Skip to main content

Generator configuration

avroSchemaHint

Commentary

added in 0.9.6

local configuration

Explicitly defines the Avro schema for generated data. Takes a map of generator keys to Avro schemas.

Note this configuration doesn't yet work with Kafka Avro serialization. Use kafkaKeyAvroSchemaHint and kafkaValueAvroSchemaHint for that. This was an API design mistake and will get smoothed over soon.


Examples

Overriding the Avro schema

Set the keys in the map to the fields in the generator. In this example, the schema is defined for the data field for events output to Google Cloud Storage.

{
"generators": [
{
"bucket": "sandbox",
"bucketConfigs": {
"format": "parquet",
"blobPrefix": "part-"
},
"data": {
"x": {
"_gen": "oneOf",
"choices": [
1,
2,
3
]
}
},
"localConfigs": {
"maxEvents": 5,
"avroSchemaHint": {
"data": {
"type": "record",
"name": "MyRecord",
"fields": [
{
"name": "x",
"type": "int"
}
]
}
}
}
}
],
"connections": {
"gcs": {
"kind": "googleCloudStorage",
"connectionConfigs": {
"projectId": "myProject"
}
}
}
}

Specification

JSON schema

{
"type": "object"
}