Skip to main content

Generator configuration

avroSchemaHint

Commentary

added in 0.9.6

local configuration

Explicitly defines the Avro schema for generated data. Takes a map of generator keys to Avro schemas.

Note that when writing data to Kafka, this configuration takes precedence over the deprecated kafkaKeyAvroSchemaHint and kafkaValueAvroSchemaHint configs.


Examples

Providing the Avro schema

Set the keys in the map to the fields in the generator. In this example, the schema is defined for the data field for events output to Google Cloud Storage.

{
"generators": [
{
"bucket": "sandbox",
"bucketConfigs": {
"format": "parquet",
"blobPrefix": "part-"
},
"data": {
"x": {
"_gen": "oneOf",
"choices": [
1,
2,
3
]
}
},
"localConfigs": {
"maxEvents": 5,
"avroSchemaHint": {
"data": {
"type": "record",
"name": "MyRecord",
"fields": [
{
"name": "x",
"type": "int"
}
]
}
}
}
}
],
"connections": {
"gcs": {
"kind": "googleCloudStorage",
"connectionConfigs": {
"projectId": "myProject"
}
}
}
}

Using logical dates

To use a logical date type, use the standard Avro schema syntax. But beware this value represents the number of days from the UNIX epoch.

The right way to compute this value based on now is to:

  • divide it by 1000 to convert it to seconds
  • divide it by the number of seconds in a day
  • truncate any remaining decimals
{
"topic": "sandbox",
"value": {
"_gen": "math",
"expr": "(now / 1000) / (60 * 60 * 24)",
"names": {
"now": {
"_gen": "now"
}
},
"decimals": 0
},
"localConfigs": {
"avroSchemaHint": {
"value": {
"type": "int",
"logicalType": "date"
}
}
}
}

Specification

JSON schema

{
"type": "object"
}