Connections
filesystem
Commentary
added in 0.5.10
Writes to the local file system.
Specify the directory
to write to in the generator. If any of the parent directories don't exist, they will automatically be created.
ShadowTraffic will write a series of files to that path, periodically rolling a new one. Specify the base file name with filePrefix
, and files will be generated in the pattern filePrefix-n.suffix
. The value of n
will always increase on the latest file generated. 1
A new file will be created following the default batch rate, which can be overriden by time, elements, or serialized bytes. 2
In addition, setting any of the batch rates to 0
will cause ShadowTraffic to instantaneously flush each file. 3
You can choose from a range of serialization formats and compression types 4.
If you wish, instead of writing just one file, you can also write multiple files in one shot. 5
Caveats
Conflicting generators
Do not write multiple generators that output to the same file name in the same directory, since the output writers will contend with each other and overwrite previously written content.
Examples
Writing to files
Use fileName
to set the base file name to be written. In this example, files named transactions-0.json
, transactions-n.json
, ... will be created. Use data
to set the data to be written to the file.
{
"generators": [
{
"path": "/tmp/data",
"fileConfigs": {
"filePrefix": "transactions-",
"format": "json"
},
"data": {
"amount": {
"_gen": "normalDistribution",
"mean": 100,
"sd": 2
}
}
}
],
"connections": {
"localFs": {
"kind": "fileSystem"
}
}
}
Set the batch rate
By default, a new file will be rolled every 500
ms or 5000
elements, whichever happens first. You can also optionally roll a new file after a certain amount of serialized bytes have been accumulated.
To override these:
- use
lingerMs
to set the limit on time - use
batchElements
to set it on number of events - use
batchBytes
to set it on size
{
"connections": {
"localFs": {
"kind": "fileSystem",
"batchConfigs": {
"lingerMs": 5000,
"batchElements": 50000,
"batchBytes": 5242880
}
}
}
}
Instantly flush files
To disable file buffering, set any of lingerMs
, batchElements
, or batchBytes
to 0
. In other words, this will force each event to be immediately written to a file, with a new one rolled right after.
{
"connections": {
"localFs": {
"kind": "fileSystem",
"batchConfigs": {
"batchElements": 0
}
}
}
}
Set the format and compression
format
can be any of json
, jsonl
, and parquet
.
Additionally:
pretty
set totrue
will causejson
to pretty print.explodeJsonlArrays
set totrue
will causejsonl
arrays to span one element per line.compression
can optionally be set togzip
.
{
"generators": [
{
"directory": "/tmp/data",
"fileConfigs": {
"filePrefix": "ipAddresses-",
"format": "json",
"pretty": true
},
"data": {
"ip": {
"_gen": "string",
"expr": "#{Internet.ipV4Address}"
},
"timestamp": {
"_now": "now"
}
}
}
],
"connections": {
"localFs": {
"kind": "fileSystem"
}
}
}
Writing to subdirectories
When a generator writes to the file system, by default it writes files directly under the path
parameter of the connection. Sometimes, though, you may want a generator to write to multiple subdirectories within that path
.
To do that, set subdir
in the fileConfigs
key. Presumably this will be a variable that changes over time.
In this example, 3 forks are launched which write the following files:
/tmp/data/a/foo-<n>.json
/tmp/data/b/foo-<n>.json
/tmp/data/c/foo-<n>.json
Note that each unique value for subdir
will get its own file-roll milestones. In other words, subdirectory a
will roll new files based on time/size/events at a rate independent from b
and c
.
{
"generators": [
{
"directory": "/tmp/data",
"fork": {
"key": [
"a",
"b",
"c"
]
},
"fileConfigs": {
"filePrefix": "foo-",
"format": "json",
"subdir": {
"_gen": "var",
"var": "forkKey"
}
},
"data": {
"x": {
"_gen": "oneOf",
"choices": [
1,
2,
3
]
}
}
}
],
"connections": {
"localFs": {
"kind": "fileSystem"
}
}
}
Writing multiple files
Sometimes, you might want to write to multiple files in multiple subdirectories on each generator iteration.
To do that, specify multiFile
: a map of string to file configuration overrides. You must specify each filePrefix
, and you can optionally specify individual subdir
values.
When multiFile
is enabled, data
must be a map who's keys match those in multiFile
. The values under each key are written according to the spec in multiFile
.
For example, the configuration below will write 10 files: 5 to /tmp/data/a/foo-*.json
, and 5 to /tmp/data/b/bar-*.json
.
{
"generators": [
{
"directory": "/tmp/data",
"fileConfigs": {
"format": "json",
"multiFile": {
"a": {
"filePrefix": "foo-",
"subdir": "a"
},
"b": {
"filePrefix": "bar-",
"subdir": "b"
}
}
},
"data": {
"a": {
"_gen": "oneOf",
"choices": [
1,
2,
3
]
},
"b": {
"_gen": "oneOf",
"choices": [
4,
5,
6
]
}
},
"localConfigs": {
"maxEvents": 5
}
}
],
"connections": {
"fs": {
"kind": "fileSystem"
}
}
}
Specification
Connection JSON schema
{
"type": "object",
"properties": {
"kind": {
"type": "string",
"const": "fileSystem"
},
"batchConfigs": {
"type": "object",
"properties": {
"lingerMs": {
"type": "integer",
"minimum": 0
},
"batchElements": {
"type": "integer",
"minimum": 0
},
"batchBytes": {
"type": "integer",
"minimum": 0
}
}
}
}
}
Generator JSON schema
{
"type": "object",
"properties": {
"connection": {
"type": "string"
},
"name": {
"type": "string"
},
"directory": {
"type": "string"
},
"data": {},
"localConfigs": {
"type": "object",
"properties": {
"throttleMs": {
"oneOf": [
{
"type": "number",
"minimum": 0
},
{
"type": "object",
"properties": {
"_gen": {
"type": "string"
}
},
"required": [
"_gen"
]
}
]
},
"maxEvents": {
"oneOf": [
{
"type": "integer",
"minimum": 0
},
{
"type": "object",
"properties": {
"_gen": {
"type": "string"
}
},
"required": [
"_gen"
]
}
]
},
"kafkaKeyProtobufHint": {
"type": "object",
"properties": {
"schemaFile": {
"type": "string"
},
"message": {
"type": "string"
}
},
"required": [
"schemaFile",
"message"
]
},
"jsonSchemaHint": {
"type": "object"
},
"maxBytes": {
"type": "integer",
"minimum": 1
},
"discard": {
"type": "object",
"properties": {
"rate": {
"type": "number",
"minimum": 0,
"maximum": 1
},
"retainHistory": {
"type": "boolean"
}
},
"required": [
"rate"
]
},
"repeat": {
"type": "object",
"properties": {
"rate": {
"type": "number",
"minimum": 0,
"maximum": 1
},
"times": {
"oneOf": [
{
"type": "integer",
"minimum": 0
},
{
"type": "object",
"properties": {
"_gen": {
"type": "string"
}
},
"required": [
"_gen"
]
}
]
}
},
"required": [
"rate",
"times"
]
},
"protobufSchemaHint": {
"type": "object",
"patternProperties": {
"^.*$": {
"type": "object",
"properties": {
"schemaFile": {
"type": "string"
},
"message": {
"type": "string"
}
},
"required": [
"schemaFile",
"message"
]
}
}
},
"maxHistoryEvents": {
"type": "integer",
"minimum": 0
},
"maxMs": {
"type": "integer",
"minimum": 0
},
"time": {
"type": "integer"
},
"events": {
"type": "object",
"properties": {
"exactly": {
"oneOf": [
{
"type": "integer",
"minimum": 0
},
{
"type": "object",
"properties": {
"_gen": {
"type": "string"
}
},
"required": [
"_gen"
]
}
]
}
}
},
"delay": {
"type": "object",
"properties": {
"rate": {
"type": "number",
"minimum": 0,
"maximum": 1
},
"ms": {
"oneOf": [
{
"type": "integer",
"minimum": 0
},
{
"type": "object",
"properties": {
"_gen": {
"type": "string"
}
},
"required": [
"_gen"
]
}
]
}
},
"required": [
"rate",
"ms"
]
},
"history": {
"type": "object",
"properties": {
"events": {
"type": "object",
"properties": {
"max": {
"type": "integer",
"minimum": 0
}
}
}
}
},
"avroSchemaHint": {
"type": "object"
},
"throttle": {
"type": "object",
"properties": {
"ms": {
"oneOf": [
{
"type": "number",
"minimum": 0
},
{
"type": "object",
"properties": {
"_gen": {
"type": "string"
}
},
"required": [
"_gen"
]
}
]
}
}
},
"throughput": {
"oneOf": [
{
"type": "integer",
"minimum": 1
},
{
"type": "object",
"properties": {
"_gen": {
"type": "string"
}
},
"required": [
"_gen"
]
}
]
},
"timeMultiplier": {
"oneOf": [
{
"type": "number"
},
{
"type": "object",
"properties": {
"_gen": {
"type": "string"
}
},
"required": [
"_gen"
]
}
]
},
"kafkaValueProtobufHint": {
"type": "object",
"properties": {
"schemaFile": {
"type": "string"
},
"message": {
"type": "string"
}
},
"required": [
"schemaFile",
"message"
]
}
}
},
"fileConfigs": {
"oneOf": [
{
"type": "object",
"properties": {
"filePrefix": {
"oneOf": [
{
"type": "string"
},
{
"type": "object",
"properties": {
"_gen": {
"type": "string"
}
},
"required": [
"_gen"
]
}
]
},
"format": {
"type": "string",
"enum": [
"json",
"jsonl",
"parquet"
]
},
"pretty": {
"type": "boolean"
},
"compression": {
"type": "string",
"enum": [
"gzip"
]
},
"subdir": {
"oneOf": [
{
"type": "string"
},
{
"type": "object",
"properties": {
"_gen": {
"type": "string"
}
},
"required": [
"_gen"
]
}
]
}
},
"required": [
"filePrefix",
"format"
]
},
{
"type": "object",
"properties": {
"format": {
"type": "string",
"enum": [
"json",
"jsonl",
"parquet"
]
},
"pretty": {
"type": "boolean"
},
"compression": {
"type": "string",
"enum": [
"gzip"
]
},
"subdir": {
"oneOf": [
{
"type": "string"
},
{
"type": "object",
"properties": {
"_gen": {
"type": "string"
}
},
"required": [
"_gen"
]
}
]
},
"multiFile": {
"type": "object",
"additionalProperties": {
"type": "object",
"properties": {
"filePrefix": {
"oneOf": [
{
"type": "string"
},
{
"type": "object",
"properties": {
"_gen": {
"type": "string"
}
},
"required": [
"_gen"
]
}
]
},
"subdir": {
"oneOf": [
{
"type": "string"
},
{
"type": "object",
"properties": {
"_gen": {
"type": "string"
}
},
"required": [
"_gen"
]
}
]
}
},
"required": [
"filePrefix"
]
}
}
},
"required": [
"format",
"multiFile"
]
}
]
}
},
"required": [
"directory",
"data",
"fileConfigs"
]
}