Skip to main content

Connections

filesystem

Commentary

added in 0.5.10

Writes to the local file system.

Specify the directory to write to in the generator. If any of the parent directories don't exist, they will automatically be created.

ShadowTraffic will write a series of files to that path, periodically rolling a new one. Specify the base file name with filePrefix, and files will be generated in the pattern filePrefix-n.suffix. The value of n will always increase on the latest file generated. 1

A new file will be created following the default batch rate, which can be overriden by time, elements, or serialized bytes. 2

In addition, setting any of the batch rates to 0 will cause ShadowTraffic to instantaneously flush each file. 3

You can choose from a range of serialization formats and compression types 4.

If you wish, instead of writing just one file, you can also write multiple files in one shot. 5

Caveats

Conflicting generators

Do not write multiple generators that output to the same file name in the same directory, since the output writers will contend with each other and overwrite previously written content.


Examples

Writing to files

Use fileName to set the base file name to be written. In this example, files named transactions-0.json, transactions-n.json, ... will be created. Use data to set the data to be written to the file.

{
"generators": [
{
"path": "/tmp/data",
"fileConfigs": {
"filePrefix": "transactions-",
"format": "json"
},
"data": {
"amount": {
"_gen": "normalDistribution",
"mean": 100,
"sd": 2
}
}
}
],
"connections": {
"localFs": {
"kind": "fileSystem"
}
}
}

Set the batch rate

By default, a new file will be rolled every 500 ms or 5000 elements, whichever happens first. You can also optionally roll a new file after a certain amount of serialized bytes have been accumulated.

To override these:

  • use lingerMs to set the limit on time
  • use batchElements to set it on number of events
  • use batchBytes to set it on size
{
"connections": {
"localFs": {
"kind": "fileSystem",
"batchConfigs": {
"lingerMs": 5000,
"batchElements": 50000,
"batchBytes": 5242880
}
}
}
}

Instantly flush files

To disable file buffering, set any of lingerMs, batchElements, or batchBytes to 0. In other words, this will force each event to be immediately written to a file, with a new one rolled right after.

{
"connections": {
"localFs": {
"kind": "fileSystem",
"batchConfigs": {
"batchElements": 0
}
}
}
}

Set the format and compression

format can be any of json, jsonl, and parquet.

Additionally:

  • pretty set to true will cause json to pretty print.
  • explodeJsonlArrays set to true will cause jsonl arrays to span one element per line.
  • compression can optionally be set to gzip.
{
"generators": [
{
"directory": "/tmp/data",
"fileConfigs": {
"filePrefix": "ipAddresses-",
"format": "json",
"pretty": true
},
"data": {
"ip": {
"_gen": "string",
"expr": "#{Internet.ipV4Address}"
},
"timestamp": {
"_now": "now"
}
}
}
],
"connections": {
"localFs": {
"kind": "fileSystem"
}
}
}

Writing to subdirectories

When a generator writes to the file system, by default it writes files directly under the path parameter of the connection. Sometimes, though, you may want a generator to write to multiple subdirectories within that path.

To do that, set subdir in the fileConfigs key. Presumably this will be a variable that changes over time.

In this example, 3 forks are launched which write the following files:

  • /tmp/data/a/foo-<n>.json
  • /tmp/data/b/foo-<n>.json
  • /tmp/data/c/foo-<n>.json

Note that each unique value for subdir will get its own file-roll milestones. In other words, subdirectory a will roll new files based on time/size/events at a rate independent from b and c.

{
"generators": [
{
"directory": "/tmp/data",
"fork": {
"key": [
"a",
"b",
"c"
]
},
"fileConfigs": {
"filePrefix": "foo-",
"format": "json",
"subdir": {
"_gen": "var",
"var": "forkKey"
}
},
"data": {
"x": {
"_gen": "oneOf",
"choices": [
1,
2,
3
]
}
}
}
],
"connections": {
"localFs": {
"kind": "fileSystem"
}
}
}

Writing multiple files

Sometimes, you might want to write to multiple files in multiple subdirectories on each generator iteration.

To do that, specify multiFile: a map of string to file configuration overrides. You must specify each filePrefix, and you can optionally specify individual subdir values.

When multiFile is enabled, data must be a map who's keys match those in multiFile. The values under each key are written according to the spec in multiFile.

For example, the configuration below will write 10 files: 5 to /tmp/data/a/foo-*.json, and 5 to /tmp/data/b/bar-*.json.

{
"generators": [
{
"directory": "/tmp/data",
"fileConfigs": {
"format": "json",
"multiFile": {
"a": {
"filePrefix": "foo-",
"subdir": "a"
},
"b": {
"filePrefix": "bar-",
"subdir": "b"
}
}
},
"data": {
"a": {
"_gen": "oneOf",
"choices": [
1,
2,
3
]
},
"b": {
"_gen": "oneOf",
"choices": [
4,
5,
6
]
}
},
"localConfigs": {
"maxEvents": 5
}
}
],
"connections": {
"fs": {
"kind": "fileSystem"
}
}
}

Specification

Connection JSON schema

{
"type": "object",
"properties": {
"kind": {
"type": "string",
"const": "fileSystem"
},
"batchConfigs": {
"type": "object",
"properties": {
"lingerMs": {
"type": "integer",
"minimum": 0
},
"batchElements": {
"type": "integer",
"minimum": 0
},
"batchBytes": {
"type": "integer",
"minimum": 0
}
}
}
}
}

Generator JSON schema

{
"type": "object",
"properties": {
"connection": {
"type": "string"
},
"name": {
"type": "string"
},
"directory": {
"type": "string"
},
"data": {},
"localConfigs": {
"type": "object",
"properties": {
"throttleMs": {
"oneOf": [
{
"type": "number",
"minimum": 0
},
{
"type": "object",
"properties": {
"_gen": {
"type": "string"
}
},
"required": [
"_gen"
]
}
]
},
"maxEvents": {
"oneOf": [
{
"type": "integer",
"minimum": 0
},
{
"type": "object",
"properties": {
"_gen": {
"type": "string"
}
},
"required": [
"_gen"
]
}
]
},
"kafkaKeyProtobufHint": {
"type": "object",
"properties": {
"schemaFile": {
"type": "string"
},
"message": {
"type": "string"
}
},
"required": [
"schemaFile",
"message"
]
},
"jsonSchemaHint": {
"type": "object"
},
"maxBytes": {
"type": "integer",
"minimum": 1
},
"discard": {
"type": "object",
"properties": {
"rate": {
"type": "number",
"minimum": 0,
"maximum": 1
},
"retainHistory": {
"type": "boolean"
}
},
"required": [
"rate"
]
},
"repeat": {
"type": "object",
"properties": {
"rate": {
"type": "number",
"minimum": 0,
"maximum": 1
},
"times": {
"oneOf": [
{
"type": "integer",
"minimum": 0
},
{
"type": "object",
"properties": {
"_gen": {
"type": "string"
}
},
"required": [
"_gen"
]
}
]
}
},
"required": [
"rate",
"times"
]
},
"protobufSchemaHint": {
"type": "object",
"patternProperties": {
"^.*$": {
"type": "object",
"properties": {
"schemaFile": {
"type": "string"
},
"message": {
"type": "string"
}
},
"required": [
"schemaFile",
"message"
]
}
}
},
"maxHistoryEvents": {
"type": "integer",
"minimum": 0
},
"maxMs": {
"type": "integer",
"minimum": 0
},
"time": {
"type": "integer"
},
"events": {
"type": "object",
"properties": {
"exactly": {
"oneOf": [
{
"type": "integer",
"minimum": 0
},
{
"type": "object",
"properties": {
"_gen": {
"type": "string"
}
},
"required": [
"_gen"
]
}
]
}
}
},
"delay": {
"type": "object",
"properties": {
"rate": {
"type": "number",
"minimum": 0,
"maximum": 1
},
"ms": {
"oneOf": [
{
"type": "integer",
"minimum": 0
},
{
"type": "object",
"properties": {
"_gen": {
"type": "string"
}
},
"required": [
"_gen"
]
}
]
}
},
"required": [
"rate",
"ms"
]
},
"history": {
"type": "object",
"properties": {
"events": {
"type": "object",
"properties": {
"max": {
"type": "integer",
"minimum": 0
}
}
}
}
},
"avroSchemaHint": {
"type": "object"
},
"throttle": {
"type": "object",
"properties": {
"ms": {
"oneOf": [
{
"type": "number",
"minimum": 0
},
{
"type": "object",
"properties": {
"_gen": {
"type": "string"
}
},
"required": [
"_gen"
]
}
]
}
}
},
"throughput": {
"oneOf": [
{
"type": "integer",
"minimum": 1
},
{
"type": "object",
"properties": {
"_gen": {
"type": "string"
}
},
"required": [
"_gen"
]
}
]
},
"timeMultiplier": {
"oneOf": [
{
"type": "number"
},
{
"type": "object",
"properties": {
"_gen": {
"type": "string"
}
},
"required": [
"_gen"
]
}
]
},
"kafkaValueProtobufHint": {
"type": "object",
"properties": {
"schemaFile": {
"type": "string"
},
"message": {
"type": "string"
}
},
"required": [
"schemaFile",
"message"
]
}
}
},
"fileConfigs": {
"oneOf": [
{
"type": "object",
"properties": {
"filePrefix": {
"oneOf": [
{
"type": "string"
},
{
"type": "object",
"properties": {
"_gen": {
"type": "string"
}
},
"required": [
"_gen"
]
}
]
},
"format": {
"type": "string",
"enum": [
"json",
"jsonl",
"parquet"
]
},
"pretty": {
"type": "boolean"
},
"compression": {
"type": "string",
"enum": [
"gzip"
]
},
"subdir": {
"oneOf": [
{
"type": "string"
},
{
"type": "object",
"properties": {
"_gen": {
"type": "string"
}
},
"required": [
"_gen"
]
}
]
}
},
"required": [
"filePrefix",
"format"
]
},
{
"type": "object",
"properties": {
"format": {
"type": "string",
"enum": [
"json",
"jsonl",
"parquet"
]
},
"pretty": {
"type": "boolean"
},
"compression": {
"type": "string",
"enum": [
"gzip"
]
},
"subdir": {
"oneOf": [
{
"type": "string"
},
{
"type": "object",
"properties": {
"_gen": {
"type": "string"
}
},
"required": [
"_gen"
]
}
]
},
"multiFile": {
"type": "object",
"additionalProperties": {
"type": "object",
"properties": {
"filePrefix": {
"oneOf": [
{
"type": "string"
},
{
"type": "object",
"properties": {
"_gen": {
"type": "string"
}
},
"required": [
"_gen"
]
}
]
},
"subdir": {
"oneOf": [
{
"type": "string"
},
{
"type": "object",
"properties": {
"_gen": {
"type": "string"
}
},
"required": [
"_gen"
]
}
]
}
},
"required": [
"filePrefix"
]
}
}
},
"required": [
"format",
"multiFile"
]
}
]
}
},
"required": [
"directory",
"data",
"fileConfigs"
]
}