learn
Overview
So you need some synthetic data? Maybe ShadowTraffic can help. This page explains in detail what ShadowTraffic is and how it works. We'll progressively build up a simple example to introduce each of the major features.
Let's dive in: ShadowTraffic is a product that helps you rapidly simulate production traffic to your backend—primarily to Apache Kafka, Postgres, S3, webhooks, and a few others.
Specifically, it's a container that you deploy on your own infrastructure. You give it a single JSON configuration file which tells it what you want your data to look like, and it uses that to produce streams of data to your backend. You don't write any code to make this work.
The API
Functions
So what does that JSON configuration file look like? Let's illustrate it with an example.
Imagine that you want to generate a stream of sensor readings to a Kafka topic. Perhaps the data is as simple as this:
{
"sensorId": "a94729b9-c375-45a3-8a1b-58b96f6b77dc",
"reading": 60.53,
"timestamp": 1716321759
}
The guiding principle of ShadowTraffic is that you replace concrete values with functions. A function is a map with a key named _gen
.
With that in mind, here's how you'd generate more data like the above:
{
"sensorId": { "_gen": "uuid" },
"reading": { "_gen": "normalDistribution", "mean": 60, "sd": 5 },
"timestamp": { "_gen": "now" }
}
When you run that (with a fully assembled configuration, which we'll get to in a second), it will produce data like this:
[
{
"sensorId" : "c67473fb-46bf-92f3-1323-691ff7307e98",
"reading" : 54.62383104172507,
"timestamp" : 1716322323316
},
{
"sensorId" : "18883008-9cc7-318a-5f59-d964cc3ba36e",
"reading" : 61.93954024611097,
"timestamp" : 1716322323323
},
{
"sensorId" : "7f25b88f-71c0-1bcc-7f9d-c2069b9bd54e",
"reading" : 58.074918200513636,
"timestamp" : 1716322323342
}
]
What is really going on here?
ShadowTraffic scans your configuration file and looks for any maps with a key named _gen
. If it finds one, it looks at the function name and compiles code at that location. Then at runtime, ShadowTraffic repeatedly invokes those functions to produce synthetic data.
This approach is powerful because the API is a mirror image of your data. You can put functions anywhere you like, and ShadowTraffic will mimic the structure. For example, if the sensor data was more nested like:
{
"sensorPayload": {
"identifiers": {
"v4": "a94729b9-c375-45a3-8a1b-58b96f6b77dc"
},
"value": [60.53, "degrees"],
"time": {
"recordedAt": 1716321759
}
}
}
Then all you'd need to do is shape your function calls in the same way:
{
"sensorPayload": {
"identifiers": {
"v4": { "_gen": "uuid" }
},
"value": [
{ "_gen": "normalDistribution", "mean": 60, "sd": 5 },
"degrees"
],
"time": {
"recordedAt": { "_gen": "now" }
}
}
}
Funny looking, isn't it? Just remember: replace concrete values with functions. That is all that's happening here.
Before we move on, one thing you might notice is that each event represents a brand new sensor. Those UUIDs are nearly never going to overlap, so each sensor only seems to emit one reading. If you can, ignore this for a moment: we'll show a little later how to model individual sensors over time.
Function modifiers
Did you notice something a little off about the example above? When we started, we wanted sensor readings that looked neat with two decimal places, like 60.53
, but our generator spits out long values like 54.62383104172507
. How can you control that?
Many functions take specific parameters to control their behavior, but there are a few general parameters that can be used on almost any function. These are called function modifiers.
For instance, on any numeric function, you can use the decimals
function modifier to trim the number of decimal places. Using decimals
, we adjust our function call like so:
{
"_gen": "normalDistribution",
"mean": 60,
"sd": 5,
"decimals": 2
},
If we rerun ShadowTraffic, we'll now get reading values in exactly the shape we want them:
[
{
"sensorId" : "5a5be9a6-6245-e6c3-9d9d-a67a0121607e"
"reading" : 65.0,
"timestamp" : 1716393799097
},
{
"sensorId" : "1ef26028-4e5d-e57f-171a-ae256a1c498f"
"reading" : 69.44,
"timestamp" : 1716393799106
},
{
"sensorId" : "5a6798a9-3792-6b1f-b888-74c87170d00a"
"reading" : 56.41,
"timestamp" : 1716393799107
}
]
decimals
works with other functions like uniformDistribution
, divide
, and anything else that returns numbers.
Other useful function modifiers include path
, selectKeys
, and keyNames
. You might want to spend a minute or two browsing these since they come in handy so often.
Function modifier order
As a reference, modifiers execute in the following order:
If you need to change the order (perhaps you need selectKeys
to run after path
), you can use the constant
function to wrap the result and apply further function modifiers, like so:
{
"_gen": "constant",
"x": {
"_gen": "constant",
"x": {
"a": {
"b": c
}
},
"path": ["a"]
},
"selectKeys": ["b"]
}
Generators
So far, we've introduced just enough abstraction to make new data. But we haven't actually run ShadowTraffic to see it producing that data to a particular backend like Kafka or Postgres. To do that, we'll need another concept: generators.
A generator describes the backend-specific attributes of the data. Continuing our example, we want to get our sensor data into a Kafka topic. When you send an event to Kafka, you need to supply a topic, key, value, and perhaps other information.
Slightly reshaping our function calls from above, the generator for Kafka looks like this:
{
"topic": "sensorReadings",
"key": {
"sensorId": { "_gen": "uuid" }
},
"value": {
"reading": { "_gen": "normalDistribution", "mean": 60, "sd": 5 },
"timestamp": { "_gen": "now" }
}
}
Where did this JSON structure come from? Each backend type has a schema for what its generator must look like. Here's the Kafka generator schema, and here's the schema for Postgres if that's what you're working with.
Moving our example along, a ShadowTraffic configuration file requires an array of generators, so the nearly-complete configuration file looks like:
{
"generators": [
{
"topic": "sensorReadings",
"key": {
"sensorId": { "_gen": "uuid" }
},
"value": {
"reading": { "_gen": "normalDistribution", "mean": 60, "sd": 5 },
"timestamp": { "_gen": "now" }
}
}
]
}
When you have more than one generator, ShadowTraffic executes them in a round-robin fashion unless otherwise specified. But we'll get to that soon!
Connections
By now, you can probably guess the last concept we need to tie this all together: connections. Connections describe where, specifically, the data will go. They're a top-level construct in the configuration file, mapping a connection name to its details.
To send our sensor data to a local Kafka cluster, our connection will look something like this:
{
"generators": [
{
"topic": "sensorReadings",
"key": {
"sensorId": { "_gen": "uuid" }
},
"value": {
"reading": { "_gen": "normalDistribution", "mean": 60, "sd": 5, "decimals": 2 },
"timestamp": { "_gen": "now" }
}
}
],
"connections": {
"dev-kafka": {
"kind": "kafka",
"producerConfigs": {
"bootstrap.servers": "localhost:9092",
"key.serializer": "io.shadowtraffic.kafka.serdes.JsonSerializer",
"value.serializer": "io.shadowtraffic.kafka.serdes.JsonSerializer"
}
}
}
}
A few things to call out:
- Each connection needs a name,
dev-kafka
in this case. When you have just one connection, ShadowTraffic automatically understands that your generators should use that connection, so you don't need to tell it anything else - like in this example. - If you have multiple connections, each generator has to specify a
connection
field to tell it what connection name it should bind to. - The
kind
field specifies which connection type this is, and therefore what fields are required for a valid connection. For instance, if you set it topostgres
, ShadowTraffic would require fields likehost
andport
instead ofbootstrap.servers
. - For you Kafka readers with a keen eye, you'll have noticed the JSON serializers with the ShadowTraffic package name. ShadowTraffic ships this out of the box for your convenience. You're welcome.
See the reference pages for Kafka, Postgres, and S3 to learn more about each connection type.
Now, with that much configuration, you can run ShadowTraffic to send sensor data to Kafka. Read on to the next section.
Dry run
Before you actually send data off to Kafka, you might want to check that the data looks how you'd expect.
ShadowTraffic lets you perform a dry run: you can see exactly what it's going to do, but instead of having the data sent to your connection, it gets printed to standard output on your terminal. Invoke ShadowTraffic with this command, noting --stdout
and --sample
.
docker run --env-file license.env -v $(pwd)/your-config-file.json:/home/config.json shadowtraffic/shadowtraffic:latest --config /home/config.json --stdout --sample 10
You shoud see it print 10 sensor readings similiar to the below, and then exit.
{
"topic" : "sensorReadings",
"key" : {
"sensorId" : "ddc789f7-c9be-7883-da97-ce5759e526f6"
},
"value" : {
"reading" : 64.14,
"timestamp" : 1716909643832
}
}
{
"topic" : "sensorReadings",
"key" : {
"sensorId" : "b46da8cb-6cf4-c10a-2426-70e1a10e366d"
},
"value" : {
"reading" : 57.65,
"timestamp" : 1716909643840
}
}
{
"topic" : "sensorReadings",
"key" : {
"sensorId" : "e4ce8f14-6b42-36db-c467-6428173f49ff"
},
"value" : {
"reading" : 65.36,
"timestamp" : 1716909643841
}
}
What's even more useful during development is the --watch
flag. By running ShadowTraffic with --stdout --sample 10 --watch
, ShadowTraffic will print 10 sample events to standard output every time your configuration file changes. This is incredibly useful for iterating on your configuration.
Runtime
When all looks well, you can drop the development flags and run ShadowTraffic like so:
docker run --env-file license.env -v $(pwd)/your-config-file.json:/home/config.json shadowtraffic/shadowtraffic:latest --config /home/config.json
It should print some logs about connecting to Kafka and block, producing data indefinitely. You can stop it with Control-C
or stop the container with the Docker CLI.
If you want a quick way to verify how ShadowTraffic is behaving, you can peek at its Prometheus metrics.
Generator configuration
If checked Kafka after you ran ShadowTraffic, the first thing you'll probably notice is that it produced quite a lot of messages in a short span of time. How can you slow it down? Or better yet, how can you generally change an entire generator's behavior?
There are two ways to configure a generator: either locally for a specific generator, or globally for all of them.
For example, to make a generator produce an event at most every 200 milliseconds, you could configure its local throttleMs
parameter:
{
"topic": "sensorReadings",
"key": {
"sensorId": { "_gen": "uuid" }
},
"value": {
"reading": { "_gen": "normalDistribution", "mean": 60, "sd": 5, "decimals": 2 },
"timestamp": { "_gen": "now" }
},
"localConfigs": {
"throttleMs": 200
}
}
If you had many generators and you wanted all of them to produce events no more than 500 milliseconds each, you could instead use the top-level global configuration field:
{
"generators": [
{
"topic": "sensorReadings",
"key": {
"sensorId": { "_gen": "uuid" }
},
"value": {
"reading": { "_gen": "normalDistribution", "mean": 60, "sd": 5, "decimals": 2 },
"timestamp": { "_gen": "now" }
}
}
],
"globalConfigs": {
"throttleMs": 500
}
"connections": {
"dev-kafka": { ... }
}
}
If you supply the same configuration in both global and local, the local parameter takes precedence.
Similarly, you can use generator configuration to cap the number of generated events (maxEvents
), delay a portion of events from being written to the connection (delay
), among other things.
Notably, you can use functions inside the configuration. If you wanted to generate events anywhere between 100-200 ms each, you could use a function to create that variance:
{
"globalConfigs": {
"throttleMs": {
"_gen": "uniformDistribution",
"bounds": [100, 200]
}
}
}
Lookups
Until now, we've generated data to only one Kafka topic. But it's often the case that you have multiple streams of data which share a common identifier - in other words, a join key.
Imagine, for example, customer and order data. Both data sets usually share a common customerId
field that links rows in both together.
How would you do that with ShadowTraffic? There's a function just for this purpose: lookup
.
When ShadowTraffic runs a generator, it retains a window of history about the events it recently produced. lookup
is a function that queries that history, picking out a random event to use.
Let's continue our sensor example and imagine there's another stream of data for maintenance requests. Every ~10 seconds, an event is generated which requests that a random sensor get checked for repairs:
{
"generators": [
{
"topic": "sensorReadings",
"key": {
"sensorId": { "_gen": "uuid" }
},
"value": {
"reading": { "_gen": "normalDistribution", "mean": 60, "sd": 5, "decimals": 2 },
"timestamp": { "_gen": "now" }
},
"localConfigs": {
"throttleMs": 200
}
},
{
"topic": "maintainenceNotifications",
"value": {
"sensorId": {
"_gen": "lookup",
"topic": "sensorReadings",
"path": [ "key", "sensorId" ]
},
"status": "needs repair"
},
"localConfigs": {
"throttleMs": 10000
}
}
],
"connections": {
"dev-kafka": { ... }
}
}
The generator for maintainenceNotifications
calls the lookup
function, asking it for events previously generated to the sensorReadings
topic. Notice the path
function modifier. lookup
returns an entire event that was previously generated, but we only want the sensorId
. Using path
lets us drill directly to the value we want.
When you run ShadowTraffic, you'll see common identifiers link up, like the following. Notice how the id a2c57eea-0589-70f8-b557-25e7ebb399c4
is shared in both events.
{
"topic" : "sensorReadings",
"key" : {
"sensorId" : "a2c57eea-0589-70f8-b557-25e7ebb399c4"
},
"value" : {
"reading" : 57.91,
"timestamp" : 1716330021188
}
}
{
"topic" : "maintainenceNotifications",
"value" : {
"sensorId" : "a2c57eea-0589-70f8-b557-25e7ebb399c4",
"status" : "needs repair"
}
}
By default, the last 1000000
generated events are available for lookups before being purged from memory, but you can raise or lower this value with the history
generator configuration.
Variables
When your generators get complex enough, you'll probably want to share data across multiple fields. Variables are an obvious abstraction for this.
To show how that works, let's extend our example and imagine that each sensor event also contains a URL to see a chart about its recent activity. Perhaps that URL contains the sensor ID itself, so we'll need to reference it in two places.
{
"topic": "sensorReadings",
"vars": {
"sensorId": { "_gen": "uuid" }
},
"key": {
"sensorId": { "_gen": "var", "var": "sensorId" }
},
"value": {
"reading": { "_gen": "normalDistribution", "mean": 60, "sd": 5, "decimals": 2 },
"timestamp": { "_gen": "now" },
"url": {
"_gen": "string",
"expr": "http://mydomain.com/charts/#{sensorId}"
}
}
}
You declare variables in a top-level vars
field, mapping names to expressions. To reference a variable, you simply use the var
function with the name you want to bind. Notice further how the string
function can reference variables through #{}
templating.
When you run it, each event will share the same identifier in its sensorId
and url
fields:
{
"topic" : "sensorReadings",
"key" : {
"sensorId" : "6d95332c-fcc2-ff43-d48f-67053ecb5609"
},
"value" : {
"reading" : 60.02,
"timestamp" : 1716330583941,
"url" : "http://mydomain.com/charts/6d95332c-fcc2-ff43-d48f-67053ecb5609"
}
}
There might be some cases where you want to randomly generate a variable, but only do so once and lock its value for the lifetime of the generator. To do that, you can use the top-level varsOnce
field, which works exactly like vars
, but only evaluates once:
{
"topic": "sensorReadings",
"varsOnce": {
"originalHardware": {
"_gen": "boolean"
}
},
"vars": {
"sensorId": { "_gen": "uuid" }
},
"key": {
"sensorId": { "_gen": "var", "var": "sensorId" }
},
"value": {
"reading": { "_gen": "normalDistribution", "mean": 60, "sd": 5, "decimals": 2 },
"timestamp": { "_gen": "now" },
"url": {
"_gen": "string",
"expr": "http://mydomain.com/charts/#{sensorId}"
},
"original": {
"_gen": "var",
"var": "originalHardware"
}
}
}
In the output data, original
will always be true
or false
depending on when you run ShadowTraffic.
And yes, variables can reference other variables.
Seeding
There are many cases, like correctness testing, where you'll want to generate the exact same data every time. ShadowTraffic can do this with the --seed
parameter:
docker run --env-file license.env -v $(pwd)/your-config-file.json:/home/config.json shadowtraffic/shadowtraffic:latest --config /home/config.json --seed 42
When you set a seed, ShadowTraffic will not only generate the same data, but use the same throttle values, delay values, or any other configuration you've provided.
If you want to replicate a particular run, look at the top of ShadowTraffic's logs to see what the seed was:
✔ Running with seed 720838425. You can repeat this run by setting --seed 720838425.
For example, in the previous section, we discussed how varsOnce
can lock a boolean value for the lifetime of one ShadowTraffic run, though it may be true
or false
the next time you run it. Using --seed
will lock it to one or the other forever.
State machines
Earlier, we mentioned something a little weird about this data: every event comes from a new sensor ID. That isn't very realistic. What we want to create is a set of sensors, each of which sends updates over time. In the following two sections, we'll introduce the constructs you need to do it, starting with state machines.
To complete our example, let's imagine that we want 10 sensors sending updates, each at 1 second intervals. Each sensor's reading will start will a value of about 60, and each subsequent reading will be the previous reading plus a random value between -1
and 1
. This will create a nice drift effect for each sensor.
A state machine is the perfect construct for modeling this. It's what it sounds like: you have a set of states and transitions. Each state describes how to override the base generator.
The best way to understand this is to just dive into the example. Notice how sensorId
has been temporarily set to a specific UUID. We'll come back and fix this in the next section, but for now this makes sense: we're modeling the lifecycle of a single sensor.
{
"generators": [
{
"topic": "sensorReadings",
"key": {
"sensorId": "2d9549cc-ac0f-b899-0b8b-cca3fc0691d3"
},
"value": {
"timestamp": { "_gen": "now" }
},
"stateMachine": {
"_gen": "stateMachine",
"initial": "start",
"transitions": {
"start": "update",
"update": "update"
},
"states": {
"start": {
"value": {
"reading": {"_gen": "normalDistribution", "mean": 60, "sd": 5, "decimals": 2 }
}
},
"update": {
"value": {
"reading": {
"_gen": "add",
"args": [
{ "_gen": "uniformDistribution", "bounds": [-1, 1] },
{ "_gen": "previousEvent", "path": [ "value", "reading" ] }
],
"decimals": 2
}
}
}
}
},
"localConfigs": { "throttleMs": 1000 }
}
],
"connections": {
"dev-kafka": { ... }
}
}
Notice a few things:
- If you run it, you'll see one event written per second. The first event will be from the
start
state, and subsequent events will be from theupdate
state. - The
reading
field has been moved from the top-levelvalue
field into each state. The states merge, or override, its configuration into the base generator. You can observe this by adding more fields besidesreading
to theupdate
state. All events but the first will contain your new field. previousEvent
is a function that grabs the latest event from this generator's history. Thepath
function modifier drills into the last sensor reading so it can be added to a random value as described.
We're almost there. In the next section, we'll remove that hardcoded sensor ID, 2d9549cc-ac0f-b899-0b8b-cca3fc0691d3
, and generate data for 10 distinct sensors.
Forks
If you think about the ways you could take the previous example and generalize it to 10 sensors, one thing you could do is copy and paste that generate 9 more times, altering sensorId
each time. That could work, but at best it's clumsy. What if you need 1,000 sensors?
Fork is a construct that dynamically clones a generator, running it in parallel many times. You create a top-level field, fork
and provide at least a key
field. key
describes the "identity" of each fork: is this sensor 1
, sensor 2
, or sensor 3
? This is easiest to understand by looking at the configuration:
{
"generators": [
{
"topic": "sensorReadings",
"fork": {
"key": { "_gen": "uuid" },
"maxForks": 10
},
"key": {
"sensorId": { "_gen": "var", "var": "forkKey" }
},
"value": {
"timestamp": { "_gen": "now" }
},
"stateMachine": { ... },
"localConfigs": { "throttleMs": 1000 }
}
],
"connections": {
"dev-kafka": { ... }
}
}
Notice how:
- The
key
field infork
has been set to theuuid
function, which is what it originally was at the start of this page. This will create random sensor IDs for us. - There's also a
maxForks
field that's been set to10
. When unset,fork
will generate as many instances as possible. Since we want only 10 sensors,maxForks
puts an upper bound on it. - The hardcoded sensor ID has been removed and replaced with a reference to a variable called
forKey
. When you usefork
, you can use this variable to identify which fork this one is.
If you run this, you'll now see 10 different UUIDs, each of which updates every 1 second.
You might also notice that all the sensors appear to update at nearly the same time: a burst of 10 updates, then nothing for 1 second, and repeat. This is because by default, forks are spawned as fast as possible. You can stagger how quickly forks start with the aptly named stagger
field.
Take care to either put an upper bound on the number of forks you start, with maxForks
, or ensure that forks eventually stop running through either maxEvents
or a state machine with a terminal state. Each additional fork consumes memory, so if you start an unbounded number of forks, you'll eventually run out of memory.
Intervals
You may want to make your synthetic streams a little more realistic by controlling how they behave depending on the day or time. ShadowTraffic provides a construct for this: intervals.
intervals
is a function that maps a Cron string to an expression. When the current wallclock times overlaps with one of the Cron strings, the mapped expression is used in its place.
For example, imagine that you wanted sensors to normally emit updates once per 4 seconds. But on every 5th minute of the hour, you want updates every 50 milliseconds, and on every 2nd minute of the hour, you want updates every 1000 milliseconds. You can adjust your throttle
to look like this:
{
"topic": "sensorReadings",
"value": { ... },
"localConfigs": {
"throttleMs": {
"_gen": "intervals",
"intervals": [
[ "*/5 * * * *", 50 ],
[ "*/2 * * * *", 1000 ]
],
"defaultValue": 4000
}
}
}
Stateful functions
As we get towards the end of this overview, let's revisit what we started with: functions. Throughout this tour, we've used stateless functions like uuid
and normalDistribution
. Each time you invoke them, they return a completely random value.
But sometimes, you may find yourself in situations where you need to generate a series of values, where each one is a progression of the last. ShadowTraffic ships a few functions that behave this way. They're called stateful functions because they retain state between calls. You'll know you're working with a stateful function because it's indicated on the function's reference page.
Let's build on our example using the stateful sequentialInteger
and sequentialString
functions:
{
"topic": "sensorReadings",
"key": {
"sensorId": { "_gen": "sequentialString", "expr": "sensor-~d" }
},
"value": {
"i": { "_gen": "sequentialInteger", "startingFrom": 50 },
"reading": { "_gen": "normalDistribution", "mean": 60, "sd": 5, "decimals": 2 },
"timestamp": { "_gen": "now" }
}
}
Each time this generator runs, the internal state for these functions advances, and the values automatically progress in the output:
{
"topic" : "sensorReadings",
"key" : {
"sensorId" : "sensor-0"
},
"value" : {
"i" : 50,
"reading" : 63.31,
"timestamp" : 1716909120807
}
}
{
"topic" : "sensorReadings",
"key" : {
"sensorId" : "sensor-1"
},
"value" : {
"i" : 51,
"reading" : 61.9,
"timestamp" : 1716909120816
}
}
{
"topic" : "sensorReadings",
"key" : {
"sensorId" : "sensor-2"
},
"value" : {
"i" : 52,
"reading" : 64.75,
"timestamp" : 1716909120816
}
}
Preprocessors
This next concept becomes useful the more you use ShadowTraffic: preprocessors. As your configuration files grow, you'll probably want some modularity, and even the ability to parameterize them at launch-time.
Preprocessors are special functions that help you do these things. As their name suggests, they run first before all other functions and transform your configuration.
We can use them, for example, to put your connection information in a file that can be shared across other ShadowTraffic configurations. If you were to make a file called connections.json
:
{
"dev-kafka": {
"kind": "kafka",
"producerConfigs": {
"bootstrap.servers": "localhost:9092",
"key.serializer": "io.shadowtraffic.kafka.serdes.JsonSerializer",
"value.serializer": "io.shadowtraffic.kafka.serdes.JsonSerializer"
}
}
}
You could include the connection block in your main configuration like so:
{
...
"connections": {
"_gen": "loadJsonFile",
"file": "connections.json"
}
}
ShadowTraffic will expand the contents on connections.json
and inline them into the spot where you called loadJsonFile
, and then proceed with normal validation.
Another thing you might want to do is inject variables from your environment. You can use the env
function to do that, perhaps to parameterize your bootstrap server URL:
{
"_gen": "env",
"var": "BOOTSTRAP_SERVERS"
}
While this example showed using preprocessors in your connection settings, you can use them anywhere in your configuration file.
Schedules
We'll round the overview out by looking at how you can change the way ShadowTraffic picks what generators to run. By default, ShadowTraffic round-robin executes your generators, backing off when you ask it to throttle between events.
But sometimes, you'll want to execute generators in a series based on time or number of events. In other words, you'll want to schedule generators.
ShadowTraffic provides a top-level construct, schedule
, to do just that. You specify stages
, which an array of generates to run to completion. ShadowTraffic advances to the next stage when all the generators in the current stage terminate.
One common use case for this is seeding data, especially across multiple systems. Imagine that you want your sensor IDs stored in a Postgres table before any data is generated to Kafka. Here's how you'd do that:
{
"generators": [
{
"name": "sensors",
"connection": "pg",
"table": "sensorCatalog",
"row": {
"id": {
"_gen": "uuid"
}
}
},
{
"name": "readings",
"connection": "kafka",
"topic": "sensorReadings",
"value": {
"id": {
"_gen": "lookup",
"connection": "pg",
"table": "sensorCatalog",
"path": ["row", "id"]
},
"reading": { "_gen": "normalDistribution", "mean": 60, "sd": 5, "decimals": 2 }
}
}
],
"schedule": {
"stages": [
{
"generators": ["sensors"],
"overrides": {
"sensors": {
"localConfigs": {
"maxEvents": 5
}
}
}
},
{
"generators": ["readings"]
}
]
},
"connections": {
"pg": {
"kind": "postgres",
"tablePolicy": "dropAndCreate",
"connectionConfigs": {
"host": "localhost",
"port": 5432,
"username": "postgres",
"password": "postgres",
"db": "mydb"
}
} ,
"kafka": {
"kind": "kafka",
"producerConfigs": {
"bootstrap.servers": "localhost:9092",
"key.serializer": "io.shadowtraffic.kafka.serdes.JsonSerializer",
"value.serializer": "io.shadowtraffic.kafka.serdes.JsonSerializer"
}
}
}
}
Notice a few things here:
- Generators are give
name
fields. This is what is references in thestages
array. - You can specify
overrides
to change how a generator behaves. In this instance, the schedule specifies thatsensors
should only produce 5 elements and then stop. - Explicit connection names are supplied in the generator definitions because there are multiple connections.
When you run this simplified example, you'll see 5
sensors emitted to Postgres, followed by an unlimited number of readings to Kafka:
{ "table": "sensorCatalog", "row": { "id":"01e13f58-4e9a-f9d4-7b1f-b12de230013c" } }
{ "table": "sensorCatalog", "row": { "id":"3ad10b49-97a9-4a6b-dda5-10aedc476a88" } }
{ "table": "sensorCatalog", "row": { "id":"79139536-f960-016b-1f43-26e6a2b680e3" } }
{ "table": "sensorCatalog", "row": { "id":"72cdf45a-7f4a-a086-73f0-3f720c87efad" } }
{ "table": "sensorCatalog", "row": { "id":"5d5cb0f9-24d8-b392-1dc2-88d7884bf5a4" } }
{ "topic": "sensorReadings", "value": { "id":"01e13f58-4e9a-f9d4-7b1f-b12de230013c","reading":67.53 } }
{ "topic": "sensorReadings", "value": { "id":"5d5cb0f9-24d8-b392-1dc2-88d7884bf5a4","reading":60.69 } }
{ "topic": "sensorReadings", "value": { "id":"3ad10b49-97a9-4a6b-dda5-10aedc476a88","reading":66.86 } }
...
And with that, you have a solid understanding of the main constructs in ShadowTraffic.
Want to learn more? Try the video guides or the cheat sheet.