learn

Overview

ShadowTraffic is a product that helps you rapidly simulate production traffic to your backend—primarily Apache Kafka, Postgres, S3, webhooks, and a few others.

More concretely, it's a container that you deploy on your own infrastructure. You give it a single JSON configuration file, instructing it how to generate data. It then indefinitely produces streams of data to your backend.

Why a container? Because your backend is sensitive, and bringing ShadowTraffic directly to your infrastructure minimizes networking and authentication headaches.

The API

Core concepts

Let's get to the most important question, shall we? What do you have to do to get ShadowTraffic to generate data?

ShadowTraffic's API is simple and intuitive. Let's walk through an example of generating user data. Imagine an example of your data looks like this:

{
  "person": {
    "name": "John Doe",
    "age": 30,
    "details": {
      "team": "red",
      "occupation": "Software Engineer"
    }
  }
}

To generate this "kind" of data with ShadowTraffic, you simply replace each of the leaf values with an generator invocation (any map that has a key _gen).

{
  "person": {
    "name": { "_gen": "string", "expr": "#{Name.full_name}"},
    "age": { "_gen" : "uniformDistribution", "bounds": [ 0, 90 ] },
    "details": {
      "team": {
        "_gen": "oneOf",
        "choices": [
          "red", "blue", "yellow"
          ]
      },
      "occupation": { "_gen": "string", "expr": "#{Job.position}" }
    }
  }
}

You create one or more of these generators, specify where you want to send the data, and ShadowTraffic takes care of the rest.

Altogether, your configuration file looks something like this:

{
  "generators": [
  ],
  "globalConfiguration": {
  },
  "connections": {
  }
}

Generators

A generator tells ShadowTraffic what "kind" of data to make. One part of that are the specific attributes to create, as we saw above, but another part is what "shape" the data should be.

For instance, if the generator is sending data to Kafka, it'll enforce that a key or value be specified. If it's sending data to Postgres, it'll enforce that a row be specified.

{
    "table": "customers",
    "row": {
        "customerId": { "_gen": "uuid" },
        "name": { "_gen": "string", "expr": "#{Name.full_name}" },
        "team": {
            "_gen": "weightedOneOf",
            "choices": [
                { "weight": 20, "value": "red" },
                { "weight": 40, "value": "blue" },
                { "weight": 30,  "value": "yellow" }
            ]
        },
        "address": { "_gen": "string", "expr": "#{Address.full_address}" },
        "membership": {
            "_gen": "oneOf",
            "choices": [
                "bronze", "silver", "gold"
            ]
        }
    }
}

When you have more than one generator, ShadowTraffic executes them in a round-robin fashion unless any generator is configured to throttle.

Connections

When ShadowTraffic generates data, it writes it directly to your backend. How does that work?

You author a number of generators, then specify one or more connections about where to send the data to. That specification is a map of names to connection details. Here's a basic example generating data to a Kafka cluster:

{
  "generators" : [ {
    "topic" : "testTopic",
    "value" : {
      "_gen" : "oneOf",
      "choices" : [ "👍", "🔥", "❤️" ]
    }
  } ],
  "connections" : {
    "dev-kafka" : {
      "kind" : "kafka",
      "producerConfigs" : {
        "bootstrap.servers" : "localhost:9092",
        "key.serializer" : "io.shadowtraffic.kafka.serdes.JsonSerializer",
        "value.serializer" : "io.shadowtraffic.kafka.serdes.JsonSerializer"
      }
    }
  }
}

Because there is only one connection, ShadowTraffic infers that the generator should use that one. But If you want, you can explicitly set the connection for the generator:

{
  "topic" : "testTopic",
  "value" : {
    "_gen" : "oneOf",
    "choices" : [ "👍", "🔥", "❤️" ]
  },
  "connection" : "dev-kafka"
}

Generator schemas

ShadowTraffic enforces that generators have a particular shape depending on what connection they are associated with.

For Kafka connections, you need a topic, and key or value.
For Postgres connections, you need a table and row.

Each connection documents it corresponding generator schema.

Multiple connections

ShadowTraffic is able to open multiple connections simultaneously. This is useful because most backends consist of not one, but many databases, queues, servers, etc.

Here's an example of simultaneously sending data to Kafka and Postgres. Note that because there's more than one listed connection, you have to specify the connection name at each generator. This tells ShadowTraffic what "kind" of data you want to generate (key/value for Kafka, relational for Postgres, etc).

{
  "generators" : [ {
    "connection" : "dev-pg",
    "table" : "customers",
    "row" : {
      "customerId" : {
        "_gen" : "uuid"
      },
      "name" : {
        "_gen" : "string",
        "expr" : "#{Name.full_name}"
      },
      "membership" : {
        "_gen" : "oneOf",
        "choices" : [ "bronze", "silver", "gold" ]
      }
    }
  }, {
    "connection" : "dev-kafka",
    "topic" : "orders",
    "value" : {
      "orderId" : {
        "_gen" : "uuid"
      },
      "customerId" : {
        "_gen" : "lookup",
        "connection" : "dev-pg",
        "table" : "customers",
        "path" : [ "customerId" ]
      }
    }
  } ],
  "connections" : {
    "dev-pg" : {
      "kind" : "postgres",
      "connectionConfigs" : {
        "host" : "localhost",
        "port" : 5432,
        "username" : "postgres",
        "password" : "postgres",
        "db" : "mydb"
      }
    },
    "dev-kafka" : {
      "kind" : "kafka",
      "producerConfigs" : {
        "bootstrap.servers" : "localhost:9092",
        "key.serializer" : "io.shadowtraffic.kafka.serdes.JsonSerializer",
        "value.serializer" : "io.shadowtraffic.kafka.serdes.JsonSerializer"
      }
    }
  }
}

Variables

Sometimes you might want to use a generated value in multiple places. For instance, you might want to randomly throttle messages by some duration, and then record that duration as part of the message. Variables let you do this.

There are two places you can define variables: in the vars and varsOnce sections of a generator. vars are evaluated on each generator run; varsOnce are evaluated only once and then locked for the lifetime of the generator. varsOnce is useful if you're defining a constant value, like a number, or a random value that you never want to change, like an identifier.

Here's a quick example:

{
    "topic": "variables",
    "vars": {
        "x": {
            "_gen": "normalDistribution", "mean": 10, "sd": 5
        }
    },
    "varsOnce": {
        "y": {
            "_gen": "normalDistribution", "mean": 10, "sd": 5
        }
    },
    "value": {
        "x": {
            "_gen": "var",
            "var": "x"
        },
        "y": {
            "_gen": "var",
            "var": "y"
        }
    }
}

Which produces:

{"x":13.180123110382171,"y":4.528384156433425}
{"x":3.546348502763389,"y":4.528384156433425}
{"x":6.8976068063563325,"y":4.528384156433425}
{"x":12.23268503503384,"y":4.528384156433425}
{"x":6.664186541090108,"y":4.528384156433425}
{"x":9.002299241358545,"y":4.528384156433425}

Forks

Forks are one of those things that's best explained through an example.

Let's say you want to model 1,000 sensors, each of which emits a new reading every 5 seconds. How would you do that?

For starters, you might model a single sensor generator like so:

{
    "table": "sensors",
    "row": {
        "sensorId": "sensor-1",
        "timestamp": { "_gen": "now" },
        "reading": {
            "_gen": "normalDistribution",
            "mean": 50,
            "sd": 5
        }
    },
    "localConfigs": {
        "throttle": {
            "ms": 5000
        }
    }
}

This works, but what about the other 999 sensors? Absent any other features, you have two options:

Programatically create 999 other generators that look exactly the same, but vary the sensorId field.
Make an array of 1,000 elements like sensor-1, sensor-2, ... and randomly draw values from it with oneOf.

(2) could work because randomly choosing from the array would yield roughly an equal number of updates for each sensor, but it's not so easy. You also need to divide the throttle value - 5000 - by 1000 sensors, so that the generator runs every 5 milliseconds. That ensures each generator continues to update roughly once every 5 seconds.

Neither option is appealing, which is why forks exist.

When you fork a generator, it gets dynamically cloned and run in parallel. So for our sensor example, you can express 1,000 sensors updating every 5 seconds like so:

{
    "table": "sensors",
    "fork": {
        "key": {
            "_gen": "sequentialString",
            "expr": "sensor-~d"
        },
        "maxForks": 1000
    },
    "row": {
        "sensorId": {
            "_gen": "var", "var": "forkKey"
        },
        "timestamp": {
            "_gen": "toInstant",
            "ms": {
                "_gen": "now"
            }
        },
        "reading": {
            "_gen": "normalDistribution",
            "mean": 50,
            "sd": 5
        }
    },
    "localConfigs": {
        "throttle": {
            "ms": 5000
        }
    }
}

Notice a few things:

fork tells ShadowTraffic to run more instances of this generator.
maxForks tells it how many instances to run.
The key tells it how to differentiate the different forks.
A variable named forkKey becomes available so that you know what fork you're in (e.g. sensor-1 or sensor-500).
The throttle value is still set to 5000. You program as if you were only working on one sensor, but ShadowTraffic scales it up to 1,000 sensors.

A few other notes about how forks behave:

Fork keys must be unique to spawn a new fork. If you dynamically create fork keys, and that key is already running, a new fork won't be created. This is desirable so that existing forks aren't disturbed.
If a fork runs and then stops (perhaps because it was set to generate a limited number of events), another fork can spawn with that key afterward.
fork can a stagger parameter, which controls how long to wait between spawning forks.
If an array is provided to the key of fork, a fork for each element will start, and no more forks will be spawned, as if by setting maxForks to the number of elements in the array.

There are many other situations where you might want model where forks can help:

Planes flying through the sky
Phone calls starting and stopping
Orders being fulfilled

learn

The API​

Core concepts​

Generators​

Connections​

Generator schemas​

Multiple connections​

Variables​

Forks​