Skip to main content

Connections

motherduck

Commentary

added in 1.10.0

Connects to a Motherduck database.

You can connect by either supplying a MotherDuck token and database name 1 or a JDBC url 2.

The MotherDuck connection writes data asynchronously. A new transaction will be executed every 1000 ms or 10000 rows, whichever happens first. You can override row size/write timing with batchConfigs. 3

Automatic table creation

By default for convenience, any tables ShadowTraffic writes to will be automatically created. This makes it easier to iterate on your generators without flipping back and forth between ShadowTraffic and MotherDuck.

ShadowTraffic does this by scanning the structure of your generators and creating a suitable DDL, then executing it on your behalf.

If ShadowTraffic doesn't create the table exactly as you'd want it, you can override each column using the sqlHint function modifier. 4

If the table already exists but you'd like to clear out any existing data, set tablePolicy to dropAndCreate. 5

Manual table control

If you don't ShadowTraffic to control your tables, you can turn this behavior off by setting tablePolicy to manual in the connection map. It'll then be up to you to make sure your tables exist before trying to write to them. 6

Generating updates and deletes

By default, ShadowTraffic only issues INSERT statements to your tables. But sometimes, it might be useful to issue UPDATE and DELETE statements too.

You can control the statement type by setting the op key on your generator. op must be one of insert, update, or delete.

For the latter two, you must also include a where key on your generator, which contains a map of column name to value. You need this because when you issue an update or a delete, you need to tell MotherDuck what rows you want to change. The where map is used to check equality and select rows that match. 7

Generating reader traffic

In addition to mutating events (inserts, updates, and deletes), this connection also has the ability to issue read traffic. In other words, it's able to send arbitrary SQL to MotherDuck with the intent of running ordinary SELECT queries. Query results are realized in memory and discard. 8


Examples

Connecting with a token

You can authenticate with MotherDuck in two different ways. The first, shown here, is to supply a token and db. ShadowTraffic will figure out the right connection endpoint based on these two parameters. You can read about how to obtain a token from MotherDuck's docs.

It's advisable to keep your token in an environment variable (shown here) to avoid hardcoding it in your ShadowTraffic file.

{
"connections": {
"md": {
"kind": "motherduck",
"connectionConfigs": {
"token": {
"_gen": "env",
"var": "MOTHERDUCK_TOKEN"
},
"db": "mydb"
}
}
}
}

Connecting with a JDBC URL

The second way you can authenticate is to directly pass a JDBC url. This is useful if you want to control the entire connection string, or perhaps transparently redirect the data to DuckDB.

{
"connections": {
"md": {
"kind": "motherduck",
"connectionConfigs": {
"jdbcUrl": "jdbc:duckdb:md:my_db?motherduck_token"
}
}
}
}

Setting batch parameters

Set batchConfigs to control how frequently transactions are written. In this example, a transaction is executed whenever 500 milliseconds pass or 5000 rows are accumulated—whichever comes first.

{
"connections": {
"md": {
"kind": "motherduck",
"connectionConfigs": {
"token": {
"_gen": "env",
"var": "MOTHERDUCK_TOKEN"
},
"db": "mydb"
},
"batchConfigs": {
"lingerMs": 500,
"batchRows": 5000
}
}
}
}

Automatic table creation

By default, table sandbox doesn't need to be defined. ShadowTraffic will automatically create it for you.

{
"generators": [
{
"table": "sandbox",
"row": {
"id": {
"_gen": "uuid"
},
"level": {
"_gen": "uniformDistribution",
"bounds": [
1,
10
]
},
"active": {
"_gen": "boolean"
}
}
}
],
"connections": {
"md": {
"kind": "motherduck",
"connectionConfigs": {
"token": {
"_gen": "env",
"var": "MOTHERDUCK_TOKEN"
},
"db": "mydb"
}
}
}
}

In this case, sandbox will be created like so:

┌─────────────┬─────────────┬──────┬─────┬─────────┬───────┐
│ column_name ┆ column_type ┆ null ┆ key ┆ default ┆ extra │
╞═════════════╪═════════════╪══════╪═════╪═════════╪═══════╡
│ id ┆ VARCHAR ┆ YES ┆ ┆ ┆ │
│ level ┆ DOUBLE ┆ YES ┆ ┆ ┆ │
│ active ┆ BOOLEAN ┆ YES ┆ ┆ ┆ │
└─────────────┴─────────────┴──────┴─────┴─────────┴───────┘

Automatic table truncation

Use dropAndCreate to automatically clear out any rows in an existing target table. This will executing a cascading drop, deleting any dependent objects too.

{
"kind": "motherduck",
"tablePolicy": "dropAndCreate",
"connectionConfigs": {
"token": {
"_gen": "env",
"var": "MOTHERDUCK_TOKEN"
},
"db": "mydb"
}
}

Overriding column types

Use the optional sqlHint function modifier on any generator to override how its column is defined.

{
"table": "sandbox",
"row": {
"id": {
"_gen": "uuid",
"sqlHint": "TEXT NOT NULL"
},
"level": {
"_gen": "uniformDistribution",
"bounds": [
1,
10
],
"sqlHint": "INTEGER"
},
"active": {
"_gen": "boolean"
}
}
}

Now, sandbox will like this:

┌─────────────┬─────────────┬──────┬─────┬─────────┬───────┐
│ column_name ┆ column_type ┆ null ┆ key ┆ default ┆ extra │
╞═════════════╪═════════════╪══════╪═════╪═════════╪═══════╡
│ id ┆ VARCHAR ┆ NO ┆ ┆ ┆ │
│ level ┆ INTEGER ┆ YES ┆ ┆ ┆ │
│ active ┆ BOOLEAN ┆ YES ┆ ┆ ┆ │
└─────────────┴─────────────┴──────┴─────┴─────────┴───────┘

Manual table creation

Use manual to prevent ShadowTraffic from automatically creating tables for you. When you do this, your tables must already exist before ShadowTraffic tries to write to them.

{
"kind": "motherduck",
"tablePolicy": "manual",
"connectionConfigs": {
"token": {
"_gen": "env",
"var": "MOTHERDUCK_TOKEN"
},
"db": "mydb"
}
}

Upserts and deletes

Use op and where to issue upserts and deletes. It's up to you to make sure that you don't try to update or delete rows that don't exist.

This example uses fork and stateMachine to generate many simultaneous rows being inserted, updated, and deleted.

In the first state, insertIt, a random value of about 50 is generated. In the second state, updateIt, the value gets updated to one of 1, 5, or 10. Finally, in deleteIt, the entire row is deleted.

id is generated once as a fork key and used as the basis for equality to perform updates and deletes.

{
"table": "sandbox",
"fork": {
"key": {
"_gen": "uuid"
}
},
"row": {
"id": {
"_gen": "var",
"var": "forkKey"
}
},
"stateMachine": {
"_gen": "stateMachine",
"initial": "insertIt",
"transitions": {
"insertIt": "updateIt",
"updateIt": "deleteIt"
},
"states": {
"insertIt": {
"row": {
"value": {
"_gen": "normalDistribution",
"mean": 50,
"sd": 15
}
}
},
"updateIt": {
"op": "update",
"where": {
"id": {
"_gen": "var",
"var": "forkKey"
}
},
"row": {
"value": {
"_gen": "oneOf",
"choices": [
1,
5,
10
]
}
}
},
"deleteIt": {
"op": "delete",
"where": {
"id": {
"_gen": "var",
"var": "forkKey"
}
}
}
}
}
}

A random snapshot of the table might look like:

duckdb> select * from sandbox limit 10;
┌───────────────────────────────────────┬───────┐
│ id ┆ value │
╞═══════════════════════════════════════╪═══════╡
│ 07f217c6-1c82-e1cb-6dd8-fc614ff1a6e3 ┆ 49 │
│ 83d35207-ac3f-b8a7-fd7e-c82e1bdc3b05 ┆ 10 │
│ 184a0e26-237f-915b-4d97-5dc1808d91df ┆ 37 │
└───────────────────────────────────────┴───────┘

Timestamp types

If your schema uses a timestamp type (timestamp, timestamp with time zone and friends), use serialize around your timestamp (Unix ms) to write it with a compatible type. This works because MotherDuck is wire-compatible with Postgres.

The specific format of the timestamp will defer to the schema set in MotherDuck.

{
"table": "sandbox",
"row": {
"createdAt": {
"_gen": "now",
"serialize": {
"type": "postgresTimestamp"
}
}
}
}

Generating reads

To generate read events, mark the top-level generator with kind set to reader (generator kind is default set to writer, but it is explicitly shown here). Then set query to the SQL query you want to execute.

In this example, there are two generators.

  1. The first writes data to the table foo.
  2. The second looks up data previously written to foo and uses to run targeted SELECT queries against that same table. It uses interpolate to build the SQL strings.

These two generators interleave, issuing a mix of read and write traffic to MotherDuck.

{
"generators": [
{
"kind": "writer",
"table": "foo",
"row": {
"bar": {
"_gen": "uuid"
}
}
},
{
"kind": "reader",
"query": {
"_gen": "interpolate",
"template": "SELECT * FROM foo WHERE bar=':id';",
"params": {
"id": {
"_gen": "lookup",
"table": "foo",
"path": [
"row",
"bar"
]
}
}
}
}
],
"connections": {
"md": {
"kind": "motherduck",
"connectionConfigs": {
"token": {
"_gen": "env",
"var": "MOTHERDUCK_TOKEN"
},
"db": "mydb"
}
}
}
}

Connection pooling

By default, each MotherDuck connection instance establishes one network connection to the database. You can increase this by setting poolSize, which will establish n unique connections (and sessions). ShadowTraffic will then emit events evenly across each connection in the pool. This is useful if you want stripe traffic across many nodes in a MotherDuck cluster from a single ShadowTraffic generator.

Note that ShadowTraffic makes no gaurantees about which events will sent on which connection. If your workload requires atomicity, you probably don't want to enable this.

{
"connections": {
"md": {
"kind": "motherduck",
"poolSize": 3,
"connectionConfigs": {
"token": {
"_gen": "env",
"var": "MOTHERDUCK_TOKEN"
},
"db": "mydb"
}
}
}
}

Specification

Connection JSON schema

{
"type": "object",
"properties": {
"kind": {
"type": "string",
"const": "motherduck"
},
"connectionConfigs": {
"type": "object",
"properties": {
"jdbcUrl": {
"type": "string"
},
"token": {
"type": "string"
},
"db ": {
"type": "string"
}
},
"oneOf": [
{
"required": [
"jdbcUrl"
]
},
{
"required": [
"token",
"db"
]
}
]
},
"batchConfigs": {
"type": "object",
"properties": {
"lingerMs": {
"type": "integer",
"minimum": 0
},
"batchRows": {
"type": "integer",
"minimum": 0
}
}
},
"poolSize": {
"type": "integer",
"minimum": 1
},
"tablePolicy": {
"type": "string",
"enum": [
"manual",
"create",
"dropAndCreate"
]
}
},
"required": [
"connectionConfigs"
]
}

Writer generator JSON schema

{
"type": "object",
"properties": {
"connection": {
"type": "string"
},
"name": {
"type": "string"
},
"table": {
"type": "string"
},
"row": {
"type": "object"
},
"op": {
"type": "string",
"enum": [
"insert",
"update",
"delete"
]
},
"where": {
"type": "object"
},
"localConfigs": {
"type": "object",
"properties": {
"throttleMs": {
"oneOf": [
{
"type": "number",
"minimum": 0
},
{
"type": "object",
"properties": {
"_gen": {
"type": "string"
}
},
"required": [
"_gen"
]
}
]
},
"maxEvents": {
"oneOf": [
{
"type": "integer",
"minimum": 0
},
{
"type": "object",
"properties": {
"_gen": {
"type": "string"
}
},
"required": [
"_gen"
]
}
]
},
"kafkaKeyProtobufHint": {
"type": "object",
"properties": {
"schemaFile": {
"type": "string"
},
"message": {
"type": "string"
}
},
"required": [
"schemaFile",
"message"
]
},
"jsonSchemaHint": {
"type": "object"
},
"maxBytes": {
"type": "integer",
"minimum": 1
},
"discard": {
"type": "object",
"properties": {
"rate": {
"type": "number",
"minimum": 0,
"maximum": 1
},
"retainHistory": {
"type": "boolean"
}
},
"required": [
"rate"
]
},
"repeat": {
"type": "object",
"properties": {
"rate": {
"type": "number",
"minimum": 0,
"maximum": 1
},
"times": {
"oneOf": [
{
"type": "integer",
"minimum": 0
},
{
"type": "object",
"properties": {
"_gen": {
"type": "string"
}
},
"required": [
"_gen"
]
}
]
}
},
"required": [
"rate",
"times"
]
},
"protobufSchemaHint": {
"type": "object",
"patternProperties": {
"^.*$": {
"type": "object",
"properties": {
"schemaFile": {
"type": "string"
},
"message": {
"type": "string"
}
},
"required": [
"schemaFile",
"message"
]
}
}
},
"maxHistoryEvents": {
"type": "integer",
"minimum": 0
},
"maxMs": {
"type": "integer",
"minimum": 0
},
"time": {
"type": "integer"
},
"events": {
"type": "object",
"properties": {
"exactly": {
"oneOf": [
{
"type": "integer",
"minimum": 0
},
{
"type": "object",
"properties": {
"_gen": {
"type": "string"
}
},
"required": [
"_gen"
]
}
]
}
}
},
"delay": {
"type": "object",
"properties": {
"rate": {
"type": "number",
"minimum": 0,
"maximum": 1
},
"ms": {
"oneOf": [
{
"type": "integer",
"minimum": 0
},
{
"type": "object",
"properties": {
"_gen": {
"type": "string"
}
},
"required": [
"_gen"
]
}
]
}
},
"required": [
"rate",
"ms"
]
},
"history": {
"type": "object",
"properties": {
"events": {
"type": "object",
"properties": {
"max": {
"type": "integer",
"minimum": 0
}
}
}
}
},
"avroSchemaHint": {
"type": "object"
},
"throttle": {
"type": "object",
"properties": {
"ms": {
"oneOf": [
{
"type": "number",
"minimum": 0
},
{
"type": "object",
"properties": {
"_gen": {
"type": "string"
}
},
"required": [
"_gen"
]
}
]
}
}
},
"throughput": {
"oneOf": [
{
"type": "integer",
"minimum": 1
},
{
"type": "object",
"properties": {
"_gen": {
"type": "string"
}
},
"required": [
"_gen"
]
}
]
},
"timeMultiplier": {
"oneOf": [
{
"type": "number"
},
{
"type": "object",
"properties": {
"_gen": {
"type": "string"
}
},
"required": [
"_gen"
]
}
]
},
"kafkaValueProtobufHint": {
"type": "object",
"properties": {
"schemaFile": {
"type": "string"
},
"message": {
"type": "string"
}
},
"required": [
"schemaFile",
"message"
]
}
}
}
},
"required": [
"table",
"row"
],
"allOf": [
{
"if": {
"properties": {
"op": {
"const": "update"
}
}
},
"then": {
"required": [
"where"
]
}
},
{
"if": {
"properties": {
"op": {
"const": "delete"
}
}
},
"then": {
"required": [
"where"
]
}
}
]
}

Reader generator JSON schema

{
"type": "object",
"properties": {
"connection": {
"type": "string"
},
"name": {
"type": "string"
},
"query": {
"oneOf": [
{
"type": "string"
},
{
"type": "object",
"properties": {
"_gen": {
"type": "string"
}
},
"required": [
"_gen"
]
}
]
},
"localConfigs": {
"type": "object",
"properties": {
"throttleMs": {
"oneOf": [
{
"type": "number",
"minimum": 0
},
{
"type": "object",
"properties": {
"_gen": {
"type": "string"
}
},
"required": [
"_gen"
]
}
]
},
"maxEvents": {
"oneOf": [
{
"type": "integer",
"minimum": 0
},
{
"type": "object",
"properties": {
"_gen": {
"type": "string"
}
},
"required": [
"_gen"
]
}
]
},
"kafkaKeyProtobufHint": {
"type": "object",
"properties": {
"schemaFile": {
"type": "string"
},
"message": {
"type": "string"
}
},
"required": [
"schemaFile",
"message"
]
},
"jsonSchemaHint": {
"type": "object"
},
"maxBytes": {
"type": "integer",
"minimum": 1
},
"discard": {
"type": "object",
"properties": {
"rate": {
"type": "number",
"minimum": 0,
"maximum": 1
},
"retainHistory": {
"type": "boolean"
}
},
"required": [
"rate"
]
},
"repeat": {
"type": "object",
"properties": {
"rate": {
"type": "number",
"minimum": 0,
"maximum": 1
},
"times": {
"oneOf": [
{
"type": "integer",
"minimum": 0
},
{
"type": "object",
"properties": {
"_gen": {
"type": "string"
}
},
"required": [
"_gen"
]
}
]
}
},
"required": [
"rate",
"times"
]
},
"protobufSchemaHint": {
"type": "object",
"patternProperties": {
"^.*$": {
"type": "object",
"properties": {
"schemaFile": {
"type": "string"
},
"message": {
"type": "string"
}
},
"required": [
"schemaFile",
"message"
]
}
}
},
"maxHistoryEvents": {
"type": "integer",
"minimum": 0
},
"maxMs": {
"type": "integer",
"minimum": 0
},
"time": {
"type": "integer"
},
"events": {
"type": "object",
"properties": {
"exactly": {
"oneOf": [
{
"type": "integer",
"minimum": 0
},
{
"type": "object",
"properties": {
"_gen": {
"type": "string"
}
},
"required": [
"_gen"
]
}
]
}
}
},
"delay": {
"type": "object",
"properties": {
"rate": {
"type": "number",
"minimum": 0,
"maximum": 1
},
"ms": {
"oneOf": [
{
"type": "integer",
"minimum": 0
},
{
"type": "object",
"properties": {
"_gen": {
"type": "string"
}
},
"required": [
"_gen"
]
}
]
}
},
"required": [
"rate",
"ms"
]
},
"history": {
"type": "object",
"properties": {
"events": {
"type": "object",
"properties": {
"max": {
"type": "integer",
"minimum": 0
}
}
}
}
},
"avroSchemaHint": {
"type": "object"
},
"throttle": {
"type": "object",
"properties": {
"ms": {
"oneOf": [
{
"type": "number",
"minimum": 0
},
{
"type": "object",
"properties": {
"_gen": {
"type": "string"
}
},
"required": [
"_gen"
]
}
]
}
}
},
"throughput": {
"oneOf": [
{
"type": "integer",
"minimum": 1
},
{
"type": "object",
"properties": {
"_gen": {
"type": "string"
}
},
"required": [
"_gen"
]
}
]
},
"timeMultiplier": {
"oneOf": [
{
"type": "number"
},
{
"type": "object",
"properties": {
"_gen": {
"type": "string"
}
},
"required": [
"_gen"
]
}
]
},
"kafkaValueProtobufHint": {
"type": "object",
"properties": {
"schemaFile": {
"type": "string"
},
"message": {
"type": "string"
}
},
"required": [
"schemaFile",
"message"
]
}
}
}
},
"required": [
"query"
]
}