Skip to main content

Geo functions

geolocation

Commentary

added in 0.2.5

Generates real geolocations in a variety of formats. This generator requires downloading an external data set. Please see below.

Getting the data sets

This generator is powered using public domain data. For each country you want to generate locations in, you need the corresponding data set. Download it to your machine and follow the rest of the instructions.

Configuring ShadowTraffic

To give ShadowTraffic access to the geolocation data, you need to do two things.

First, volume mount the data set into the ShadowTraffic container. You can put it in whatever path you wish. For example:

docker run -v $(pwd)/YourDataSet.csv:/home/YourDataSet.csv ...

Second, add the file to your globalConfigs in your ShadowTraffic file, under the corresponding country name. Be sure to use the path you mounted to inside the ShadowTraffic container. For example, to add geolocation data for the United States:

{
"generators": [
],
"globalConfigs": {
"geolocation": {
"countryFiles": {
"United States": "/home/YourDataSet.csv"
}
}
},
"connections": {
}
}

Invoking the generator

By default, when you execute this generator, it will choose geolocations anywhere within the specified country.

You can, however, set a number of other narrowing criteria depending on the chosen country. For example, to narrow locations within the United States, you can set the state, city, and zipCode parameters.

All narrowing parameters take the form of an array, and the search ORs the elements together. For example, supplying the parameter "state": ["TX", "NY"] searches for locations in Texas or New York.

If you supply multiple narrowing parameters, like state and city, the search ANDs the parameters together. If you added "city": ["Austin", "New York City"] to the previous example, it would only return locations in Austin of Texas or New York City of New York.

All search criteria must match the underlying data set. ShadowTraffic doesn't alter it's capitalization, whitespace, etc.

Output formats

This generator can output geolocation data in a variety of formats by setting the format parameter.

Address

If not explicitly set, the address format is used, which generates a complete address string according to the country's mailing convention. For example, in the United States, that might be: 72 Oak St, Rochester NY 14602.

Coordinates

Setting format to coordinates generates maps of the form:

{
"latitude": 30.1853900100001,
"longitude": -97.888961035
}

Object

Setting format to object generates maps of structured data. The specification of this structure depends on the country you're generating data for.

Multiples

Setting format to an array of formats will return a map whose keys are the format names and whose values are the formatted locations. 1

Caching

When you run ShadowTraffic and a particular geolocation generator for the first time, the supporting data will be loaded from scratch. Depending on the size of the data set, this can take a little bit.

After that, all subsequent runs of that generator will be cached in an embedded, on-disk database.

Changing any search criteria, like city, will force a reload of the data.

To improve development cycles, it's recommended that you preserve that cache by mounting another volume into your host container to the /tmp/geolocations path. For example, you might run ShadowTraffic like this:

docker run -v $(pwd)/geolocations:/tmp ...

If for some reason you need to wipe out the cache, just delete your local mount.


Examples

Generating US addresses

Generate string addresses in the United States.

{
"_gen": "geolocation",
"country": "United States"
}

Generating Texas addresses

Use state, city, and other parameters to narrow the location candidatres. This examples generates geolocations only in Austin and Houston in the state of Texas.

{
"_gen": "geolocation",
"country": "United States",
"state": [
"TX"
],
"city": [
"Austin",
"Houston"
]
}

Generating multiple formats

If you set format to an array, geolocation will return an object of format -> data for the location. For example, if you requested object and address, each event would return an map with two keys: one object-version and one string address-version of the same location.

{
"topic": "locations",
"value": {
"_gen": "geolocation",
"country": "United States",
"format": [
"object",
"address"
]
}
}

Specification

JSON schema

{
"type": "object",
"properties": {
"country": {
"type": "string",
"enum": [
"United States"
]
},
"state": {
"type": "array",
"minItems": 1,
"items": {
"type": "string"
}
},
"city": {
"type": "array",
"minItems": 1,
"items": {
"type": "string"
}
},
"zipCode": {
"type": "array",
"minItems": 1,
"items": {
"type": "string"
}
},
"format": {
"oneOf": [
{
"type": "string",
"enum": [
"address",
"coordinates",
"object"
]
},
{
"type": "array",
"items": {
"type": "string",
"enum": [
"address",
"coordinates",
"object"
]
},
"minItems": 1
}
]
}
},
"required": [
"country"
]
}