Skip to main content

Misc functions

profile

Commentary

added in 1.15.0

Generates demographically realistic profiles of people.

Many times when you create synthetic data, you'll need to represent some kind of table or data stream about users or customers. And some of the time, their particular details—like names and email addresses and company names—don't matter that much. For instance, you can generate a profile, say for Fred Smith, who apparently works at Acme, with the fictitious email address purple_tuba@gmail.com. For these situations, you can use the string function, which is underpinned by canned values from the Java Faker library.

But other times, it might be very important that profiles you generate look a lot more realistic. For instance, you might want:

  • A full name that follows demographic probabilities, like Edgar Rivera
  • A real, operating company, like Payscale
  • A matching email address, like erivera@payscale.com

Drawn from a 2023 Scientific Data paper based on United States voter registrations, this function estimates demographically probable name distributions. Company names and domain suffixes come from the Open Data 500.

Note that the first time this function is invoked per ShadowTraffic instance, you'll experience a small delay of a few seconds while the backing data is loaded into memory. From then on, the data is cached and generating profiles will remain fast.


Examples

Generating traits

At minimum, specify an array of traits to generate for the profile. Choices are:

  • firstName
  • lastName
  • email
  • company
  • website
{
"_gen": "profile",
"traits": [
"firstName",
"lastName",
"email",
"company",
"website"
]
}
[
{
"email": "aliawarren@overturecorp.com",
"website": "www.overturecorp.com",
"company": "Overture Technologies",
"lastName": "Warren",
"firstName": "Alia"
},
{
"email": "s.deleon@scaleunlimited.com",
"website": "www.scaleunlimited.com",
"company": "Scale Unlimited",
"lastName": "Deleon",
"firstName": "Sandra"
},
{
"email": "klansberry@childcaredesk.com",
"website": "childcaredesk.com",
"company": "Child Care Desk",
"lastName": "Lansberry",
"firstName": "Katelyn"
},
{
"email": "a.carter@computerpackages.com",
"website": "www.computerpackages.com",
"company": "Computer Packages Inc",
"lastName": "Carter",
"firstName": "Andrew"
},
{
"email": "chen@blr.com",
"website": "www.blr.com",
"company": "Business and Legal Resources",
"lastName": "Chen",
"firstName": "Shelly"
}
]

Changing email formats

By default, emails will be randomly formatted with a variety of strategies for the person's first and last name. You can specify a particular strategy with the email and format keys. Choices are:

  • first
  • last
  • firstLast
  • first.last
  • firstInitialLast
  • firstInitial.last
{
"_gen": "profile",
"traits": [
"firstName",
"lastName",
"email"
],
"email": {
"format": "firstInitialLast"
}
}
[
{
"email": "awarren@overturecorp.com",
"lastName": "Warren",
"firstName": "Alia"
},
{
"email": "mmcmillan@rezolvegroup.com",
"lastName": "Mcmillan",
"firstName": "Maisha"
},
{
"email": "lchetram@panjiva.com",
"lastName": "Chetram",
"firstName": "Lakeisha"
},
{
"email": "jcandell@energypoints.com",
"lastName": "Candell",
"firstName": "Jorge"
},
{
"email": "ccollins@lumesis.com",
"lastName": "Collins",
"firstName": "Carolyn"
}
]

Email formats as a function

If you want to dynamically choose between email formats, you can specify a function that returns a valid format.

{
"_gen": "profile",
"traits": [
"firstName",
"lastName",
"email"
],
"email": {
"format": {
"_gen": "oneOf",
"choices": [
"first",
"last"
]
}
}
}
[
{
"email": "perez@accuweather.com",
"lastName": "Perez",
"firstName": "Cristina"
},
{
"email": "linda@capitalcube.com",
"lastName": "Hucks",
"firstName": "Linda"
},
{
"email": "candell@energypoints.com",
"lastName": "Candell",
"firstName": "Jorge"
},
{
"email": "jessica@teradata.com",
"lastName": "Perry",
"firstName": "Jessica"
},
{
"email": "cui@yelp.com",
"lastName": "Cui",
"firstName": "Ravi"
}
]

Specification

JSON schema

{
"type": "object",
"properties": {
"traits": {
"type": "array",
"items": {
"type": "string",
"enum": [
"firstName",
"lastName",
"company",
"email",
"website"
]
}
},
"email": {
"type": "object",
"properties": {
"format": {
"oneOf": [
{
"type": "string",
"enum": [
"firstInitial.last",
"firstInitialLast",
"first.last",
"firstLast",
"first",
"last"
]
},
{
"type": "object",
"properties": {
"_gen": {
"type": "string"
}
},
"required": [
"_gen"
]
}
]
}
},
"required": [
"format"
]
}
},
"required": [
"traits"
]
}