What is a Synthetic Population?¶
Definition¶
A synthetic population is a dataset that represents individuals, places, and networks, such as social networks, within a specific location, but contains no personally identifiable information. Epistemix’s simulation intelligence platform features a prebuilt, realistic, and statistically accurate synthetic population for the entire United States derived from multiple data sources. The synthetic population is geospatially accurate, represented at the country, state, county, and metropolitan area levels. Epistemix thus creates digital twins—accurate virtual representations of physical individuals, places, and networks—without revealing any personally identifiable information.
By having such realistic and comprehensive coverage of the United States population, Epistemix’s simulation intelligence platform allows researchers, policy makers, and businesses to simulate the behaviors and interactions of synthetic people within their environment.
Statistical Accuracy¶
The goal of constructing a synthetic population is to create a dataset of individuals who, at aggregate levels, share certain characteristics with the real population being represented. One way to think about success relative to this goal is the following. If you conducted a census on the synthetic population analogous to the census conducted on the real population which produced that data used to generate the synthetic population conducted on the real population, then you would not be able to distinguish between the two. When we describe a synthetic population as statistically accurate, we are referring to this property.
A Dynamic Synthetic Population¶
One unique aspect of the Epistemix synthetic population, in contrast to the concept of a synthetic population more generally, is that it is dynamic. Beyond the connections between people and places that are included in the static synthetic population data records, there is a default model available in every simulation that brings the synthetic population to life (so to speak), by instructing agents to move around their environment and "attend" the places to which they are connected. This gives agents the opportunity to interact with their neighbors in any simulation: to transmit an infectious disease in an epidemiological model, to share an opinion in a political polarization model, to communicate their perspective about a recent purchase in a product adoption model, etc.