Quickstart Guide Part 3: Agents¶
A major strength of agent-based modeling with Epistemix is that our modeling tools are set up to incorporate a pre-built population of agents that is representative of the population of the United States. This synthetic population is based on data from the 2010 census. It provides both demographic information - their age, sex, and race, for example - and geographic information - the places where they live, work, and go to school - about each agent. As a result of this feature, instead of having to spend hours gathering basic data on people and formatting it correctly to be read into your simulation, you can simply identify a location in the United States and the FRED simulation engine will load a statistically representative population ready for you to work with.
In this lesson, you will take a look at some of properties that are defined for the agents in the synthetic population. In order to do this, we introduce an important new component of the FRED modeling language - the variables
block - and examine a method for recording information from a simulation as output.
By the end of this lesson, you should be able to:
- Use the
variables
block type to (1) load agent properties from the synthetic population into a simulation and (2) define new variables. - Use the
output
property of a shared table variable to instruct the FRED simulation engine to record agent information in a CSV file. - Make use of the built-in Excluded state that the FRED simulation includes in every condition to specify a terminal state for agents in a condition.
- Access the output CSV files from each shared table using built-in methods of JobResults objects and convert them to pandas DataFrame objects for visualization.
As usual, let's begin by importing the set of required Python packages. This time you will also import the plotly express package, which you will use to make the visualizations.
import pandas as pd
import plotly.express as px
import time
# Epistemix package for managing simulations and simulation results in Python
from epx import Job, ModelConfig, SynthPop
# local Python module that encapsulates a useful helper function
from analysis import create_agent_dataframe
# Formatting plotly visualizations using the Epistemix template
import plotly.io as pio
import plotly.graph_objects as go
import requests
# Use the Epistemix default plotly template
r = requests.get("https://gist.githubusercontent.com/daniel-epistemix/8009ad31ebfa96ac97b7be038c014c0d/raw/320c3b0ca3dfbf7946e49c97254fa65d4753aeac/epx_plotly_theme.json")
if r.status_code == 200:
pio.templates["epistemix"] = go.layout.Template(r.json())
pio.templates.default = "epistemix"
3.1 Variables and the Variables Block¶
Open the agent_info.fred
file in the file browser to the left. Below the familiar comment
block, and before the first condition is defined, notice that there is now a variables
block in the model. This new block type allows you to define variables that can be accessed and updated by the agents during the simulation.
variables {
agent numeric race
agent numeric sex
shared table agent_age
agent_age.output = 1
shared table agent_race
agent_race.output = 1
shared table agent_sex
agent_sex.output = 1
}
Several types of variable are available to use in the FRED modeling language: numerics, i.e., individual numbers, lists of numbers, tables, where numeric "keys" are used to access associated numeric "values", and list_tables, where numeric keys are used to access associated list values.
For more details about variables, please consult the FRED Modeling Language docs, including the entries for the variables
block and individual variable types in the FRED Language Reference.
The variables
block displayed above defines two numeric variables - race
and sex
- alongside three different table
variables called agent_age
, agent_race
, and agent_sex
. Each variable is also instantiated with a scope that defines whether agents see the same or different values for a given variable.
Variable scope¶
Variables in FRED models exist in one of two scopes: agent, meaning that a version of the variable is instantiated for each individual agent - or shared, where one instance of the variable (containing a single value) is shared among all of the agents.
Think of agent variables as things that you want to define for each agent individually. These could be immutable properties, like eye color, or properties that you expect to change over time, like current bank account balance or the number of movies streamed in the last month.
In contrast, shared variables represent information that all agents will share during the simulation. The minimum wage set for a community or the price of a unit of electricity for a power company are possible examples of shared variables.
Shared variables can be changed by agents over time in response to things that are happening in the simulation. For example, a shared variable called counter
that tracks the number of agents who have taken a specific action during the run of the simulation can be updated over time by asking agents to add 1 to the counter variable after they take the relevant action. The important thing to remember is that at any given moment in time in the simulation, all agents will "see" the same value for the variable.
Tables and list_tables are only defined with shared scope. These variable types are useful for tracking information about multiple agents during a simulation, because they:
1. Allow multiple rows of information.
2. Have a built-in output
property that instructs the FRED simulation engine to record the final version of the table/list_table in a file at the end of the simulation. We'll discuss this feature in more detail below.
Given this behavior, tables and list_tables can function as "community records", allowing agents to access information about other agents or places during the simulation. You'll see examples of tables and list_tables being used in this way in later tutorials.
3.2 Gathering the Demographic Data of Synthetic Agents¶
In the remainder of this lesson, you will explore a short FRED model that asks agents to update three tables with their age, race, and sex. The model is comprised of a single condition with three states. Click on the agent_info.fred
file in the file browser to the left to take a look at the model.
The Epistemix synthetic population defines properties (sometimes also called factors) of the agents and their environment, largely derived from U.S. census data. These properties are available to use in a simulation once they are declared as agent variables in the variables
block.
The following lines in the variables
block instruct the simulation engine to load the values for age and sex that are defined for each agent in the synthetic population:
After this, the model code declares three table variables that will be used to store the age, race, and sex of each agent in the simulation. Take a look at the first table defined in the agent_info.fred
file:
The first line instatiates a (shared) table called agent_age
. The second line turns on output for the agent_age
table, specifying a frequency (in days) for the simulation engine to record the values in the table in a CSV file. Note that for any non-zero output interval, the final version of the table is recorded at the end of the simulation.
The REPORT_DEMOGRAPHICS
condition is comprised of three states. In each, agents record the values for their age, race, and sex in the appropriate table. The line of code of this form in each state
variables
block, is accessed using the
3.3 The Excluded State¶
Notice that the transition rule in the ReportSex
state is:
The FRED simulation engine includes a built-in state for each condition called Excluded that is functionally equivalent to "wait forever." Since states must have a default transition rule, the Excluded state can serve as a terminal state that agents are sent to when nothing else in the condition applies to them. Here, we send the agents to the Excluded state once they have reported the three properties of interest. They will wait there until the end of the simulation.
3.4 Exploring the Demographics of Loving County, TX¶
Let's start by running the simulation using the tools from the epx package:
# create a ModelConfig object
info_config = ModelConfig(
synth_pop=SynthPop("US_2010.v5", ["Loving_County_TX"]),
start_date="2022-05-10",
end_date="2022-05-10",
)
# create a Job object using the ModelConfig
info_job = Job(
"agent_info.fred",
config=[info_config],
key="info_job",
fred_version="11.0.1",
results_dir="/home/epx/qsg-results"
)
# call the `Job.execute()` method
info_job.execute()
# the following loop idles while we wait for the simulation job to finish
start = time.time()
timeout = 300 # timeout in seconds
idle_time = 3 # time to wait (in seconds) before checking status again
while str(info_job.status) != 'DONE':
if time.time() > start + timeout:
msg = f"Job did not finish within {timeout / 60} minutes."
raise RuntimeError(msg)
time.sleep(idle_time)
str(info_job.status)
The simulation that we just executed recorded the values stored in the each of the tables in a (separate) file. These tables can be accessed via JobResults object associated with our Job object (and accessed via the Job.results
attribute).
JobResults objects have a built-in method called table_var
that will load a table variable that was recorded as output into a pandas DataFrame object. Let's create three DataFrames to store the output of the three tables defined in the model:
agent_age = info_job.results.table_var("agent_age")
agent_race = info_job.results.table_var("agent_race")
agent_sex = info_job.results.table_var("agent_sex")
The resulting objects are pandas DataFrame objects:
Let's take a look at the agent_race
DataFrame:
The key - recorded in the corresponding column - is the agent ID. Each agent in the simulation has a unique ID number. We used this number as the key for our table when we instructed the agents to report the value of their race variable to the agent_race
table in the ReportRace
state.
The value - recorded in the corresponding column - is a number that identifies the agent's race. The mapping that describes which race is associated with which number can be found here as well as in a table on the documentation page for each keyword that corresponds to a racial category (e.g., unknown_race
). Each agent's sex (female
or male
) is also encoded as a number (0 or 1, respectively):
The other property recorded in this simulation, age, is recorded as positive integer:
All three objects whose contents we have displayed above are pandas DataFrame objects. Thus, all of the available tools in the pandas package can be used to process the data and to create visualizations.
The next cell runs a short Python function to merge the data and replace the numerical values with their equivalent text values. If you'd like to look at the code, you can click on analysis.py
in the browser to the left to open the file.
3.5 Visualizing the Simulation Output¶
You can now create a series of visualizations that help you understand the demographics of the population of Loving County, TX. To do so, you'll make use of the plotly express package, but you could use any visualization package of your choice.
First, here is a pie chart that demonstrates the gender split of the county:
fig = px.pie(
agent_info,
values='agent_count',
names='agent_sex_txt',
title='Agent Sex Distribution'
)
fig.show()
Notice that you can hover your mouse over the pie slices to see more information. Next, we can explore the racial diversity of the county in a pie chart of the agent races:
fig = px.pie(
agent_info,
values='agent_count',
names='agent_race_txt',
labels={'agent_count': 'Number agents', 'agent_race_txt': 'Agent race'},
title='Agent Race Distribution'
)
fig.show()
Lastly, we can examine the age distribution of residents of the county in several ways.
First, as a bar chart:
fig = px.histogram(
agent_info,
x="agent_age",
nbins=20,
labels={'agent_age':'Agent age'},
title='Agent Age Distribution',
)
fig.show()
Second, as a violin plot:
fig = px.violin(
agent_info,
x="agent_age",
points='all',
labels={'agent_age':'Agent age'},
box=True,
title='Agent Age Distribution'
)
fig.show()
And, finally, as a box-and-whisker plot, which displays the quartiles of a distribution (via the box), alongside the maximum and minimum values in the data (via the whiskers):
fig = px.box(agent_info, x="agent_race_txt", y="agent_age",
labels={'agent_age':'Agent Age',
'agent_race_txt':'Agent Race'})
fig.show()
Taken together, these charts tell a story about the 70 residents of Loving County: the population skews old and male, and it is racially diverse.
The power of agent variables¶
These agent properties included in the Epistemix synthetic population allow you to work with specific subpopulations of the agents within a simulation. You can assign different behaviors to different groups by subjecting agents to different states within conditions based on their properties, e.g., age. And you can produce output that includes categorical information, as you did in this lesson, that - when paired with data from the simulation - allows you to compare outcomes among different groups.
3.6 Lesson Recap¶
In this lesson, you used a condition with three simple states to explore the demographics of the synthetic population of agents in Loving County, TX.
- You encountered the
variables
block for the first time and used it to define three tables. - You used the
output_interval
property of table variables to record the keys and values they store during the simulation in a file. - You explored agent properties (a.k.a., factors), which distinguish the individuals within the simulation.
- You applied the special Excluded state as a terminal state for agents within the
REPORT_DEMOGRAPHICS
condition. - You utilized the JobResults object's internal method
table_var
to create a pandas DataFrame object corresponding to each property reported by the agents and then used the plotly express package to visualize the data.
In the next lesson, you will explore a model that defines multiple conditions and see how agents are affected during the course of the simulation.
3.7 Additional Exercises¶
-
Create an additional table in the
variables
block and set the output interval to a positive number, but do not write any agent information to it. What does this table look like when loaded by theJobResults.table_var()
method? -
Try changing the
REPORT_DEMOGRAPHICS
condition to only use a single state.
Exercise Solutions¶
- Empty tables
The call to JobResults.table_var()
will return an empty pandas DataFrame object, i.e., a DataFrame that contains no data.
- Single state