Quickstart Guide Part 10: Data Input and Output¶
Congratulations on making it to the final lesson of this Quickstart Guide!
Along the way, you've learned how to run simulations and access outputs using the Epistemix Platform Python tools. You have explored the concepts of conditions and states, which are used to describe the behaviors that agents will carry out during simulations, and you spent time exploring the properties of the synthetic population of agents (and associated places) that lies at the heart of the Epistemix Platform simulation engine.
In Parts 6, 7, 8, and 9, you examined more complex models that demonstrated how agents can interact with each other and with their environment. That set of models also presented many of the action rules available within the FRED modeling language for dictating agent behavior, including control structures that enable richer interaction with simulation variables and more finely tuned branching behavior within states.
In the final part of this introductory sequence, you will run a model that reads in external data to define an age-specific probability with which female agents will become pregnant. You will encounter the set of functions that are included in the FRED modeling language to read in data from external files. You will also see a summary of the various methods used throughout this guide to export data from your simulations.
By the end of this lesson, you will know how to: 1. Create shared list, table, and list_table variables from data stored in external files. 2. Use the various output methods available in the FRED modeling language to write out files containing data from your simulation. 3. Transfer data between the Epistemix Platform IDE and your own computer.
Let's load the packages we need to run this model and interact with the outputs. We will be making use of the pandas package to create data output files:
import pandas as pd
import time
# Epistemix package for managing simulations and simulation results in Python
from epx import Job, ModelConfig, SynthPop
10.1 Functions to Read in Data¶
Open the read_table_example.fred
file in the file browser to the left to follow along.
The FRED language includes several functions for reading data in from a file and storing it as a variable. In the example model here, we are reading the data in the preg_probs.csv
file and creating a shared table
variable named preg_prob_table
. The columns in the preg_probs.csv
file are the estimated rates of pregnancy for agents at each age between 15 and 44. This data was drawn from a report released by the National Center for Health Statistics that estimated pregnancy rates for women in the United States. We can tentatively use these rates as an estimate of the probabilities that female agents will become pregnant at their current ages. (Although, in a more comprehensive model, it may be prudent to take more fine-grained data on how pregnancy rates vary between different subpopulations into account.)
The steps to create a table from a file are:
1. Declare a shared table variable in the variables
block to hold the data.
2. Use the read_table
function to open the external data file and specify which two columns from the file should be stored in the table variable.
You can see both of these steps in the read_table_example.fred
model. The variable is created using this line in the variables
block:
preg_prob_table
variable in the startup
block:
The first parameter passed to the read_table
function is the name of the target table variable in your model. The second parameter is the name of the external file. (Note, if this file is not in the same directory where you are running the model, you will need to specify the path to the file.)
The third and fourth parameters specify which columns in the external file should be used as the keys and values, respectively, for the table. Each column in the external file is referenced using its integer position (starting with 0) in the list of columns. In this case, the preg_probs.csv
file only has two columns, so these values are 0 and 1. We specify them in that order, so that the ages are the keys in our table.
The read_table
function will ignore any lines that start with a non-numeric character, so it will skip any text-header rows present in your file. Note that the FRED modeling language only allows for numeric variables at this time. Any key entries that are not numeric will cause the simulation to fail and return an "Undefined value for key" error. Any value entries that are text will be replaced by the value 0.
10.2 Run the Model and Examine the Output¶
Execute the next two cells below to run the model and retrieve the print output:
# create a ModelConfig object
preg_config = ModelConfig(
synth_pop=SynthPop("US_2010.v5", ["Jefferson_County_PA"]),
start_date="2023-07-01",
end_date="2023-07-01",
)
# create a Job object using the ModelConfig
preg_job = Job(
"read_table_example.fred",
config=[preg_config],
key="preg_job",
fred_version="11.0.1",
results_dir="/home/epx/qsg-results"
)
# call the `Job.execute()` method
preg_job.execute()
# the following loop idles while we wait for the simulation job to finish
start = time.time()
timeout = 300 # timeout in seconds
idle_time = 3 # time to wait (in seconds) before checking status again
while str(preg_job.status) != 'DONE':
if time.time() > start + timeout:
msg = f"Job did not finish within {timeout / 60} minutes."
raise RuntimeError(msg)
time.sleep(idle_time)
str(preg_job.status)
You should see a text representation of the data in the preg_prob_table
displayed above this cell. This text output was produced by the for
loop structure in the startup
block.
Recall from the previous lesson that text that is written by the print()
function in a simulation can be accessed with the JobResults.print_output()
method, which returns a DataFrame with a single printed statement per row, and that the print_output
column of that DataFrame can be formatted as above to display the print output itself.
Tables are comprised of keys and values. The print
function in the FRED language cannot print the key-value pair directly. Instead, we use a for
loop to iterate through the table's keys and use each key to access the corresponding value for each row:
for (age_key, get_keys(preg_prob_table)) do {
print("Age: ", age_key, " Prob: ", preg_prob_table[age_key])
}
get_keys()
. This function returns a list containing all of the keys from the table. (Note that there is a corresponding get_values()
function that can be used to obtain a list of the table's values.) We can then use the key to access the value using the syntax table[key]
(or, lookup(table, key)
, equivalently).
By the way, you can open any text file inside the Epistemix Platform IDE. Click on the preg_probs.csv
file to the left to open it, and then compare its contents to the text output of the simulation above. As you can see, the data matches row for row.
10.3 Additional File Input Functions¶
The FRED modeling language also specifies functions for populating lists and list_tables from files. These are called read
and read_list_table
, respectively. You can read more information about using these two functions here and here in the FRED Modeling Language Reference.
The read_list_table
function can be very useful - you can see examples of it being used in the Ground Logistics model (Ground-Shipping-Logistics
) included in the Epistemix Platform Community Library.
10.4 A Recap on Data Output¶
You have seen several methods for interacting with the output of FRED simulations in this set of introductory tutorials. Here, we'll summarize those methods and then show you how to create output files that you can store within the Epistemix Platform or download to your own computer to work with offline.
output_interval
keyword for shared variables¶
A built-in keyword called output_interval
can be used for shared variables to indicate that the values contained in the corresponding variable should be recorded in a CSV file periodically throughout and at the end of the simulation. The value of the output interval specifies the frequency (in days) with which the value of the variable should be recorded. The output_interval
keyword for the preg_prob_table
variable is turned on in this model, so that you can compare it to the original file and the text output from the simulation:
shared table preg_prob_table # create table variable that will be populated from a file
preg_prob_table.output_interval = 2 # turn on csv output for table
2
, since the simulation only runs for a single day - means that we will only record the data in the table at the end of the simulation.
To retrieve the resulting CSV file and load it into a pandas DataFrame object, we use the table_var
method of the JobResults object associated with our simulation Job. (There are analogous JobResults methods to retrieve the output for the other variable types.)
Run the cell below to see this in action:
df = (preg_job.results.table_var("preg_prob_table")
.drop(columns=["run_id", "sim_day"])
.rename(columns={"key":"age", "value":"preg_prob"})
.set_index("age"))
df.preg_prob = df.preg_prob.map(lambda x: '%.7f' % x)
df
We can save this table as a CSV file using the built-in to_csv
method for pandas DataFrame objects:
After running the cell above, you should see a new CSV file appear in the browser to the left. Click on it to open and examine the file in a new tab. You'll see the same data as the original file.
Output of JobResults methods¶
You can save the output of any method of a JobResults object that returns a pandas DataFrame in this way. For example, to save a CSV file of the agent counts in a given state each day, you can first call the JobResults.state()
method to produce a DataFrame. (Head back to Parts 1 and 2 of this guide for a refresher on how these methods work!) Then you can save the data in a file using DataFrame.to_csv()
:
preg_job.results.state(
condition="ASSIGN_PREGNANCY_PROB",
state='Start',
count_type='new'
).to_csv('start_state_counts.csv')
preg_job.results.state(
condition="ASSIGN_PREGNANCY_PROB",
state='AssignProbs',
count_type='new'
).to_csv('assignprobs_state_counts.csv')
After running the above cell, you should see two new files to the left named start_state_counts.csv
and assignprobs_state_counts.csv
. These contain the number of new agents in each state.
Using the print
and print_csv
commands¶
The FRED modeling language also includes two functions that allow agents to print information directly to a file as part of the action rules they carry out in a given state.
In the last lesson (and above), you saw how the print
function can be used to record unstructured data in a file that can be retrieved by the JobResults.print_output()
method.
You have also seen the print_csv
function used in several previous lessons to record structured data output. For example, in Part 7, we used the function to have the agents report the location where they were infected with influenza.
The data in these CSV output files can also be retrieved and loaded into a pandas DataFrame using the JobResults.csv_output()
method. You can then save the file to your working directory using the to_csv
DataFrame method, as above.
When using print_csv
, recall that a print_csv
statement must refer to a file that was previously opened by an open_csv
statement, so that the file is available for the agent to write to. This open_csv
call is generally made in a startup
block, so that the meta agent opens the file prior the first hour of the simulation.
10.5 Uploading to and Downloading from the Epistemix Platform IDE¶
It is easy to transfer data between the Epistemix Platform and your local machine.
Data Upload (Your Machine → Epistemix Platform)¶
Getting a file into the Platform is as simple as dragging it into the file browser area to the left. For one or more individual files, this action will result in a dashed grey border appearing around the file browser. Simply drop the file and upload will begin.
Note that you cannot drag a folder into the platform in this way. You will need to create a folder within the platform using the small folder icon with a plus sign above the file browser. Then, you can drag and drop the individual files you wish to store inside that new folder.
Data Download (Epistemix Platform → Your Machine)¶
To download any individual file in the browser to the left, right-click on the file name (Ctrl-click on a Mac) and select "Download" from the menu that pops up. This will open your computer's download dialogue, so you can specify where to store the file on your computer.
Again, note that you cannot download a folder in this way. You will need to right click on the individual files inside the folder and download them one by one.
If you have many files to download, this may be burdensome. You can work around this by using the terminal built into the Epistemix IDE to create a single archive file (e.g., a tarball), and then right-click to download that.
To open a terminal, click on the large button with the plus sign above the file broswer, and select "Terminal" from the bottom row of icons. This will launch a terminal. You can then use your favorite method for creating a single archive file for the folder of interest.
10.6 Lesson Recap¶
In this lesson, you explored a short model that loaded data from an external file into a table variable in a FRED simulation. Female agents between the ages of 15 and 44 then queried that table to update their agent variable my_preg_prob
, which stored the probability with which they would become pregnant.
We also discussed several methods available within the FRED language and within the Epistemix Platform for creating data output files. After exploring this model, you should be able to:
- Use the
read
,read_table
, andread_list_table
functions to populate list, table, and list_table variables (respectively) with data from an external file. - Make use of the
output_interval
keyword to record the values stored in shared variables in a file periodically during and at the end of a simulation. - Use methods from the JobResults object to create DataFrames containing various kinds of simulation output, including state tables, that can be saved as CSV files.
- Retrieve CSV files that agents used to record information during the simulation with the
JobResults.csv_output()
method and save it in a file using the built-into_csv()
method for pandas DataFrames. - Drag and drop files into the Epistemix IDE, and download them from the file browser.
You've made it to the end of this introduction to the Epistemix Platform and the FRED Modeling language! Hopefully you've enjoyed learning about the many features of the FRED language and seeing some of the powerful things that can be accomplished even in relatively simple simulation models.
Next Steps¶
Here are some next steps for continuing your learning journey:
-
Try modifying any of the models in this guide to run in a different place. Or, add a state or condition to an existing model to introduce new agent behaviors.
-
Take a look around our Community Library to explore models built by the Epistemix team and by other FRED modelers. The knowledge you have gained by working through this guide should make it possible for you to read those more complex models and to start to think about how you might start developing your own models for whatever interesting problems you want to tackle.
Special Topics¶
There are also additional tutorials that supplement this Quickstart Guide. They cover the following special topics that may be relevant to your modeling projects:
- external-data
: This lesson offers more examples of how external data can be incorporated into your simulation, in the form of (1) augmenting the synthetic population with additional agent attributes and (2) building custom places in FRED from real location data.
These lessons can be found in the special-topics
directory.