Quickstart Guide Part 1: A Minimal FRED Model¶
Over the next 10 lessons, you will take a tour of the key features and capabilities of the FRED modeling language and the Epistemix platform. On this first stop, you will run the most basic FRED model possible. Running this model doesn't reveal many of the interesting capabilites of the modeling language...because it doesn't do anything! But it will introduce you to the setup of a simulation and demonstrate how to use some of Epistemix's Python utilities ( the epx package) to run a simulation and to interact with the resulting output.
By the end of this notebook, you should be able to use the tools in the epx package to: 1. Configue a simulation Job by specifying the (synthetic) population and the simulation dates. 3. Submit a simulation Job to the Epistemix cloud Simulation Run Service (SRS). 4. Examine the output of a simulation Job. 5. Create a pandas DataFrame that maps the days during which a simulation runs to calendar dates.
If you haven't used Jupyter notebooks before, don't worry - it is really simple to step through the code in this lesson.
In any block that contains code, simply click the cursor at inside the block, so that a blue bar appears to the left of the text. Once this blue bar is present, you can click Shift
+ Return
to run the code in the cell.
Let's try this out! Run the following cell, which will load some Python packages that are necessary to simulate FRED models and interact with the results.
# import some required Python packages
import pandas as pd
import time
# Epistemix package for managing simulations and simulation results in Python
from epx import Job, ModelConfig, SynthPop
If all goes well, you should see the number to the left of the cell increment by one and the blue bar highlighting the cell move down to this cell. You've also done something necessary to continue with this tutorial: load a package, epx that has been purpose built to enable you to interact with the FRED modeling language from a Python environment.
1.1 A Minimal Simulation Model¶
Start by clicking on minimal.fred
in the file browser to the left. This will open the FRED model in a new tab within this lab. Click on the tab to take a look at the model code. Perhaps surprisingly, you will see that this minimal model is comprised of just a single component:
Comment block¶
Comments allow you to document your FRED models. They can exist at the top level of a model, in a comment
block, as you see here:
comment {
This model demonstrates the most basic FRED simulation possible.
It just:
1. sets up the population of agents and
2. runs for the specified period of time.
}
Inline comments are also allowed and are initiated using the #
symbol. You will see an example of this form of comment in the next lesson.
Configuring a Simulation¶
At minimum, any simulation run must know: 1. the (synthetic) population of agents who will participate in the simulation and 2. the range of dates that the simulation will cover.
These details are stored (and eventually submitted to the simulation engine) using a ModelConfig
object:
minimal_config = ModelConfig(
synth_pop=SynthPop("US_2010.v5", ["Loving_County_TX"]),
start_date="2022-05-10",
end_date="2022-05-17",
)
Since they are required for any simulation, the synth_pop
, start_date
, and end_date
fields are required when constructing a ModelConfig
object. However, additional simulation configuration parameters can also be specified.
You can examine the other configuration parameters using the built-in ?
functionality of Jupyter notebooks:
1.2 Running the Simulation¶
Let's run this minimal simulation. As mentioned earlier, it isn't the most exciting FRED model - because you haven't set up any rules or actions for the agents in the population to follow or take! However, it is still a valid FRED model, and - since the required simulation configuration parameters have been set - running a simulation will produce output that we can examine using the epx package
For simplicity, we will run the model in Loving County, TX. This is the least-populated county in the contiguous United States. I like to use this location whenever I am developing models and need to run simulations over and over, because the small population (just 70 people) allows simulations to run very quickly. This makes iterative development much easier!
To run the model from a Python script or Jupyter notebook, there are two steps.
First, we create a Job
object. To do so, we must specify:
1. the name of the program that contains our model code.
2. the list of model configurations to simulate.
3. a name to identify the job and distinguish it from other jobs.
In this case, we also use optional parameters to specify the version of the FRED modeling language we want to use and the directory where the results should be saved, which will create a new directory in our home directory called qsg-results
(if it does not already exist):
minimal_job = Job(
"minimal.fred",
config=[minimal_config],
key="minimal_job",
fred_version="11.0.1",
results_dir="/home/epx/qsg-results"
)
We recommend explicitly specifying the version number for each model that you develop, so that you can decide when is the right time to update your models to be compatible with the latest version of the FRED modeling language.
Next, we submit the job to the cloud Simulation Run Service using the Job.execute()
method:
minimal_job.execute()
# the following loop idles while we wait for the simulation job to finish
start = time.time()
timeout = 300 # timeout in seconds
idle_time = 3 # time to wait (in seconds) before checking status again
while str(minimal_job.status) != 'DONE':
if time.time() > start + timeout:
msg = f"Job did not finish within {timeout / 60} minutes."
raise RuntimeError(msg)
time.sleep(idle_time)
str(minimal_job.status)
Congratulations - you just ran your first FRED simulation!
What Just Happened?¶
Calling Job.execute()
submitted the job we set up using the model in the minimal.fred
file and the configuration parameters stored in the minimal_config
variable to the Epistemix cloud Simulation Run Service.
A job, like the one we submitted, is comprised of one or more simulation executions, also called runs. The output from each job is written to its own directory, where the output from each run is written to its own subdirectory.
1.3 Examining Simulation Outputs¶
Now that our simulation job is complete, we can use the same Job
object that we used to submit the job to interact with the results. A completed Job
has an attribute Job.results
that stores a JobResults
object that is designed to retrieve any data that was output from our simulation job.
# type is a Python command that will return the data type of an object or variable
type(minimal_job.results)
The minimal model did not define any conditions or states for the agents that were loaded in (you'll learn about conditions and states in the next lesson). As a result, some internal methods of the JobResults
object won't do anything yet. For example, executing the following code would return an error, no matter which arguments were passed alongside the condition
and state
keywords:
# Note: Calling this method would return an error.
minimal_job.results.state(
condition = "CONDITION",
state = "State",
count_type = "cumulative"
)
However, by examining the output of the JobResult.dates()
method, you can see that this simulation ran for 7 days:
So, even though your simultation didn't really do anything, it did move the clock forward from the specified start date to the specified end date.
The output of dates
and other JobResults
methods are standard pandas Series or DataFrame objects. These can be manipulated and visualized using all of the available pandas tools. (If you haven't used pandas before, here is a great introduction to the package and its features.)
Moreover, any pandas DataFrame can be saved to your workspace using the pandas.DataFrame.to_csv()
method.
Simulation Logs¶
Whenever a simulation executes, a log file is produced that documents important information about the simulation run. These logs can be accessed using the the Job.status.logs
attribute, which loads a pandas DataFrame object with one row for each log entry.
We can pull useful information about the simulation from this DataFrame. For instance, we can find an entry in the log using a list comprehension expression, which reveals that Loving County, TX has a population size of just 70 people:
log_entry_to_find = "Estimated total agents"
log_entry = [entry for entry in logs.message if log_entry_to_find in entry][0]
estimated_total_agents = log_entry.split()[-1]
print(log_entry_to_find, '=', estimated_total_agents)
Deleting a Job¶
In a setting like these tutorials, where the same code may be run multiple times without changes, we must take care to manage the Jobs
that we submit. If we execute a job with a key
that matches the key
of a job that already exists in the specified results directory, then an error will be raised (to prevent the existing job results from being accidentally overwritten).
Thus, when we are done with the results from a particular job, we will erase that job from our results directory using the Job.delete()
method. This will allow us to re-submit the same job (by running the cells in this notebook again with no change) without raising an error.
For added caution, you can also call the Job.delete()
method with interactive = True
(which is also the default if no keyword arguments are passed). That call will prompt you to confirm that you want to delete a job before the deletion occurs.
1.4 Changing Simulation Locations¶
Running a simulation with a different synthetic population (corresponding to a different location) is as simple as modifying the SynthPop.locations
attribute in our ModelConfig
object.
Run the next cell to change the location from Loving County, Texas to Butte County, Idaho (written as Butte_County_ID
):
minimal_config.synth_pop.locations.remove("Loving_County_TX")
minimal_config.synth_pop.locations.append("Butte_County_ID")
Another way to accomplish this would have been to instantiate a whole new ModelConfig
object with the new location, as follows:
minimal_config = ModelConfig(
synth_pop=SynthPop("US_2010.v5", ["Butte_County_ID"]),
start_date="2022-05-10",
end_date="2022-05-17",
)
Run the following cell to execute our simulation in the new location:
minimal_job_new_location = Job(
"minimal.fred",
config=[minimal_config],
key="minimal_job_new_location",
fred_version="11.0.1",
results_dir="/home/epx/qsg-results"
)
minimal_job_new_location.execute()
# the following loop idles while we wait for the simulation job to finish
start = time.time()
timeout = 300 # timeout in seconds
idle_time = 3 # time to wait (in seconds) before checking status again
while str(minimal_job_new_location.status) != 'DONE':
if time.time() > start + timeout:
msg = f"Job did not finish within {timeout / 60} minutes."
raise RuntimeError(msg)
time.sleep(idle_time)
str(minimal_job_new_location.status)
By checking the log from this job, we can see that our simulation loaded the 2854 agents from the synthetic population of Butte County, Idaho, rather than the 70 agents from Loving County, Texas.
logs = minimal_job_new_location.status.logs
log_entry_to_find = "Estimated total agents"
log_entry = [entry for entry in logs.message if log_entry_to_find in entry][0]
estimated_total_agents = log_entry.split()[-1]
print(log_entry_to_find, '=', estimated_total_agents)
Simulating Multiple Locations¶
The SynthPop.locations
attribute can also be modified to configure simulations to run on multiple locations:
Or, to instantiate a whole new ModelConfig
object with the multiple locations:
minimal_config = ModelConfig(
synth_pop=SynthPop("US_2010.v5",
[
"Butte_County_ID",
"Loving_County_TX"
]),
start_date="2022-05-10",
end_date="2022-05-17",
)
1.5 Lesson Recap¶
You covered quite a few items in this short lesson!
- You configured a minimal FRED model and executed a simulation of it using the epx package and the Epistemix cloud Simulation Run Service.
- You encountered the minimum information required for a model to be configured for execution: the synthetic population (i.e., the simulation location) and the date range.
- You learned that the simulation of a configured FRED model is called a job, and a job can be comprised of one or more runs.
- You used the methods of a
JobResults
object to access the results of the simulation you ran and to both (1) create a pandas DataFrame object containing the dates of the simulation and (2) examine the simulation log file.
The fun hasn't really started yet. In the next lesson, you will add conditions to your model that tell the agents in your simulation to do something!
Let's keep going! Navigate back to the top level Quickstart Guide folder in the file browser to the left, and then select the Part 2 folder to move on.