sid.simulate
¶
Module Contents¶
Functions¶
|
Get a function that simulates the spread of an infectious disease. |
|
Simulate the spread of an infectious disease. |
|
Generate seeds for startup and simulation. |
|
Determine the output directory for the data. |
|
Sort the contact_models. |
|
Set default values for assort_by variables and extract them into a dict. |
|
Create a name for each contact models group codes. |
|
Create indexers for matching individuals within contact models. |
|
Create first stage probabilities for assortative matching with random contacts. |
|
Add default durations to models. |
|
Process the initial states given by the user. |
|
Create group codes and additional information. |
|
Dump the states of one period. |
|
Prepare the time series for the simulation results. |
|
Process saved columns. |
|
Combine multiple lists. |
|
Are states prepared. |
|
Add additional but optional information to states. |
Attributes¶
- get_simulate_func(params: pandas.DataFrame, initial_states: pandas.DataFrame, contact_models: Dict[str, Any], duration: Optional[Dict[str, Any]] = None, events: Optional[Dict[str, Any]] = None, contact_policies: Optional[Dict[str, Any]] = None, testing_demand_models: Optional[Dict[str, Any]] = None, testing_allocation_models: Optional[Dict[str, Any]] = None, testing_processing_models: Optional[Dict[str, Any]] = None, seed: Optional[int] = None, path: Union[str, pathlib.Path, None] = None, saved_columns: Optional[Dict[str, Union[bool, str, List[str]]]] = None, initial_conditions: Optional[Dict[str, Any]] = None, susceptibility_factor_model: Optional[Callable] = None, virus_strains: Optional[List[str]] = None, vaccination_models: Optional[Callable] = None, rapid_test_models: Optional[Dict[str, Dict[str, Any]]] = None, rapid_test_reaction_models: Optional[Dict[str, Dict[str, Any]]] = None, seasonality_factor_model: Optional[Callable] = None, derived_state_variables: Optional[Dict[str, str]] = None, period_outputs: Optional[Dict[str, Callable]] = None, return_time_series: bool = True, return_last_states: bool = True)[source]¶
Get a function that simulates the spread of an infectious disease.
The resulting function only depends on parameters. The computational time it takes to process the user input is only incurred once in
get_simulate_func()
and not when the resulting function is called.- Parameters
params (pandas.DataFrame) –
params
is a DataFrame with a three-level index which contains parameters for various aspects of the model. For example, infection probabilities of contact models, multiplier effects of policies, determinants of the course of the disease. More information can be found in params.initial_states (pandas.DataFrame) – The initial states are a DataFrame which contains individuals and their characteristics. More information can be found in The states DataFrame.
contact_models (Dict[str, Any]) – A dictionary of dictionaries where each dictionary describes a channel by which contacts can be formed. More information can be found in Contact Models.
duration (Optional[Dict[str, Any]]) – A dictionary which contains keys and values suited to be passed to
pandas.date_range()
. Only the first three arguments,"start"
,"end"
, and"periods"
, are allowed.events (Optional[Dict[str, Any]]) – Dictionary of events which cause infections.
contact_policies (Optional[Dict[str, Any]]) – Dict of dicts with contact. See Policies.
testing_demand_models (Optional[Dict[str, Any]]) – Dict of dicts with demand models for tests. See Demand models for more information.
testing_allocation_models (Optional[Dict[str, Any]]) – Dict of dicts with allocation models for tests. See Allocation models for more information.
testing_processing_models (Optional[Dict[str, Any]]) – Dict of dicts with processing models for tests. See Processing models for more information.
seed (Optional[int]) – The seed is used as the starting point for two seed sequences where one is used to set up the simulation function and the other seed sequence is used within the simulation and reset every parameter evaluation. If you pass
None
as a seed, an internal seed is sampled to set up the simulation function. The seed for the simulation is sampled at the beginning of the simulation function and can be influenced by settingnumpy.random.seed
right before the call.path (Union[str, pathlib.Path, None]) – Path to the directory where the simulated data is stored.
saved_columns (Option[Dict[str, Union[bool, str, List[str]]]]) – Dictionary with categories of state columns. The corresponding values can be True, False or Lists with columns that should be saved. Typically, during estimation you only want to save exactly what you need to calculate moments to make the simulation and calculation of moments faster. The categories are “initial_states”, “disease_states”, “testing_states”, “countdowns”, “contacts”, “countdown_draws”, “group_codes” and “other”.
initial_conditions (Optional[Dict[str, Any]]) –
The initial conditions allow you to govern the distribution of infections and immunity and the heterogeneity of courses of disease at the start of the simulation. Use
None
to assume no heterogeneous courses of diseases and 1% infections. Otherwise,initial_conditions
is a dictionary containing the following entries:assort_by
(Optional[Union[str, List[str]]]): The relative infections is preserved between the groups formed byassort_by
variables. By default, no group is formed and infections spread across the whole population.burn_in_periods
(int): The number of periods over which infections are distributed and can progress. The default is one period.growth_rate
(float): The growth rate specifies the increase of infections from one burn-in period to the next. For example, two indicates doubling case numbers every period. The value must be greater than or equal to one. Default is one which is no distribution over time.initial_immunity
(Union[int, float, pandas.Series]): The n_people who are immune in the beginning can be specified as an integer for the number, a float between 0 and 1 for the share, and apandas.Series
with the same index as states. Note that infected individuals are also immune. For a 10% pre-existing immunity with 2% currently infected people, set the key to 0.12. By default, only infected individuals indicated by the initial infections are immune.initial_infections
(Union[int, float, pandas.Series, pandas.DataFrame]): The initial infections can be given as an integer which is the number of randomly infected individuals, as a float for the share or as apandas.Series
which indicates whether an individuals is infected. If initial infections are apandas.DataFrame
, then, the index is the same asstates
, columns are dates or periods which can be sorted, and values are infected individuals on that date. This step will skip upscaling and distributing infections over days and directly jump to the evolution of states. By default, 1% of individuals is infected.known_cases_multiplier
(int): The factor can be used to scale up the initial infections while keeping shares betweenassort_by
variables constant. This is helpful if official numbers are underreporting the number of cases.virus_shares
(Union[dict, pandas.Series]): A mapping between the names of the virus strains and their share among newly infected individuals in each burn-in period.
susceptibility_factor_model (Optional[Callable]) – A function which takes the states and parameters and returns an infection probability multiplier for each individual.
virus_strains (Optional[List[str]]) – A list of names indicating the different virus strains used in the model. Their different contagiousness factors are looked up in the params DataFrame. By default, only one virus strain is used.
vaccination_models (Optional[Dict[str, Dict[str, Any]) – A dictionary of models which allow to vaccinate individuals. The
"model"
key holds a function with argumentsstates
,params
, and aseed
which returns boolean indicators for individuals who received a vaccination.(Optional[Dict[str (rapid_test_models) – A dictionary of dictionaries containing models for rapid tests. Each model for rapid tests can have a
"start"
and"end"
date. It must have a function under"model"
which acceptsstates
,params
,receives_rapid_test
,"contacts"
andseed
and returns a boolean series indicating individuals who received a rapid test. The difference to other test models is that rapid tests are performed after planned contacts are calculated (i.e. contact models and policies are evaluated) but before they actually take place. This allows people to use more rapid tests on days with many planned contacts and to react to the test outcome inrapid_test_reaction_models
.Dict[str – A dictionary of dictionaries containing models for rapid tests. Each model for rapid tests can have a
"start"
and"end"
date. It must have a function under"model"
which acceptsstates
,params
,receives_rapid_test
,"contacts"
andseed
and returns a boolean series indicating individuals who received a rapid test. The difference to other test models is that rapid tests are performed after planned contacts are calculated (i.e. contact models and policies are evaluated) but before they actually take place. This allows people to use more rapid tests on days with many planned contacts and to react to the test outcome inrapid_test_reaction_models
.Any]]] – A dictionary of dictionaries containing models for rapid tests. Each model for rapid tests can have a
"start"
and"end"
date. It must have a function under"model"
which acceptsstates
,params
,receives_rapid_test
,"contacts"
andseed
and returns a boolean series indicating individuals who received a rapid test. The difference to other test models is that rapid tests are performed after planned contacts are calculated (i.e. contact models and policies are evaluated) but before they actually take place. This allows people to use more rapid tests on days with many planned contacts and to react to the test outcome inrapid_test_reaction_models
.rapid_test_reaction_models (Optional[Dict[str, Dict[str, Any]]]) – A dictionary holding rapid tests reaction models which allow to change calculated contacts based on the results of rapid tests. Each model can have a
"start"
and"end"
date. It must have a function under"model"
which acceptsstates
,params
,"contacts"
andseed
and returns a modified copy of contacts.seasonality_factor_model (Optional[Callable]) – A model which takes in and
params
anddates
signaling the whole duration of the simulation and returns a DataFrame with a factor for each day and contact model which scales the corresponding infection probability. If seasonality patterns are the same for all contact models, the model can return a Series instead of a DataFrame.derived_state_variables (Optional[Dict[str, str]]) – A dictionary that maps names of state variables to pandas evaluation strings that generate derived state variables, i.e. state variables that can be calculated from the existing state variables.
period_outputs (Optional[Dict[str, Callable]]) – A dictionary of functions that are called with the states DataFrame at the end of each period. Their results are stored in a dictionary of lists inside the results dictionary of the simulate function.
return_time_series (Optional[bool])) – Whether the full time searies is stored on disk and returned as dask.DataFrame in the results dictionary of the simulate function.
return_last_states (Optional[bool])) – Whether the full states DataFrame of the last period are returned in the results dictionary of the simulate function.
- Returns
Simulates dataset based on parameters.
- Return type
Callable
- _simulate(params, initial_states, assort_bys, contact_models, group_codes_info, duration, events, contact_policies, testing_demand_models, testing_allocation_models, testing_processing_models, seed, path, columns_to_keep, indexers, susceptibility_factor_model, virus_strains, vaccination_models, rapid_test_models, rapid_test_reaction_models, seasonality_factor_model, derived_state_variables, period_outputs, return_time_series, return_last_states)[source]¶
Simulate the spread of an infectious disease.
- Parameters
params (pandas.DataFrame) – DataFrame with parameters that influence the number of contacts, contagiousness and dangerousness of the disease, … .
initial_states (pandas.DataFrame) – See The states DataFrame. Cannot contain the column “date” because it is used internally.
contact_models (dict) – Dictionary of dictionaries where each dictionary describes a channel by which contacts can be formed. See Contact Models.
duration (dict) – Duration is a dictionary containing kwargs for
pandas.date_range()
.events (dict) – Dictionary of events which cause infections.
contact_policies (dict) – Dict of dicts with policies. See Policies.
testing_demand_models (dict) – Dict of dicts with demand models for tests. See Demand models for more information.
testing_allocation_models (dict) – Dict of dicts with allocation models for tests. See Allocation models for more information.
testing_processing_models (dict) – Dict of dicts with processing models for tests. See Processing models for more information.
seed (int, optional) – The seed is used as the starting point for two seed sequences where one is used to set up the simulation function and the other seed sequence is used within the simulation and reset every parameter evaluation. If you pass
None
as a seed, an internal seed is sampled to set up the simulation function. The seed for the simulation is sampled at the beginning of the simulation function and can be influenced by settingnumpy.random.seed
right before the call.path (pathlib.Path) – Path to the directory where the simulated data is stored.
columns_to_keep (list) – Columns of states that will be saved in each period.
susceptibility_factor_model (Callable) – A function which takes the states and parameters and returns an infection probability multiplier for each individual.
virus_strains (Dict[str, Any]) – A dictionary with the keys
"names"
,"contagiousness_factor"
and"immunity_resistance_factor"
holding the different contagiousness factors and immunity resistance factors of multiple viruses.vaccination_models (Optional[Dict[str, Dict[str, Any]) – A dictionary of models which allow to vaccinate individuals. The
"model"
key holds a function with argumentsstates
,params
, and aseed
which returns boolean indicators for individuals who received a vaccination.(Optional[Dict[str (rapid_test_models) – A dictionary of dictionaries containing models for rapid tests. Each model for rapid tests can have a
"start"
and"end"
date. It must have a function under"model"
which acceptsstates
,params
,receives_rapid_test
andseed
and returns a boolean series indicating individuals who received a rapid test. The difference to other test models is that rapid tests are performed before contacts are calculated to allow that people can use rapid tests before they meet other people and so that normal tests can re-test individuals with positive rapid tests.Dict[str – A dictionary of dictionaries containing models for rapid tests. Each model for rapid tests can have a
"start"
and"end"
date. It must have a function under"model"
which acceptsstates
,params
,receives_rapid_test
andseed
and returns a boolean series indicating individuals who received a rapid test. The difference to other test models is that rapid tests are performed before contacts are calculated to allow that people can use rapid tests before they meet other people and so that normal tests can re-test individuals with positive rapid tests.Any]]] – A dictionary of dictionaries containing models for rapid tests. Each model for rapid tests can have a
"start"
and"end"
date. It must have a function under"model"
which acceptsstates
,params
,receives_rapid_test
andseed
and returns a boolean series indicating individuals who received a rapid test. The difference to other test models is that rapid tests are performed before contacts are calculated to allow that people can use rapid tests before they meet other people and so that normal tests can re-test individuals with positive rapid tests.rapid_test_reaction_models (Optional[Dict[str, Dict[str, Any]]]) – A dictionary holding rapid tests reaction models which allow to change calculated contacts based on the results of rapid tests.
seasonality_factor_model (Optional[Callable]) – A model which takes in and
params
anddates
signaling the whole duration of the simulation and returns a factor for each day which scales all infection probabilities.derived_state_variables (Dict[str, str]) – A dictionary that maps names of state variables to pandas evaluation strings that generate derived state variables, i.e. state variables that can be calculated from the existing state variables.
period_outputs (Optional[Dict[str, Callable]]) – A dictionary of functions that are called with the states DataFrame at the end of each period. Their results are stored in a dictionary of lists inside the results dictionary of the simulate function.
return_time_series (Optional[bool])) – Whether the full time searies is stored on disk and returned as dask.DataFrame in the results dictionary of the simulate function. If False, only the additional outputs are available.
return_last_states (Optional[bool])) – Whether the full states DataFrame of the last period are returned in the results dictionary of the simulate function.
- Returns
- The simulation result which include some or all of the
following keys, depending on the values of
period_outputs
,return_time_series
andreturn_last_states
.time_series (
dask.dataframe
): The DataFrame contains the states of each period (see The states DataFrame).last_states (
dask.dataframe
): The states of the last simulated period to resume the simulation.period_outputs (dict): Dictionary of lists. The keys are the keys of the
period_outputs
dictionary passed toget_simulate_func
. The values are lists with one entry per simulated period.
- Return type
result (Dict[str, Any])
- _generate_seeds(seed: Optional[int])[source]¶
Generate seeds for startup and simulation.
We use the user provided seed or a random seed to generate two other seeds. The first seed will be turned to a seed sequence and used to control randomness during the preparation of the simulate function. The second seed is for the randomness in the simulation, but stays an integer so that the seed sequence can be rebuild every iteration.
If the seed is
None
, only the start-up seed is sampled and the seed for simulation is set toNone
. This seed will be sampled in_simulate()
and can be influenced by settingnp.random.seed(seed)
right before the call.
- _create_output_directory(path: Union[str, pathlib.Path, None]) pathlib.Path [source]¶
Determine the output directory for the data.
The user can provide a path or a default path is chosen. If the user’s path leads to an non-empty directory, it is removed and newly created.
- Parameters
path (Union[str, Path, None]) – Path to the output directory.
- Returns
Path to the created output directory.
- Return type
output_directory (pathlib.Path)
- _sort_contact_models(contact_models: Dict[str, Any]) Dict[str, Any] [source]¶
Sort the contact_models.
First we have non recurrent, then recurrent contacts models. Within each group the models are sorted alphabetically.
- Parameters
contact_models (Dict[str, Any]) – See Contact Models
- Returns
Sorted copy of contact_models.
- Return type
Dict[str, Any]
- _process_assort_bys(contact_models: Dict[str, Any]) Dict[str, List[str]] [source]¶
Set default values for assort_by variables and extract them into a dict.
- Parameters
contact_models (Dict[str, Any]) – see Contact Models
- Returns
- Keys are names of contact models, values are
lists with the assort_by variables of the model.
- Return type
- _create_group_codes_names(contact_models: Dict[str, Any], assort_bys: Dict[str, List[str]]) Dict[str, str] [source]¶
Create a name for each contact models group codes.
The group codes are either found in the initial states or are a factorization of one or multiple variables in the initial states.
"is_factorized"
can be set in contact models to indicate that the assortative variable is already factorized which saves memory.
- _prepare_assortative_matching_indexers(states: pandas.DataFrame, contact_models: Dict[str, Dict[str, Any]], group_codes_info: Dict[str, Dict[str, Any]]) Dict[str, numba.typed.List] [source]¶
Create indexers for matching individuals within contact models.
For each contact model,
create_group_indexer()
returns a Numba list where each position contains anumpy.ndarray
with all the indices of individuals belonging to the same group given by the index.The indexer has one Numba list for recurrent and random models. Each list has one entry per contact model which holds the result of
create_group_indexer()
.- Parameters
states (pandas.DataFrame) – see The states DataFrame.
contact_models (Dict[str, Dict[str, Any]]) – The contact models.
group_codes_info (Dict[str, Dict[str, Any]]) – A dictionary where keys are names of contact models and values are dictionaries containing the name and the original codes of the assortative variables.
- Returns
- The indexer is a dictionary with one
entry for recurrent and random contact models. The values are Numba lists containing Numba lists for each contact model. Each list holds indices for each group in the contact model.
- Return type
indexers (Dict[str, numba.typed.List])
- _prepare_assortative_matching_cumulative_probabilities(states: pandas.DataFrame, assort_bys: Dict[str, List[str]], params: pandas.DataFrame, contact_models: Dict[str, Dict[str, Any]], group_codes_info: Dict[str, Dict[str, Any]]) numba.typed.List [source]¶
Create first stage probabilities for assortative matching with random contacts.
- Parameters
states (pandas.DataFrame) – See The states DataFrame.
assort_bys (Dict[str, List[str]]) – Keys are names of contact models, values are lists with the assort_by variables of the model.
params (pandas.DataFrame) – See params.
contact_models (dict) – see Contact Models.
group_codes_info (Dict[str, Dict[str, Any]]) – A dictionary where keys are names of contact models and values are dictionaries containing the name and the original codes of the assortative variables.
- Returns
- The list contains one entry for each random
contact model. Each entry holds a
n_groups * n_groups
transition matrix whereprobs[i, j]
is the cumulative probability that an individual from groupi
meets someone from groupj
.
- Return type
probabilities (numba.typed.List)
- _add_default_duration_to_models(dictionaries: Dict[str, Dict[str, Any]], duration: Dict[str, Any]) Dict[str, Dict[str, Any]] [source]¶
Add default durations to models.
- _process_initial_states(states: pandas.DataFrame, assort_bys: Dict[str, List[str]], virus_strains: Dict[str, Any]) pandas.DataFrame [source]¶
Process the initial states given by the user.
- Parameters
states (pandas.DataFrame) – The user-defined initial states.
assort_bys (list, optional) – List of variable names. Contacts are assortative by these variables.
- Returns
Processed states.
- Return type
states (pandas.DataFrame)
- _create_group_codes_and_info(states: pandas.DataFrame, assort_bys: Dict[str, List[str]], contact_models: Dict[str, Dict[str, Any]]) Tuple[pandas.DataFrame, Dict[str, Dict[str, Any]]] [source]¶
Create group codes and additional information.
- Parameters
- Returns
states (pandas.DataFrame): The states.
group_codes_info (Dict[str, Dict[str, Any]]): A dictionary where keys are names of contact models and values are dictionaries containing the name and the original codes of the assortative variables.
- Return type
A tuple containing
- _dump_periodic_states(states, columns_to_keep, output_directory, date) None [source]¶
Dump the states of one period.
- _prepare_time_series(output_directory, columns_to_keep, last_states)[source]¶
Prepare the time series for the simulation results.
- Parameters
output_directory (pathlib.Path) – Path to output directory.
columns_to_keep (list) – List of variables which should be kept.
last_states (pandas.DataFrame) – The states from the last period.
- Returns
The DataFrame contains (reduced) states of each period.
- Return type
dask.dataframe
- _process_saved_columns(saved_columns: Union[None, Dict[str, Union[bool, str, List[str]]]], initial_state_columns: List[str], group_codes_info: Dict[str, str], contact_models: Dict[str, Dict[str, Any]]) List[str] [source]¶
Process saved columns.
This functions combines the user-defined
saved_columns
with the default and produces a list of columns names which should be kept in the periodic states.The list is also used to check whether additional information should be computed and then stored in the periodic states.
- Parameters
saved_columns (Union[None, Dict[str, Union[bool, str, List[str]]]]) – The columns the user decided to save in the simulation output.
initial_state_columns (List[str]) – The columns available in the initial states passed by the user.
group_codes_info (Dict[str, str]) – A dictionary which contains the name and groups for each group code variable.
contact_models (Dict[str, Dict[str, Any]]) – The contact models.
- Returns
A list of columns names which should be kept in the states.
- Return type
keep (List[str])
- _are_states_prepared(states: pandas.DataFrame) bool [source]¶
Are states prepared.
If the states include information on the period or date, we assume that the states are prepared.
- _add_additional_information_to_states(states: pandas.DataFrame, columns_to_keep: List[str], n_has_additionally_infected: pandas.Series, random_contacts: numpy.ndarray, recurrent_contacts: numpy.ndarray, channel_infected_by_contact: pandas.Series, channel_infected_by_event: pandas.Series, channel_demands_test: pandas.Series, susceptibility_factor: numpy.ndarray)[source]¶
Add additional but optional information to states.
- Parameters
states (pandas.DataFrame) – The states of one period.
columns_to_keep (List[str]) – A list of columns names which should be kept.
n_has_additionally_infected (Optional[pandas.Series]) – Additionally infected persons by this individual.
contacts (numpy.ndarray) – Matrix with number of contacts for each contact model.
channel_infected_by_contact (pandas.Series) – A categorical series containing the information which contact model lead to the infection.
channel_infected_by_event (pandas.Series) – A categorical series containing the information which event model lead to the infection.
susceptibility_factor (numpy.ndarray) – An array containing infection probability multiplier for each individual.
- Returns
The states with additional information.
- Return type
states (pandas.DataFrame)