How to reduce memory consumption

sid allows to simulate and estimate complex epidemiological models which accumulate a lot of data. This notebook lists some approaches to reduce memory consumption.

[1]:
import sid

In-memory consumption

One step to reduce in-memory consumption which sid does by default is to store each simulated day on disk which can then be loaded with Dask to perform computations which do not fit into the memory.

If you are unable to simulate the model even for a day, you must reduce the complexity of the model.

  • Reduce the number of simulated individuals.

  • Reduce the number of contact models.

On-disk memory consumption

Reducing the amount of memory on disk occupied by sid is far easier than decreasing in-memory demands.

sid already uses efficient data types internally and will not store a lot of internal information on disk by default.

For more fine-grained control over what is stored on disk, consult the documentation for the function argument saved_columns of the get_simulate_func.

The date information

The date is by default stored as a np.datetime64 type and pandas offers many helpful functions to deal with datetimes. For bigger models, the 64-bit type can be costly and, thus, sid allows to switch to a 16-bit period integer.

By default, the period column is dropped and only the date column is shown. To store the period column along with the date column, use the saved_colums argument of sid.get_simulate_func with

saved_columns = {"time": ["date", "period"]}
# or
saved_columns = {"time": True}

or use only the period column with

saved_columns = {"time": ["period"]}

The period is similar to Unix time and enumerates the days since 2019-01-01. Negative values will pick days before the date.

Functions to convert timestamps to periods and vice versa can be found in the main name space.

[2]:
sid.timestamp_to_sid_period("2019-01-01")
[2]:
0
[3]:
sid.sid_period_to_timestamp(0)
[3]:
Timestamp('2019-01-01 00:00:00')