bucky.util package¶
Submodules¶
bucky.util.distributions module¶
Provides any probability distributions used by the model that aren’t in numpy/cupy.
-
bucky.util.distributions.
mPERT_sample
(mu, a=0.0, b=1.0, gamma=4.0, var=None)¶ Provides a vectorized Modified PERT distribution.
- Parameters
mu (float, array_like) – Mean value for the PERT distribution.
a (float, array_like) – Lower bound for the distribution.
b (float, array_like) – Upper bound for the distribution.
gamma (float, array_like) – Shape paramter.
var (float, array_like, None) – Variance of the distribution. If var != None, gamma will be calcuated to meet the desired variance.
- Returns
out – Samples drawn from the specified mPERT distribution. Shape is the broadcasted shape of the the input parameters.
- Return type
float, array_like
-
bucky.util.distributions.
truncnorm
(xp, loc=0.0, scale=1.0, size=1, a_min=None, a_max=None)¶ Provides a vectorized truncnorm implementation that is compatible with cupy.
The output is calculated by using the numpy/cupy random.normal() and truncted via rejection sampling. The interface is intended to mirror the scipy implementation of truncnorm.
- Parameters
xp (module) –
bucky.util.get_historical_data module¶
-
bucky.util.get_historical_data.
add_daily_history
(history_data, window_size=None)¶ Applies a window to cumulative historical data to get daily data.
- Parameters
history_data (Pandas DataFrame) – Cumulative case and death data
window_size (int or None) – Size of window in days
- Returns
history_data – Historical data with added columns for daily case and death data
- Return type
Pandas DataFrame
-
bucky.util.get_historical_data.
get_historical_data
(columns, level, lookup_df, window_size, hist_file)¶ Gets historical data for the columns requested.
- Parameters
columns (list of str) – Column names for historical data
level (str) – Geographic level to get historical data for, e.g. adm1
lookup_df (Pandas DataFrame) – Dataframe with names and values for admin0, admin1, and admin2 levels
window_size (int) – Size of window in days
hist_file (string or None) – Historical data file to use if not using defaults.
- Returns
history_data – Historical data indexed by data and geographic level containing only requested columns
- Return type
Pandas DataFrame
bucky.util.graph2histcsv module¶
bucky.util.read_config module¶
bucky.util.readable_col_names module¶
bucky.util.scoring module¶
-
bucky.util.scoring.
IS
(x, lower, upper, alp)¶ :param : TODO :param : TODO
- Returns
TODO
-
bucky.util.scoring.
WIS
(x, q, x_q, norm=False, log=False, smooth=False)¶ :param : TODO :param : TODO
- Returns
TODO
-
bucky.util.scoring.
logistic
(x, x0=0.0, k=1.0, L=1.0)¶ :param : TODO :param : TODO
- Returns
TODO
-
bucky.util.scoring.
smooth_IS
(x, lower, upper, alp)¶ :param : TODO :param : TODO
- Returns
TODO
bucky.util.update_data_repos module¶
Data Updating Utility (bucky.util.update_data_repos
)¶
A utility for fetching updated data for mobility and case data from public repositories.
This module pulls from public git repositories and preprocessed the data if necessary. For case data, unallocated or unassigned cases are distributed as necessary.
-
bucky.util.update_data_repos.
distribute_data_by_population
(total_df, dist_vect, data_to_dist, replace)¶ Distributes data by population across a state or territory.
- Parameters
total_df (Pandas DataFrame) – DataFrame containing confirmed and death data indexed by date and FIPS code
dist_vect (Pandas DataFrame) – Population data for each county as proportion of total state population, indexed by FIPS code
data_to_dist (Pandas DataFrame) – Data to distribute, indexed by data
replace (boolean) – If true, distributed values overwrite current historical data in DataFrame. If false, distributed values are added to current data
- Returns
total_df – Modified input dataframe with distributed data
- Return type
Pandas DataFrame
-
bucky.util.update_data_repos.
distribute_mdoc
(df, csse_deaths_file)¶ Distributes Michigan Department of Corrections data across Michigan counties by population.
- Parameters
df (Pandas DataFrame) – Current historical DataFrame indexed by FIPS and date, which includes MDOC and FCI data
csse_deaths_file (string) – File location of CSSE deaths file (contains population data)
- Returns
df – Modified historical dataframe with Michigan prison data distributed and added to Michigan data
- Return type
Pandas DataFrame
-
bucky.util.update_data_repos.
distribute_nyc_data
(df)¶ Distributes NYC case data across the six NYC counties.
- Parameters
df (Pandas DataFrame) – DataFrame containing historical data indexed by FIPS and date
add deprecation warning b/c csse has fixed this (TODO) –
- Returns
df – Modified DataFrame containing corrected NYC historical data indexed by FIPS and date
- Return type
Pandas DataFrame
-
bucky.util.update_data_repos.
distribute_territory_data
(df, add_american_samoa)¶ Distributes territory-wide case and death data for territories.
Uses county population to distribute cases for US Virgin Islands, Guam, and CNMI. Optionally adds a single case to the most populous American Samoan county.
- Parameters
df (Pandas DataFrame) – Current historical DataFrame indexed by FIPS and date, which includes territory-wide case and death data
add_american_samoa (boolean) – If true, adds 1 case to American Samoa
- Returns
df – Modified historical dataframe with territory-wide data distributed to counties
- Return type
Pandas DataFrame
-
bucky.util.update_data_repos.
distribute_unallocated_csse
(confirmed_file, deaths_file, hist_df)¶ Distributes unallocated historical case and deaths data from CSSE.
JHU CSSE data contains state-level unallocated data, indicated with “Unassigned” or “Out of” for each state. This function distributes these unallocated cases based on the proportion of cases in each county relative to the state.
- Parameters
confirmed_file (string) – filename of CSSE confirmed data
deaths_file (string) – filename of CSSE death data
hist_df (Pandas DataFrame) – current historical DataFrame containing confirmed and death data indexed by date and FIPS code
- Returns
hist_df – modified historical DataFrame with cases and deaths distributed
- Return type
Pandas DataFrame
-
bucky.util.update_data_repos.
get_county_population_data
(csse_deaths_file, county_fips)¶ Uses JHU CSSE deaths file to get county population data as as fraction of population across list of counties.
- Parameters
csse_deaths_file (string) – filename of CSSE deaths file
county_fips (array-like) – list of FIPS to return population data for
- Returns
population_df – DataFrame with population fraction data indexed by FIPS
- Return type
Pandas DataFrame
-
bucky.util.update_data_repos.
get_timeseries_data
(col_name, filename, fips_key='FIPS', is_csse=True)¶ Transforms a historical data file to a dataframe with FIPs, date, and case or death data.
- Parameters
col_name (string) – Column name to extract from data.
filename (string) – Location of filename to read.
fips_key (string, optional) – Key used in file for indicating county-level field.
is_csse (boolean) – Indicates whether the file is CSSE data. If True, certain areas without FIPS are included.
- Returns
df – Dataframe with the historical data indexed by FIPS, date
- Return type
Pandas DataFrame
-
bucky.util.update_data_repos.
git_pull
(abs_path)¶ Updates a git repository given its path.
- Parameters
abs_path (string) – Abs path location of repository to update
-
bucky.util.update_data_repos.
process_csse_data
()¶ Performs pre-processing on CSSE data.
CSSE data is separated into two different files: confirmed cases and deaths. These two files are combined into one dataframe, indexed by FIPS and date with two columns, Confirmed and Deaths. This function distributes CSSE that is either unallocated or territory-wide instead of county-wide. Michigan data from the state Department of Corrections and Federal Correctional Institution is distributed to Michigan counties. New York City data which is currently all placed in one county (New York County) is distributed to the other NYC counties. Territory data for Guam, CNMI, and US Virgin Islands is also distributed. This data is written to a CSV.
-
bucky.util.update_data_repos.
process_usafacts
(case_file, deaths_file)¶ Performs preprocessing on USA Facts data.
USAFacts contains unallocated cases and deaths for each state. These are allocated across states based on case distribution in the state.
- Parameters
case_file (string) – Location of USAFacts case file
deaths_file (string) – Location of USAFacts death file
- Returns
combined_df – USAFacts data containing cases and deaths indexed by FIPS and date.
- Return type
Pandas DataFrame
-
bucky.util.update_data_repos.
update_covid_tracking_data
()¶ Downloads and processes data from the COVID Tracking project to match the format of other preprocessed data.
The COVID Tracking project contains data at a state-level. Each state is given a random FIPS selected from all FIPS in that state. This is done to make aggregation easier for plotting later. Processed data is written to a CSV.
-
bucky.util.update_data_repos.
update_repos
()¶ Uses git to update public data repos.
-
bucky.util.update_data_repos.
update_usafacts_data
()¶ Retrieves updated historical data from USA Facts, preprocesses it, and writes to CSV.
bucky.util.util module¶
-
class
bucky.util.util.
TqdmLoggingHandler
(level=0)¶ Bases:
logging.Handler
-
emit
(record)¶ Do whatever it takes to actually log the specified logging record.
This version is intended to be implemented by subclasses and so raises a NotImplementedError.
-
-
bucky.util.util.
bin_age_csv
(filename, out_filename)¶
-
bucky.util.util.
cache_files
(*argv)¶
-
bucky.util.util.
date_to_t_int
(dates, start_date)¶
-
class
bucky.util.util.
dotdict
¶ Bases:
dict
dot.notation access to dictionary attributes.
-
bucky.util.util.
estimate_IFR
(age)¶
-
bucky.util.util.
map_np_array
(a, d)¶
-
bucky.util.util.
remove_chars
(seq)¶
-
bucky.util.util.
unpack_cache
(cache_file)¶