bucky.util package


bucky.util.read_config module

bucky.util.readable_col_names module

bucky.util.update_data_repos module

bucky.util.update_data_repos.distribute_data_by_population(total_df, dist_vect, data_to_dist, replace)

Distributes data by population across a state or territory.

  • total_df (Pandas DataFrame) – DataFrame containing confirmed and death data indexed by date and FIPS code

  • dist_vect (Pandas DataFrame) – Population data for each county as proportion of total state population, indexed by FIPS code

  • data_to_dist (Pandas DataFrame) – Data to distribute, indexed by data

  • replace (boolean) – If true, distributed values overwrite current historical data in DataFrame. If false, distributed values are added to current data


total_df – Modified input dataframe with distributed data

Return type

Pandas DataFrame

bucky.util.update_data_repos.distribute_mdoc(df, csse_deaths_file)

Distributes Michigan Department of Corrections data across Michigan counties by population.

  • df (Pandas DataFrame) – Current historical DataFrame indexed by FIPS and date, which includes MDOC and FCI data

  • csse_deaths_file (string) – File location of CSSE deaths file (contains population data)


df – Modified historical dataframe with Michigan prison data distributed and added to Michigan data

Return type

Pandas DataFrame


Distributes NYC case data across the six NYC counties.

  • df (Pandas DataFrame) – DataFrame containing historical data indexed by FIPS and date

  • add deprecation warning b/c csse has fixed this (TODO) –


df – Modified DataFrame containing corrected NYC historical data indexed by FIPS and date

Return type

Pandas DataFrame

bucky.util.update_data_repos.distribute_territory_data(df, add_american_samoa)

Distributes territory-wide case and death data for territories.

Uses county population to distribute cases for US Virgin Islands, Guam, and CNMI. Optionally adds a single case to the most populous American Samoan county.

  • df (Pandas DataFrame) – Current historical DataFrame indexed by FIPS and date, which includes territory-wide case and death data

  • add_american_samoa (boolean) – If true, adds 1 case to American Samoa


df – Modified historical dataframe with territory-wide data distributed to counties

Return type

Pandas DataFrame

bucky.util.update_data_repos.distribute_unallocated_csse(confirmed_file, deaths_file, hist_df)

Distributes unallocated historical case and deaths data from CSSE.

JHU CSSE data contains state-level unallocated data, indicated with “Unassigned” or “Out of” for each state. This function distributes these unallocated cases based on the proportion of cases in each county relative to the state.

  • confirmed_file (string) – filename of CSSE confirmed data

  • deaths_file (string) – filename of CSSE death data

  • hist_df (Pandas DataFrame) – current historical DataFrame containing confirmed and death data indexed by date and FIPS code


hist_df – modified historical DataFrame with cases and deaths distributed

Return type

Pandas DataFrame

bucky.util.update_data_repos.get_county_population_data(csse_deaths_file, county_fips)

Uses JHU CSSE deaths file to get county-level population data as as fraction of total population across requested list of counties.

  • csse_deaths_file (string) – filename of CSSE deaths file

  • county_fips (array-like) – list of FIPS to return population data for


population_df – DataFrame with population fraction data indexed by FIPS

Return type

Pandas DataFrame

bucky.util.update_data_repos.get_timeseries_data(col_name, filename, fips_key='FIPS', is_csse=True)

Takes a historical data file and reduces it to a dataframe with FIPs, date, and case or death data.

  • col_name (string) – Column name to extract from data.

  • filename (string) – Location of filename to read.

  • fips_key (string, optional) – Key used in file for indicating county-level field.

  • is_csse (boolean) – Indicates whether the file is CSSE data. If True, certain areas without FIPS are included.


df – Dataframe with the historical data indexed by FIPS, date

Return type

Pandas DataFrame


Updates a git repository given its path.


abs_path (string) – Abs path location of repository to update


Performs pre-processing on CSSE data.

CSSE data is separated into two different files: confirmed cases and deaths. These two files are combined into one dataframe, indexed by FIPS and date with two columns, Confirmed and Deaths. This function distributes CSSE that is either unallocated or territory-wide instead of county-wide. Michigan data from the state Department of Corrections and Federal Correctional Institution is distributed to Michigan counties. New York City data which is currently all placed in one county (New York County) is distributed to the other NYC counties. Territory data for Guam, CNMI, and US Virgin Islands is also distributed. This data is written to a CSV.

bucky.util.update_data_repos.process_usafacts(case_file, deaths_file)

Performs preprocessing on USA Facts data.

USAFacts contains unallocated cases and deaths for each state. These are allocated across states based on case distribution in the state.

  • case_file (string) – Location of USAFacts case file

  • deaths_file (string) – Location of USAFacts death file


combined_df – USAFacts data containing cases and deaths indexed by FIPS and date.

Return type

Pandas DataFrame


Downloads and processes data from the Atlantic’s COVID Tracking project to match the format of other preprocessed data sources.

The COVID Tracking project contains data at a state-level. Each state is given a random FIPS selected from all FIPS in that state. This is done to make aggregation easier for plotting later. Processed data is written to a CSV.


Uses git to update public data repos.


Retrieves updated historical data from USA Facts, preprocesses it, and writes to CSV.

bucky.util.util module

class bucky.util.util.TqdmLoggingHandler(level=0)

Bases: logging.Handler


Do whatever it takes to actually log the specified logging record.

This version is intended to be implemented by subclasses and so raises a NotImplementedError.

bucky.util.util.bin_age_csv(filename, out_filename)
bucky.util.util.date_to_t_int(dates, start_date)
class bucky.util.util.dotdict

Bases: dict

dot.notation access to dictionary attributes

bucky.util.util.map_np_array(a, d)

Module contents