bucky.util.update_data_repos
A utility for fetching updated data for mobility and case data from public repositories.
This module pulls from public git repositories and preprocessed the data if necessary. For case data, unallocated or unassigned cases are distributed as necessary.
distribute_data_by_population(total_df, dist_vect, data_to_dist, replace)
distribute_data_by_population
Distributes data by population across a state or territory.
distribute_mdoc(df, csse_deaths_file)
distribute_mdoc
Distributes Michigan Department of Corrections data across Michigan counties by population.
distribute_nyc_data(df)
distribute_nyc_data
Distributes NYC case data across the six NYC counties.
distribute_territory_data(df, add_american_samoa)
distribute_territory_data
Distributes territory-wide case and death data for territories.
distribute_unallocated_csse(confirmed_file, deaths_file, hist_df)
distribute_unallocated_csse
Distributes unallocated historical case and deaths data from CSSE.
distribute_utah_data(df, csse_deaths_file)
distribute_utah_data
Distributes Utah case data for local health departments spanning multiple counties.
get_county_population_data(csse_deaths_file, county_fips)
get_county_population_data
Uses JHU CSSE deaths file to get county population data as as fraction of population across list of counties.
get_timeseries_data(col_name, filename, fips_key='FIPS', is_csse=True)
get_timeseries_data
Transforms a historical data file to a dataframe with FIPs, date, and case or death data.
git_pull(abs_path)
git_pull
Updates a git repository given its path.
main()
main
Uses git to update public data repos.
process_csse_data()
process_csse_data
Performs pre-processing on CSSE data.
process_usafacts(case_file, deaths_file)
process_usafacts
Performs preprocessing on USA Facts data.
update_covid_tracking_data()
update_covid_tracking_data
Downloads and processes data from the COVID Tracking project to match the format of other preprocessed data.
update_hhs_hosp_data()
update_hhs_hosp_data
Retrieves updated historical data from healthdata.gov and writes to CSV.
update_usafacts_data()
update_usafacts_data
Retrieves updated historical data from USA Facts, preprocesses it, and writes to CSV.
bucky.util.update_data_repos.
ADD_AMERICAN_SAMOA
MI_PRISON_UIDS
TERRITORY_DATA
UT_LHD_UIDS
total_df (pandas.DataFrame) – DataFrame containing confirmed and death data indexed by date and FIPS code
pandas.DataFrame
dist_vect (pandas.DataFrame) – Population data for each county as proportion of total state population, indexed by FIPS code
data_to_dist (pandas.DataFrame) – Data to distribute, indexed by data
replace (bool) – If true, distributed values overwrite current historical data in DataFrame. If false, distributed values are added to current data
bool
total_df – Modified input dataframe with distributed data
df (pandas.DataFrame) – Current historical DataFrame indexed by FIPS and date, which includes MDOC and FCI data
csse_deaths_file (str) – File location of CSSE deaths file (contains population data)
str
df – Modified historical dataframe with Michigan prison data distributed and added to Michigan data
TODO add deprecation warning b/c csse has fixed this
df (pandas.DataFrame) – DataFrame containing historical data indexed by FIPS and date
df – Modified DataFrame containing corrected NYC historical data indexed by FIPS and date
Uses county population to distribute cases for US Virgin Islands, Guam, and CNMI. Optionally adds a single case to the most populous American Samoan county.
df (pandas.DataFrame) – Current historical DataFrame indexed by FIPS and date, which includes territory-wide case and death data
add_american_samoa (bool) – If true, adds 1 case to American Samoa
df – Modified historical dataframe with territory-wide data distributed to counties
JHU CSSE data contains state-level unallocated data, indicated with “Unassigned” or “Out of” for each state. This function distributes these unallocated cases based on the proportion of cases in each county relative to the state.
confirmed_file (str) – filename of CSSE confirmed data
deaths_file (str) – filename of CSSE death data
hist_df (pandas.DataFrame) – current historical DataFrame containing confirmed and death data indexed by date and FIPS code
hist_df – modified historical DataFrame with cases and deaths distributed
Utah has 13 local health districts, six of which span multiple counties. This function distributes those cases and deaths by population across their constituent counties.
csse_deaths_file (str) – File location of CSSE deaths file
df – Modified DataFrame containing corrected Utah historical data indexed by FIPS and date
csse_deaths_file (str) – filename of CSSE deaths file
county_fips (numpy.ndarray) – list of FIPS to return population data for
numpy.ndarray
population_df – DataFrame with population fraction data indexed by FIPS
col_name (str) – Column name to extract from data.
filename (str) – Location of filename to read.
fips_key (str, optional) – Key used in file for indicating county-level field.
is_csse (bool, optional) – Indicates whether the file is CSSE data. If True, certain areas without FIPS are included.
df – Dataframe with the historical data indexed by FIPS, date
abs_path (str) – Abs path location of repository to update
CSSE data is separated into two different files: confirmed cases and deaths. These two files are combined into one dataframe, indexed by FIPS and date with two columns, Confirmed and Deaths. This function distributes CSSE that is either unallocated or territory-wide instead of county-wide. Michigan data from the state Department of Corrections and Federal Correctional Institution is distributed to Michigan counties. New York City data which is currently all placed in one county (New York County) is distributed to the other NYC counties. Territory data for Guam, CNMI, and US Virgin Islands is also distributed. This data is written to a CSV.
USAFacts contains unallocated cases and deaths for each state. These are allocated across states based on case distribution in the state.
case_file (str) – Location of USAFacts case file
deaths_file (str) – Location of USAFacts death file
combined_df – USAFacts data containing cases and deaths indexed by FIPS and date.
The COVID Tracking project contains data at a state-level. Each state is given a random FIPS selected from all FIPS in that state. This is done to make aggregation easier for plotting later. Processed data is written to a CSV.
bucky.util.spline_smooth
bucky.util.util