bucky.util.data_sync#

Utilities to pull/update PAI data sources.

Module Contents#

Functions#

_exec_shell_cmd(cmd, cwd=None)

Exec a shell command optionally from a specified directory.

_git_clone(url, local_name, abs_path, bare=False, depth=1, tag=None)

Updates a git repository given its path.

_git_pull(abs_path, rebase=True)

Pull a git repo at a given path.

_hash_file_obj(obj)

Hash a file-like object.

_locate_included_data()

locate the base_config package that shipped with bucky (it's likely in site-packages).

_process_one_datasource(source_cfg, raw_data_dir)

Perform all the processing needed for a single data source.

_unzip_file_obj_to_dir(f, output_dir=None)

Unzip a file to a directory.

_write_filelike(src, dest, buffer_size=16384)

Write a file-like object to disk.

process_datasources(data_sources, data_dir, ssl_no_verify=False, n_jobs=None)

Process all the data sources found in the config w/ multiprocessing.

exception bucky.util.data_sync.BuckySyncException[source]#

Bases: bucky.exceptions.BuckyException

Exception for sync operations.

bucky.util.data_sync._exec_shell_cmd(cmd, cwd=None)[source]#

Exec a shell command optionally from a specified directory.

bucky.util.data_sync._git_clone(url, local_name, abs_path, bare=False, depth=1, tag=None)[source]#

Updates a git repository given its path.

Parameters:

abs_path (str) – Abs path location of repository to update

bucky.util.data_sync._git_pull(abs_path, rebase=True)[source]#

Pull a git repo at a given path.

bucky.util.data_sync._hash_file_obj(obj)[source]#

Hash a file-like object.

bucky.util.data_sync._locate_included_data()[source]#

locate the base_config package that shipped with bucky (it’s likely in site-packages).

bucky.util.data_sync._process_one_datasource(source_cfg, raw_data_dir)[source]#

Perform all the processing needed for a single data source.

bucky.util.data_sync._unzip_file_obj_to_dir(f, output_dir=None)[source]#

Unzip a file to a directory.

bucky.util.data_sync._write_filelike(src, dest, buffer_size=16384)[source]#

Write a file-like object to disk.

bucky.util.data_sync.process_datasources(data_sources, data_dir, ssl_no_verify=False, n_jobs=None)[source]#

Process all the data sources found in the config w/ multiprocessing.