Resource downloaders¶

podium.storage.resources.downloader module¶

Module downloader offers classes for downloading files from the given uri.

It is consisted from base class BaseDownloader that every downloader implements. Special class of downloaders are downloaders that use HTTP protocol, their base class is HTTPDownloader and its simple implementation is SimpleHttpDownloader.

class podium.storage.resources.downloader.BaseDownloader[source]¶

Bases: abc.ABC

BaseDownloader interface for downloader classes.

abstract classmethod download(uri, path, overwrite=False, **kwargs)[source]¶

Function downloades file from given URI to given path. If the overwrite variable is true and given path already exists it will be overwriten with new file.

Parameters
  • uri (str) – URI of file that needs to be downloaded

  • path (str) – destination path where to save downloaded file

  • overwrite (bool) – if true and given path exists downloaded file will overwrite existing files

Returns

rewrite_status – True if download was successful or False if the file already exists and given overwrite value was False.

Return type

bool

Raises
  • ValueError – if given uri or path are None

  • RuntimeError – if there was an error while obtaining resource from uri

class podium.storage.resources.downloader.HttpDownloader[source]¶

Bases: podium.storage.resources.downloader.BaseDownloader

Interface for downloader that uses http protocol for data transfer.

class podium.storage.resources.downloader.SCPDownloader[source]¶

Bases: podium.storage.resources.downloader.BaseDownloader

Class for downloading file from server using sftp on top of ssh protocol.

USER_NAME_KEY¶

key for defining keyword argument for username

Type

str

PASSWORD_KEY¶

key for defining keyword argument for password if the private key file uses paraphrase, user should define it here

Type

str, optional

HOST_ADDR_KEY¶

key for defining keyword argument for remote host address

Type

str

PRIVATE_KEY_FILE_KEY¶

key for defining keyword argument for private key location if the user uses default linux private key location this argument can be set to None

Type

str, optional

classmethod download(uri, path, overwrite=False, **kwargs)[source]¶

Method downloads a file from the remote machine and saves it to the local path. If the overwrite variable is true and given path already exists it will be overwriten with new file.

Parameters
  • uri (str) – URI of the file on remote machine

  • path (str) – path of the file on local machine

  • overwrite (bool) – if true and given path exists downloaded file will overwrite existing files

  • kwargs (dict(str, str)) – key word arguments that are described in class attributes used for connecting to the remote machine

Returns

rewrite_status – True if download was successful or False if the file already exists and given overwrite value was False.

Return type

bool

Raises
  • ValueError – If given uri or path are None, or if the host is not defined.

  • RuntimeError – If there was an error while obtaining resource from uri.

class podium.storage.resources.downloader.SimpleHttpDownloader[source]¶

Bases: podium.storage.resources.downloader.HttpDownloader

Downloader that uses HTTP protocol for downloading.

It doesn’t offer content confirmation (as needed for example in google drive) or any kind of authentication.

classmethod download(uri, path, overwrite=False, **kwargs)[source]¶

Function downloades file from given URI to given path. If the overwrite variable is true and given path already exists it will be overwriten with new file.

Parameters
  • uri (str) – URI of file that needs to be downloaded

  • path (str) – destination path where to save downloaded file

  • overwrite (bool) – if true and given path exists downloaded file will overwrite existing files

Returns

rewrite_status – True if download was successful or False if the file already exists and given overwrite value was False.

Return type

bool

Raises
  • ValueError – if given uri or path are None

  • RuntimeError – if there was an error while obtaining resource from uri

podium.storage.resources.large_resource module¶

Module contains class for defining large resource.

Classes that contain large resources that should be downloaded should use this module.

class podium.storage.resources.large_resource.LargeResource(**kwargs)[source]¶

Bases: object

Large resource that needs to download files from URL. Class also supports archive decompression.

BASE_RESOURCE_DIR¶

base large files directory path

Type

str

RESOURCE_NAME¶

key for defining resource directory name parameter

Type

str

URL¶

key for defining resource url parameter

Type

str

ARCHIVE¶

key for defining archiving method paramter

Type

str

SUPPORTED_ARCHIVE¶

list of supported archive file types

Type

list(str)

Creates large resource file. If the file is not in resource_location it will be dowloaded from url and if needed decompressed. Resource location is defined as BASE_RESOURCE_DIR+RESOURCE_NAME.

Parameters

kwargs (dict(str, str)) – key word arguments that define RESOURCE_NAME, URL and optionally archiving method ARCHIVE

class podium.storage.resources.large_resource.SCPLargeResource(**kwargs)[source]¶

Bases: podium.storage.resources.large_resource.LargeResource

Large resource that needs to download files from URI using scp protocol. For other functionalities class uses Large Resource class.

SCP_HOST_KEY¶

key for keyword argument that defines remote host address

Type

str

SCP_USER_KEY¶

key for keyword argument that defines remote host username

Type

str

SCP_PASS_KEY¶

key for keyword argument that defines remote host password or passphrase used in private key

Type

str, optional

SCP_PRIVATE_KEY¶

key for keyword argument that defines location for private key on linux OS it can be optional if the key is in default location

Type

str, optional

Creates large resource file. If the file is not in resource_location it will be dowloaded from url and if needed decompressed. Resource location is defined as BASE_RESOURCE_DIR+RESOURCE_NAME.

Parameters

kwargs (dict(str, str)) – key word arguments that define RESOURCE_NAME, URL and optionally archiving method ARCHIVE

podium.storage.resources.large_resource.init_scp_large_resource_from_kwargs(resource, uri, archive, scp_host, user_dict)[source]¶

Method initializes scp resource from resource informations and user credentials.

Parameters
  • resource (str) – resource name, same as LargeResource.RESOURCE_NAME

  • uri (str) – resource uri, same as LargeResource.URI

  • archive (str) – archive type, see LargeResource.ARCHIVE

  • scp_host (str) – remote host adress, see SCPLargeResource.SCP_HOST_KEY

  • user_dict (dict(str, str)) – user dictionary that may contain scp_user that defines username, scp_private_key that defines path to private key, scp_pass_key that defines user password

podium.storage.resources.util module¶

Module contains storage utility methods.

podium.storage.resources.util.copyfileobj_with_tqdm(finput, foutput, total_size, buffer_size=16384)[source]¶

Function copies file like input finput to file like output foutput. Total size is used to display progress bar and buffer size to determine size of the buffer used for copying. The implementation is based on shutil.copyfileobj.

Parameters
  • finput (file like object) – input object from which to copy the data

  • foutput (file like object) – output object to which the data is copied

  • total_size (int) – total input file size used for computing progress and displaying progress bar

  • buffer_size (int) – constant used for determining maximal buffer size

podium.storage.resources.util.extract_tar_file(archive_file, destination_dir, encoding='uft-8')[source]¶

Method extracts tar archive to destination, including those archives that are created using gzip, bz2 and lzma compression.

Parameters
  • archive_file (str) – path to the archive file that needs to be extracted

  • destination_dir (str) – path where file needs to be decompressed

Raises

ValueError – If given archive file doesn’t exist.

podium.storage.resources.util.extract_zip_file(archive_file, destination_dir)[source]¶

Method extracts zip archive to destination.

Parameters
  • archive_file (str) – path to the archive file that needs to be extracted

  • destination_dir (str) – path where file needs to be decompressed

Raises

ValueError – If given archive file doesn’t exist.

Module contents¶