4.1.1.3.1.1. datanator.data_source.array_express_tools package

4.1.1.3.1.1.1. Submodules

4.1.1.3.1.1.2. datanator.data_source.array_express_tools.ensembl_tools module

class datanator.data_source.array_express_tools.ensembl_tools.StrainInfo(organism_strain, download_url, full_strain_specificity, domain)[source]

Bases: object

Represents information about an ensembl reference genome

organism_strain[source]

the ensembl strain in the reference genome

Type

str

download_url[source]

the url for that strain’s refernce genome

Type

str

full_strain_specificity[source]

whether or not the strain mathces the full specifity provided in the arra express sample

Type

bool

datanator.data_source.array_express_tools.ensembl_tools.find_nth(haystack, needle, n)[source]
datanator.data_source.array_express_tools.ensembl_tools.format_org_name(name)[source]

Format the name of an organism so normalize all species names

Args:

name (bool): the name of a spcies (e.g. escherichia coli str. k12)

Returns:

str: the normalized version of the strain name (e.g. escherichia coli k12)

datanator.data_source.array_express_tools.ensembl_tools.get_ftp_url(url)[source]
datanator.data_source.array_express_tools.ensembl_tools.get_json_ends(tree)[source]
datanator.data_source.array_express_tools.ensembl_tools.get_ref_seq_url(org_symbol)[source]
datanator.data_source.array_express_tools.ensembl_tools.get_strain_info(sample)[source]

Get information about the refernce genome that should be used for a given sample

Args:

sample (array_express.Sample): an RNA-Seq sample

Returns:

EnsembleInfo: Ensembl information about the reference genome

datanator.data_source.array_express_tools.ensembl_tools.get_taxonomic_lineage(base_species)[source]

Get the lineage of a species

Parameters

base_species (bool) – a species (e.g. escherichia coli)

Returns

list of str: a list of strings corresponding to the layer of its taxonomy

4.1.1.3.1.1.3. Module contents