Count Matrices¶
Prebuilt Count Data¶
Load yeast test files (sourced from GSE135430 and GSE125162 )
-
bio_test_artifacts.prebuilt.
counts_yeast_tpm
(gzip=False) Make a count TSV file from GEO record GSE135430. This will have 12 non-header rows and 6685 gene columns. All values are floats. It is in Transcripts Per Million (TPM) and all rows should approximately sum to 1e6
- Parameters
gzip (bool) – Provide a gzipped (.gz) file instead of a .tsv file. Defaults to False.
- Returns
An absolute path to the count TSV file and a pandas DataFrame (12 x 6685)
- Return type
str, pd.DataFrame
-
bio_test_artifacts.prebuilt.
counts_yeast_tpm_chr01
(gzip=False) Make a count TSV file from GEO record GSE135430. This will have 12 non-header rows and 121 gene columns. All values are floats. It is in Transcripts Per Million (TPM), but it is a subset of the entire genome and rows do not sum to 1e6.
- Parameters
gzip (bool) – Provide a gzipped (.gz) file instead of a .tsv file. Defaults to False.
- Returns
An absolute path to the count TSV file and a pandas DataFrame (12 x 121)
- Return type
str, pd.DataFrame
-
bio_test_artifacts.prebuilt.
counts_yeast_single_cell_chr01
(gzip=False, filetype='tsv') Make a count TSV file from GEO record GSE125162. This will have 38225 non-header rows and 121 gene columns. All values are integers. It is in (UMI) counts.
- Parameters
gzip (bool) – Provide a gzipped (.gz) file instead of a .tsv file. Defaults to False.
filetype (str) – Which type of count file to provide. Options are TSV, MTX, HDF5, and H5AD
- Returns
An absolute path to the count file and a pandas DataFrame (38225 x 121) In the case of the MTX option, a tuple of .mtx, .genes.tsv, and .features.tsv will be returned instead of a single path.
- Return type
str, pd.DataFrame
Synthesized Count Data¶
Synthesize a count matrix TSV file
-
bio_test_artifacts.generate.
counts_tsv_generate
(num_obs=25, num_genes=5000, random_seed=42, integer=True, target_file=None, n=100, p=0.1, transpose=False) Generate a random TSV count or TPM matrix
- Parameters
num_obs (int, optional) – Number of observations / samples. Defaults to 25
num_genes (int, optional) – Number of genes. Defaults to 5000
random_seed (int, optional) – Seeding for RNG. Defaults to 42
target_file (str, optional) – File target. Optional. Will create a temp file in $TMP if not set.
integer (bool, optional) – Generate an integer count matrix if True. Converts to float TPM if False. Defaults to True
n (int, optional) – Negative binomial N parameter. Defaults to 100
p (float, optional) – Negative binomial p parameter. Defaults to 0.1
transpose (bool, optional) – Transposes to Genes by Samples if True. Defaults to False
- Returns
An absolute pathname to a TSV file and a pandas dataframe containing counts
- Return type
str, pd.DataFrame