Count Matrices

Prebuilt Count Data

Load yeast test files (sourced from GSE135430 and GSE125162 )

bio_test_artifacts.prebuilt.counts_yeast_tpm(gzip=False)

Make a count TSV file from GEO record GSE135430. This will have 12 non-header rows and 6685 gene columns. All values are floats. It is in Transcripts Per Million (TPM) and all rows should approximately sum to 1e6

Parameters

gzip (bool) – Provide a gzipped (.gz) file instead of a .tsv file. Defaults to False.

Returns

An absolute path to the count TSV file and a pandas DataFrame (12 x 6685)

Return type

str, pd.DataFrame

bio_test_artifacts.prebuilt.counts_yeast_tpm_chr01(gzip=False)

Make a count TSV file from GEO record GSE135430. This will have 12 non-header rows and 121 gene columns. All values are floats. It is in Transcripts Per Million (TPM), but it is a subset of the entire genome and rows do not sum to 1e6.

Parameters

gzip (bool) – Provide a gzipped (.gz) file instead of a .tsv file. Defaults to False.

Returns

An absolute path to the count TSV file and a pandas DataFrame (12 x 121)

Return type

str, pd.DataFrame

bio_test_artifacts.prebuilt.counts_yeast_single_cell_chr01(gzip=False, filetype='tsv')

Make a count TSV file from GEO record GSE125162. This will have 38225 non-header rows and 121 gene columns. All values are integers. It is in (UMI) counts.

Parameters
  • gzip (bool) – Provide a gzipped (.gz) file instead of a .tsv file. Defaults to False.

  • filetype (str) – Which type of count file to provide. Options are TSV, MTX, HDF5, and H5AD

Returns

An absolute path to the count file and a pandas DataFrame (38225 x 121) In the case of the MTX option, a tuple of .mtx, .genes.tsv, and .features.tsv will be returned instead of a single path.

Return type

str, pd.DataFrame

Synthesized Count Data

Synthesize a count matrix TSV file

bio_test_artifacts.generate.counts_tsv_generate(num_obs=25, num_genes=5000, random_seed=42, integer=True, target_file=None, n=100, p=0.1, transpose=False)

Generate a random TSV count or TPM matrix

Parameters
  • num_obs (int, optional) – Number of observations / samples. Defaults to 25

  • num_genes (int, optional) – Number of genes. Defaults to 5000

  • random_seed (int, optional) – Seeding for RNG. Defaults to 42

  • target_file (str, optional) – File target. Optional. Will create a temp file in $TMP if not set.

  • integer (bool, optional) – Generate an integer count matrix if True. Converts to float TPM if False. Defaults to True

  • n (int, optional) – Negative binomial N parameter. Defaults to 100

  • p (float, optional) – Negative binomial p parameter. Defaults to 0.1

  • transpose (bool, optional) – Transposes to Genes by Samples if True. Defaults to False

Returns

An absolute pathname to a TSV file and a pandas dataframe containing counts

Return type

str, pd.DataFrame