FASTA

Prebuilt FASTA Data

Load yeast test files (sourced from YeastGenome)

bio_test_artifacts.prebuilt.fasta_yeast_chr01()

Make a FASTA file containing yeast chromosome 1 (chr1).

Returns

An absolute path to the FASTA file and a list [(header, sequence)]

Return type

str, list(tuple(str, str))

bio_test_artifacts.prebuilt.fasta_protein_yeast_chr01()

Make a FASTA file containing the first two protein sequences located on yeast chromosome 1 (chr1).

Returns

A path to the FASTA file and a list [(header0, sequence0), (header1, sequence1)]

Return type

str, list(tuple(str, str))

Synthesized FASTA Data

Synthesize a FASTA file

bio_test_artifacts.generate.fasta_generate(num_records=20, record_length=100, random_seed=42, target_file=None, alphabet='ATGC', probabilities=None)

Generate a random FASTA file

Parameters
  • num_records (int, optional) – Number of separate records to include in the file. Defaults to 20 records

  • record_length (int, optional) – Length in bases of each individual record in the file, Defaults to 100 bases per record

  • random_seed (int, optional) – Seeding for RNG. Defaults to 42

  • target_file (str, optional) – File target. Optional. Will create a temp file in $TMP if not set.

  • alphabet (str, optional) – A string containing the alphabet for FASTA record, Defaults to ATGC

  • probabilities (tuple(float), optional) – An iterable with probabilities for each character in the alphabet string. Defaults to balanced

Returns

An absolute pathname to a TSV file and a list of (header, seq) tuples

Return type

str, list(tuple())