FASTQ¶

Prebuilt FASTQ Data¶

Load yeast test files (sourced from SRR1569870)

bio_test_artifacts.prebuilt.fastq_single_end(gzip=True)

Make a single-end FASTQ file from NCBI SRR1569870 This will have 10580 sequences, each of which is 101bp. The PHRED scores are Illumina (PHRED +33)

Parameters: gzip (bool) – Provide a gzipped FASTQ (.fastq.gz). Defaults to True.
Returns: The absolute paths to two FASTQ files
Return type: str, str

bio_test_artifacts.prebuilt.fastq_paired_end(gzip=True)

Make two paired-end FASTQ files from NCBI SRR1569870 Both will have 10580 sequences, each of which is 101bp. The PHRED scores are Illumina (PHRED + 33)

Parameters: gzip (bool) – Provide a gzipped FASTQ (.fastq.gz). Defaults to True.
Returns: The absolute paths to two FASTQ files
Return type: str, str

Synthesized FASTQ Data¶

Synthesize a FASTA file

bio_test_artifacts.generate.fastq_generate(num_sequences=100, seq_length=75, gzip_output=True, random_seed=42, target_file=None, alphabet='ATGC', probabilities=None)

Generate a random FASTQ file. PHRED scores will be illumina (+33).

Parameters

num_sequences (int, optional) – Number of separate sequence records to include in the file. Defaults to 100 records
seq_length (int, optional) – Length in bases of each individual record in the file, Defaults to 75 bases per record
gzip_output (bool, optional) – Should the output be a gzipped FASTQ. Defaults to True
random_seed (int, optional) – Seeding for RNG. Defaults to 42
target_file (str, optional) – File target. Optional. Will create a temp file in $TMP if not set.
alphabet (str, optional) – A string containing the alphabet for FASTQ record, Defaults to ATGC
probabilities (tuple(float), optional) – An iterable with probabilities for each character in the alphabet string. Defaults to balanced

Returns

An absolute pathname to a TSV file and a list of (header, seq, quality) tuples. Quality is a list of integer PHRED scores, not a character string

Return type

str, list(tuple())