FASTQ¶
Prebuilt FASTQ Data¶
Load yeast test files (sourced from SRR1569870)
-
bio_test_artifacts.prebuilt.
fastq_single_end
(gzip=True) Make a single-end FASTQ file from NCBI SRR1569870 This will have 10580 sequences, each of which is 101bp. The PHRED scores are Illumina (PHRED +33)
- Parameters
gzip (bool) – Provide a gzipped FASTQ (.fastq.gz). Defaults to True.
- Returns
The absolute paths to two FASTQ files
- Return type
str, str
-
bio_test_artifacts.prebuilt.
fastq_paired_end
(gzip=True) Make two paired-end FASTQ files from NCBI SRR1569870 Both will have 10580 sequences, each of which is 101bp. The PHRED scores are Illumina (PHRED + 33)
- Parameters
gzip (bool) – Provide a gzipped FASTQ (.fastq.gz). Defaults to True.
- Returns
The absolute paths to two FASTQ files
- Return type
str, str
Synthesized FASTQ Data¶
Synthesize a FASTA file
-
bio_test_artifacts.generate.
fastq_generate
(num_sequences=100, seq_length=75, gzip_output=True, random_seed=42, target_file=None, alphabet='ATGC', probabilities=None) Generate a random FASTQ file. PHRED scores will be illumina (+33).
- Parameters
num_sequences (int, optional) – Number of separate sequence records to include in the file. Defaults to 100 records
seq_length (int, optional) – Length in bases of each individual record in the file, Defaults to 75 bases per record
gzip_output (bool, optional) – Should the output be a gzipped FASTQ. Defaults to True
random_seed (int, optional) – Seeding for RNG. Defaults to 42
target_file (str, optional) – File target. Optional. Will create a temp file in $TMP if not set.
alphabet (str, optional) – A string containing the alphabet for FASTQ record, Defaults to ATGC
probabilities (tuple(float), optional) – An iterable with probabilities for each character in the alphabet string. Defaults to balanced
- Returns
An absolute pathname to a TSV file and a list of (header, seq, quality) tuples. Quality is a list of integer PHRED scores, not a character string
- Return type
str, list(tuple())