Skip to content

Quality scores

Given an assertion, \(A\), the quality score, \(Q(A)\), is a function of the probability that the true base call is different from the assertion, i.e. \(\mathbb{P}(\neg A)\). The most common relationship between \(Q(A)\) and \(\mathbb{P}(\neg A)\) is:

\[ Q(A) = -10 \log_{10} \mathbb{P}(\neg A) \]

where \(P(\neg A)\) is the estimated probability of an assertion \(A\) being wrong. This is called the "Phred scale" and is sometimes denoted as \(Q_{Phred}(A)\).

Sequencing instruments from the early 2000s used a slightly different scale that made use of the log odds:

$$ Q(A) = -10 \log_{10} \frac{ \mathbb{P}(\neg A) }{ 1 - \mathbb{P}(\neg A) } $$ This is called the "Solexa scale" and is sometimes denoted as \(Q_{Solexa}(A)\).

In FASTQ files, quality scores are encoded a a single byte ASCII character to match the length of the sequence line. The Q-score is equal to the ASCII character code value plus an offset, that differs across manufacturers and instruments. The scale name is abbreviated as the Q-score transformation name, plus the ASCII value offset for a 0 quality score. For example, \(Q_{Phred}\) = ASCII value - 33 is the "Phred33" scale.

Platform Scale Offset ASCII Q
Element Biosciences Phred 33 [!, Y] [0, 56]
Solexa Solexa 64 [;, ~] [-5, 40]
Illumina 1.2 and earlier Solexa 64 [;, ~] [-5, 40]
Illumina 1.3, 1.4 Phred 64 [@, ~] [0, 62]
Illumina 1.5, 1.6, 1.7 Phred 64 [C, ~] [3, 62]
Illumina 1.8 and later Phred 33 [!, ~] [0, 93]
MGI / BGI Phred 33 [!, ~] [0, 93]
NCBI / Sanger Phred 33 [!, ~] [0, 93]
Oxford Nanopore Phred 33 [!, ~] [0, 93]
Pacific Biosciences Phred 33 [!, ~] [0, 93]
Singular Genomics Phred 33 [!, ~] [0, 93]
Ultima Genomics Phred 33 [!, ~] [0, 93]

References

  1. Peter A. J. Cock, et al., "The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants", Nucleic Acids Research, 2009
  2. Illumina: Connected Software
  3. Qiagen CLC Genomics Workbench: Quality scores on the Illumina platform
  4. Element Biosciences: bases2fastq
  5. Pacific Biosciences: obc2fastq Reference Guide v6.0
  6. Oxford Nanopore: Data Analysis
  7. Singular Genomics: FASTQ Data Format