Quality scores

Given an assertion, $A$, the quality score, $Q(A)$, is a function of the probability that the true base call is different from the assertion, i.e. $\mathbb{P}(\neg A)$. The most common relationship between $Q(A)$ and $\mathbb{P}(\neg A)$ is:

\[ Q(A) = -10 \log_{10} \mathbb{P}(\neg A) \]

where $P(\neg A)$ is the estimated probability of an assertion $A$ being wrong. This is called the "Phred scale" and is sometimes denoted as $Q_{Phred}(A)$.

Sequencing instruments from the early 2000s used a slightly different scale that made use of the log odds:

$$ Q(A) = -10 \log_{10} \frac{ \mathbb{P}(\neg A) }{ 1 - \mathbb{P}(\neg A) } $$ This is called the "Solexa scale" and is sometimes denoted as $Q_{Solexa}(A)$.

In FASTQ files, quality scores are encoded a a single byte ASCII character to match the length of the sequence line. The Q-score is equal to the ASCII character code value plus an offset, that differs across manufacturers and instruments. The scale name is abbreviated as the Q-score transformation name, plus the ASCII value offset for a 0 quality score. For example, $Q_{Phred}$ = ASCII value - 33 is the "Phred33" scale.

Platform	Scale	Offset	ASCII	Q
Element Biosciences	Phred	33	[`!`, `Y`]	[0, 56]
Solexa	Solexa	64	[`;`, `~`]	[-5, 40]
Illumina 1.2 and earlier	Solexa	64	[`;`, `~`]	[-5, 40]
Illumina 1.3, 1.4	Phred	64	[`@`, `~`]	[0, 62]
Illumina 1.5, 1.6, 1.7	Phred	64	[`C`, `~`]	[3, 62]
Illumina 1.8 and later	Phred	33	[`!`, `~`]	[0, 93]
MGI / BGI	Phred	33	[`!`, `~`]	[0, 93]
NCBI / Sanger	Phred	33	[`!`, `~`]	[0, 93]
Oxford Nanopore	Phred	33	[`!`, `~`]	[0, 93]
Pacific Biosciences	Phred	33	[`!`, `~`]	[0, 93]
Singular Genomics	Phred	33	[`!`, `~`]	[0, 93]
Ultima Genomics	Phred	33	[`!`, `~`]	[0, 93]

Quality scores

References