The engine flagged the reference dataset for this species — review the signals below before relying on these thresholds.
Derived from 147 genomes. For the derivation pipeline and the PASS / WARN / FAIL verdict model, see the methods page for qualibact-v1.0.
Applied to the full All-The-Bacteria dataset, these thresholds place 131 genomes at PASS, 15 at WARN, and 20 at FAIL (166 assessed in total). The per-tier genome lists can be downloaded below in .csv.gz format; the FAIL list also records the reason each assembly was rejected.
This table summarises the distribution of each metric, including standard deviation, mean, median, and percentiles.
A combined summary table across all species is available on the summary page.
| Metric | Distribution | n | Mean | SD | Min | Q1 | Median | Q3 | Max |
|---|---|---|---|---|---|---|---|---|---|
| N50 | non-normal | 147 | 128,734 | 32,552 | 25,266 | 111,778 | 137,128 | 150,384 | 217,139 |
| no_of_contigs | non-normal | 147 | 45.39 | 18.15 | 20 | 33 | 41 | 52.5 | 111 |
| longest | non-normal | 147 | 266,275 | 77,260 | 104,255 | 203,281 | 264,039 | 345,668 | 428,608 |
| GC_Content | normal | 147 | 29.62 | 0.13 | 29.29 | 29.54 | 29.63 | 29.69 | 29.98 |
| Completeness_Specific | non-normal | 147 | 100 | 0 | 99.99 | 100 | 100 | 100 | 100 |
| Contamination | non-normal | 147 | 0.04 | 0.03 | 0 | 0.02 | 0.03 | 0.05 | 0.15 |
| Total_Coding_Sequences | non-normal | 147 | 1,556 | 54.37 | 1,474 | 1,516 | 1,545 | 1,592 | 1,764 |
| Genome_Size | non-normal | 147 | 1,512,730 | 45,098 | 1,444,764 | 1,475,975 | 1,499,865 | 1,544,245 | 1,647,524 |
Full statistics including KS test vs RefSeq and Wasserstein distance are in the downloadable summary.csv.
Derived from 147 genomes
Both Fail and Warn bands shown as the published rounded values — easier to cite and consistent across the species page, CSV downloads, and downstream QC tools.
| Metric | Fail below | Warn below | Warn above | Fail above |
|---|---|---|---|---|
| Genome_Size | 1,400,000 | 1,400,000 | 1,700,000 | 1,700,000 |
| GC_Content | 29.3 | 29.3 | 29.9 | 30 |
| Total_Coding_Sequences | 1,400 | 1,400 | 1,700 | 1,800 |
| Completeness_Specific | 99 | 100 | - | - |
| Contamination | - | - | 1 | 1 |
| N50 | 38,000 | 56,000 | - | - |
| no_of_contigs | - | - | 100 | 110 |
| longest | - | - | - | - |
How to read this: a value between the two warn columns is typical for this species and passes QC. A value between a warn column and the corresponding fail column is borderline — worth a manual look but not an outright failure. A value outside the fail columns is unusual enough to fail QC.
The published rounded thresholds (the values in the table above) were applied to the full AllTheBacteria-2024-08 set for this species. Each row carries the per-metric verdict and, where applicable, the reason a genome was demoted to WARN or FAIL. Files are gzipped CSV.
This plot shows the relationship between the number of coding sequences (CDS) and genome size — how the number of genes scales with assembly length. The relationship should be roughly linear: as genome size increases, the number of coding sequences should rise proportionally. A secondary trend line or non-linear behaviour can indicate either bona fide sub-populations within the retained genomes (e.g. distinct sub-clades) or residual contamination that survived filtering.
These plots show genomes before and after filtering to highlight the outliers removed:
The filtered distribution shown here may not exactly match the published thresholds because additional rounding and curator adjustments are applied on top.