enterobase v2.3 methods
enterobase-v2.3 mirrors the assembly-QC cutoffs used by EnteroBase (Zhou et al. 2020) for its public-genome QA pipeline, version 2.3. We surface these alongside QualiBact's own schemes so users can compare a long-running third-party platform side-by-side with QualiBact's engine-derived numbers. It is not intended as a replacement for qualibact-v1.0 / qualibact-v1.1 — those are derived from a much wider per-species reference distribution, whereas EnteroBase publishes a single set of cutoffs for an entire genus.
What EnteroBase's QA pipeline does
The EnteroBase backend pipeline (backend documentation) evaluates every public assembly against five operational criteria:
- Total assembly length — must fall within a species-/genus-specific min–max band.
- N50 — must exceed a species-/genus-specific minimum.
- Contig count — must not exceed a species-/genus-specific maximum.
- Proportion of N's — fraction of low-quality / ambiguous sites must stay below 3–6 %.
- Species-call agreement — fraction of contigs assigned to the expected species (using Kraken; v0.10.5-beta in the EnteroBase pipeline) must exceed 65–70 %.
In addition, the pipeline applies a contig-level depth filter before evaluation: contigs whose mean read depth is less than 20 % of the genome-wide mean are dropped, on the assumption that they represent low-level contamination.
EnteroBase publishes a single set of cutoffs per genus (Salmonella, Escherichia/Shigella, Yersinia, Klebsiella, Pseudomonas, etc., plus a default block) — there is no per-species refinement and no WARN tier. A given assembly either meets every criterion or it fails.
What QualiBact's enterobase-v2.3 scheme publishes
For every QualiBact species in a genus EnteroBase actively curates, we copy the four metric cutoffs that map onto QualiBact's threshold table:
| EnteroBase configuration key | QualiBact metric | Side |
|---|---|---|
min total bases |
Genome_Size |
FAIL_lower |
max total bases |
Genome_Size |
FAIL_upper |
min N50 |
N50 |
FAIL_lower |
max contig number |
no_of_contigs |
FAIL_upper |
The values are taken verbatim — no rounding, no statistical re-derivation. The published source flag reads external so it's distinguishable from QualiBact's computed (engine-derived) and pinned (expert-overridden) rows.
QualiBact's WARN columns are deliberately left blank for this scheme — EnteroBase has only one tier (PASS / FAIL), so synthesising a borderline band would misrepresent the source.
Coverage is restricted to species EnteroBase actually QCs — a species in a genus EnteroBase doesn't list (e.g. Achromobacter xylosoxidans) gets no enterobase-v2.3 row, and therefore no species page under this scheme.
Where the data lives
enterobase-v2.3 is not included in the canonical /api/v2/thresholds.csv — that file is QualiBact-published thresholds only. Third-party schemes are published separately under /api/v2/external/:
/api/v2/external/thresholds.csv— flat CSV, only the rows EnteroBase actually defines./api/v2/external/thresholds.json— same data, nested by (species, scheme)./api/v2/external/index.json— registry of species covered by each external scheme.
This keeps a clean separation between QualiBact's official thresholds and third-party gates surfaced for comparison.
Metrics EnteroBase checks that are out of QualiBact's scope
Two EnteroBase criteria are not part of QualiBact's assembly-level QC remit and therefore don't appear in the threshold table:
- Proportion of N's per assembly. Counting ambiguous bases at the assembly level is on QualiBact's roadmap (see the Requests page) but requires a re-pass over the ~2M-genome AllTheBacteria reference set and is not in
qualibact-v1.x. EnteroBase's cutoff of <3–6 % is recorded in the per-speciesenterobase-notes.jsonsidecar for completeness. - Kraken species-call agreement. This is a species-identification metric, not an assembly metric. We recommend bactscout as a companion tool for read-level / species-call QC; on the assembly side, the engine's own
severityflag already calls out species-separation issues per species.
Why this scheme is not preferred
enterobase-v2.3 is deliberately never selected as a species' preferred scheme. The species page defaults to qualibact-v1.1 (where it exists) or qualibact-v1.0. Users have to switch to EnteroBase explicitly via the scheme switcher to see its cutoffs side-by-side. This is to avoid implicitly endorsing a third-party, single-tier, genus-level gate as the canonical QualiBact answer.
Citation
Zhou Z, Alikhan N-F, Mohamed K, et al. The EnteroBase user's guide, with case studies on Salmonella transmissions, Yersinia pestis phylogeny, and Escherichia core genomic diversity. Genome Research 30 (2020): 138–152. doi:10.1101/gr.251678.119