Skip to content

Requests

We track requests for QualiBact in two buckets: organisms to add to the catalogue, and metrics or features to add to the QC framework. Each is listed below with a short status note so users can see what's in flight, what's parked, and what we've decided not to do.

Organism requests

We maintain a running list of organisms that are frequently requested or in progress for QualiBact coverage. If you are evaluating the catalogue and notice that a species you rely on is missing, this page should help you understand whether it is already queued or how to get it added.

How to request a new organism

  • Check the list below to confirm whether your species is already planned.
  • If it is missing, send a note to [email protected] with the organism name(s), any relevant strain or dataset details, and your intended use so we can prioritize appropriately.
  • Please review the contribution guidelines for details on data requirements so we can evaluate your submission quickly.

Species under active dataset review

The Nov 2025 expert-feedback survey flagged a small number of species where the reference set itself looks heterogeneous — bimodal distributions, mixed clades, or RefSeq genomes whose ANI suggests they may not actually be the species in question. We are working on better-defined reference datasets so the published thresholds for these species can be re-derived from a cleaner pool:

  • Burkholderia mallei — taxonomic classification needs a curated reference set distinct from B. pseudomallei.
  • Haemophilus influenzae — two RefSeq references (GCF_002985465.2, GCF_002984345.2) push the assembly-size upper bound up, but fastANI against H. influenzae is inconclusive (~94 %); flagged for follow-up.
  • Klebsiella grimontii — species-ID inconsistencies reported against an external K. oxytoca species-complex reference database.
  • Serratia nevei — patchy N50-vs-total-length distribution, possibly reflecting underlying heterogeneity or differences in assembly strategies.
  • Serratia ureilytica — bimodal N50-vs-total-length distribution, possibly reflecting taxonomic issues or significant ICEs / plasmids.
  • Staphylococcus coagulans — bimodal distribution; some assemblies need to be reviewed.

Status banners for the affected species appear on each species page where the engine output is otherwise published. As cleaner reference sets land, the threshold tables will be regenerated from the engine and the banners removed.

Metric and feature requests

Suggestions we've received about the QualiBact framework itself.

Request Status Notes
Ns per 100 kbp Future release Counting ambiguous bases needs a re-pass over the ~1.2 million reference assemblies and a new engine field. Scheduled for the next major engine refresh.
BUSCO completeness as an extra metric Declined Marginal value over CheckM-based completeness; not worth re-running BUSCO across the reference set.
Read-level QC (depth, base-quality, etc.) Declined Out of scope. Read information isn't recoverable from a static assembly. We recommend bactscout as a companion tool for read-level checks.
Identify the contaminating organism per assembly Declined A per-sample, per-lab question. QualiBact publishes the contamination threshold; identification of the contaminant belongs in the downstream QC tool (SpecCheck, CheckM2, sylph).
≥1 kb minimum contig length for no_of_contigs / total_length Declined QualiBact counts all contigs by design. This is the output as it comes from the assembly pipeline.

How to request a metric or feature

Send a note to [email protected] with what you're proposing, which species or use case it would benefit, and any references. Concrete proposals (with example data or threshold suggestions) are easier to action than open-ended asks. Thanks for helping us expand QualiBact!