LIMS Export — Configuration & Usage
This guide explains how to configure and use the LIMS export feature in the Bonsai API, covering both the HTTP API and the CLI.
Overview
The LIMS export produces a tabular file (TSV or CSV) composed of rows with the columns:
sample_idparameter_nameparameter_valuecomment
For a given sample_id, the system looks up an assay-specific configuration and then,
for each configured field, calls a formatter by its data_type to compute the value and comment.
A single field produces exactly one row. The total number of rows equals the number
of fields in the matched AssayConfig. If you want to export resistance to multiple antibiotics
or report motif that are expected to confere high or low level of resistance separately you have to
have one field per antibiotic or resistance level.
Configuration Schema (YAML)
The export configuration is a YAML list of AssayConfig entries:
assay: str— must matchsample.pipeline.assayfor the samplefields: list[FieldDefinition]— one entry per desired output row
FieldDefinition:
parameter_name: str— the label that appears in the LIMS outputdata_type: str— the registered formatter name (see Built-in Formatters)required: bool— whether the analysis must be present on the sampleoptions: dict— optional arguments passed to the formatter
Example
# lims_export.yaml
- assay: "saureus"
fields:
- parameter_name: "Species (Bracken)"
data_type: "species"
required: true
options:
software: "bracken" # default "bracken" | e.g. "mykrobe"
sort_by: "fraction_total_reads" # default depends on software
- parameter_name: "QC Status"
data_type: "qc"
required: true
options: {}
- parameter_name: "Lineage (TBProfiler)"
data_type: "lineage"
required: false
options: {}
- parameter_name: "MLST ST"
data_type: "mlst"
required: false
options: {}
- parameter_name: "Rifampicin resistance variants"
data_type: "amr"
required: false
options:
antibiotic_name: "rifampicin"
software: "tbprofiler" # default
resistance_level: "all" # include only motif with predicted resistance level, default: "all"
- assay: "strep"
fields:
- parameter_name: "EMM Type"
data_type: "emm"
required: false
options: {}
Note
Assay names are case-sensitive and must exactly match the database value in
sample.pipeline.assay.For clarity in downstream systems, keep
parameter_namevalues unique within an assay.
How the Export Works
Load and select config: - The YAML is parsed into a list of
AssayConfig. - The system selects the config whoseassaymatches the sample’spipeline.assay.Per-field formatting: - For each
FieldDefinitioninconfig.fields:Resolve the formatter with
get_formatter(field.data_type).Invoke it as
formatter(sample, options=field.options).The formatter returns a tuple:
(value, comment).
Error semantics (per field): - If the formatter raises:
AnalysisNotPresentError→ treat as not present: - Ifrequired=True→ abort the export by raisingValueError. - Ifrequired=False→ include a row withparameter_value = "-"(see below),comment = "not_present".AnalysisNoResultError→ analysis present but no result: - Include a row withparameter_value = "-",comment = "no_result"(even if required).Any other exception → propagate (logged as unexpected error).
If the formatter returns a value of
Noneor empty string, the system serializes it as"-".
Row construction: - Each field yields one
LimsRsResultrow with:sample_id: from the sampleparameter_name: from the fieldparameter_value: passed through an internal sanitizer: -Noneor""→"-"- otherwise →str(value)comment: formatter comment or one of"not_present"/"no_result"
Serialization: -
serialize_lims_results(results, delimiter)writes a header row followed by data rows. -delimiteris a token:"csv"(,) or"tsv"(t).
Built-in Formatters
Formatters are registered by name using a decorator:
_FORMATTERS: dict[str, Formatter] = {}
def register_formatter(name: str) -> Callable[[Formatter], Formatter]:
...
@register_formatter("mlst")
def mlst_typing(sample, *, options) -> tuple[LimsAtomic, LimsComment]: ...
The following formatter names are available by default:
"species"— Species prediction.Options: -
software:"bracken"(default) or another supported tool name -sort_by: for bracken default is"fraction_total_reads"; for mykrobe default is"species_coverage"Behavior: - Selects results for the chosen
software. - Sorts and returns the top hit’s scientific name. - If no predictions present →AnalysisNotPresentError. - If predictions list is empty →AnalysisNoResultError."qc"— QC status.Options: none.
Behavior: - Returns a capitalized QC classification (e.g.,
"Pass"/"Fail")."mlst"— MLST sequence type.Options: none.
Behavior: - Returns the
sequence_typeor the literal"novel"when appropriate. - If MLST analysis missing →AnalysisNotPresentError. - If analysis present with no value →AnalysisNoResultError."emm"— EMM type (Streptococcus).Options: none.
Behavior: - Returns the EMM type or
"novel"when appropriate. - Missing analysis →AnalysisNotPresentError."lineage"— Lineage (TBProfiler).Options: none.
Behavior: - Returns a sublineage string (e.g.,
"2.2.1")."amr"— AMR prediction for a given antibiotic.Options: -
antibiotic_name: e.g.,"rifampicin"(default) -software: e.g.,"tbprofiler"(default) -resistance_level:"all"(default) or a specific levelBehavior: - Returns a comma-separated list of resistance variants (genes currently TODO). - If no variants match →
AnalysisNoResultError.
Note
You can add custom formatters by registering them with @register_formatter("<name>").
In the YAML, set data_type: "<name>" for fields that should use your formatter.
API Usage
Endpoint
GET /export/{sample_id}/lims
Output format: The service typically serializes as TSV by default.
If your deployment supports a query switch (e.g., ?fmt=csv|tsv), use it to select the format.
Responses (typical)
200 OK: Returns text body with header row and data rows.
404 Not Found: No configuration exists for the sample’s assay, or the sample ID is missing (implementation-dependent).
500 Internal Server Error: - Configuration parsing/formatting problems (e.g., invalid YAML, required analysis not present in a required field).
501 Not Implemented: A field references a
data_typewith no registered formatter.
Example
curl -H "Authorization: Bearer <token>" \
-o sample123_lims.tsv \
"https://api.example.com/export/sample123/lims"
CLI Usage
Command
bonsai export --sample-id <ID> [--export-cnf PATH] [--format {csv,tsv}] [OUTPUT]
Options & arguments
--sample-id, -i(required): The sample ID to export.--export-cnf, -e: Path to the YAML configuration. If provided but missing, the CLI exits with an error.--format, -f:csvortsv. Controls the token passed to the serializer.OUTPUT(positional): File to write. Defaults to-(stdout).
Behavior
Loads the configuration (load_export_config).
Matches the sample’s
pipeline.assayto anAssayConfig.Builds rows via
lims_rs_formatter(sample, config).Writes the table with:
serialize_lims_results(lims_data, delimiter=output_format) # output_format ∈ {"csv", "tsv"}
If no configuration for the assay: prints a red error and aborts.
If parsing/formatting errors occur (e.g., invalid YAML, required analysis missing): prints a yellow message and aborts.
Examples
Write TSV to a file:
bonsai export -i sample123 -f tsv results/sample123_lims.tsv
Write CSV to stdout and pipe:
bonsai export -i sample123 -f csv - | column -s, -t
Use a specific config file:
bonsai export -i sample123 -e /srv/bonsai/lims_export.yaml -f tsv sample123.tsv
Serialization Details
Delimiter selection: pass a token: -
"csv"→ comma (",") -"tsv"→ tab ("\t")Header: Always included with columns:
sample_id, parameter_name, parameter_value, comment.Missing values: Rendered as a single hyphen
"-".Quoting:
csv.QUOTE_MINIMAL.Encoding: The function returns a Python
str. When writing to files or over HTTP, ensure UTF-8 is used (typical default in modern deployments).
Error Semantics (Summary)
Per field (inside lims_rs_formatter):
AnalysisNotPresentError: - If fieldrequired=True→ abort whole export. - If fieldrequired=False→ include row withparameter_value="-",comment="not_present".AnalysisNoResultError: - Include row withparameter_value="-",comment="no_result"(does not abort).Any other exception: - Logged and re-raised.
At the integration layer:
CLI: shows colored messages and aborts on errors (e.g., config not found, invalid YAML, ValueError).
API: typically maps: - Missing config for assay → 404 - Unimplemented formatter → 501 - Invalid format / missing required analysis (ValueError) / unreadable config → 500
Extending the System
Register a new formatter
from bonsai_api.lims_export.models import LimsAtomic, LimsComment
from bonsai_api.lims_export.formatters import register_formatter, AnalysisNotPresentError, AnalysisNoResultError
@register_formatter("my_custom_type")
def my_custom_formatter(sample, *, options=None) -> tuple[LimsAtomic, LimsComment]:
# Inspect sample; raise AnalysisNotPresentError if analysis not attached
# Raise AnalysisNoResultError if analysis attached but no data
# Otherwise return (value, optional_comment)
value = "some-derived-value"
comment = ""
return value, comment
Then, reference the formatter in YAML:
- assay: "example-assay"
fields:
- parameter_name: "My Custom Field"
data_type: "my_custom_type"
required: false
options:
threshold: 0.9
Best Practices
Keep the YAML under version control; validate changes in CI.
Ensure assay names match DB values exactly.
Prefer unique
parameter_namevalues per assay to avoid confusion downstream.Use
required=Truesparingly—only for truly mandatory analyses.For user-facing consistency, reserve
commentvalues for: -"not_present"(analysis missing) -"no_result"(present but empty) - Additional comments from formatters as needed.When targeting Excel ingestion, prefer CSV; for robust pipelines, prefer TSV.