LIMS Export — Configuration & Usage

This guide explains how to configure and use the LIMS export feature in the Bonsai API, covering both the HTTP API and the CLI.

Overview 

The LIMS export produces a tabular file (TSV or CSV) composed of rows with the columns:

sample_id
parameter_name
parameter_value
comment

For a given sample_id, the system looks up an assay-specific configuration and then, for each configured field, calls a formatter by its data_type to compute the value and comment.

A single field produces exactly one row. The total number of rows equals the number of fields in the matched AssayConfig. If you want to export resistance to multiple antibiotics or report motif that are expected to confere high or low level of resistance separately you have to have one field per antibiotic or resistance level.

Configuration Schema (YAML)

The export configuration is a YAML list of AssayConfig entries:

assay: str — must match sample.pipeline.assay for the sample
fields: list[FieldDefinition] — one entry per desired output row

FieldDefinition:

parameter_name: str — the label that appears in the LIMS output
data_type: str — the registered formatter name (see Built-in Formatters)
required: bool — whether the analysis must be present on the sample
options: dict — optional arguments passed to the formatter

Example 

# lims_export.yaml

- assay: "saureus"
  fields:
    - parameter_name: "Species (Bracken)"
      data_type: "species"
      required: true
      options:
        software: "bracken"             # default "bracken" | e.g. "mykrobe"
        sort_by: "fraction_total_reads" # default depends on software

    - parameter_name: "QC Status"
      data_type: "qc"
      required: true
      options: {}

    - parameter_name: "Lineage (TBProfiler)"
      data_type: "lineage"
      required: false
      options: {}

    - parameter_name: "MLST ST"
      data_type: "mlst"
      required: false
      options: {}

    - parameter_name: "Rifampicin resistance variants"
      data_type: "amr"
      required: false
      options:
        antibiotic_name: "rifampicin"
        software: "tbprofiler"          # default
        resistance_level: "all"         # include only motif with predicted resistance level, default: "all"

- assay: "strep"
  fields:
    - parameter_name: "EMM Type"
      data_type: "emm"
      required: false
      options: {}

Note

Assay names are case-sensitive and must exactly match the database value in sample.pipeline.assay.
For clarity in downstream systems, keep parameter_name values unique within an assay.

How the Export Works 

Load and select config: - The YAML is parsed into a list of AssayConfig. - The system selects the config whose assay matches the sample’s pipeline.assay.
Per-field formatting: - For each FieldDefinition in config.fields:
- Resolve the formatter with get_formatter(field.data_type).
- Invoke it as formatter(sample, options=field.options).
- The formatter returns a tuple: (value, comment).
Error semantics (per field): - If the formatter raises:
- AnalysisNotPresentError → treat as not present: - If required=True → abort the export by raising ValueError. - If required=False → include a row with parameter_value = "-" (see below),
  
  comment = "not_present".
- AnalysisNoResultError → analysis present but no result: - Include a row with parameter_value = "-", comment = "no_result" (even if required).
- Any other exception → propagate (logged as unexpected error).
- If the formatter returns a value of None or empty string, the system serializes it as "-".
Row construction: - Each field yields one LimsRsResult row with:
- sample_id: from the sample
- parameter_name: from the field
- parameter_value: passed through an internal sanitizer: - None or "" → "-" - otherwise → str(value)
- comment: formatter comment or one of "not_present" / "no_result"
Serialization: - serialize_lims_results(results, delimiter) writes a header row followed by data rows. - delimiter is a token: "csv" (,) or "tsv" (t).

Built-in Formatters 

Formatters are registered by name using a decorator:

_FORMATTERS: dict[str, Formatter] = {}

def register_formatter(name: str) -> Callable[[Formatter], Formatter]:
    ...

@register_formatter("mlst")
def mlst_typing(sample, *, options) -> tuple[LimsAtomic, LimsComment]: ...

The following formatter names are available by default:

"species" — Species prediction.

Options: - software: "bracken" (default) or another supported tool name - sort_by: for bracken default is "fraction_total_reads"; for mykrobe default is "species_coverage"

Behavior: - Selects results for the chosen software. - Sorts and returns the top hit’s scientific name. - If no predictions present → AnalysisNotPresentError. - If predictions list is empty → AnalysisNoResultError.
"qc" — QC status.

Options: none.

Behavior: - Returns a capitalized QC classification (e.g., "Pass" / "Fail").
"mlst" — MLST sequence type.

Options: none.

Behavior: - Returns the sequence_type or the literal "novel" when appropriate. - If MLST analysis missing → AnalysisNotPresentError. - If analysis present with no value → AnalysisNoResultError.
"emm" — EMM type (Streptococcus).

Options: none.

Behavior: - Returns the EMM type or "novel" when appropriate. - Missing analysis → AnalysisNotPresentError.
"lineage" — Lineage (TBProfiler).

Options: none.

Behavior: - Returns a sublineage string (e.g., "2.2.1").
"amr" — AMR prediction for a given antibiotic.

Options: - antibiotic_name: e.g., "rifampicin" (default) - software: e.g., "tbprofiler" (default) - resistance_level: "all" (default) or a specific level

Behavior: - Returns a comma-separated list of resistance variants (genes currently TODO). - If no variants match → AnalysisNoResultError.

Note

You can add custom formatters by registering them with @register_formatter("<name>"). In the YAML, set data_type: "<name>" for fields that should use your formatter.

API Usage 

Endpoint 

GET /export/{sample_id}/lims

Output format: The service typically serializes as TSV by default. If your deployment supports a query switch (e.g., ?fmt=csv|tsv), use it to select the format.

Responses (typical)

200 OK: Returns text body with header row and data rows.
404 Not Found: No configuration exists for the sample’s assay, or the sample ID is missing (implementation-dependent).
500 Internal Server Error: - Configuration parsing/formatting problems (e.g., invalid YAML, required analysis not present in a required field).
501 Not Implemented: A field references a data_type with no registered formatter.

Example 

curl -H "Authorization: Bearer <token>" \
     -o sample123_lims.tsv \
     "https://api.example.com/export/sample123/lims"

CLI Usage 

Command 

bonsai export --sample-id <ID> [--export-cnf PATH] [--format {csv,tsv}] [OUTPUT]

Options & arguments 

--sample-id, -i (required): The sample ID to export.
--export-cnf, -e: Path to the YAML configuration. If provided but missing, the CLI exits with an error.
--format, -f: csv or tsv. Controls the token passed to the serializer.
OUTPUT (positional): File to write. Defaults to - (stdout).

Behavior 

Loads the configuration (load_export_config).
Matches the sample’s pipeline.assay to an AssayConfig.
Builds rows via lims_rs_formatter(sample, config).

Writes the table with:

serialize_lims_results(lims_data, delimiter=output_format)  # output_format ∈ {"csv", "tsv"}

If no configuration for the assay: prints a red error and aborts.
If parsing/formatting errors occur (e.g., invalid YAML, required analysis missing): prints a yellow message and aborts.

Examples 

Write TSV to a file:

bonsai export -i sample123 -f tsv results/sample123_lims.tsv

Write CSV to stdout and pipe:

bonsai export -i sample123 -f csv - | column -s, -t

Use a specific config file:

bonsai export -i sample123 -e /srv/bonsai/lims_export.yaml -f tsv sample123.tsv

Serialization Details 

Delimiter selection: pass a token: - "csv" → comma (",") - "tsv" → tab ("\t")
Header: Always included with columns: sample_id, parameter_name, parameter_value, comment.
Missing values: Rendered as a single hyphen "-".
Quoting: csv.QUOTE_MINIMAL.
Encoding: The function returns a Python str. When writing to files or over HTTP, ensure UTF-8 is used (typical default in modern deployments).

Error Semantics (Summary)

Per field (inside `lims_rs_formatter`):

AnalysisNotPresentError: - If field required=True → abort whole export. - If field required=False → include row with parameter_value="-", comment="not_present".
AnalysisNoResultError: - Include row with parameter_value="-", comment="no_result" (does not abort).
Any other exception: - Logged and re-raised.

At the integration layer:

CLI: shows colored messages and aborts on errors (e.g., config not found, invalid YAML, ValueError).
API: typically maps: - Missing config for assay → 404 - Unimplemented formatter → 501 - Invalid format / missing required analysis (ValueError) / unreadable config → 500

Extending the System 

Register a new formatter 

from bonsai_api.lims_export.models import LimsAtomic, LimsComment
from bonsai_api.lims_export.formatters import register_formatter, AnalysisNotPresentError, AnalysisNoResultError

@register_formatter("my_custom_type")
def my_custom_formatter(sample, *, options=None) -> tuple[LimsAtomic, LimsComment]:
    # Inspect sample; raise AnalysisNotPresentError if analysis not attached
    # Raise AnalysisNoResultError if analysis attached but no data
    # Otherwise return (value, optional_comment)
    value = "some-derived-value"
    comment = ""
    return value, comment

Then, reference the formatter in YAML:

- assay: "example-assay"
  fields:
    - parameter_name: "My Custom Field"
      data_type: "my_custom_type"
      required: false
      options:
        threshold: 0.9

Best Practices 

Keep the YAML under version control; validate changes in CI.
Ensure assay names match DB values exactly.
Prefer unique parameter_name values per assay to avoid confusion downstream.
Use required=True sparingly—only for truly mandatory analyses.
For user-facing consistency, reserve comment values for: - "not_present" (analysis missing) - "no_result" (present but empty) - Additional comments from formatters as needed.
When targeting Excel ingestion, prefer CSV; for robust pipelines, prefer TSV.