MinHash clustering

The MinHash clustering service subscribes to the Redis database and listens for jobs on the minhash queue.

Tasks

minhash_service.tasks.add_signature(sample_id: str, signature) str

Find signatures similar to reference signature.

Parameters:
  • str (sample_id) – the sample_id

  • str] (signature Dict[str,) – sourmash signature file in JSON format

Returns:

path to the signature

Return type:

str

minhash_service.tasks.remove_signature(sample_id: str) Dict[str, str | bool]

Remove a signature from the database and index.

Parameters:

str (sample_id) – the sample_id of the signature to remove

Returns:

The status of the removed job

Return type:

Dict[str, str | bool]

minhash_service.tasks.similar(sample_id: str, min_similarity: float = 0.5, limit: int | None = None) List[SimilarSignature]

Find signatures similar to reference signature.

Parameters:
  • str (sample_id) – The id of reference sample

  • float (min_similarity) – Minimum similarity score

  • None (limit int |) – Limit the result to x samples, default to None

Returns:

list of the similar signatures

Return type:

SimilarSignatures

minhash_service.tasks.cluster(sample_ids: List[str], cluster_method: str = 'single') str

Cluster multiple sample on their sourmash signatures.

Parameters:
  • List[str] (sample_ids) – The sample ids to cluster

  • int (cluster_method) – The linkage or clustering method to use, default to single

Raises:

ValueError – raises an exception if the method is not a valid MSTree clustering method.

Returns:

clustering result in newick format

Return type:

str

minhash_service.tasks.find_similar_and_cluster(sample_id: str, min_similarity: float = 0.5, limit: int | None = None, cluster_method: str = 'single') str

Find similar samples and cluster them on their minhash profile.

Parameters:
  • str (sample_id) – The id of reference sample

  • float (min_similarity) – Minimum similarity score

  • None (limit int |) – Limit the result to x samples, default to None

  • int (cluster_method) – The linkage or clustering method to use, default to single

Raises:

ValueError – raises an exception if the method is not a valid MSTree clustering method.

Returns:

clustering result in newick format

Return type:

str