MinHash clustering
The MinHash clustering service subscribes to the Redis database and listens for jobs on the minhash queue.
Tasks
- minhash_service.tasks.add_signature(sample_id: str, signature: list[dict[str, str | list[dict[str, int | list[int]]]]]) str
Find signatures similar to reference signature.
- minhash_service.tasks.remove_signature(sample_id: str) dict[str, str | bool]
Remove a signature from the database and index.
- minhash_service.tasks.add_to_index(sample_ids: list[str]) str
Add signatures to sourmash SBT index.
- Parameters:
list[str] (sample_ids) – The path to multiple signature files
- Returns:
result message
- Return type:
- minhash_service.tasks.remove_from_index(sample_ids: list[str]) str
Remove signatures from sourmash SBT index.
- Parameters:
list[str] (sample_ids) – Sample ids of signatures to remove
- Returns:
result message
- Return type:
- minhash_service.tasks.similar(sample_id: str, min_similarity: float = 0.5, limit: int | None = None) list[SimilarSignature]
Find signatures similar to reference signature.
- Parameters:
str (sample_id) – The id of reference sample
float (min_similarity) – Minimum similarity score
None (limit int |) – Limit the result to x samples, default to None
- Returns:
list of the similar signatures
- Return type:
SimilarSignatures
- minhash_service.tasks.cluster(sample_ids: list[str], cluster_method: str = 'single') str
Cluster multiple sample on their sourmash signatures.
- Parameters:
list[str] (sample_ids) – The sample ids to cluster
int (cluster_method) – The linkage or clustering method to use, default to single
- Raises:
ValueError – raises an exception if the method is not a valid MSTree clustering method.
- Returns:
clustering result in newick format
- Return type:
- minhash_service.tasks.find_similar_and_cluster(sample_id: str, min_similarity: float = 0.5, limit: int | None = None, cluster_method: str = 'single') str
Find similar samples and cluster them on their minhash profile.
- Parameters:
str (sample_id) – The id of reference sample
float (min_similarity) – Minimum similarity score
None (limit int |) – Limit the result to x samples, default to None
int (cluster_method) – The linkage or clustering method to use, default to single
- Raises:
ValueError – raises an exception if the method is not a valid MSTree clustering method.
- Returns:
clustering result in newick format
- Return type: