Quickstart¶

See the next tutorial, Toy Example, for a full walkthrough.

See Run Clumppling for a guide to running Clumppling with ACE-OF-Clust.

See Real Example for a demonstration on real data (PBMC3k scRNA-seq hard-clustering model comparison).

Setup¶

In [ ]:

Copied!

import ace_of_clust as aoc
print(aoc.__version__ if hasattr(aoc, '__version__') else 'import ok')
import ace_of_clust as aoc
print(aoc.__version__ if hasattr(aoc, '__version__') else 'import ok')

Run Clumppling¶

Assume your clustering results are saved as .Q matrix files (one-hot encoded for hard clustering) under cls_dir, where each row contains K membership values that sum to 1 (K is the number of clusters) and all files have the same number of rows.

In [ ]:

Copied!





align_dir = "path/to/clumppling/alignment_results"
cls_dir = "path/to/clumppling/clustering_results"
aoc.run_clumppling_via_main(
        input_dir=cls_dir,
        output_dir=align_dir,
        fmt="generalQ",                    
        vis=False,
        extension=".Q",                         
    )
align_dir = "path/to/clumppling/alignment_results"
cls_dir = "path/to/clumppling/clustering_results"
aoc.run_clumppling_via_main(
        input_dir=cls_dir,
        output_dir=align_dir,
        fmt="generalQ",                    
        vis=False,
        extension=".Q",                         
    )

Load Clumppling results¶

In [ ]:

Copied!





results = aoc.load_clumppling_results(
    align_dir=align_dir,
    suffix="rep",
    cls_dir=cls_dir,
    load_P=True,
    strict_P=True,   # will raise FileNotFoundError if any P file is missing; set to False to skip loading missing P files
)
results = aoc.load_clumppling_results(
    align_dir=align_dir,
    suffix="rep",
    cls_dir=cls_dir,
    load_P=True,
    strict_P=True,   # will raise FileNotFoundError if any P file is missing; set to False to skip loading missing P files
)

Compute pairwise mappings of clusters between modes¶

In [ ]:

Copied!





pair_mappings = aoc.extract_all_mode_pair_mappings(
    mode_names=results.modes,
    all_modes_alignment=results.all_modes_alignment,
    alignment_acrossK=results.alignment_acrossK,
)
pair_mappings = aoc.extract_all_mode_pair_mappings(
    mode_names=results.modes,
    all_modes_alignment=results.all_modes_alignment,
    alignment_acrossK=results.alignment_acrossK,
)

Compute feature metrics for all modes¶

In [ ]:

Copied!

features = [...]  # list of features (gene) names used in Clumppling
df_by_mode = aoc.compute_feature_metrics_all_modes(results, feature_names=features)
features = [...]  # list of features (gene) names used in Clumppling
df_by_mode = aoc.compute_feature_metrics_all_modes(results, feature_names=features)

Select top features (genes) by weighted_Psum quantile across modes¶

In [ ]:

Copied!





selected_by_mode, df_selected_all, overlap = aoc.select_top_features_by_weighted_Psum(
    df_by_mode,
    top_quantile=0.1,
)
selected_by_mode, df_selected_all, overlap = aoc.select_top_features_by_weighted_Psum(
    df_by_mode,
    top_quantile=0.1,
)

Visualize cluster memberships¶

Assume that you have loaded some coordinates for your data points (cells/spots) in X_coords and have provided a list of colors to be used in colors.

In [ ]:

Copied!





fig, axes = aoc.overlay_scatter_for_mode(
    results,
    coords=X_coords,
    cluster_colors=colors[:results.K_max],
    val_threshold=0.5,
    s=5, alpha=0.9,
    suptitle=f"Cluster Memberships",
    suptitle_kwargs= {'y':0.95, 'fontsize':10},
)
fig, axes = aoc.overlay_scatter_for_mode(
    results,
    coords=X_coords,
    cluster_colors=colors[:results.K_max],
    val_threshold=0.5,
    s=5, alpha=0.9,
    suptitle=f"Cluster Memberships",
    suptitle_kwargs= {'y':0.95, 'fontsize':10},
)