Quickstart¶
See the next tutorial, Toy Example, for a full walkthrough.
See Run Clumppling for a guide to running Clumppling with ACE-OF-Clust.
See Real Example for a demonstration on real data (PBMC3k scRNA-seq hard-clustering model comparison).
Setup¶
In [ ]:
Copied!
import ace_of_clust as aoc
print(aoc.__version__ if hasattr(aoc, '__version__') else 'import ok')
import ace_of_clust as aoc
print(aoc.__version__ if hasattr(aoc, '__version__') else 'import ok')
Run Clumppling¶
Assume your clustering results are saved as .Q matrix files (one-hot encoded for hard clustering) under cls_dir, where each row contains K membership values that sum to 1 (K is the number of clusters) and all files have the same number of rows.
In [ ]:
Copied!
align_dir = "path/to/clumppling/alignment_results"
cls_dir = "path/to/clumppling/clustering_results"
aoc.run_clumppling_via_main(
input_dir=cls_dir,
output_dir=align_dir,
fmt="generalQ",
vis=False,
extension=".Q",
)
align_dir = "path/to/clumppling/alignment_results"
cls_dir = "path/to/clumppling/clustering_results"
aoc.run_clumppling_via_main(
input_dir=cls_dir,
output_dir=align_dir,
fmt="generalQ",
vis=False,
extension=".Q",
)
Load Clumppling results¶
In [ ]:
Copied!
results = aoc.load_clumppling_results(
align_dir=align_dir,
suffix="rep",
cls_dir=cls_dir,
load_P=True,
strict_P=True, # will raise FileNotFoundError if any P file is missing; set to False to skip loading missing P files
)
results = aoc.load_clumppling_results(
align_dir=align_dir,
suffix="rep",
cls_dir=cls_dir,
load_P=True,
strict_P=True, # will raise FileNotFoundError if any P file is missing; set to False to skip loading missing P files
)
Compute pairwise mappings of clusters between modes¶
In [ ]:
Copied!
pair_mappings = aoc.extract_all_mode_pair_mappings(
mode_names=results.modes,
all_modes_alignment=results.all_modes_alignment,
alignment_acrossK=results.alignment_acrossK,
)
pair_mappings = aoc.extract_all_mode_pair_mappings(
mode_names=results.modes,
all_modes_alignment=results.all_modes_alignment,
alignment_acrossK=results.alignment_acrossK,
)
Compute feature metrics for all modes¶
In [ ]:
Copied!
features = [...] # list of features (gene) names used in Clumppling
df_by_mode = aoc.compute_feature_metrics_all_modes(results, feature_names=features)
features = [...] # list of features (gene) names used in Clumppling
df_by_mode = aoc.compute_feature_metrics_all_modes(results, feature_names=features)
Select top features (genes) by weighted_Psum quantile across modes¶
In [ ]:
Copied!
selected_by_mode, df_selected_all, overlap = aoc.select_top_features_by_weighted_Psum(
df_by_mode,
top_quantile=0.1,
)
selected_by_mode, df_selected_all, overlap = aoc.select_top_features_by_weighted_Psum(
df_by_mode,
top_quantile=0.1,
)
Visualize cluster memberships¶
Assume that you have loaded some coordinates for your data points (cells/spots) in X_coords and have provided a list of colors to be used in colors.
In [ ]:
Copied!
fig, axes = aoc.overlay_scatter_for_mode(
results,
coords=X_coords,
cluster_colors=colors[:results.K_max],
val_threshold=0.5,
s=5, alpha=0.9,
suptitle=f"Cluster Memberships",
suptitle_kwargs= {'y':0.95, 'fontsize':10},
)
fig, axes = aoc.overlay_scatter_for_mode(
results,
coords=X_coords,
cluster_colors=colors[:results.K_max],
val_threshold=0.5,
s=5, alpha=0.9,
suptitle=f"Cluster Memberships",
suptitle_kwargs= {'y':0.95, 'fontsize':10},
)