coreset-sc

A minimal implementation of the Coreset Spectral Clustering (CSC) algorithm given in this paper.

The presented method repeatedly jumps across the equivalence between the normlised cut and weighted kernel k-means problems to apply coreset methods to spectral clustering.

Combined with recent work on fast spectral clustering, this gives us a method for clustering very large graphs (millions of nodes) in seconds by only running (fast) spectral clustering on a much smaller induced subgraph. It can do so in even the most extreme case, where the number of clusters is linear in the number of nodes. See the experiments section the paper.

Basic usage:

from coreset_sc import CoresetSpectralClustering, gen_sbm
from sklearn.metrics.cluster import adjusted_rand_score

# Generate a graph from the stochastic block model
n = 1000            # number of nodes per cluster
k = 50              # number of clusters
p = 0.5             # probability of an intra-cluster edge
q = (1.0 / n) / k   # probability of an inter-cluster edge


# A is a sparse scipy CSR matrix of a symmetric adjacency graph
A,ground_truth_labels = gen_sbm(n, k, p, q)

coreset_ratio = 0.1 # fraction of the data to use for the coreset graph

csc = CoresetSpectralClustering(
    num_clusters=k, coreset_ratio=coreset_ratio
)
csc.fit(A) # sample extract and cluster the coreset graph
csc.label_full_graph() # label the rest of the graph given the coreset labels
pred_labels = csc.labels_ # get the full labels

# Alternatively, label the full graph in one line:
pred_labels = csc.fit_predict(A)
ari = adjusted_rand_score(ground_truth_labels,pred_labels)

Python Docs

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
.github/workflows		.github/workflows
.vscode		.vscode
benches		benches
docs		docs
python/coreset_sc		python/coreset_sc
src		src
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
csc.png		csc.png
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

coreset-sc

About

Releases

Packages

Languages

License

BenJourdan/coreset-sc

Folders and files

Latest commit

History

Repository files navigation

coreset-sc

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages