A bisecting k-means algorithm based on the paper "A comparison of document clustering techniques" by Steinbach, Karypis, and Kumar, with modification to fit Spark.
Clustering model produced by BisectingKMeans.
Distributed LDA model.
:: DeveloperApi ::
This class performs expectation maximization for multivariate Gaussian Mixture Models (GMMs).
Multivariate Gaussian Mixture Model (GMM) consisting of k Gaussians, where points are drawn from each Gaussian i=1.
K-means clustering with support for multiple parallel runs and a k-means++ like initialization mode (the k-means|| algorithm by Bahmani et al).
A clustering model for K-means.
Latent Dirichlet Allocation (LDA), a topic model designed for text documents.
Latent Dirichlet Allocation (LDA) model.
:: DeveloperApi ::
Local LDA model.
:: DeveloperApi ::
Power Iteration Clustering (PIC), a scalable graph clustering algorithm developed by Lin and Cohen.
Model produced by PowerIterationClustering.
StreamingKMeans provides methods for configuring a streaming k-means analysis, training the model on streaming, and using the model to make predictions on streaming data.
StreamingKMeansModel extends MLlib's KMeansModel for streaming algorithms, so it can keep track of a continuously updated weight associated with each cluster, and also update the model by doing a single iteration of the standard k-means algorithm.
Top-level methods for calling K-means clustering.