DistributedLDAModel (Spark 2.4.8 JavaDoc)

Object
- org.apache.spark.ml.PipelineStage
- - org.apache.spark.ml.Transformer
  - - org.apache.spark.ml.Model<LDAModel>
    - - org.apache.spark.ml.clustering.LDAModel
      - org.apache.spark.ml.clustering.DistributedLDAModel

All Implemented Interfaces:

java.io.Serializable, Logging, LDAParams, Params, HasCheckpointInterval, HasFeaturesCol, HasMaxIter, HasSeed, Identifiable, MLWritable
```
public class DistributedLDAModel
extends LDAModel
```
Distributed model fitted by LDA. This type of model is currently only produced by Expectation-Maximization (EM).
This model stores the inferred topics, the full training dataset, and the topic distribution for each training document.
param: oldLocalModelOption Used to implement oldLocalModel as a lazy val, but keeping copy() cheap.

See Also:

Serialized Form

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`DistributedLDAModel`	`copy(ParamMap extra)` Creates a copy of this instance with the same UID and some extra params.
`void`	`deleteCheckpointFiles()` :: DeveloperApi ::
`String[]`	`getCheckpointFiles()` :: DeveloperApi ::
`boolean`	`isDistributed()` Indicates whether this instance is of type `DistributedLDAModel`
`static DistributedLDAModel`	`load(String path)`
`double`	`logPrior()` Log probability of the current parameter estimate: log P(topics, topic distributions for docs \| Dirichlet hyperparameters)
`static MLReader<DistributedLDAModel>`	`read()`
`LocalLDAModel`	`toLocal()` Convert this distributed model to a local representation.
`double`	`trainingLogLikelihood()` Log likelihood of the observed tokens in the training set, given the current parameter estimates: log P(docs \| topics, topic distributions for docs, Dirichlet hyperparameters)
`MLWriter`	`write()` Returns an `MLWriter` instance for this ML instance.

Methods inherited from class org.apache.spark.ml.clustering.LDAModel
describeTopics, describeTopics, estimatedDocConcentration, logLikelihood, logPerplexity, setFeaturesCol, setSeed, setTopicDistributionCol, topicsMatrix, transform, transformSchema, uid, vocabSize

Methods inherited from class org.apache.spark.ml.Model
hasParent, parent, setParent

Methods inherited from class org.apache.spark.ml.Transformer
transform, transform, transform

Methods inherited from class Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface org.apache.spark.ml.clustering.LDAParams
docConcentration, getDocConcentration, getK, getKeepLastCheckpoint, getLearningDecay, getLearningOffset, getOldDocConcentration, getOldOptimizer, getOldTopicConcentration, getOptimizeDocConcentration, getOptimizer, getSubsamplingRate, getTopicConcentration, getTopicDistributionCol, k, keepLastCheckpoint, learningDecay, learningOffset, optimizeDocConcentration, optimizer, subsamplingRate, supportedOptimizers, topicConcentration, topicDistributionCol, validateAndTransformSchema

Methods inherited from interface org.apache.spark.ml.param.shared.HasFeaturesCol
featuresCol, getFeaturesCol

Methods inherited from interface org.apache.spark.ml.param.shared.HasMaxIter
getMaxIter, maxIter

Methods inherited from interface org.apache.spark.ml.param.shared.HasSeed
getSeed, seed

Methods inherited from interface org.apache.spark.ml.param.shared.HasCheckpointInterval
checkpointInterval, getCheckpointInterval

Methods inherited from interface org.apache.spark.ml.param.Params
clear, copyValues, defaultCopy, defaultParamMap, explainParam, explainParams, extractParamMap, extractParamMap, get, getDefault, getOrDefault, getParam, hasDefault, hasParam, isDefined, isSet, paramMap, params, set, set, set, setDefault, setDefault, shouldOwn

Methods inherited from interface org.apache.spark.ml.util.Identifiable
toString

Methods inherited from interface org.apache.spark.internal.Logging
initializeLogging, initializeLogIfNecessary, initializeLogIfNecessary, isTraceEnabled, log_, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning

Methods inherited from interface org.apache.spark.ml.util.MLWritable
save

- Method Detail
  - read
```
public static MLReader<DistributedLDAModel> read()
```
  - load
```
public static DistributedLDAModel load(String path)
```
  - toLocal
```
public LocalLDAModel toLocal()
```
    Convert this distributed model to a local representation. This discards info about the training dataset.
    WARNING: This involves collecting a large topicsMatrix to the driver.
    
    Returns:
    
    (undocumented)
  - copy
```
public DistributedLDAModel copy(ParamMap extra)
```
    Description copied from interface: Params
    
    Creates a copy of this instance with the same UID and some extra params. Subclasses should implement this method and set the return type properly. See defaultCopy().
    
    Specified by:
    
    copy in interface Params
    
    Specified by:
    
    copy in class Model<LDAModel>
    
    Parameters:
    
    extra - (undocumented)
    
    Returns:
    
    (undocumented)
  - isDistributed
```
public boolean isDistributed()
```
    Description copied from class: LDAModel
    
    Indicates whether this instance is of type DistributedLDAModel
    
    Specified by:
    
    isDistributed in class LDAModel
  - trainingLogLikelihood
```
public double trainingLogLikelihood()
```
    Log likelihood of the observed tokens in the training set, given the current parameter estimates: log P(docs | topics, topic distributions for docs, Dirichlet hyperparameters)
    Notes: - This excludes the prior; for that, use logPrior. - Even with logPrior, this is NOT the same as the data log likelihood given the hyperparameters. - This is computed from the topic distributions computed during training. If you call logLikelihood() on the same training dataset, the topic distributions will be computed again, possibly giving different results.
    
    Returns:
    
    (undocumented)
  - logPrior
```
public double logPrior()
```
    Log probability of the current parameter estimate: log P(topics, topic distributions for docs | Dirichlet hyperparameters)
    
    Returns:
    
    (undocumented)
  - getCheckpointFiles
```
public String[] getCheckpointFiles()
```
    :: DeveloperApi ::
    If using checkpointing and LDA.keepLastCheckpoint is set to true, then there may be saved checkpoint files. This method is provided so that users can manage those files.
    Note that removing the checkpoints can cause failures if a partition is lost and is needed by certain DistributedLDAModel methods. Reference counting will clean up the checkpoints when this model and derivative data go out of scope.
    
    Returns:
    
    Checkpoint files from training
  - deleteCheckpointFiles
```
public void deleteCheckpointFiles()
```
    :: DeveloperApi ::
    Remove any remaining checkpoint files from training.
    
    See Also:
    
    getCheckpointFiles
  - write
```
public MLWriter write()
```
    Description copied from interface: MLWritable
    
    Returns an MLWriter instance for this ML instance.
    
    Returns:
    
    (undocumented)

Class DistributedLDAModel

Method Summary

Methods inherited from class org.apache.spark.ml.clustering.LDAModel

Methods inherited from class org.apache.spark.ml.Model

Methods inherited from class org.apache.spark.ml.Transformer

Methods inherited from class Object

Methods inherited from interface org.apache.spark.ml.clustering.LDAParams

Methods inherited from interface org.apache.spark.ml.param.shared.HasFeaturesCol

Methods inherited from interface org.apache.spark.ml.param.shared.HasMaxIter

Methods inherited from interface org.apache.spark.ml.param.shared.HasSeed

Methods inherited from interface org.apache.spark.ml.param.shared.HasCheckpointInterval

Methods inherited from interface org.apache.spark.ml.param.Params

Methods inherited from interface org.apache.spark.ml.util.Identifiable

Methods inherited from interface org.apache.spark.internal.Logging

Methods inherited from interface org.apache.spark.ml.util.MLWritable

Method Detail

read

load

toLocal

copy

isDistributed

trainingLogLikelihood

logPrior

getCheckpointFiles

deleteCheckpointFiles

write