public class DistributedLDAModel extends LDAModel
LDA
.
This type of model is currently only produced by Expectation-Maximization (EM).
This model stores the inferred topics, the full training dataset, and the topic distribution for each training document.
param: oldLocalModelOption Used to implement oldLocalModel
as a lazy val, but keeping
copy()
cheap.
Modifier and Type | Method and Description |
---|---|
DistributedLDAModel |
copy(ParamMap extra)
Creates a copy of this instance with the same UID and some extra params.
|
void |
deleteCheckpointFiles()
:: DeveloperApi ::
|
String[] |
getCheckpointFiles()
:: DeveloperApi ::
|
boolean |
isDistributed()
Indicates whether this instance is of type
DistributedLDAModel |
static DistributedLDAModel |
load(String path) |
double |
logPrior()
Log probability of the current parameter estimate:
log P(topics, topic distributions for docs | Dirichlet hyperparameters)
|
static MLReader<DistributedLDAModel> |
read() |
LocalLDAModel |
toLocal()
Convert this distributed model to a local representation.
|
double |
trainingLogLikelihood()
Log likelihood of the observed tokens in the training set,
given the current parameter estimates:
log P(docs | topics, topic distributions for docs, Dirichlet hyperparameters)
|
MLWriter |
write()
Returns an
MLWriter instance for this ML instance. |
describeTopics, describeTopics, estimatedDocConcentration, logLikelihood, logPerplexity, setFeaturesCol, setSeed, setTopicDistributionCol, topicsMatrix, transform, transformSchema, uid, vocabSize
transform, transform, transform
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
docConcentration, getDocConcentration, getK, getKeepLastCheckpoint, getLearningDecay, getLearningOffset, getOldDocConcentration, getOldOptimizer, getOldTopicConcentration, getOptimizeDocConcentration, getOptimizer, getSubsamplingRate, getTopicConcentration, getTopicDistributionCol, k, keepLastCheckpoint, learningDecay, learningOffset, optimizeDocConcentration, optimizer, subsamplingRate, supportedOptimizers, topicConcentration, topicDistributionCol, validateAndTransformSchema
featuresCol, getFeaturesCol
getMaxIter, maxIter
checkpointInterval, getCheckpointInterval
clear, copyValues, defaultCopy, defaultParamMap, explainParam, explainParams, extractParamMap, extractParamMap, get, getDefault, getOrDefault, getParam, hasDefault, hasParam, isDefined, isSet, paramMap, params, set, set, set, setDefault, setDefault, shouldOwn
toString
initializeLogging, initializeLogIfNecessary, initializeLogIfNecessary, isTraceEnabled, log_, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning
save
public static MLReader<DistributedLDAModel> read()
public static DistributedLDAModel load(String path)
public LocalLDAModel toLocal()
WARNING: This involves collecting a large topicsMatrix
to the driver.
public DistributedLDAModel copy(ParamMap extra)
Params
defaultCopy()
.public boolean isDistributed()
LDAModel
DistributedLDAModel
isDistributed
in class LDAModel
public double trainingLogLikelihood()
Notes:
- This excludes the prior; for that, use logPrior
.
- Even with logPrior
, this is NOT the same as the data log likelihood given the
hyperparameters.
- This is computed from the topic distributions computed during training. If you call
logLikelihood()
on the same training dataset, the topic distributions will be computed
again, possibly giving different results.
public double logPrior()
public String[] getCheckpointFiles()
If using checkpointing and LDA.keepLastCheckpoint
is set to true, then there may be
saved checkpoint files. This method is provided so that users can manage those files.
Note that removing the checkpoints can cause failures if a partition is lost and is needed
by certain DistributedLDAModel
methods. Reference counting will clean up the checkpoints
when this model and derivative data go out of scope.
public void deleteCheckpointFiles()
Remove any remaining checkpoint files from training.
getCheckpointFiles
public MLWriter write()
MLWritable
MLWriter
instance for this ML instance.