public class NaiveBayes extends Object implements scala.Serializable, Logging
(label, features)
pairs.
This is the Multinomial NB (http://tinyurl.com/lsdw6p
) which can handle all kinds of
discrete data. For example, by converting documents into TF-IDF vectors, it can be used for
document classification. By making every vector a 0-1 vector, it can also be used as
Bernoulli NB (http://tinyurl.com/p7c96j6
).
Constructor and Description |
---|
NaiveBayes() |
Modifier and Type | Method and Description |
---|---|
NaiveBayesModel |
run(RDD<LabeledPoint> data)
Run the algorithm with the configured parameters on an input RDD of LabeledPoint entries.
|
NaiveBayes |
setLambda(double lambda)
Set the smoothing parameter.
|
static NaiveBayesModel |
train(RDD<LabeledPoint> input)
Trains a Naive Bayes model given an RDD of
(label, features) pairs. |
static NaiveBayesModel |
train(RDD<LabeledPoint> input,
double lambda)
Trains a Naive Bayes model given an RDD of
(label, features) pairs. |
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
initialized, initializeIfNecessary, initializeLogging, initLock, isTraceEnabled, log_, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logTrace, logTrace, logWarning, logWarning
public static NaiveBayesModel train(RDD<LabeledPoint> input)
(label, features)
pairs.
This is the Multinomial NB (http://tinyurl.com/lsdw6p
) which can handle all kinds of
discrete data. For example, by converting documents into TF-IDF vectors, it can be used for
document classification. By making every vector a 0-1 vector, it can also be used as
Bernoulli NB (http://tinyurl.com/p7c96j6
).
This version of the method uses a default smoothing parameter of 1.0.
input
- RDD of (label, array of features)
pairs. Every vector should be a frequency
vector or a count vector.public static NaiveBayesModel train(RDD<LabeledPoint> input, double lambda)
(label, features)
pairs.
This is the Multinomial NB (http://tinyurl.com/lsdw6p
) which can handle all kinds of
discrete data. For example, by converting documents into TF-IDF vectors, it can be used for
document classification. By making every vector a 0-1 vector, it can also be used as
Bernoulli NB (http://tinyurl.com/p7c96j6
).
input
- RDD of (label, array of features)
pairs. Every vector should be a frequency
vector or a count vector.lambda
- The smoothing parameterpublic NaiveBayes setLambda(double lambda)
public NaiveBayesModel run(RDD<LabeledPoint> data)
data
- RDD of LabeledPoint
.