public class TrainValidationSplit extends Estimator<TrainValidationSplitModel> implements TrainValidationSplitParams, HasParallelism, HasCollectSubModels, MLWritable, org.apache.spark.internal.Logging
CrossValidator
, but only splits the set once.Constructor and Description |
---|
TrainValidationSplit() |
TrainValidationSplit(String uid) |
Modifier and Type | Method and Description |
---|---|
BooleanParam |
collectSubModels()
Param for whether to collect a list of sub-models trained during tuning.
|
TrainValidationSplit |
copy(ParamMap extra)
Creates a copy of this instance with the same UID and some extra params.
|
Param<Estimator<?>> |
estimator()
param for the estimator to be validated
|
Param<ParamMap[]> |
estimatorParamMaps()
param for estimator param maps
|
Param<Evaluator> |
evaluator()
param for the evaluator used to select hyper-parameters that maximize the validated metric
|
TrainValidationSplitModel |
fit(Dataset<?> dataset)
Fits a model to the input data.
|
static TrainValidationSplit |
load(String path) |
IntParam |
parallelism()
The number of threads to use when running parallel algorithms.
|
static MLReader<TrainValidationSplit> |
read() |
LongParam |
seed()
Param for random seed.
|
TrainValidationSplit |
setCollectSubModels(boolean value)
Whether to collect submodels when fitting.
|
TrainValidationSplit |
setEstimator(Estimator<?> value) |
TrainValidationSplit |
setEstimatorParamMaps(ParamMap[] value) |
TrainValidationSplit |
setEvaluator(Evaluator value) |
TrainValidationSplit |
setParallelism(int value)
Set the maximum level of parallelism to evaluate models in parallel.
|
TrainValidationSplit |
setSeed(long value) |
TrainValidationSplit |
setTrainRatio(double value) |
DoubleParam |
trainRatio()
Param for ratio between train and validation data.
|
StructType |
transformSchema(StructType schema)
Check transform validity and derive the output schema from the input schema.
|
String |
uid()
An immutable unique ID for the object and its derivatives.
|
MLWriter |
write()
Returns an
MLWriter instance for this ML instance. |
params
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
getTrainRatio
getEstimator, getEstimatorParamMaps, getEvaluator, logTuningParams, transformSchemaImpl
clear, copyValues, defaultCopy, defaultParamMap, explainParam, explainParams, extractParamMap, extractParamMap, get, getDefault, getOrDefault, getParam, hasDefault, hasParam, isDefined, isSet, paramMap, params, set, set, set, setDefault, setDefault, shouldOwn
toString
getExecutionContext, getParallelism
getCollectSubModels
save
$init$, initializeForcefully, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, initLock, isTraceEnabled, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning, org$apache$spark$internal$Logging$$log__$eq, org$apache$spark$internal$Logging$$log_, uninitialize
public TrainValidationSplit(String uid)
public TrainValidationSplit()
public static MLReader<TrainValidationSplit> read()
public static TrainValidationSplit load(String path)
public final BooleanParam collectSubModels()
HasCollectSubModels
collectSubModels
in interface HasCollectSubModels
public IntParam parallelism()
HasParallelism
parallelism
in interface HasParallelism
public DoubleParam trainRatio()
TrainValidationSplitParams
trainRatio
in interface TrainValidationSplitParams
public Param<Estimator<?>> estimator()
ValidatorParams
estimator
in interface ValidatorParams
public Param<ParamMap[]> estimatorParamMaps()
ValidatorParams
estimatorParamMaps
in interface ValidatorParams
public Param<Evaluator> evaluator()
ValidatorParams
evaluator
in interface ValidatorParams
public final LongParam seed()
HasSeed
public String uid()
Identifiable
uid
in interface Identifiable
public TrainValidationSplit setEstimator(Estimator<?> value)
public TrainValidationSplit setEstimatorParamMaps(ParamMap[] value)
public TrainValidationSplit setEvaluator(Evaluator value)
public TrainValidationSplit setTrainRatio(double value)
public TrainValidationSplit setSeed(long value)
public TrainValidationSplit setParallelism(int value)
value
- (undocumented)public TrainValidationSplit setCollectSubModels(boolean value)
Note: If set this param, when you save the returned model, you can set an option
"persistSubModels" to be "true" before saving, in order to save these submodels.
You can check documents of
TrainValidationSplitModel.TrainValidationSplitModelWriter
for more information.
value
- (undocumented)public TrainValidationSplitModel fit(Dataset<?> dataset)
Estimator
fit
in class Estimator<TrainValidationSplitModel>
dataset
- (undocumented)public StructType transformSchema(StructType schema)
PipelineStage
We check validity for interactions between parameters during transformSchema
and
raise an exception if any parameter value is invalid. Parameter value checks which
do not depend on other parameters are handled by Param.validate()
.
Typical implementation should first conduct verification on schema change and parameter validity, including complex parameter interaction checks.
transformSchema
in class PipelineStage
schema
- (undocumented)public TrainValidationSplit copy(ParamMap extra)
Params
defaultCopy()
.copy
in interface Params
copy
in class Estimator<TrainValidationSplitModel>
extra
- (undocumented)public MLWriter write()
MLWritable
MLWriter
instance for this ML instance.write
in interface MLWritable