ALS (Spark 2.4.8 JavaDoc)

Object
- org.apache.spark.ml.PipelineStage
- - org.apache.spark.ml.Estimator<ALSModel>
  - - org.apache.spark.ml.recommendation.ALS

All Implemented Interfaces:

java.io.Serializable, Logging, Params, HasCheckpointInterval, HasMaxIter, HasPredictionCol, HasRegParam, HasSeed, ALSModelParams, ALSParams, DefaultParamsWritable, Identifiable, MLWritable
```
public class ALS
extends Estimator<ALSModel>
implements ALSParams, DefaultParamsWritable
```
Alternating Least Squares (ALS) matrix factorization.
ALS attempts to estimate the ratings matrix R as the product of two lower-rank matrices, X and Y, i.e. X * Yt = R. Typically these approximations are called 'factor' matrices. The general approach is iterative. During each iteration, one of the factor matrices is held constant, while the other is solved for using least squares. The newly-solved factor matrix is then held constant while solving for the other factor matrix.
This is a blocked implementation of the ALS factorization algorithm that groups the two sets of factors (referred to as "users" and "products") into blocks and reduces communication by only sending one copy of each user vector to each product block on each iteration, and only for the product blocks that need that user's feature vector. This is achieved by pre-computing some information about the ratings matrix to determine the "out-links" of each user (which blocks of products it will contribute to) and "in-link" information for each product (which of the feature vectors it receives from each user block it will depend on). This allows us to send only an array of feature vectors between each user block and product block, and have the product block find the users' ratings and update the products based on these messages.
For implicit preference data, the algorithm used is based on "Collaborative Filtering for Implicit Feedback Datasets", available at http://dx.doi.org/10.1109/ICDM.2008.22, adapted for the blocked approach used here.
Essentially instead of finding the low-rank approximations to the rating matrix R, this finds the approximations for a preference matrix P where the elements of P are 1 if r is greater than 0 and 0 if r is less than or equal to 0. The ratings then act as 'confidence' values related to strength of indicated user preferences rather than explicit ratings given to items.

See Also:

Serialized Form

Nested Class Summary

Nested Classes
Modifier and Type	Class and Description
`static class`	`ALS.InBlock$`
`static interface`	`ALS.LeastSquaresNESolver` Trait for least squares solvers applied to the normal equation.
`static class`	`ALS.Rating<ID>` :: DeveloperApi :: Rating class for better code readability.
`static class`	`ALS.Rating$`
`static class`	`ALS.RatingBlock$`

Constructor Summary

Constructors
Constructor and Description

ALS()

ALS(String uid)

Constructors
Constructor and Description
`ALS()`
`ALS(String uid)`

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`ALS`	`copy(ParamMap extra)` Creates a copy of this instance with the same UID and some extra params.
`ALSModel`	`fit(Dataset<?> dataset)` Fits a model to the input data.
`static ALS`	`load(String path)`
`static MLReader<T>`	`read()`
`ALS`	`setAlpha(double value)`
`ALS`	`setCheckpointInterval(int value)`
`ALS`	`setColdStartStrategy(String value)`
`ALS`	`setFinalStorageLevel(String value)`
`ALS`	`setImplicitPrefs(boolean value)`
`ALS`	`setIntermediateStorageLevel(String value)`
`ALS`	`setItemCol(String value)`
`ALS`	`setMaxIter(int value)`
`ALS`	`setNonnegative(boolean value)`
`ALS`	`setNumBlocks(int value)` Sets both numUserBlocks and numItemBlocks to the specific value.
`ALS`	`setNumItemBlocks(int value)`
`ALS`	`setNumUserBlocks(int value)`
`ALS`	`setPredictionCol(String value)`
`ALS`	`setRank(int value)`
`ALS`	`setRatingCol(String value)`
`ALS`	`setRegParam(double value)`
`ALS`	`setSeed(long value)`
`ALS`	`setUserCol(String value)`
`static <ID> scala.Tuple2<RDD<scala.Tuple2<ID,float[]>>,RDD<scala.Tuple2<ID,float[]>>>`	`train(RDD<ALS.Rating<ID>> ratings, int rank, int numUserBlocks, int numItemBlocks, int maxIter, double regParam, boolean implicitPrefs, double alpha, boolean nonnegative, StorageLevel intermediateRDDStorageLevel, StorageLevel finalRDDStorageLevel, int checkpointInterval, long seed, scala.reflect.ClassTag<ID> evidence$1, scala.math.Ordering<ID> ord)` :: DeveloperApi :: Implementation of the ALS algorithm.
`StructType`	`transformSchema(StructType schema)` :: DeveloperApi ::
`String`	`uid()` An immutable unique ID for the object and its derivatives.

Methods inherited from class org.apache.spark.ml.Estimator
fit, fit, fit, fit

Methods inherited from class Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface org.apache.spark.ml.recommendation.ALSParams
alpha, finalStorageLevel, getAlpha, getFinalStorageLevel, getImplicitPrefs, getIntermediateStorageLevel, getNonnegative, getNumItemBlocks, getNumUserBlocks, getRank, getRatingCol, implicitPrefs, intermediateStorageLevel, nonnegative, numItemBlocks, numUserBlocks, rank, ratingCol, validateAndTransformSchema

Methods inherited from interface org.apache.spark.ml.recommendation.ALSModelParams
checkedCast, coldStartStrategy, getColdStartStrategy, getItemCol, getUserCol, itemCol, userCol

Methods inherited from interface org.apache.spark.ml.param.shared.HasPredictionCol
getPredictionCol, predictionCol

Methods inherited from interface org.apache.spark.ml.param.Params
clear, copyValues, defaultCopy, defaultParamMap, explainParam, explainParams, extractParamMap, extractParamMap, get, getDefault, getOrDefault, getParam, hasDefault, hasParam, isDefined, isSet, paramMap, params, set, set, set, setDefault, setDefault, shouldOwn

Methods inherited from interface org.apache.spark.ml.util.Identifiable
toString

Methods inherited from interface org.apache.spark.ml.param.shared.HasMaxIter
getMaxIter, maxIter

Methods inherited from interface org.apache.spark.ml.param.shared.HasRegParam
getRegParam, regParam

Methods inherited from interface org.apache.spark.ml.param.shared.HasCheckpointInterval
checkpointInterval, getCheckpointInterval

Methods inherited from interface org.apache.spark.ml.param.shared.HasSeed
getSeed, seed

Methods inherited from interface org.apache.spark.ml.util.DefaultParamsWritable
write

Methods inherited from interface org.apache.spark.ml.util.MLWritable
save

Methods inherited from interface org.apache.spark.internal.Logging
initializeLogging, initializeLogIfNecessary, initializeLogIfNecessary, isTraceEnabled, log_, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning

- Constructor Detail
  - ALS
```
public ALS(String uid)
```
  - ALS
```
public ALS()
```
- Method Detail
  - load
```
public static ALS load(String path)
```
  - train
```
public static <ID> scala.Tuple2<RDD<scala.Tuple2<ID,float[]>>,RDD<scala.Tuple2<ID,float[]>>> train(RDD<ALS.Rating<ID>> ratings,
                                                                                                   int rank,
                                                                                                   int numUserBlocks,
                                                                                                   int numItemBlocks,
                                                                                                   int maxIter,
                                                                                                   double regParam,
                                                                                                   boolean implicitPrefs,
                                                                                                   double alpha,
                                                                                                   boolean nonnegative,
                                                                                                   StorageLevel intermediateRDDStorageLevel,
                                                                                                   StorageLevel finalRDDStorageLevel,
                                                                                                   int checkpointInterval,
                                                                                                   long seed,
                                                                                                   scala.reflect.ClassTag<ID> evidence$1,
                                                                                                   scala.math.Ordering<ID> ord)
```
    :: DeveloperApi :: Implementation of the ALS algorithm.
    This implementation of the ALS factorization algorithm partitions the two sets of factors among Spark workers so as to reduce network communication by only sending one copy of each factor vector to each Spark worker on each iteration, and only if needed. This is achieved by precomputing some information about the ratings matrix to determine which users require which item factors and vice versa. See the Scaladoc for InBlock for a detailed explanation of how the precomputation is done.
    In addition, since each iteration of calculating the factor matrices depends on the known ratings, which are spread across Spark partitions, a naive implementation would incur significant network communication overhead between Spark workers, as the ratings RDD would be repeatedly shuffled during each iteration. This implementation reduces that overhead by performing the shuffling operation up front, precomputing each partition's ratings dependencies and duplicating those values to the appropriate workers before starting iterations to solve for the factor matrices. See the Scaladoc for OutBlock for a detailed explanation of how the precomputation is done.
    Note that the term "rating block" is a bit of a misnomer, as the ratings are not partitioned by contiguous blocks from the ratings matrix but by a hash function on the rating's location in the matrix. If it helps you to visualize the partitions, it is easier to think of the term "block" as referring to a subset of an RDD containing the ratings rather than a contiguous submatrix of the ratings matrix.
    
    Parameters:
    
    ratings - (undocumented)
    
    rank - (undocumented)
    
    numUserBlocks - (undocumented)
    
    numItemBlocks - (undocumented)
    
    maxIter - (undocumented)
    
    regParam - (undocumented)
    
    implicitPrefs - (undocumented)
    
    alpha - (undocumented)
    
    nonnegative - (undocumented)
    
    intermediateRDDStorageLevel - (undocumented)
    
    finalRDDStorageLevel - (undocumented)
    
    checkpointInterval - (undocumented)
    
    seed - (undocumented)
    
    evidence$1 - (undocumented)
    
    ord - (undocumented)
    
    Returns:
    
    (undocumented)
  - read
```
public static MLReader<T> read()
```
  - uid
```
public String uid()
```
    Description copied from interface: Identifiable
    
    An immutable unique ID for the object and its derivatives.
    
    Specified by:
    
    uid in interface Identifiable
    
    Returns:
    
    (undocumented)
  - setRank
```
public ALS setRank(int value)
```
  - setNumUserBlocks
```
public ALS setNumUserBlocks(int value)
```
  - setNumItemBlocks
```
public ALS setNumItemBlocks(int value)
```
  - setImplicitPrefs
```
public ALS setImplicitPrefs(boolean value)
```
  - setAlpha
```
public ALS setAlpha(double value)
```
  - setUserCol
```
public ALS setUserCol(String value)
```
  - setItemCol
```
public ALS setItemCol(String value)
```
  - setRatingCol
```
public ALS setRatingCol(String value)
```
  - setPredictionCol
```
public ALS setPredictionCol(String value)
```
  - setMaxIter
```
public ALS setMaxIter(int value)
```
  - setRegParam
```
public ALS setRegParam(double value)
```
  - setNonnegative
```
public ALS setNonnegative(boolean value)
```
  - setCheckpointInterval
```
public ALS setCheckpointInterval(int value)
```
  - setSeed
```
public ALS setSeed(long value)
```
  - setIntermediateStorageLevel
```
public ALS setIntermediateStorageLevel(String value)
```
  - setFinalStorageLevel
```
public ALS setFinalStorageLevel(String value)
```
  - setColdStartStrategy
```
public ALS setColdStartStrategy(String value)
```
  - setNumBlocks
```
public ALS setNumBlocks(int value)
```
    Sets both numUserBlocks and numItemBlocks to the specific value.
    
    Parameters:
    
    value - (undocumented)
    
    Returns:
    
    (undocumented)
  - fit
```
public ALSModel fit(Dataset<?> dataset)
```
    Description copied from class: Estimator
    
    Fits a model to the input data.
    
    Specified by:
    
    fit in class Estimator<ALSModel>
    
    Parameters:
    
    dataset - (undocumented)
    
    Returns:
    
    (undocumented)
  - transformSchema
```
public StructType transformSchema(StructType schema)
```
    Description copied from class: PipelineStage
    
    :: DeveloperApi ::
    Check transform validity and derive the output schema from the input schema.
    We check validity for interactions between parameters during transformSchema and raise an exception if any parameter value is invalid. Parameter value checks which do not depend on other parameters are handled by Param.validate().
    Typical implementation should first conduct verification on schema change and parameter validity, including complex parameter interaction checks.
    
    Specified by:
    
    transformSchema in class PipelineStage
    
    Parameters:
    
    schema - (undocumented)
    
    Returns:
    
    (undocumented)
  - copy
```
public ALS copy(ParamMap extra)
```
    Description copied from interface: Params
    
    Creates a copy of this instance with the same UID and some extra params. Subclasses should implement this method and set the return type properly. See defaultCopy().
    
    Specified by:
    
    copy in interface Params
    
    Specified by:
    
    copy in class Estimator<ALSModel>
    
    Parameters:
    
    extra - (undocumented)
    
    Returns:
    
    (undocumented)

Class ALS

Nested Class Summary

Constructor Summary

Method Summary

Methods inherited from class org.apache.spark.ml.Estimator

Methods inherited from class Object

Methods inherited from interface org.apache.spark.ml.recommendation.ALSParams

Methods inherited from interface org.apache.spark.ml.recommendation.ALSModelParams

Methods inherited from interface org.apache.spark.ml.param.shared.HasPredictionCol

Methods inherited from interface org.apache.spark.ml.param.Params

Methods inherited from interface org.apache.spark.ml.util.Identifiable

Methods inherited from interface org.apache.spark.ml.param.shared.HasMaxIter

Methods inherited from interface org.apache.spark.ml.param.shared.HasRegParam

Methods inherited from interface org.apache.spark.ml.param.shared.HasCheckpointInterval

Methods inherited from interface org.apache.spark.ml.param.shared.HasSeed

Methods inherited from interface org.apache.spark.ml.util.DefaultParamsWritable

Methods inherited from interface org.apache.spark.ml.util.MLWritable

Methods inherited from interface org.apache.spark.internal.Logging

Constructor Detail

ALS

ALS

Method Detail

load

train

read

uid

setRank

setNumUserBlocks

setNumItemBlocks

setImplicitPrefs

setAlpha

setUserCol

setItemCol

setRatingCol

setPredictionCol

setMaxIter

setRegParam

setNonnegative

setCheckpointInterval

setSeed

setIntermediateStorageLevel

setFinalStorageLevel

setColdStartStrategy

setNumBlocks

fit

transformSchema

copy