HadoopMapReduceCommitProtocol (Spark 2.4.8 JavaDoc)

Object
- org.apache.spark.internal.io.FileCommitProtocol
- - org.apache.spark.internal.io.HadoopMapReduceCommitProtocol

All Implemented Interfaces:

java.io.Serializable, Logging

Direct Known Subclasses:

HadoopMapRedCommitProtocol
```
public class HadoopMapReduceCommitProtocol
extends FileCommitProtocol
implements scala.Serializable, Logging
```
An FileCommitProtocol implementation backed by an underlying Hadoop OutputCommitter (from the newer mapreduce API, not the old mapred API).
Unlike Hadoop's OutputCommitter, this implementation is serializable.
param: jobId the job's or stage's id param: path the job's output path, or null if committer acts as a noop param: dynamicPartitionOverwrite If true, Spark will overwrite partition directories at runtime dynamically, i.e., we first write files under a staging directory with partition path, e.g. /path/to/staging/a=1/b=1/xxx.parquet. When committing the job, we first clean up the corresponding partition directories at destination path, e.g. /path/to/destination/a=1/b=1, and move files from staging directory to the corresponding partition directories under destination path.

See Also:

Serialized Form

Nested Class Summary
- Nested classes/interfaces inherited from class org.apache.spark.internal.io.FileCommitProtocol
  FileCommitProtocol.EmptyTaskCommitMessage$, FileCommitProtocol.TaskCommitMessage

Constructor Summary

Constructors
Constructor and Description
`HadoopMapReduceCommitProtocol(String jobId, String path, boolean dynamicPartitionOverwrite)`

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`void`	`abortJob(org.apache.hadoop.mapreduce.JobContext jobContext)` Aborts a job after the writes fail.
`void`	`abortTask(org.apache.hadoop.mapreduce.TaskAttemptContext taskContext)` Aborts a task after the writes have failed.
`void`	`commitJob(org.apache.hadoop.mapreduce.JobContext jobContext, scala.collection.Seq<FileCommitProtocol.TaskCommitMessage> taskCommits)` Commits a job after the writes succeed.
`FileCommitProtocol.TaskCommitMessage`	`commitTask(org.apache.hadoop.mapreduce.TaskAttemptContext taskContext)` Commits a task after the writes succeed.
`String`	`newTaskTempFile(org.apache.hadoop.mapreduce.TaskAttemptContext taskContext, scala.Option<String> dir, String ext)` Notifies the commit protocol to add a new file, and gets back the full path that should be used.
`String`	`newTaskTempFileAbsPath(org.apache.hadoop.mapreduce.TaskAttemptContext taskContext, String absoluteDir, String ext)` Similar to newTaskTempFile(), but allows files to committed to an absolute output location.
`void`	`setupJob(org.apache.hadoop.mapreduce.JobContext jobContext)` Setups up a job.
`void`	`setupTask(org.apache.hadoop.mapreduce.TaskAttemptContext taskContext)` Sets up a task within a job.

Methods inherited from class org.apache.spark.internal.io.FileCommitProtocol
deleteWithJob, instantiate, onTaskCommit

Methods inherited from class Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface org.apache.spark.internal.Logging
initializeLogging, initializeLogIfNecessary, initializeLogIfNecessary, isTraceEnabled, log_, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning

- Constructor Detail
  - HadoopMapReduceCommitProtocol
```
public HadoopMapReduceCommitProtocol(String jobId,
                                     String path,
                                     boolean dynamicPartitionOverwrite)
```
- Method Detail
  - newTaskTempFile
```
public String newTaskTempFile(org.apache.hadoop.mapreduce.TaskAttemptContext taskContext,
                              scala.Option<String> dir,
                              String ext)
```
    Description copied from class: FileCommitProtocol
    
    Notifies the commit protocol to add a new file, and gets back the full path that should be used. Must be called on the executors when running tasks.
    Note that the returned temp file may have an arbitrary path. The commit protocol only promises that the file will be at the location specified by the arguments after job commit.
    A full file path consists of the following parts: 1. the base path 2. some sub-directory within the base path, used to specify partitioning 3. file prefix, usually some unique job id with the task id 4. bucket id 5. source specific file extension, e.g. ".snappy.parquet"
    The "dir" parameter specifies 2, and "ext" parameter specifies both 4 and 5, and the rest are left to the commit protocol implementation to decide.
    Important: it is the caller's responsibility to add uniquely identifying content to "ext" if a task is going to write out multiple files to the same dir. The file commit protocol only guarantees that files written by different tasks will not conflict.
    
    Specified by:
    
    newTaskTempFile in class FileCommitProtocol
    
    Parameters:
    
    taskContext - (undocumented)
    
    dir - (undocumented)
    
    ext - (undocumented)
    
    Returns:
    
    (undocumented)
  - newTaskTempFileAbsPath
```
public String newTaskTempFileAbsPath(org.apache.hadoop.mapreduce.TaskAttemptContext taskContext,
                                     String absoluteDir,
                                     String ext)
```
    Description copied from class: FileCommitProtocol
    
    Similar to newTaskTempFile(), but allows files to committed to an absolute output location. Depending on the implementation, there may be weaker guarantees around adding files this way.
    Important: it is the caller's responsibility to add uniquely identifying content to "ext" if a task is going to write out multiple files to the same dir. The file commit protocol only guarantees that files written by different tasks will not conflict.
    
    Specified by:
    
    newTaskTempFileAbsPath in class FileCommitProtocol
    
    Parameters:
    
    taskContext - (undocumented)
    
    absoluteDir - (undocumented)
    
    ext - (undocumented)
    
    Returns:
    
    (undocumented)
  - setupJob
```
public void setupJob(org.apache.hadoop.mapreduce.JobContext jobContext)
```
    Description copied from class: FileCommitProtocol
    
    Setups up a job. Must be called on the driver before any other methods can be invoked.
    
    Specified by:
    
    setupJob in class FileCommitProtocol
    
    Parameters:
    
    jobContext - (undocumented)
  - commitJob
```
public void commitJob(org.apache.hadoop.mapreduce.JobContext jobContext,
                      scala.collection.Seq<FileCommitProtocol.TaskCommitMessage> taskCommits)
```
    Description copied from class: FileCommitProtocol
    
    Commits a job after the writes succeed. Must be called on the driver.
    
    Specified by:
    
    commitJob in class FileCommitProtocol
    
    Parameters:
    
    jobContext - (undocumented)
    
    taskCommits - (undocumented)
  - abortJob
```
public void abortJob(org.apache.hadoop.mapreduce.JobContext jobContext)
```
    Description copied from class: FileCommitProtocol
    
    Aborts a job after the writes fail. Must be called on the driver.
    Calling this function is a best-effort attempt, because it is possible that the driver just crashes (or killed) before it can call abort.
    
    Specified by:
    
    abortJob in class FileCommitProtocol
    
    Parameters:
    
    jobContext - (undocumented)
  - setupTask
```
public void setupTask(org.apache.hadoop.mapreduce.TaskAttemptContext taskContext)
```
    Description copied from class: FileCommitProtocol
    
    Sets up a task within a job. Must be called before any other task related methods can be invoked.
    
    Specified by:
    
    setupTask in class FileCommitProtocol
    
    Parameters:
    
    taskContext - (undocumented)
  - commitTask
```
public FileCommitProtocol.TaskCommitMessage commitTask(org.apache.hadoop.mapreduce.TaskAttemptContext taskContext)
```
    Description copied from class: FileCommitProtocol
    
    Commits a task after the writes succeed. Must be called on the executors when running tasks.
    
    Specified by:
    
    commitTask in class FileCommitProtocol
    
    Parameters:
    
    taskContext - (undocumented)
    
    Returns:
    
    (undocumented)
  - abortTask
```
public void abortTask(org.apache.hadoop.mapreduce.TaskAttemptContext taskContext)
```
    Description copied from class: FileCommitProtocol
    
    Aborts a task after the writes have failed. Must be called on the executors when running tasks.
    Calling this function is a best-effort attempt, because it is possible that the executor just crashes (or killed) before it can call abort.
    
    Specified by:
    
    abortTask in class FileCommitProtocol
    
    Parameters:
    
    taskContext - (undocumented)

Class HadoopMapReduceCommitProtocol

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.spark.internal.io.FileCommitProtocol

Constructor Summary

Method Summary

Methods inherited from class org.apache.spark.internal.io.FileCommitProtocol

Methods inherited from class Object

Methods inherited from interface org.apache.spark.internal.Logging

Constructor Detail

HadoopMapReduceCommitProtocol

Method Detail

newTaskTempFile

newTaskTempFileAbsPath

setupJob

commitJob

abortJob

setupTask

commitTask

abortTask