InsertIntoParquetTable (Spark 1.2.2 JavaDoc)

All Classes

Summary:
Nested |
Field |
Constr |
Method

Detail:
Field |
Constr |
Method

Object
- org.apache.spark.sql.catalyst.trees.TreeNode<PlanType>
- - org.apache.spark.sql.catalyst.plans.QueryPlan<SparkPlan>
  - - org.apache.spark.sql.execution.SparkPlan
    - - org.apache.spark.sql.parquet.InsertIntoParquetTable

All Implemented Interfaces:

java.io.Serializable, Logging, SparkHadoopMapReduceUtil, org.apache.spark.sql.catalyst.trees.UnaryNode<SparkPlan>, scala.Equals, scala.Product
```
public class InsertIntoParquetTable
extends SparkPlan
implements UnaryNode, SparkHadoopMapReduceUtil, scala.Product, scala.Serializable
```
:: DeveloperApi :: Operator that acts as a sink for queries on RDDs and can be used to store the output inside a directory of Parquet files. This operator is similar to Hive's INSERT INTO TABLE operation in the sense that one can choose to either overwrite or append to a directory. Note that consecutive insertions to the same table must have compatible (source) schemas.
WARNING: EXPERIMENTAL! InsertIntoParquetTable with overwrite=false may cause data corruption in the case that multiple users try to append to the same table simultaneously. Inserting into a table that was previously generated by other means (e.g., by creating an HDFS directory and importing Parquet files generated by other tools) may cause unpredicted behaviour and therefore results in a RuntimeException (only detected via filename pattern so will not catch all cases).

See Also:
Serialized Form

Constructor Summary

Constructors
Constructor and Description

InsertIntoParquetTable(ParquetRelation relation, SparkPlan child, boolean overwrite)

Method Summary

Methods
Modifier and Type	Method and Description
`SparkPlan`	`child()`
`RDD<org.apache.spark.sql.catalyst.expressions.Row>`	`execute()` Inserts all rows into the Parquet file.
`scala.collection.Seq<org.apache.spark.sql.catalyst.expressions.Attribute>`	`output()`
`boolean`	`overwrite()`
`ParquetRelation`	`relation()`

Methods inherited from class org.apache.spark.sql.execution.SparkPlan
codegenEnabled, executeCollect, makeCopy, outputPartitioning, requiredChildDistribution

Methods inherited from class org.apache.spark.sql.catalyst.plans.QueryPlan
expressions, inputSet, missingInput, org$apache$spark$sql$catalyst$plans$QueryPlan$$transformExpressionDown$1, org$apache$spark$sql$catalyst$plans$QueryPlan$$transformExpressionUp$1, outputSet, printSchema, references, schema, schemaString, simpleString, statePrefix, transformAllExpressions, transformExpressions, transformExpressionsDown, transformExpressionsUp

Methods inherited from class org.apache.spark.sql.catalyst.trees.TreeNode
apply, argString, asCode, children, collect, fastEquals, flatMap, foreach, generateTreeString, getNodeNumbered, map, mapChildren, nodeName, numberedTreeString, otherCopyArgs, stringArgs, toString, transform, transformChildrenDown, transformChildrenUp, transformDown, transformUp, treeString, withNewChildren

Methods inherited from class Object
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait

Methods inherited from interface org.apache.spark.sql.execution.UnaryNode
outputPartitioning

Methods inherited from interface org.apache.spark.sql.catalyst.trees.UnaryNode
children

Methods inherited from interface org.apache.spark.mapreduce.SparkHadoopMapReduceUtil
firstAvailableClass, newJobContext, newTaskAttemptContext, newTaskAttemptID

Methods inherited from interface scala.Product
productArity, productElement, productIterator, productPrefix

Methods inherited from interface scala.Equals
canEqual, equals

Methods inherited from interface org.apache.spark.Logging
initializeIfNecessary, initializeLogging, isTraceEnabled, log_, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning

Constructor Detail

InsertIntoParquetTable

public InsertIntoParquetTable(ParquetRelation relation,
                      SparkPlan child,
                      boolean overwrite)

Method Detail
- relation
```
public ParquetRelation relation()
```
- child
```
public SparkPlan child()
```
  Specified by:
  
  child in interface org.apache.spark.sql.catalyst.trees.UnaryNode<SparkPlan>
- overwrite
```
public boolean overwrite()
```
- execute
```
public RDD<org.apache.spark.sql.catalyst.expressions.Row> execute()
```
  Inserts all rows into the Parquet file.
  
  Specified by:
  
  execute in class SparkPlan
- output
```
public scala.collection.Seq<org.apache.spark.sql.catalyst.expressions.Attribute> output()
```
  Specified by:
  
  output in class org.apache.spark.sql.catalyst.plans.QueryPlan<SparkPlan>

All Classes

Summary:
Nested |
Field |
Constr |
Method

Detail:
Field |
Constr |
Method