DataFrame (Spark 1.6.3 JavaDoc)

Object
- org.apache.spark.sql.DataFrame

All Implemented Interfaces:: java.io.Serializable, org.apache.spark.sql.execution.Queryable

public class DataFrame
extends Object
implements org.apache.spark.sql.execution.Queryable, scala.Serializable

:: Experimental :: A distributed collection of data organized into named columns.

A DataFrame is equivalent to a relational table in Spark SQL. The following example creates a DataFrame by pointing Spark SQL to a Parquet data set.


   val people = sqlContext.read.parquet("...")  // in Scala
   DataFrame people = sqlContext.read().parquet("...")  // in Java

Once created, it can be manipulated using the various domain-specific-language (DSL) functions defined in: DataFrame (this class), Column, and functions.

To select a column from the data frame, use apply method in Scala and col in Java.


   val ageCol = people("age")  // in Scala
   Column ageCol = people.col("age")  // in Java

Note that the Column type can also be manipulated through its various functions.


   // The following creates a new column that increases everybody's age by 10.
   people("age") + 10  // in Scala
   people.col("age").plus(10);  // in Java

A more concrete example in Scala:


   // To create DataFrame using SQLContext
   val people = sqlContext.read.parquet("...")
   val department = sqlContext.read.parquet("...")

   people.filter("age > 30")
     .join(department, people("deptId") === department("id"))
     .groupBy(department("name"), "gender")
     .agg(avg(people("salary")), max(people("age")))

and in Java:


   // To create DataFrame using SQLContext
   DataFrame people = sqlContext.read().parquet("...");
   DataFrame department = sqlContext.read().parquet("...");

   people.filter("age".gt(30))
     .join(department, people.col("deptId").equalTo(department("id")))
     .groupBy(department.col("name"), "gender")
     .agg(avg(people.col("salary")), max(people.col("age")));

Since:: 1.3.0
See Also:: Serialized Form

Constructor Summary

Constructors
Constructor and Description
`DataFrame(SQLContext sqlContext, org.apache.spark.sql.catalyst.plans.logical.LogicalPlan logicalPlan)` A constructor that automatically analyzes the logical plan.

Method Summary

Methods
Modifier and Type	Method and Description
`DataFrame`	`agg(Column expr, Column... exprs)` Aggregates on the entire `DataFrame` without groups.
`DataFrame`	`agg(Column expr, scala.collection.Seq<Column> exprs)` Aggregates on the entire `DataFrame` without groups.
`DataFrame`	`agg(scala.collection.immutable.Map<String,String> exprs)` (Scala-specific) Aggregates on the entire `DataFrame` without groups.
`DataFrame`	`agg(java.util.Map<String,String> exprs)` (Java-specific) Aggregates on the entire `DataFrame` without groups.
`DataFrame`	`agg(scala.Tuple2<String,String> aggExpr, scala.collection.Seq<scala.Tuple2<String,String>> aggExprs)` (Scala-specific) Aggregates on the entire `DataFrame` without groups.
`DataFrame`	`alias(String alias)` Returns a new `DataFrame` with an alias set.
`DataFrame`	`alias(scala.Symbol alias)` (Scala-specific) Returns a new `DataFrame` with an alias set.
`Column`	`apply(String colName)` Selects column based on the column name and return it as a `Column`.
`<U> Dataset<U>`	`as(Encoder<U> evidence$1)` :: Experimental :: Converts this `DataFrame` to a strongly-typed `Dataset` containing objects of the specified type, `U`.
`DataFrame`	`as(String alias)` Returns a new `DataFrame` with an alias set.
`DataFrame`	`as(scala.Symbol alias)` (Scala-specific) Returns a new `DataFrame` with an alias set.
`DataFrame`	`cache()` Persist this `DataFrame` with the default storage level (`MEMORY_AND_DISK`).
`DataFrame`	`coalesce(int numPartitions)` Returns a new `DataFrame` that has exactly `numPartitions` partitions.
`Column`	`col(String colName)` Selects column based on the column name and return it as a `Column`.
`Row[]`	`collect()` Returns an array that contains all of `Row`s in this `DataFrame`.
`java.util.List<Row>`	`collectAsList()` Returns a Java list that contains all of `Row`s in this `DataFrame`.
`String[]`	`columns()` Returns all column names as an array.
`long`	`count()` Returns the number of rows in the `DataFrame`.
`void`	`createJDBCTable(String url, String table, boolean allowExisting)` Deprecated. As of 1.340, replaced by `write().jdbc()`. This will be removed in Spark 2.0.
`GroupedData`	`cube(Column... cols)` Create a multi-dimensional cube for the current `DataFrame` using the specified columns, so we can run aggregation on them.
`GroupedData`	`cube(scala.collection.Seq<Column> cols)` Create a multi-dimensional cube for the current `DataFrame` using the specified columns, so we can run aggregation on them.
`GroupedData`	`cube(String col1, scala.collection.Seq<String> cols)` Create a multi-dimensional cube for the current `DataFrame` using the specified columns, so we can run aggregation on them.
`GroupedData`	`cube(String col1, String... cols)` Create a multi-dimensional cube for the current `DataFrame` using the specified columns, so we can run aggregation on them.
`DataFrame`	`describe(scala.collection.Seq<String> cols)` Computes statistics for numeric columns, including count, mean, stddev, min, and max.
`DataFrame`	`describe(String... cols)` Computes statistics for numeric columns, including count, mean, stddev, min, and max.
`DataFrame`	`distinct()` Returns a new `DataFrame` that contains only the unique rows from this `DataFrame`.
`DataFrame`	`drop(Column col)` Returns a new `DataFrame` with a column dropped.
`DataFrame`	`drop(String colName)` Returns a new `DataFrame` with a column dropped.
`DataFrame`	`dropDuplicates()` Returns a new `DataFrame` that contains only the unique rows from this `DataFrame`.
`DataFrame`	`dropDuplicates(scala.collection.Seq<String> colNames)` (Scala-specific) Returns a new `DataFrame` with duplicate rows removed, considering only the subset of columns.
`DataFrame`	`dropDuplicates(String[] colNames)` Returns a new `DataFrame` with duplicate rows removed, considering only the subset of columns.
`scala.Tuple2<String,String>[]`	`dtypes()` Returns all column names and their data types as an array.
`DataFrame`	`except(DataFrame other)` Returns a new `DataFrame` containing rows in this frame but not in another frame.
`void`	`explain()` Prints the physical plan to the console for debugging purposes.
`void`	`explain(boolean extended)` Prints the plans (logical and physical) to the console for debugging purposes.
`<A extends scala.Product> DataFrame`	`explode(scala.collection.Seq<Column> input, scala.Function1<Row,scala.collection.TraversableOnce<A>> f, scala.reflect.api.TypeTags.TypeTag<A> evidence$2)` (Scala-specific) Returns a new `DataFrame` where each row has been expanded to zero or more rows by the provided function.
`<A,B> DataFrame`	`explode(String inputColumn, String outputColumn, scala.Function1<A,scala.collection.TraversableOnce<B>> f, scala.reflect.api.TypeTags.TypeTag<B> evidence$3)` (Scala-specific) Returns a new `DataFrame` where a single column has been expanded to zero or more rows by the provided function.
`DataFrame`	`filter(Column condition)` Filters rows using the given condition.
`DataFrame`	`filter(String conditionExpr)` Filters rows using the given SQL expression.
`Row`	`first()` Returns the first row.
`<R> RDD<R>`	`flatMap(scala.Function1<Row,scala.collection.TraversableOnce<R>> f, scala.reflect.ClassTag<R> evidence$5)` Returns a new RDD by first applying a function to all rows of this `DataFrame`, and then flattening the results.
`void`	`foreach(scala.Function1<Row,scala.runtime.BoxedUnit> f)` Applies a function `f` to all rows.
`void`	`foreachPartition(scala.Function1<scala.collection.Iterator<Row>,scala.runtime.BoxedUnit> f)` Applies a function f to each partition of this `DataFrame`.
`GroupedData`	`groupBy(Column... cols)` Groups the `DataFrame` using the specified columns, so we can run aggregation on them.
`GroupedData`	`groupBy(scala.collection.Seq<Column> cols)` Groups the `DataFrame` using the specified columns, so we can run aggregation on them.
`GroupedData`	`groupBy(String col1, scala.collection.Seq<String> cols)` Groups the `DataFrame` using the specified columns, so we can run aggregation on them.
`GroupedData`	`groupBy(String col1, String... cols)` Groups the `DataFrame` using the specified columns, so we can run aggregation on them.
`Row`	`head()` Returns the first row.
`Row[]`	`head(int n)` Returns the first `n` rows.
`String[]`	`inputFiles()` Returns a best-effort snapshot of the files that compose this DataFrame.
`void`	`insertInto(String tableName)` Deprecated. As of 1.4.0, replaced by `write().mode(SaveMode.Append).saveAsTable(tableName)`. This will be removed in Spark 2.0.
`void`	`insertInto(String tableName, boolean overwrite)` Deprecated. As of 1.4.0, replaced by `write().mode(SaveMode.Append\|SaveMode.Overwrite).saveAsTable(tableName)`. This will be removed in Spark 2.0.
`void`	`insertIntoJDBC(String url, String table, boolean overwrite)` Deprecated. As of 1.4.0, replaced by `write().jdbc()`. This will be removed in Spark 2.0.
`DataFrame`	`intersect(DataFrame other)` Returns a new `DataFrame` containing rows only in both this frame and another frame.
`boolean`	`isLocal()` Returns true if the `collect` and `take` methods can be run locally (without any Spark executors).
`JavaRDD<Row>`	`javaRDD()` Returns the content of the `DataFrame` as a `JavaRDD` of `Row`s.
`DataFrame`	`join(DataFrame right)` Cartesian join with another `DataFrame`.
`DataFrame`	`join(DataFrame right, Column joinExprs)` Inner join with another `DataFrame`, using the given join expression.
`DataFrame`	`join(DataFrame right, Column joinExprs, String joinType)` Join with another `DataFrame`, using the given join expression.
`DataFrame`	`join(DataFrame right, scala.collection.Seq<String> usingColumns)` Inner equi-join with another `DataFrame` using the given columns.
`DataFrame`	`join(DataFrame right, scala.collection.Seq<String> usingColumns, String joinType)` Equi-join with another `DataFrame` using the given columns.
`DataFrame`	`join(DataFrame right, String usingColumn)` Inner equi-join with another `DataFrame` using the given column.
`DataFrame`	`limit(int n)` Returns a new `DataFrame` by taking the first `n` rows.
`<R> RDD<R>`	`map(scala.Function1<Row,R> f, scala.reflect.ClassTag<R> evidence$4)` Returns a new RDD by applying a function to all rows of this DataFrame.
`<R> RDD<R>`	`mapPartitions(scala.Function1<scala.collection.Iterator<Row>,scala.collection.Iterator<R>> f, scala.reflect.ClassTag<R> evidence$6)` Returns a new RDD by applying a function to each partition of this DataFrame.
`DataFrameNaFunctions`	`na()` Returns a `DataFrameNaFunctions` for working with missing data.
`DataFrame`	`orderBy(Column... sortExprs)` Returns a new `DataFrame` sorted by the given expressions.
`DataFrame`	`orderBy(scala.collection.Seq<Column> sortExprs)` Returns a new `DataFrame` sorted by the given expressions.
`DataFrame`	`orderBy(String sortCol, scala.collection.Seq<String> sortCols)` Returns a new `DataFrame` sorted by the given expressions.
`DataFrame`	`orderBy(String sortCol, String... sortCols)` Returns a new `DataFrame` sorted by the given expressions.
`DataFrame`	`persist()` Persist this `DataFrame` with the default storage level (`MEMORY_AND_DISK`).
`DataFrame`	`persist(StorageLevel newLevel)` Persist this `DataFrame` with the given storage level.
`void`	`printSchema()` Prints the schema to the console in a nice tree format.
`org.apache.spark.sql.execution.QueryExecution`	`queryExecution()`
`DataFrame[]`	`randomSplit(double[] weights)` Randomly splits this `DataFrame` with the provided weights.
`DataFrame[]`	`randomSplit(double[] weights, long seed)` Randomly splits this `DataFrame` with the provided weights.
`RDD<Row>`	`rdd()` Represents the content of the `DataFrame` as an `RDD` of `Row`s.
`void`	`registerTempTable(String tableName)` Registers this `DataFrame` as a temporary table using the given name.
`DataFrame`	`repartition(Column... partitionExprs)` Returns a new `DataFrame` partitioned by the given partitioning expressions preserving the existing number of partitions.
`DataFrame`	`repartition(int numPartitions)` Returns a new `DataFrame` that has exactly `numPartitions` partitions.
`DataFrame`	`repartition(int numPartitions, Column... partitionExprs)` Returns a new `DataFrame` partitioned by the given partitioning expressions into `numPartitions`.
`DataFrame`	`repartition(int numPartitions, scala.collection.Seq<Column> partitionExprs)` Returns a new `DataFrame` partitioned by the given partitioning expressions into `numPartitions`.
`DataFrame`	`repartition(scala.collection.Seq<Column> partitionExprs)` Returns a new `DataFrame` partitioned by the given partitioning expressions preserving the existing number of partitions.
`GroupedData`	`rollup(Column... cols)` Create a multi-dimensional rollup for the current `DataFrame` using the specified columns, so we can run aggregation on them.
`GroupedData`	`rollup(scala.collection.Seq<Column> cols)` Create a multi-dimensional rollup for the current `DataFrame` using the specified columns, so we can run aggregation on them.
`GroupedData`	`rollup(String col1, scala.collection.Seq<String> cols)` Create a multi-dimensional rollup for the current `DataFrame` using the specified columns, so we can run aggregation on them.
`GroupedData`	`rollup(String col1, String... cols)` Create a multi-dimensional rollup for the current `DataFrame` using the specified columns, so we can run aggregation on them.
`DataFrame`	`sample(boolean withReplacement, double fraction)` Returns a new `DataFrame` by sampling a fraction of rows, using a random seed.
`DataFrame`	`sample(boolean withReplacement, double fraction, long seed)` Returns a new `DataFrame` by sampling a fraction of rows.
`void`	`save(String path)` Deprecated. As of 1.4.0, replaced by `write().save(path)`. This will be removed in Spark 2.0.
`void`	`save(String path, SaveMode mode)` Deprecated. As of 1.4.0, replaced by `write().mode(mode).save(path)`. This will be removed in Spark 2.0.
`void`	`save(String source, SaveMode mode, java.util.Map<String,String> options)` Deprecated. As of 1.4.0, replaced by `write().format(source).mode(mode).options(options).save(path)`. This will be removed in Spark 2.0.
`void`	`save(String source, SaveMode mode, scala.collection.immutable.Map<String,String> options)` Deprecated. As of 1.4.0, replaced by `write().format(source).mode(mode).options(options).save(path)`. This will be removed in Spark 2.0.
`void`	`save(String path, String source)` Deprecated. As of 1.4.0, replaced by `write().format(source).save(path)`. This will be removed in Spark 2.0.
`void`	`save(String path, String source, SaveMode mode)` Deprecated. As of 1.4.0, replaced by `write().format(source).mode(mode).save(path)`. This will be removed in Spark 2.0.
`void`	`saveAsParquetFile(String path)` Deprecated. As of 1.4.0, replaced by `write().parquet()`. This will be removed in Spark 2.0.
`void`	`saveAsTable(String tableName)` Deprecated. As of 1.4.0, replaced by `write().saveAsTable(tableName)`. This will be removed in Spark 2.0.
`void`	`saveAsTable(String tableName, SaveMode mode)` Deprecated. As of 1.4.0, replaced by `write().mode(mode).saveAsTable(tableName)`. This will be removed in Spark 2.0.
`void`	`saveAsTable(String tableName, String source)` Deprecated. As of 1.4.0, replaced by `write().format(source).saveAsTable(tableName)`. This will be removed in Spark 2.0.
`void`	`saveAsTable(String tableName, String source, SaveMode mode)` Deprecated. As of 1.4.0, replaced by `write().mode(mode).saveAsTable(tableName)`. This will be removed in Spark 2.0.
`void`	`saveAsTable(String tableName, String source, SaveMode mode, java.util.Map<String,String> options)` Deprecated. As of 1.4.0, replaced by `write().format(source).mode(mode).options(options).saveAsTable(tableName)`. This will be removed in Spark 2.0.
`void`	`saveAsTable(String tableName, String source, SaveMode mode, scala.collection.immutable.Map<String,String> options)` Deprecated. As of 1.4.0, replaced by `write().format(source).mode(mode).options(options).saveAsTable(tableName)`. This will be removed in Spark 2.0.
`StructType`	`schema()` Returns the schema of this `DataFrame`.
`DataFrame`	`select(Column... cols)` Selects a set of column based expressions.
`DataFrame`	`select(scala.collection.Seq<Column> cols)` Selects a set of column based expressions.
`DataFrame`	`select(String col, scala.collection.Seq<String> cols)` Selects a set of columns.
`DataFrame`	`select(String col, String... cols)` Selects a set of columns.
`DataFrame`	`selectExpr(scala.collection.Seq<String> exprs)` Selects a set of SQL expressions.
`DataFrame`	`selectExpr(String... exprs)` Selects a set of SQL expressions.
`void`	`show()` Displays the top 20 rows of `DataFrame` in a tabular form.
`void`	`show(boolean truncate)` Displays the top 20 rows of `DataFrame` in a tabular form.
`void`	`show(int numRows)` Displays the `DataFrame` in a tabular form.
`void`	`show(int numRows, boolean truncate)` Displays the `DataFrame` in a tabular form.
`DataFrame`	`sort(Column... sortExprs)` Returns a new `DataFrame` sorted by the given expressions.
`DataFrame`	`sort(scala.collection.Seq<Column> sortExprs)` Returns a new `DataFrame` sorted by the given expressions.
`DataFrame`	`sort(String sortCol, scala.collection.Seq<String> sortCols)` Returns a new `DataFrame` sorted by the specified column, all in ascending order.
`DataFrame`	`sort(String sortCol, String... sortCols)` Returns a new `DataFrame` sorted by the specified column, all in ascending order.
`DataFrame`	`sortWithinPartitions(Column... sortExprs)` Returns a new `DataFrame` with each partition sorted by the given expressions.
`DataFrame`	`sortWithinPartitions(scala.collection.Seq<Column> sortExprs)` Returns a new `DataFrame` with each partition sorted by the given expressions.
`DataFrame`	`sortWithinPartitions(String sortCol, scala.collection.Seq<String> sortCols)` Returns a new `DataFrame` with each partition sorted by the given expressions.
`DataFrame`	`sortWithinPartitions(String sortCol, String... sortCols)` Returns a new `DataFrame` with each partition sorted by the given expressions.
`SQLContext`	`sqlContext()`
`DataFrameStatFunctions`	`stat()` Returns a `DataFrameStatFunctions` for working statistic functions support.
`Row[]`	`take(int n)` Returns the first `n` rows in the `DataFrame`.
`java.util.List<Row>`	`takeAsList(int n)` Returns the first `n` rows in the `DataFrame` as a list.
`DataFrame`	`toDF()` Returns the object itself.
`DataFrame`	`toDF(scala.collection.Seq<String> colNames)` Returns a new `DataFrame` with columns renamed.
`DataFrame`	`toDF(String... colNames)` Returns a new `DataFrame` with columns renamed.
`JavaRDD<Row>`	`toJavaRDD()` Returns the content of the `DataFrame` as a `JavaRDD` of `Row`s.
`RDD<String>`	`toJSON()` Returns the content of the `DataFrame` as a RDD of JSON strings.
`DataFrame`	`toSchemaRDD()` Deprecated. As of 1.3.0, replaced by `toDF()`. This will be removed in Spark 2.0.
`<U> DataFrame`	`transform(scala.Function1<DataFrame,DataFrame> t)` Concise syntax for chaining custom transformations.
`DataFrame`	`unionAll(DataFrame other)` Returns a new `DataFrame` containing union of rows in this frame and another frame.
`DataFrame`	`unpersist()` Mark the `DataFrame` as non-persistent, and remove all blocks for it from memory and disk.
`DataFrame`	`unpersist(boolean blocking)` Mark the `DataFrame` as non-persistent, and remove all blocks for it from memory and disk.
`DataFrame`	`where(Column condition)` Filters rows using the given condition.
`DataFrame`	`where(String conditionExpr)` Filters rows using the given SQL expression.
`DataFrame`	`withColumn(String colName, Column col)` Returns a new `DataFrame` by adding a column or replacing the existing column that has the same name.
`DataFrame`	`withColumnRenamed(String existingName, String newName)` Returns a new `DataFrame` with a column renamed.
`DataFrameWriter`	`write()` :: Experimental :: Interface for saving the content of the `DataFrame` out into external storage.

Methods inherited from class Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface org.apache.spark.sql.execution.Queryable
formatString, formatString$default$4, showString$default$2, toString

- Constructor Detail
 - DataFrame
```
public DataFrame(SQLContext sqlContext,
 org.apache.spark.sql.catalyst.plans.logical.LogicalPlan logicalPlan)
```
 A constructor that automatically analyzes the logical plan.
 This reports error eagerly as the DataFrame is constructed, unless SQLConf.dataFrameEagerAnalysis is turned off.
 
 Parameters:
 sqlContext - (undocumented)
 logicalPlan - (undocumented)
- Method Detail
 - toDF
```
public DataFrame toDF(String... colNames)
```
 Returns a new DataFrame with columns renamed. This can be quite convenient in conversion from a RDD of tuples into a DataFrame with meaningful names. For example:
```
 val rdd: RDD[(Int, String)] = ...
 rdd.toDF() // this implicit conversion creates a DataFrame with column name _1 and _2
 rdd.toDF("id", "name") // this creates a DataFrame with column name "id" and "name"
 
```
 Parameters:
 colNames - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 1.3.0
 - sortWithinPartitions
```
public DataFrame sortWithinPartitions(String sortCol,
 String... sortCols)
```
 Returns a new DataFrame with each partition sorted by the given expressions.
 This is the same operation as "SORT BY" in SQL (Hive QL).
 
 Parameters:
 sortCol - (undocumented)
 sortCols - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 1.6.0
 - sortWithinPartitions
```
public DataFrame sortWithinPartitions(Column... sortExprs)
```
 Returns a new DataFrame with each partition sorted by the given expressions.
 This is the same operation as "SORT BY" in SQL (Hive QL).
 
 Parameters:
 sortExprs - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 1.6.0
 - sort
```
public DataFrame sort(String sortCol,
 String... sortCols)
```
 Returns a new DataFrame sorted by the specified column, all in ascending order.
```
 // The following 3 are equivalent
 df.sort("sortcol")
 df.sort($"sortcol")
 df.sort($"sortcol".asc)
 
```
 Parameters:
 sortCol - (undocumented)
 sortCols - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 1.3.0
 - sort
```
public DataFrame sort(Column... sortExprs)
```
 Returns a new DataFrame sorted by the given expressions. For example:
```
 df.sort($"col1", $"col2".desc)
 
```
 Parameters:
 sortExprs - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 1.3.0
 - orderBy
```
public DataFrame orderBy(String sortCol,
 String... sortCols)
```
 Returns a new DataFrame sorted by the given expressions. This is an alias of the sort function.
 
 Parameters:
 sortCol - (undocumented)
 sortCols - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 1.3.0
 - orderBy
```
public DataFrame orderBy(Column... sortExprs)
```
 Returns a new DataFrame sorted by the given expressions. This is an alias of the sort function.
 
 Parameters:
 sortExprs - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 1.3.0
 - select
```
public DataFrame select(Column... cols)
```
 Selects a set of column based expressions.
```
 df.select($"colA", $"colB" + 1)
 
```
 Parameters:
 cols - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 1.3.0
 - select
```
public DataFrame select(String col,
 String... cols)
```
 Selects a set of columns. This is a variant of select that can only select existing columns using column names (i.e. cannot construct expressions).
```
 // The following two are equivalent:
 df.select("colA", "colB")
 df.select($"colA", $"colB")
 
```
 Parameters:
 col - (undocumented)
 cols - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 1.3.0
 - selectExpr
```
public DataFrame selectExpr(String... exprs)
```
 Selects a set of SQL expressions. This is a variant of select that accepts SQL expressions.
```
 // The following are equivalent:
 df.selectExpr("colA", "colB as newName", "abs(colC)")
 df.select(expr("colA"), expr("colB as newName"), expr("abs(colC)"))
 
```
 Parameters:
 exprs - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 1.3.0
 - groupBy
```
public GroupedData groupBy(Column... cols)
```
 Groups the DataFrame using the specified columns, so we can run aggregation on them. See GroupedData for all the available aggregate functions.
```
 // Compute the average for all numeric columns grouped by department.
 df.groupBy($"department").avg()

 // Compute the max age and average salary, grouped by department and gender.
 df.groupBy($"department", $"gender").agg(Map(
 "salary" -> "avg",
 "age" -> "max"
 ))
 
```
 Parameters:
 cols - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 1.3.0
 - rollup
```
public GroupedData rollup(Column... cols)
```
 Create a multi-dimensional rollup for the current DataFrame using the specified columns, so we can run aggregation on them. See GroupedData for all the available aggregate functions.
```
 // Compute the average for all numeric columns rolluped by department and group.
 df.rollup($"department", $"group").avg()

 // Compute the max age and average salary, rolluped by department and gender.
 df.rollup($"department", $"gender").agg(Map(
 "salary" -> "avg",
 "age" -> "max"
 ))
 
```
 Parameters:
 cols - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 1.4.0
 - cube
```
public GroupedData cube(Column... cols)
```
 Create a multi-dimensional cube for the current DataFrame using the specified columns, so we can run aggregation on them. See GroupedData for all the available aggregate functions.
```
 // Compute the average for all numeric columns cubed by department and group.
 df.cube($"department", $"group").avg()

 // Compute the max age and average salary, cubed by department and gender.
 df.cube($"department", $"gender").agg(Map(
 "salary" -> "avg",
 "age" -> "max"
 ))
 
```
 Parameters:
 cols - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 1.4.0
 - groupBy
```
public GroupedData groupBy(String col1,
 String... cols)
```
 Groups the DataFrame using the specified columns, so we can run aggregation on them. See GroupedData for all the available aggregate functions.
 This is a variant of groupBy that can only group by existing columns using column names (i.e. cannot construct expressions).
```
 // Compute the average for all numeric columns grouped by department.
 df.groupBy("department").avg()

 // Compute the max age and average salary, grouped by department and gender.
 df.groupBy($"department", $"gender").agg(Map(
 "salary" -> "avg",
 "age" -> "max"
 ))
 
```
 Parameters:
 col1 - (undocumented)
 cols - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 1.3.0
 - rollup
```
public GroupedData rollup(String col1,
 String... cols)
```
 Create a multi-dimensional rollup for the current DataFrame using the specified columns, so we can run aggregation on them. See GroupedData for all the available aggregate functions.
 This is a variant of rollup that can only group by existing columns using column names (i.e. cannot construct expressions).
```
 // Compute the average for all numeric columns rolluped by department and group.
 df.rollup("department", "group").avg()

 // Compute the max age and average salary, rolluped by department and gender.
 df.rollup($"department", $"gender").agg(Map(
 "salary" -> "avg",
 "age" -> "max"
 ))
 
```
 Parameters:
 col1 - (undocumented)
 cols - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 1.4.0
 - cube
```
public GroupedData cube(String col1,
 String... cols)
```
 Create a multi-dimensional cube for the current DataFrame using the specified columns, so we can run aggregation on them. See GroupedData for all the available aggregate functions.
 This is a variant of cube that can only group by existing columns using column names (i.e. cannot construct expressions).
```
 // Compute the average for all numeric columns cubed by department and group.
 df.cube("department", "group").avg()

 // Compute the max age and average salary, cubed by department and gender.
 df.cube($"department", $"gender").agg(Map(
 "salary" -> "avg",
 "age" -> "max"
 ))
 
```
 Parameters:
 col1 - (undocumented)
 cols - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 1.4.0
 - agg
```
public DataFrame agg(Column expr,
 Column... exprs)
```
 Aggregates on the entire DataFrame without groups.
```
 // df.agg(...) is a shorthand for df.groupBy().agg(...)
 df.agg(max($"age"), avg($"salary"))
 df.groupBy().agg(max($"age"), avg($"salary"))
 
```
 Parameters:
 expr - (undocumented)
 exprs - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 1.3.0
 - describe
```
public DataFrame describe(String... cols)
```
 Computes statistics for numeric columns, including count, mean, stddev, min, and max. If no columns are given, this function computes statistics for all numerical columns.
 This function is meant for exploratory data analysis, as we make no guarantee about the backward compatibility of the schema of the resulting DataFrame. If you want to programmatically compute summary statistics, use the agg function instead.
```
 df.describe("age", "height").show()

 // output:
 // summary age height
 // count 10.0 10.0
 // mean 53.3 178.05
 // stddev 11.6 15.7
 // min 18.0 163.0
 // max 92.0 192.0
 
```
 Parameters:
 cols - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 1.3.1
 - repartition
```
public DataFrame repartition(int numPartitions,
 Column... partitionExprs)
```
 Returns a new DataFrame partitioned by the given partitioning expressions into numPartitions. The resulting DataFrame is hash partitioned.
 This is the same operation as "DISTRIBUTE BY" in SQL (Hive QL).
 
 Parameters:
 numPartitions - (undocumented)
 partitionExprs - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 1.6.0
 - repartition
```
public DataFrame repartition(Column... partitionExprs)
```
 Returns a new DataFrame partitioned by the given partitioning expressions preserving the existing number of partitions. The resulting DataFrame is hash partitioned.
 This is the same operation as "DISTRIBUTE BY" in SQL (Hive QL).
 
 Parameters:
 partitionExprs - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 1.6.0
 - sqlContext
```
public SQLContext sqlContext()
```
 Specified by:
 
 sqlContext in interface org.apache.spark.sql.execution.Queryable
 - queryExecution
```
public org.apache.spark.sql.execution.QueryExecution queryExecution()
```
 Specified by:
 
 queryExecution in interface org.apache.spark.sql.execution.Queryable
 - toDF
```
public DataFrame toDF()
```
 Returns the object itself.
 
 Returns:
 (undocumented)
 Since:
 
 1.3.0
 - as
```
public  Dataset as(Encoder evidence$1)
```
 :: Experimental :: Converts this DataFrame to a strongly-typed Dataset containing objects of the specified type, U.
 
 Parameters:
 evidence$1 - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 1.6.0
 - toDF
```
public DataFrame toDF(scala.collection.Seq<String> colNames)
```
 Returns a new DataFrame with columns renamed. This can be quite convenient in conversion from a RDD of tuples into a DataFrame with meaningful names. For example:
```
 val rdd: RDD[(Int, String)] = ...
 rdd.toDF() // this implicit conversion creates a DataFrame with column name _1 and _2
 rdd.toDF("id", "name") // this creates a DataFrame with column name "id" and "name"
 
```
 Parameters:
 colNames - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 1.3.0
 - schema
```
public StructType schema()
```
 Returns the schema of this DataFrame.
 
 Specified by:
 
 schema in interface org.apache.spark.sql.execution.Queryable
 
 Returns:
 (undocumented)
 Since:
 
 1.3.0
 - printSchema
```
public void printSchema()
```
 Prints the schema to the console in a nice tree format.
 
 Specified by:
 
 printSchema in interface org.apache.spark.sql.execution.Queryable
 
 Since:
 
 1.3.0
 - explain
```
public void explain(boolean extended)
```
 Prints the plans (logical and physical) to the console for debugging purposes.
 
 Specified by:
 
 explain in interface org.apache.spark.sql.execution.Queryable
 
 Parameters:
 extended - (undocumented)
 Since:
 
 1.3.0
 - explain
```
public void explain()
```
 Prints the physical plan to the console for debugging purposes.
 
 Specified by:
 
 explain in interface org.apache.spark.sql.execution.Queryable
 
 Since:
 
 1.3.0
 - dtypes
```
public scala.Tuple2<String,String>[] dtypes()
```
 Returns all column names and their data types as an array.
 
 Returns:
 (undocumented)
 Since:
 
 1.3.0
 - columns
```
public String[] columns()
```
 Returns all column names as an array.
 
 Returns:
 (undocumented)
 Since:
 
 1.3.0
 - isLocal
```
public boolean isLocal()
```
 Returns true if the collect and take methods can be run locally (without any Spark executors).
 
 Returns:
 (undocumented)
 Since:
 
 1.3.0
 - show
```
public void show(int numRows)
```
 Displays the DataFrame in a tabular form. Strings more than 20 characters will be truncated, and all cells will be aligned right. For example:
```
 year month AVG('Adj Close) MAX('Adj Close)
 1980 12 0.503218 0.595103
 1981 01 0.523289 0.570307
 1982 02 0.436504 0.475256
 1983 03 0.410516 0.442194
 1984 04 0.450090 0.483521
 
```
 Parameters:
 numRows - Number of rows to show
 Since:
 
 1.3.0
 - show
```
public void show()
```
 Displays the top 20 rows of DataFrame in a tabular form. Strings more than 20 characters will be truncated, and all cells will be aligned right.
 
 Since:
 
 1.3.0
 - show
```
public void show(boolean truncate)
```
 Displays the top 20 rows of DataFrame in a tabular form.
 
 Parameters:
 truncate - Whether truncate long strings. If true, strings more than 20 characters will be truncated and all cells will be aligned right
 Since:
 
 1.5.0
 - show
```
public void show(int numRows,
 boolean truncate)
```
 Displays the DataFrame in a tabular form. For example:
```
 year month AVG('Adj Close) MAX('Adj Close)
 1980 12 0.503218 0.595103
 1981 01 0.523289 0.570307
 1982 02 0.436504 0.475256
 1983 03 0.410516 0.442194
 1984 04 0.450090 0.483521
 
```
 Parameters:
 numRows - Number of rows to show
 truncate - Whether truncate long strings. If true, strings more than 20 characters will be truncated and all cells will be aligned right
 Since:
 
 1.5.0
 - na
```
public DataFrameNaFunctions na()
```
 Returns a DataFrameNaFunctions for working with missing data.
```
 // Dropping rows containing any null values.
 df.na.drop()
 
```
 Returns:
 (undocumented)
 Since:
 
 1.3.1
 - stat
```
public DataFrameStatFunctions stat()
```
 Returns a DataFrameStatFunctions for working statistic functions support.
```
 // Finding frequent items in column with name 'a'.
 df.stat.freqItems(Seq("a"))
 
```
 Returns:
 (undocumented)
 Since:
 
 1.4.0
 - join
```
public DataFrame join(DataFrame right)
```
 Cartesian join with another DataFrame.
 Note that cartesian joins are very expensive without an extra filter that can be pushed down.
 
 Parameters:
 right - Right side of the join operation.
 
 Returns:
 (undocumented)
 Since:
 
 1.3.0
 - join
```
public DataFrame join(DataFrame right,
 String usingColumn)
```
 Inner equi-join with another DataFrame using the given column.
 Different from other join functions, the join column will only appear once in the output, i.e. similar to SQL's JOIN USING syntax.
```
 // Joining df1 and df2 using the column "user_id"
 df1.join(df2, "user_id")
 
```
 Note that if you perform a self-join using this function without aliasing the input DataFrames, you will NOT be able to reference any columns after the join, since there is no way to disambiguate which side of the join you would like to reference.
 Parameters:
 right - Right side of the join operation.
 usingColumn - Name of the column to join on. This column must exist on both sides.
 
 Returns:
 (undocumented)
 Since:
 
 1.4.0
 - join
```
public DataFrame join(DataFrame right,
 scala.collection.Seq<String> usingColumns)
```
 Inner equi-join with another DataFrame using the given columns.
 Different from other join functions, the join columns will only appear once in the output, i.e. similar to SQL's JOIN USING syntax.
```
 // Joining df1 and df2 using the columns "user_id" and "user_name"
 df1.join(df2, Seq("user_id", "user_name"))
 
```
 Note that if you perform a self-join using this function without aliasing the input DataFrames, you will NOT be able to reference any columns after the join, since there is no way to disambiguate which side of the join you would like to reference.
 Parameters:
 right - Right side of the join operation.
 usingColumns - Names of the columns to join on. This columns must exist on both sides.
 
 Returns:
 (undocumented)
 Since:
 
 1.4.0
 - join
```
public DataFrame join(DataFrame right,
 scala.collection.Seq<String> usingColumns,
 String joinType)
```
 Equi-join with another DataFrame using the given columns.
 Different from other join functions, the join columns will only appear once in the output, i.e. similar to SQL's JOIN USING syntax.
 Note that if you perform a self-join using this function without aliasing the input DataFrames, you will NOT be able to reference any columns after the join, since there is no way to disambiguate which side of the join you would like to reference.
 
 Parameters:
 right - Right side of the join operation.
 usingColumns - Names of the columns to join on. This columns must exist on both sides.
 joinType - One of: inner, outer, left_outer, right_outer, leftsemi.
 
 Returns:
 (undocumented)
 Since:
 
 1.6.0
 - join
```
public DataFrame join(DataFrame right,
 Column joinExprs)
```
 Inner join with another DataFrame, using the given join expression.
```
 // The following two are equivalent:
 df1.join(df2, $"df1Key" === $"df2Key")
 df1.join(df2).where($"df1Key" === $"df2Key")
 
```
 Parameters:
 right - (undocumented)
 joinExprs - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 1.3.0
 - join
```
public DataFrame join(DataFrame right,
 Column joinExprs,
 String joinType)
```
 Join with another DataFrame, using the given join expression. The following performs a full outer join between df1 and df2.
```
 // Scala:
 import org.apache.spark.sql.functions._
 df1.join(df2, $"df1Key" === $"df2Key", "outer")

 // Java:
 import static org.apache.spark.sql.functions.*;
 df1.join(df2, col("df1Key").equalTo(col("df2Key")), "outer");
 
```
 Parameters:
 right - Right side of the join.
 joinExprs - Join expression.
 joinType - One of: inner, outer, left_outer, right_outer, leftsemi.
 
 Returns:
 (undocumented)
 Since:
 
 1.3.0
 - sortWithinPartitions
```
public DataFrame sortWithinPartitions(String sortCol,
 scala.collection.Seq<String> sortCols)
```
 Returns a new DataFrame with each partition sorted by the given expressions.
 This is the same operation as "SORT BY" in SQL (Hive QL).
 
 Parameters:
 sortCol - (undocumented)
 sortCols - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 1.6.0
 - sortWithinPartitions
```
public DataFrame sortWithinPartitions(scala.collection.Seq<Column> sortExprs)
```
 Returns a new DataFrame with each partition sorted by the given expressions.
 This is the same operation as "SORT BY" in SQL (Hive QL).
 
 Parameters:
 sortExprs - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 1.6.0
 - sort
```
public DataFrame sort(String sortCol,
 scala.collection.Seq<String> sortCols)
```
 Returns a new DataFrame sorted by the specified column, all in ascending order.
```
 // The following 3 are equivalent
 df.sort("sortcol")
 df.sort($"sortcol")
 df.sort($"sortcol".asc)
 
```
 Parameters:
 sortCol - (undocumented)
 sortCols - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 1.3.0
 - sort
```
public DataFrame sort(scala.collection.Seq<Column> sortExprs)
```
 Returns a new DataFrame sorted by the given expressions. For example:
```
 df.sort($"col1", $"col2".desc)
 
```
 Parameters:
 sortExprs - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 1.3.0
 - orderBy
```
public DataFrame orderBy(String sortCol,
 scala.collection.Seq<String> sortCols)
```
 Returns a new DataFrame sorted by the given expressions. This is an alias of the sort function.
 
 Parameters:
 sortCol - (undocumented)
 sortCols - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 1.3.0
 - orderBy
```
public DataFrame orderBy(scala.collection.Seq<Column> sortExprs)
```
 Returns a new DataFrame sorted by the given expressions. This is an alias of the sort function.
 
 Parameters:
 sortExprs - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 1.3.0
 - apply
```
public Column apply(String colName)
```
 Selects column based on the column name and return it as a Column. Note that the column name can also reference to a nested column like a.b.
 
 Parameters:
 colName - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 1.3.0
 - col
```
public Column col(String colName)
```
 Selects column based on the column name and return it as a Column. Note that the column name can also reference to a nested column like a.b.
 
 Parameters:
 colName - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 1.3.0
 - as
```
public DataFrame as(String alias)
```
 Returns a new DataFrame with an alias set.
 
 Parameters:
 alias - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 1.3.0
 - as
```
public DataFrame as(scala.Symbol alias)
```
 (Scala-specific) Returns a new DataFrame with an alias set.
 
 Parameters:
 alias - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 1.3.0
 - alias
```
public DataFrame alias(String alias)
```
 Returns a new DataFrame with an alias set. Same as as.
 
 Parameters:
 alias - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 1.6.0
 - alias
```
public DataFrame alias(scala.Symbol alias)
```
 (Scala-specific) Returns a new DataFrame with an alias set. Same as as.
 
 Parameters:
 alias - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 1.6.0
 - select
```
public DataFrame select(scala.collection.Seq<Column> cols)
```
 Selects a set of column based expressions.
```
 df.select($"colA", $"colB" + 1)
 
```
 Parameters:
 cols - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 1.3.0
 - select
```
public DataFrame select(String col,
 scala.collection.Seq<String> cols)
```
 Selects a set of columns. This is a variant of select that can only select existing columns using column names (i.e. cannot construct expressions).
```
 // The following two are equivalent:
 df.select("colA", "colB")
 df.select($"colA", $"colB")
 
```
 Parameters:
 col - (undocumented)
 cols - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 1.3.0
 - selectExpr
```
public DataFrame selectExpr(scala.collection.Seq<String> exprs)
```
 Selects a set of SQL expressions. This is a variant of select that accepts SQL expressions.
```
 // The following are equivalent:
 df.selectExpr("colA", "colB as newName", "abs(colC)")
 df.select(expr("colA"), expr("colB as newName"), expr("abs(colC)"))
 
```
 Parameters:
 exprs - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 1.3.0
 - filter
```
public DataFrame filter(Column condition)
```
 Filters rows using the given condition.
```
 // The following are equivalent:
 peopleDf.filter($"age" > 15)
 peopleDf.where($"age" > 15)
 
```
 Parameters:
 condition - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 1.3.0
 - filter
```
public DataFrame filter(String conditionExpr)
```
 Filters rows using the given SQL expression.
```
 peopleDf.filter("age > 15")
 
```
 Parameters:
 conditionExpr - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 1.3.0
 - where
```
public DataFrame where(Column condition)
```
 Filters rows using the given condition. This is an alias for filter.
```
 // The following are equivalent:
 peopleDf.filter($"age" > 15)
 peopleDf.where($"age" > 15)
 
```
 Parameters:
 condition - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 1.3.0
 - where
```
public DataFrame where(String conditionExpr)
```
 Filters rows using the given SQL expression.
```
 peopleDf.where("age > 15")
 
```
 Parameters:
 conditionExpr - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 1.5.0
 - groupBy
```
public GroupedData groupBy(scala.collection.Seq<Column> cols)
```
 Groups the DataFrame using the specified columns, so we can run aggregation on them. See GroupedData for all the available aggregate functions.
```
 // Compute the average for all numeric columns grouped by department.
 df.groupBy($"department").avg()

 // Compute the max age and average salary, grouped by department and gender.
 df.groupBy($"department", $"gender").agg(Map(
 "salary" -> "avg",
 "age" -> "max"
 ))
 
```
 Parameters:
 cols - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 1.3.0
 - rollup
```
public GroupedData rollup(scala.collection.Seq<Column> cols)
```
 Create a multi-dimensional rollup for the current DataFrame using the specified columns, so we can run aggregation on them. See GroupedData for all the available aggregate functions.
```
 // Compute the average for all numeric columns rolluped by department and group.
 df.rollup($"department", $"group").avg()

 // Compute the max age and average salary, rolluped by department and gender.
 df.rollup($"department", $"gender").agg(Map(
 "salary" -> "avg",
 "age" -> "max"
 ))
 
```
 Parameters:
 cols - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 1.4.0
 - cube
```
public GroupedData cube(scala.collection.Seq<Column> cols)
```
 Create a multi-dimensional cube for the current DataFrame using the specified columns, so we can run aggregation on them. See GroupedData for all the available aggregate functions.
```
 // Compute the average for all numeric columns cubed by department and group.
 df.cube($"department", $"group").avg()

 // Compute the max age and average salary, cubed by department and gender.
 df.cube($"department", $"gender").agg(Map(
 "salary" -> "avg",
 "age" -> "max"
 ))
 
```
 Parameters:
 cols - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 1.4.0
 - groupBy
```
public GroupedData groupBy(String col1,
 scala.collection.Seq<String> cols)
```
 Groups the DataFrame using the specified columns, so we can run aggregation on them. See GroupedData for all the available aggregate functions.
 This is a variant of groupBy that can only group by existing columns using column names (i.e. cannot construct expressions).
```
 // Compute the average for all numeric columns grouped by department.
 df.groupBy("department").avg()

 // Compute the max age and average salary, grouped by department and gender.
 df.groupBy($"department", $"gender").agg(Map(
 "salary" -> "avg",
 "age" -> "max"
 ))
 
```
 Parameters:
 col1 - (undocumented)
 cols - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 1.3.0
 - rollup
```
public GroupedData rollup(String col1,
 scala.collection.Seq<String> cols)
```
 Create a multi-dimensional rollup for the current DataFrame using the specified columns, so we can run aggregation on them. See GroupedData for all the available aggregate functions.
 This is a variant of rollup that can only group by existing columns using column names (i.e. cannot construct expressions).
```
 // Compute the average for all numeric columns rolluped by department and group.
 df.rollup("department", "group").avg()

 // Compute the max age and average salary, rolluped by department and gender.
 df.rollup($"department", $"gender").agg(Map(
 "salary" -> "avg",
 "age" -> "max"
 ))
 
```
 Parameters:
 col1 - (undocumented)
 cols - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 1.4.0
 - cube
```
public GroupedData cube(String col1,
 scala.collection.Seq<String> cols)
```
 Create a multi-dimensional cube for the current DataFrame using the specified columns, so we can run aggregation on them. See GroupedData for all the available aggregate functions.
 This is a variant of cube that can only group by existing columns using column names (i.e. cannot construct expressions).
```
 // Compute the average for all numeric columns cubed by department and group.
 df.cube("department", "group").avg()

 // Compute the max age and average salary, cubed by department and gender.
 df.cube($"department", $"gender").agg(Map(
 "salary" -> "avg",
 "age" -> "max"
 ))
 
```
 Parameters:
 col1 - (undocumented)
 cols - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 1.4.0
 - agg
```
public DataFrame agg(scala.Tuple2<String,String> aggExpr,
 scala.collection.Seq<scala.Tuple2<String,String>> aggExprs)
```
 (Scala-specific) Aggregates on the entire DataFrame without groups.
```
 // df.agg(...) is a shorthand for df.groupBy().agg(...)
 df.agg("age" -> "max", "salary" -> "avg")
 df.groupBy().agg("age" -> "max", "salary" -> "avg")
 
```
 Parameters:
 aggExpr - (undocumented)
 aggExprs - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 1.3.0
 - agg
```
public DataFrame agg(scala.collection.immutable.Map<String,String> exprs)
```
 (Scala-specific) Aggregates on the entire DataFrame without groups.
```
 // df.agg(...) is a shorthand for df.groupBy().agg(...)
 df.agg(Map("age" -> "max", "salary" -> "avg"))
 df.groupBy().agg(Map("age" -> "max", "salary" -> "avg"))
 
```
 Parameters:
 exprs - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 1.3.0
 - agg
```
public DataFrame agg(java.util.Map<String,String> exprs)
```
 (Java-specific) Aggregates on the entire DataFrame without groups.
```
 // df.agg(...) is a shorthand for df.groupBy().agg(...)
 df.agg(Map("age" -> "max", "salary" -> "avg"))
 df.groupBy().agg(Map("age" -> "max", "salary" -> "avg"))
 
```
 Parameters:
 exprs - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 1.3.0
 - agg
```
public DataFrame agg(Column expr,
 scala.collection.Seq<Column> exprs)
```
 Aggregates on the entire DataFrame without groups.
```
 // df.agg(...) is a shorthand for df.groupBy().agg(...)
 df.agg(max($"age"), avg($"salary"))
 df.groupBy().agg(max($"age"), avg($"salary"))
 
```
 Parameters:
 expr - (undocumented)
 exprs - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 1.3.0
 - limit
```
public DataFrame limit(int n)
```
 Returns a new DataFrame by taking the first n rows. The difference between this function and head is that head returns an array while limit returns a new DataFrame.
 
 Parameters:
 n - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 1.3.0
 - unionAll
```
public DataFrame unionAll(DataFrame other)
```
 Returns a new DataFrame containing union of rows in this frame and another frame. This is equivalent to UNION ALL in SQL.
 
 Parameters:
 other - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 1.3.0
 - intersect
```
public DataFrame intersect(DataFrame other)
```
 Returns a new DataFrame containing rows only in both this frame and another frame. This is equivalent to INTERSECT in SQL.
 
 Parameters:
 other - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 1.3.0
 - except
```
public DataFrame except(DataFrame other)
```
 Returns a new DataFrame containing rows in this frame but not in another frame. This is equivalent to EXCEPT in SQL.
 
 Parameters:
 other - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 1.3.0
 - sample
```
public DataFrame sample(boolean withReplacement,
 double fraction,
 long seed)
```
 Returns a new DataFrame by sampling a fraction of rows.
 
 Parameters:
 withReplacement - Sample with replacement or not.
 fraction - Fraction of rows to generate.
 seed - Seed for sampling.
 
 Returns:
 (undocumented)
 Since:
 
 1.3.0
 - sample
```
public DataFrame sample(boolean withReplacement,
 double fraction)
```
 Returns a new DataFrame by sampling a fraction of rows, using a random seed.
 
 Parameters:
 withReplacement - Sample with replacement or not.
 fraction - Fraction of rows to generate.
 
 Returns:
 (undocumented)
 Since:
 
 1.3.0
 - randomSplit
```
public DataFrame[] randomSplit(double[] weights,
 long seed)
```
 Randomly splits this DataFrame with the provided weights.
 
 Parameters:
 weights - weights for splits, will be normalized if they don't sum to 1.
 seed - Seed for sampling.
 
 Returns:
 (undocumented)
 Since:
 
 1.4.0
 - randomSplit
```
public DataFrame[] randomSplit(double[] weights)
```
 Randomly splits this DataFrame with the provided weights.
 
 Parameters:
 weights - weights for splits, will be normalized if they don't sum to 1.
 
 Returns:
 (undocumented)
 Since:
 
 1.4.0
 - explode
```
public <A extends scala.Product> DataFrame explode(scala.collection.Seq<Column> input,
 scala.Function1<Row,scala.collection.TraversableOnce<A>> f,
 scala.reflect.api.TypeTags.TypeTag<A> evidence$2)
```
 (Scala-specific) Returns a new DataFrame where each row has been expanded to zero or more rows by the provided function. This is similar to a LATERAL VIEW in HiveQL. The columns of the input row are implicitly joined with each row that is output by the function.
 The following example uses this function to count the number of books which contain a given word:
```
 case class Book(title: String, words: String)
 val df: RDD[Book]

 case class Word(word: String)
 val allWords = df.explode('words) {
 case Row(words: String) => words.split(" ").map(Word(_))
 }

 val bookCountPerWord = allWords.groupBy("word").agg(countDistinct("title"))
 
```
 Parameters:
 input - (undocumented)
 f - (undocumented)
 evidence$2 - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 1.3.0
 - explode
```
public <A,B> DataFrame explode(String inputColumn,
 String outputColumn,
 scala.Function1<A,scala.collection.TraversableOnce> f,
 scala.reflect.api.TypeTags.TypeTag evidence$3)
```
 (Scala-specific) Returns a new DataFrame where a single column has been expanded to zero or more rows by the provided function. This is similar to a LATERAL VIEW in HiveQL. All columns of the input row are implicitly joined with each value that is output by the function.
```
 df.explode("words", "word"){words: String => words.split(" ")}
 
```
 Parameters:
 inputColumn - (undocumented)
 outputColumn - (undocumented)
 f - (undocumented)
 evidence$3 - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 1.3.0
 - withColumn
```
public DataFrame withColumn(String colName,
 Column col)
```
 Returns a new DataFrame by adding a column or replacing the existing column that has the same name.
 
 Parameters:
 colName - (undocumented)
 col - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 1.3.0
 - withColumnRenamed
```
public DataFrame withColumnRenamed(String existingName,
 String newName)
```
 Returns a new DataFrame with a column renamed. This is a no-op if schema doesn't contain existingName.
 
 Parameters:
 existingName - (undocumented)
 newName - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 1.3.0
 - drop
```
public DataFrame drop(String colName)
```
 Returns a new DataFrame with a column dropped. This is a no-op if schema doesn't contain column name.
 
 Parameters:
 colName - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 1.4.0
 - drop
```
public DataFrame drop(Column col)
```
 Returns a new DataFrame with a column dropped. This version of drop accepts a Column rather than a name. This is a no-op if the DataFrame doesn't have a column with an equivalent expression.
 
 Parameters:
 col - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 1.4.1
 - dropDuplicates
```
public DataFrame dropDuplicates()
```
 Returns a new DataFrame that contains only the unique rows from this DataFrame. This is an alias for distinct.
 
 Returns:
 (undocumented)
 Since:
 
 1.4.0
 - dropDuplicates
```
public DataFrame dropDuplicates(scala.collection.Seq<String> colNames)
```
 (Scala-specific) Returns a new DataFrame with duplicate rows removed, considering only the subset of columns.
 
 Parameters:
 colNames - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 1.4.0
 - dropDuplicates
```
public DataFrame dropDuplicates(String[] colNames)
```
 Returns a new DataFrame with duplicate rows removed, considering only the subset of columns.
 
 Parameters:
 colNames - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 1.4.0
 - describe
```
public DataFrame describe(scala.collection.Seq<String> cols)
```
 Computes statistics for numeric columns, including count, mean, stddev, min, and max. If no columns are given, this function computes statistics for all numerical columns.
 This function is meant for exploratory data analysis, as we make no guarantee about the backward compatibility of the schema of the resulting DataFrame. If you want to programmatically compute summary statistics, use the agg function instead.
```
 df.describe("age", "height").show()

 // output:
 // summary age height
 // count 10.0 10.0
 // mean 53.3 178.05
 // stddev 11.6 15.7
 // min 18.0 163.0
 // max 92.0 192.0
 
```
 Parameters:
 cols - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 1.3.1
 - head
```
public Row[] head(int n)
```
 Returns the first n rows.
 
 Parameters:
 n - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 1.3.0
 - head
```
public Row head()
```
 Returns the first row.
 
 Returns:
 (undocumented)
 Since:
 
 1.3.0
 - first
```
public Row first()
```
 Returns the first row. Alias for head().
 
 Returns:
 (undocumented)
 Since:
 
 1.3.0
 - transform
```
public  DataFrame transform(scala.Function1<DataFrame,DataFrame> t)
```
 Concise syntax for chaining custom transformations.
```
 def featurize(ds: DataFrame) = ...

 df
 .transform(featurize)
 .transform(...)
 
```
 Parameters:
 t - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 1.6.0
 - map
```
public <R> RDD<R> map(scala.Function1<Row,R> f,
 scala.reflect.ClassTag<R> evidence$4)
```
 Returns a new RDD by applying a function to all rows of this DataFrame.
 
 Parameters:
 f - (undocumented)
 evidence$4 - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 1.3.0
 - flatMap
```
public <R> RDD<R> flatMap(scala.Function1<Row,scala.collection.TraversableOnce<R>> f,
 scala.reflect.ClassTag<R> evidence$5)
```
 Returns a new RDD by first applying a function to all rows of this DataFrame, and then flattening the results.
 
 Parameters:
 f - (undocumented)
 evidence$5 - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 1.3.0
 - mapPartitions
```
public <R> RDD<R> mapPartitions(scala.Function1<scala.collection.Iterator<Row>,scala.collection.Iterator<R>> f,
 scala.reflect.ClassTag<R> evidence$6)
```
 Returns a new RDD by applying a function to each partition of this DataFrame.
 
 Parameters:
 f - (undocumented)
 evidence$6 - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 1.3.0
 - foreach
```
public void foreach(scala.Function1<Row,scala.runtime.BoxedUnit> f)
```
 Applies a function f to all rows.
 
 Parameters:
 f - (undocumented)
 Since:
 
 1.3.0
 - foreachPartition
```
public void foreachPartition(scala.Function1<scala.collection.Iterator<Row>,scala.runtime.BoxedUnit> f)
```
 Applies a function f to each partition of this DataFrame.
 
 Parameters:
 f - (undocumented)
 Since:
 
 1.3.0
 - take
```
public Row[] take(int n)
```
 Returns the first n rows in the DataFrame.
 Running take requires moving data into the application's driver process, and doing so with a very large n can crash the driver process with OutOfMemoryError.
 
 Parameters:
 n - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 1.3.0
 - takeAsList
```
public java.util.List<Row> takeAsList(int n)
```
 Returns the first n rows in the DataFrame as a list.
 Running take requires moving data into the application's driver process, and doing so with a very large n can crash the driver process with OutOfMemoryError.
 
 Parameters:
 n - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 1.6.0
 - collect
```
public Row[] collect()
```
 Returns an array that contains all of Rows in this DataFrame.
 Running collect requires moving all the data into the application's driver process, and doing so on a very large dataset can crash the driver process with OutOfMemoryError.
 For Java API, use collectAsList.
 
 Returns:
 (undocumented)
 Since:
 
 1.3.0
 - collectAsList
```
public java.util.List<Row> collectAsList()
```
 Returns a Java list that contains all of Rows in this DataFrame.
 Running collect requires moving all the data into the application's driver process, and doing so on a very large dataset can crash the driver process with OutOfMemoryError.
 
 Returns:
 (undocumented)
 Since:
 
 1.3.0
 - count
```
public long count()
```
 Returns the number of rows in the DataFrame.
 
 Returns:
 (undocumented)
 Since:
 
 1.3.0
 - repartition
```
public DataFrame repartition(int numPartitions)
```
 Returns a new DataFrame that has exactly numPartitions partitions.
 
 Parameters:
 numPartitions - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 1.3.0
 - repartition
```
public DataFrame repartition(int numPartitions,
 scala.collection.Seq<Column> partitionExprs)
```
 Returns a new DataFrame partitioned by the given partitioning expressions into numPartitions. The resulting DataFrame is hash partitioned.
 This is the same operation as "DISTRIBUTE BY" in SQL (Hive QL).
 
 Parameters:
 numPartitions - (undocumented)
 partitionExprs - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 1.6.0
 - repartition
```
public DataFrame repartition(scala.collection.Seq<Column> partitionExprs)
```
 Returns a new DataFrame partitioned by the given partitioning expressions preserving the existing number of partitions. The resulting DataFrame is hash partitioned.
 This is the same operation as "DISTRIBUTE BY" in SQL (Hive QL).
 
 Parameters:
 partitionExprs - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 1.6.0
 - coalesce
```
public DataFrame coalesce(int numPartitions)
```
 Returns a new DataFrame that has exactly numPartitions partitions. Similar to coalesce defined on an RDD, this operation results in a narrow dependency, e.g. if you go from 1000 partitions to 100 partitions, there will not be a shuffle, instead each of the 100 new partitions will claim 10 of the current partitions.
 
 Parameters:
 numPartitions - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 1.4.0
 - distinct
```
public DataFrame distinct()
```
 Returns a new DataFrame that contains only the unique rows from this DataFrame. This is an alias for dropDuplicates.
 
 Returns:
 (undocumented)
 Since:
 
 1.3.0
 - persist
```
public DataFrame persist()
```
 Persist this DataFrame with the default storage level (MEMORY_AND_DISK).
 
 Returns:
 (undocumented)
 Since:
 
 1.3.0
 - cache
```
public DataFrame cache()
```
 Persist this DataFrame with the default storage level (MEMORY_AND_DISK).
 
 Returns:
 (undocumented)
 Since:
 
 1.3.0
 - persist
```
public DataFrame persist(StorageLevel newLevel)
```
 Persist this DataFrame with the given storage level.
 
 Parameters:
 newLevel - One of: MEMORY_ONLY, MEMORY_AND_DISK, MEMORY_ONLY_SER, MEMORY_AND_DISK_SER, DISK_ONLY, MEMORY_ONLY_2, MEMORY_AND_DISK_2, etc.
 
 Returns:
 (undocumented)
 Since:
 
 1.3.0
 - unpersist
```
public DataFrame unpersist(boolean blocking)
```
 Mark the DataFrame as non-persistent, and remove all blocks for it from memory and disk.
 
 Parameters:
 blocking - Whether to block until all blocks are deleted.
 
 Returns:
 (undocumented)
 Since:
 
 1.3.0
 - unpersist
```
public DataFrame unpersist()
```
 Mark the DataFrame as non-persistent, and remove all blocks for it from memory and disk.
 
 Returns:
 (undocumented)
 Since:
 
 1.3.0
 - rdd
```
public RDD<Row> rdd()
```
 Represents the content of the DataFrame as an RDD of Rows. Note that the RDD is memoized. Once called, it won't change even if you change any query planning related Spark SQL configurations (e.g. spark.sql.shuffle.partitions).
 
 Returns:
 (undocumented)
 Since:
 
 1.3.0
 - toJavaRDD
```
public JavaRDD<Row> toJavaRDD()
```
 Returns the content of the DataFrame as a JavaRDD of Rows.
 
 Returns:
 (undocumented)
 Since:
 
 1.3.0
 - javaRDD
```
public JavaRDD<Row> javaRDD()
```
 Returns the content of the DataFrame as a JavaRDD of Rows.
 
 Returns:
 (undocumented)
 Since:
 
 1.3.0
 - registerTempTable
```
public void registerTempTable(String tableName)
```
 Registers this DataFrame as a temporary table using the given name. The lifetime of this temporary table is tied to the SQLContext that was used to create this DataFrame.
 
 Parameters:
 tableName - (undocumented)
 Since:
 
 1.3.0
 - write
```
public DataFrameWriter write()
```
 :: Experimental :: Interface for saving the content of the DataFrame out into external storage.
 
 Returns:
 (undocumented)
 Since:
 
 1.4.0
 - toJSON
```
public RDD<String> toJSON()
```
 Returns the content of the DataFrame as a RDD of JSON strings.
 
 Returns:
 (undocumented)
 Since:
 
 1.3.0
 - inputFiles
```
public String[] inputFiles()
```
 Returns a best-effort snapshot of the files that compose this DataFrame. This method simply asks each constituent BaseRelation for its respective files and takes the union of all results. Depending on the source relations, this may not find all input files. Duplicates are removed.
 
 Returns:
 (undocumented)
 - toSchemaRDD
```
public DataFrame toSchemaRDD()
```
 Deprecated. As of 1.3.0, replaced by toDF(). This will be removed in Spark 2.0.
 
 Returns:
 (undocumented)
 - createJDBCTable
```
public void createJDBCTable(String url,
 String table,
 boolean allowExisting)
```
 Deprecated. As of 1.340, replaced by write().jdbc(). This will be removed in Spark 2.0.
 
 Save this DataFrame to a JDBC database at url under the table name table. This will run a CREATE TABLE and a bunch of INSERT INTO statements. If you pass true for allowExisting, it will drop any table with the given name; if you pass false, it will throw if the table already exists.
 
 Parameters:
 url - (undocumented)
 table - (undocumented)
 allowExisting - (undocumented)
 - insertIntoJDBC
```
public void insertIntoJDBC(String url,
 String table,
 boolean overwrite)
```
 Deprecated. As of 1.4.0, replaced by write().jdbc(). This will be removed in Spark 2.0.
 
 Save this DataFrame to a JDBC database at url under the table name table. Assumes the table already exists and has a compatible schema. If you pass true for overwrite, it will TRUNCATE the table before performing the INSERTs.
 The table must already exist on the database. It must have a schema that is compatible with the schema of this RDD; inserting the rows of the RDD in order via the simple statement INSERT INTO table VALUES (?, ?, ..., ?) should not fail.
 
 Parameters:
 url - (undocumented)
 table - (undocumented)
 overwrite - (undocumented)
 - saveAsParquetFile
```
public void saveAsParquetFile(String path)
```
 Deprecated. As of 1.4.0, replaced by write().parquet(). This will be removed in Spark 2.0.
 
 Saves the contents of this DataFrame as a parquet file, preserving the schema. Files that are written out using this method can be read back in as a DataFrame using the parquetFile function in SQLContext.
 
 Parameters:
 path - (undocumented)
 - saveAsTable
```
public void saveAsTable(String tableName)
```
 Deprecated. As of 1.4.0, replaced by write().saveAsTable(tableName). This will be removed in Spark 2.0.
 
 Creates a table from the the contents of this DataFrame. It will use the default data source configured by spark.sql.sources.default. This will fail if the table already exists.
 Note that this currently only works with DataFrames that are created from a HiveContext as there is no notion of a persisted catalog in a standard SQL context. Instead you can write an RDD out to a parquet file, and then register that file as a table. This "table" can then be the target of an insertInto.
 When the DataFrame is created from a non-partitioned HadoopFsRelation with a single input path, and the data source provider can be mapped to an existing Hive builtin SerDe (i.e. ORC and Parquet), the table is persisted in a Hive compatible format, which means other systems like Hive will be able to read this table. Otherwise, the table is persisted in a Spark SQL specific format.
 
 Parameters:
 tableName - (undocumented)
 - saveAsTable
```
public void saveAsTable(String tableName,
 SaveMode mode)
```
 Deprecated. As of 1.4.0, replaced by write().mode(mode).saveAsTable(tableName). This will be removed in Spark 2.0.
 
 Creates a table from the the contents of this DataFrame, using the default data source configured by spark.sql.sources.default and SaveMode.ErrorIfExists as the save mode.
 Note that this currently only works with DataFrames that are created from a HiveContext as there is no notion of a persisted catalog in a standard SQL context. Instead you can write an RDD out to a parquet file, and then register that file as a table. This "table" can then be the target of an insertInto.
 When the DataFrame is created from a non-partitioned HadoopFsRelation with a single input path, and the data source provider can be mapped to an existing Hive builtin SerDe (i.e. ORC and Parquet), the table is persisted in a Hive compatible format, which means other systems like Hive will be able to read this table. Otherwise, the table is persisted in a Spark SQL specific format.
 
 Parameters:
 tableName - (undocumented)
 mode - (undocumented)
 - saveAsTable
```
public void saveAsTable(String tableName,
 String source)
```
 Deprecated. As of 1.4.0, replaced by write().format(source).saveAsTable(tableName). This will be removed in Spark 2.0.
 
 Creates a table at the given path from the the contents of this DataFrame based on a given data source and a set of options, using SaveMode.ErrorIfExists as the save mode.
 Note that this currently only works with DataFrames that are created from a HiveContext as there is no notion of a persisted catalog in a standard SQL context. Instead you can write an RDD out to a parquet file, and then register that file as a table. This "table" can then be the target of an insertInto.
 When the DataFrame is created from a non-partitioned HadoopFsRelation with a single input path, and the data source provider can be mapped to an existing Hive builtin SerDe (i.e. ORC and Parquet), the table is persisted in a Hive compatible format, which means other systems like Hive will be able to read this table. Otherwise, the table is persisted in a Spark SQL specific format.
 
 Parameters:
 tableName - (undocumented)
 source - (undocumented)
 - saveAsTable
```
public void saveAsTable(String tableName,
 String source,
 SaveMode mode)
```
 Deprecated. As of 1.4.0, replaced by write().mode(mode).saveAsTable(tableName). This will be removed in Spark 2.0.
 
 :: Experimental :: Creates a table at the given path from the the contents of this DataFrame based on a given data source, SaveMode specified by mode, and a set of options.
 Note that this currently only works with DataFrames that are created from a HiveContext as there is no notion of a persisted catalog in a standard SQL context. Instead you can write an RDD out to a parquet file, and then register that file as a table. This "table" can then be the target of an insertInto.
 When the DataFrame is created from a non-partitioned HadoopFsRelation with a single input path, and the data source provider can be mapped to an existing Hive builtin SerDe (i.e. ORC and Parquet), the table is persisted in a Hive compatible format, which means other systems like Hive will be able to read this table. Otherwise, the table is persisted in a Spark SQL specific format.
 
 Parameters:
 tableName - (undocumented)
 source - (undocumented)
 mode - (undocumented)
 - saveAsTable
```
public void saveAsTable(String tableName,
 String source,
 SaveMode mode,
 java.util.Map<String,String> options)
```
 Deprecated. As of 1.4.0, replaced by write().format(source).mode(mode).options(options).saveAsTable(tableName). This will be removed in Spark 2.0.
 
 Creates a table at the given path from the the contents of this DataFrame based on a given data source, SaveMode specified by mode, and a set of options.
 Note that this currently only works with DataFrames that are created from a HiveContext as there is no notion of a persisted catalog in a standard SQL context. Instead you can write an RDD out to a parquet file, and then register that file as a table. This "table" can then be the target of an insertInto.
 When the DataFrame is created from a non-partitioned HadoopFsRelation with a single input path, and the data source provider can be mapped to an existing Hive builtin SerDe (i.e. ORC and Parquet), the table is persisted in a Hive compatible format, which means other systems like Hive will be able to read this table. Otherwise, the table is persisted in a Spark SQL specific format.
 
 Parameters:
 tableName - (undocumented)
 source - (undocumented)
 mode - (undocumented)
 options - (undocumented)
 - saveAsTable
```
public void saveAsTable(String tableName,
 String source,
 SaveMode mode,
 scala.collection.immutable.Map<String,String> options)
```
 Deprecated. As of 1.4.0, replaced by write().format(source).mode(mode).options(options).saveAsTable(tableName). This will be removed in Spark 2.0.
 
 (Scala-specific) Creates a table from the the contents of this DataFrame based on a given data source, SaveMode specified by mode, and a set of options.
 Note that this currently only works with DataFrames that are created from a HiveContext as there is no notion of a persisted catalog in a standard SQL context. Instead you can write an RDD out to a parquet file, and then register that file as a table. This "table" can then be the target of an insertInto.
 When the DataFrame is created from a non-partitioned HadoopFsRelation with a single input path, and the data source provider can be mapped to an existing Hive builtin SerDe (i.e. ORC and Parquet), the table is persisted in a Hive compatible format, which means other systems like Hive will be able to read this table. Otherwise, the table is persisted in a Spark SQL specific format.
 
 Parameters:
 tableName - (undocumented)
 source - (undocumented)
 mode - (undocumented)
 options - (undocumented)
 - save
```
public void save(String path)
```
 Deprecated. As of 1.4.0, replaced by write().save(path). This will be removed in Spark 2.0.
 
 Saves the contents of this DataFrame to the given path, using the default data source configured by spark.sql.sources.default and SaveMode.ErrorIfExists as the save mode.
 
 Parameters:
 path - (undocumented)
 - save
```
public void save(String path,
 SaveMode mode)
```
 Deprecated. As of 1.4.0, replaced by write().mode(mode).save(path). This will be removed in Spark 2.0.
 
 Saves the contents of this DataFrame to the given path and SaveMode specified by mode, using the default data source configured by spark.sql.sources.default.
 
 Parameters:
 path - (undocumented)
 mode - (undocumented)
 - save
```
public void save(String path,
 String source)
```
 Deprecated. As of 1.4.0, replaced by write().format(source).save(path). This will be removed in Spark 2.0.
 
 Saves the contents of this DataFrame to the given path based on the given data source, using SaveMode.ErrorIfExists as the save mode.
 
 Parameters:
 path - (undocumented)
 source - (undocumented)
 - save
```
public void save(String path,
 String source,
 SaveMode mode)
```
 Deprecated. As of 1.4.0, replaced by write().format(source).mode(mode).save(path). This will be removed in Spark 2.0.
 
 Saves the contents of this DataFrame to the given path based on the given data source and SaveMode specified by mode.
 
 Parameters:
 path - (undocumented)
 source - (undocumented)
 mode - (undocumented)
 - save
```
public void save(String source,
 SaveMode mode,
 java.util.Map<String,String> options)
```
 Deprecated. As of 1.4.0, replaced by write().format(source).mode(mode).options(options).save(path). This will be removed in Spark 2.0.
 
 Saves the contents of this DataFrame based on the given data source, SaveMode specified by mode, and a set of options.
 
 Parameters:
 source - (undocumented)
 mode - (undocumented)
 options - (undocumented)
 - save
```
public void save(String source,
 SaveMode mode,
 scala.collection.immutable.Map<String,String> options)
```
 Deprecated. As of 1.4.0, replaced by write().format(source).mode(mode).options(options).save(path). This will be removed in Spark 2.0.
 
 (Scala-specific) Saves the contents of this DataFrame based on the given data source, SaveMode specified by mode, and a set of options
 
 Parameters:
 source - (undocumented)
 mode - (undocumented)
 options - (undocumented)
 - insertInto
```
public void insertInto(String tableName,
 boolean overwrite)
```
 Deprecated. As of 1.4.0, replaced by write().mode(SaveMode.Append|SaveMode.Overwrite).saveAsTable(tableName). This will be removed in Spark 2.0.
 
 Adds the rows from this RDD to the specified table, optionally overwriting the existing data.
 
 Parameters:
 tableName - (undocumented)
 overwrite - (undocumented)
 - insertInto
```
public void insertInto(String tableName)
```
 Deprecated. As of 1.4.0, replaced by write().mode(SaveMode.Append).saveAsTable(tableName). This will be removed in Spark 2.0.
 
 Adds the rows from this RDD to the specified table. Throws an exception if the table already exists.
 
 Parameters:
 tableName - (undocumented)

Class DataFrame

Constructor Summary

Method Summary

Methods inherited from class Object

Methods inherited from interface org.apache.spark.sql.execution.Queryable

Constructor Detail

DataFrame

Method Detail

toDF

sortWithinPartitions

sortWithinPartitions

sort

sort

orderBy

orderBy

select

select

selectExpr

groupBy

rollup

cube

groupBy

rollup

cube

agg

describe

repartition

repartition

sqlContext

queryExecution

toDF

as

toDF

schema

printSchema

explain

explain

dtypes

columns

isLocal

show

show

show

show

na

stat

join

join

join

join

join

join

sortWithinPartitions

sortWithinPartitions

sort

sort

orderBy

orderBy

apply

col

as

as

alias

alias

select

select

selectExpr

filter

filter

where

where

groupBy

rollup

cube

groupBy

rollup

cube

agg

agg

agg