Specifies the underlying output data source.
Specifies the underlying output data source. Built-in options include "parquet", "json", etc.
1.4.0
Inserts the content of the DataFrame to the specified table.
Saves the content of the DataFrame to a external database table via JDBC.
Saves the content of the DataFrame to a external database table via JDBC. In the case the
table already exists in the external database, behavior of this function depends on the
save mode, specified by the mode
function (default to throwing an exception).
Don't create too many partitions in parallel on a large cluster; otherwise Spark might crash your external database systems.
JDBC database url of the form jdbc:subprotocol:subname
Name of the table in the external database.
JDBC database connection arguments, a list of arbitrary string tag/value. Normally at least a "user" and "password" property should be included.
1.4.0
Saves the content of the DataFrame in JSON format at the specified path.
Saves the content of the DataFrame in JSON format at the specified path. This is equivalent to:
format("json").save(path)
1.4.0
Specifies the behavior when data or table already exists.
Specifies the behavior when data or table already exists. Options include:
overwrite
: overwrite the existing data.append
: append the data.ignore
: ignore the operation (i.e. no-op).error
: default option, throw an exception at runtime.
1.4.0
Specifies the behavior when data or table already exists.
Specifies the behavior when data or table already exists. Options include:
SaveMode.Overwrite
: overwrite the existing data.SaveMode.Append
: append the data.SaveMode.Ignore
: ignore the operation (i.e. no-op).SaveMode.ErrorIfExists
: default option, throw an exception at runtime.
1.4.0
Adds an output option for the underlying data source.
Adds an output option for the underlying data source.
1.4.0
Adds output options for the underlying data source.
Adds output options for the underlying data source.
1.4.0
(Scala-specific) Adds output options for the underlying data source.
(Scala-specific) Adds output options for the underlying data source.
1.4.0
Saves the content of the DataFrame in ORC format at the specified path.
Saves the content of the DataFrame in ORC format at the specified path. This is equivalent to:
format("orc").save(path)
1.5.0
Currently, this method can only be used together with HiveContext
.
Saves the content of the DataFrame in Parquet format at the specified path.
Saves the content of the DataFrame in Parquet format at the specified path. This is equivalent to:
format("parquet").save(path)
1.4.0
Partitions the output by the given columns on the file system.
Partitions the output by the given columns on the file system. If specified, the output is laid out on the file system similar to Hive's partitioning scheme.
This was initially applicable for Parquet but in 1.5+ covers JSON, text, ORC and avro as well.
1.4.0
Saves the content of the DataFrame as the specified table.
Saves the content of the DataFrame as the specified table.
1.4.0
Saves the content of the DataFrame at the specified path.
Saves the content of the DataFrame at the specified path.
1.4.0
Saves the content of the DataFrame as the specified table.
Saves the content of the DataFrame as the specified table.
In the case the table already exists, behavior of this function depends on the
save mode, specified by the mode
function (default to throwing an exception).
When mode
is Overwrite
, the schema of the DataFrame does not need to be
the same as that of the existing table.
When mode
is Append
, the schema of the DataFrame need to be
the same as that of the existing table, and format or options will be ignored.
When the DataFrame is created from a non-partitioned HadoopFsRelation with a single input path, and the data source provider can be mapped to an existing Hive builtin SerDe (i.e. ORC and Parquet), the table is persisted in a Hive compatible format, which means other systems like Hive will be able to read this table. Otherwise, the table is persisted in a Spark SQL specific format.
1.4.0
Saves the content of the DataFrame in a text file at the specified path.
Saves the content of the DataFrame in a text file at the specified path. The DataFrame must have only one column that is of string type. Each row becomes a new line in the output file. For example:
// Scala: df.write.text("/path/to/output") // Java: df.write().text("/path/to/output")
1.6.0
:: Experimental :: Interface used to write a DataFrame to external storage systems (e.g. file systems, key-value stores, etc). Use DataFrame.write to access this.
1.4.0