Bemærk
Adgang til denne side kræver godkendelse. Du kan prøve at logge på eller ændre mapper.
Adgang til denne side kræver godkendelse. Du kan prøve at ændre mapper.
Interface used to write a DataFrame to external storage using the v2 API.
For most use cases with Databricks tables and Delta Lake, DataFrameWriterV2 provides more powerful and flexible options than the original DataFrameWriter:
- Better table property support
- More fine-grained control over partitioning
- Conditional overwrite capabilities
- Support for clustering
- Clearer semantics for create or replace operations
Supports Spark Connect
Syntax
Use DataFrame.writeTo(table) to access this interface.
Methods
| Method | Description |
|---|---|
using(provider) |
Specifies a provider for the underlying output data source. |
option(key, value) |
Add a write option. For example, to create a managed table: df.writeTo("test").using("delta").option("path", "s3://test").createOrReplace(). |
options(**options) |
Add write options. |
tableProperty(property, value) |
Add table property. For example, use tableProperty("location", "s3://test") to create an EXTERNAL (unmanaged) table. |
partitionedBy(col, *cols) |
Partition the output table created by create, createOrReplace, or replace using the given columns or transforms. |
clusterBy(col, *cols) |
Clusters the data by the given columns to optimize query performance. |
create() |
Create a new table from the contents of the data frame. |
replace() |
Replace an existing table with the contents of the data frame. |
createOrReplace() |
Create a new table or replace an existing table with the contents of the data frame. |
append() |
Append the contents of the data frame to the output table. |
overwrite(condition) |
Overwrite rows matching the given filter condition with the contents of the data frame in the output table. |
overwritePartitions() |
Overwrite all partition for which the data frame contains at least one row with the contents of the data frame in the output table. |
Examples
Creating a new table
# Create a new table with DataFrame contents
df = spark.createDataFrame([{"name": "Alice", "age": 30}])
df.writeTo("my_table").create()
# Create with a specific provider
df.writeTo("my_table").using("parquet").create()
Partitioning data
# Partition by single column
df.writeTo("my_table") \
.partitionedBy("year") \
.create()
# Partition by multiple columns
df.writeTo("my_table") \
.partitionedBy("year", "month") \
.create()
# Partition using transform functions
from pyspark.sql.functions import years, months, days
df.writeTo("my_table") \
.partitionedBy(years("date"), months("date")) \
.create()
Setting table properties
# Add table properties
df.writeTo("my_table") \
.tableProperty("key1", "value1") \
.tableProperty("key2", "value2") \
.create()
Using options
# Add write options
df.writeTo("my_table") \
.option("compression", "snappy") \
.option("maxRecordsPerFile", "10000") \
.create()
# Add multiple options at once
df.writeTo("my_table") \
.options(compression="snappy", maxRecordsPerFile="10000") \
.create()
Clustering data
# Cluster by columns for query optimization
df.writeTo("my_table") \
.clusterBy("user_id", "timestamp") \
.create()
Replace operations
# Replace existing table
df.writeTo("my_table") \
.using("parquet") \
.replace()
# Create or replace (safe operation)
df.writeTo("my_table") \
.using("parquet") \
.createOrReplace()
Append operations
# Append to existing table
df.writeTo("my_table").append()
Overwrite operations
from pyspark.sql.functions import col
# Overwrite specific rows based on condition
df.writeTo("my_table") \
.overwrite(col("date") == "2025-01-01")
# Overwrite entire partitions
df.writeTo("my_table") \
.overwritePartitions()
Method chaining
# Combine multiple configurations
df.writeTo("my_table") \
.using("parquet") \
.option("compression", "snappy") \
.tableProperty("description", "User data table") \
.partitionedBy("year", "month") \
.clusterBy("user_id") \
.createOrReplace()