DataFrameWriterV2 class

Interface used to write a DataFrame to external storage using the v2 API.

For most use cases with Databricks tables and Delta Lake, DataFrameWriterV2 provides more powerful and flexible options than the original DataFrameWriter:

Better table property support
More fine-grained control over partitioning
Conditional overwrite capabilities
Support for clustering
Clearer semantics for create or replace operations

Supports Spark Connect

Syntax

Use DataFrame.writeTo(table) to access this interface.

Methods

Method	Description
`using(provider)`	Specifies a provider for the underlying output data source.
`option(key, value)`	Add a write option. For example, to create a managed table: `df.writeTo("test").using("delta").option("path", "s3://test").createOrReplace()`.
`options(**options)`	Add write options.
`tableProperty(property, value)`	Add table property. For example, use `tableProperty("location", "s3://test")` to create an EXTERNAL (unmanaged) table.
`partitionedBy(col, *cols)`	Partition the output table created by create, createOrReplace, or replace using the given columns or transforms.
`clusterBy(col, *cols)`	Clusters the data by the given columns to optimize query performance.
`create()`	Create a new table from the contents of the data frame.
`replace()`	Replace an existing table with the contents of the data frame.
`createOrReplace()`	Create a new table or replace an existing table with the contents of the data frame.
`append()`	Append the contents of the data frame to the output table.
`overwrite(condition)`	Overwrite rows matching the given filter condition with the contents of the data frame in the output table.
`overwritePartitions()`	Overwrite all partition for which the data frame contains at least one row with the contents of the data frame in the output table.

Examples

Creating a new table

# Create a new table with DataFrame contents
df = spark.createDataFrame([{"name": "Alice", "age": 30}])
df.writeTo("my_table").create()

# Create with a specific provider
df.writeTo("my_table").using("parquet").create()

Partitioning data

# Partition by single column
df.writeTo("my_table") \
    .partitionedBy("year") \
    .create()

# Partition by multiple columns
df.writeTo("my_table") \
    .partitionedBy("year", "month") \
    .create()

# Partition using transform functions
from pyspark.sql.functions import years, months, days

df.writeTo("my_table") \
    .partitionedBy(years("date"), months("date")) \
    .create()

Setting table properties

# Add table properties
df.writeTo("my_table") \
    .tableProperty("key1", "value1") \
    .tableProperty("key2", "value2") \
    .create()

Using options

# Add write options
df.writeTo("my_table") \
    .option("compression", "snappy") \
    .option("maxRecordsPerFile", "10000") \
    .create()

# Add multiple options at once
df.writeTo("my_table") \
    .options(compression="snappy", maxRecordsPerFile="10000") \
    .create()

Clustering data

# Cluster by columns for query optimization
df.writeTo("my_table") \
    .clusterBy("user_id", "timestamp") \
    .create()

Replace operations

# Replace existing table
df.writeTo("my_table") \
    .using("parquet") \
    .replace()

# Create or replace (safe operation)
df.writeTo("my_table") \
    .using("parquet") \
    .createOrReplace()

Append operations

# Append to existing table
df.writeTo("my_table").append()

Overwrite operations

from pyspark.sql.functions import col

# Overwrite specific rows based on condition
df.writeTo("my_table") \
    .overwrite(col("date") == "2025-01-01")

# Overwrite entire partitions
df.writeTo("my_table") \
    .overwritePartitions()

Method chaining

# Combine multiple configurations
df.writeTo("my_table") \
    .using("parquet") \
    .option("compression", "snappy") \
    .tableProperty("description", "User data table") \
    .partitionedBy("year", "month") \
    .clusterBy("user_id") \
    .createOrReplace()

Feedback

Var denne side nyttig?

Last updated on 2026-02-26