Bemærk
Adgang til denne side kræver godkendelse. Du kan prøve at logge på eller ændre mapper.
Adgang til denne side kræver godkendelse. Du kan prøve at ændre mapper.
Returns the set difference of two binary representations of Datasketches ThetaSketch objects (elements in first sketch but not in second), using a Datasketches ANotB object.
Syntax
from pyspark.sql import functions as sf
sf.theta_difference(col1, col2)
Parameters
| Parameter | Type | Description |
|---|---|---|
col1 |
pyspark.sql.Column or str |
The first Theta sketch. |
col2 |
pyspark.sql.Column or str |
The second Theta sketch. |
Returns
pyspark.sql.Column: The binary representation of the difference ThetaSketch.
Examples
Example 1: Get difference of two Theta sketches
from pyspark.sql import functions as sf
df = spark.createDataFrame([(1,4),(2,4),(3,5),(4,5)], "struct<v1:int,v2:int>")
df = df.agg(
sf.theta_sketch_agg("v1").alias("sketch1"),
sf.theta_sketch_agg("v2").alias("sketch2")
)
df.select(sf.theta_sketch_estimate(sf.theta_difference(df.sketch1, "sketch2"))).show()
+---------------------------------------------------------+
|theta_sketch_estimate(theta_difference(sketch1, sketch2))|
+---------------------------------------------------------+
| 3|
+---------------------------------------------------------+