Bemærk
Adgang til denne side kræver godkendelse. Du kan prøve at logge på eller ændre mapper.
Adgang til denne side kræver godkendelse. Du kan prøve at ændre mapper.
Returns the most frequent value in a group.
Syntax
from pyspark.sql import functions as sf
sf.mode(col, deterministic=False)
Parameters
| Parameter | Type | Description |
|---|---|---|
col |
pyspark.sql.Column or column name |
Target column to compute on. |
deterministic |
bool, optional | If there are multiple equally-frequent results then return the lowest (defaults to false). |
Returns
pyspark.sql.Column: the most frequent value in a group.
Examples
from pyspark.sql import functions as sf
df = spark.createDataFrame([
("Java", 2012, 20000), ("dotNET", 2012, 5000),
("Java", 2012, 20000), ("dotNET", 2012, 5000),
("dotNET", 2013, 48000), ("Java", 2013, 30000)],
schema=("course", "year", "earnings"))
df.groupby("course").agg(sf.mode("year")).sort("course").show()
+------+----------+
|course|mode(year)|
+------+----------+
| Java| 2012|
|dotNET| 2012|
+------+----------+
When multiple values have the same greatest frequency then either any of values is returned if deterministic is false or is not defined, or the lowest value is returned if deterministic is true.
from pyspark.sql import functions as sf
df = spark.createDataFrame([(-10,), (0,), (10,)], ["col"])
df.select(sf.mode("col", True)).show()
+---------------------------------------+
|mode() WITHIN GROUP (ORDER BY col DESC)|
+---------------------------------------+
| -10|
+---------------------------------------+