Bemærk
Adgang til denne side kræver godkendelse. Du kan prøve at logge på eller ændre mapper.
Adgang til denne side kræver godkendelse. Du kan prøve at ændre mapper.
Returns the first value in a group. The function by default returns the first values it sees. It will return the first non-null value it sees when ignoreNulls is set to true. If all values are null, then null is returned. The function is non-deterministic because its results depends on the order of the rows which may be non-deterministic after a shuffle.
Syntax
from pyspark.sql import functions as sf
sf.first(col, ignorenulls=False)
Parameters
| Parameter | Type | Description |
|---|---|---|
col |
pyspark.sql.Column or column name |
Column to fetch first value for. |
ignorenulls |
bool | If first value is null then look for first non-null value. False by default. |
Returns
pyspark.sql.Column: first value of the group.
Examples
from pyspark.sql import functions as sf
df = spark.createDataFrame([("Alice", 2), ("Bob", 5), ("Alice", None)], ("name", "age"))
df = df.orderBy(df.age)
df.groupby("name").agg(sf.first("age")).orderBy("name").show()
+-----+----------+
| name|first(age)|
+-----+----------+
|Alice| NULL|
| Bob| 5|
+-----+----------+
To ignore any null values, set ignorenulls to True:
df.groupby("name").agg(sf.first("age", ignorenulls=True)).orderBy("name").show()
+-----+----------+
| name|first(age)|
+-----+----------+
|Alice| 2|
| Bob| 5|
+-----+----------+