Grouping#

GroupedData.agg(*exprs)

Compute aggregates and returns the result as a DataFrame.

GroupedData.apply(udf)

It is an alias of pyspark.sql.GroupedData.applyInPandas(); however, it takes a pyspark.sql.functions.pandas_udf() whereas pyspark.sql.GroupedData.applyInPandas() takes a Python native function.

GroupedData.applyInArrow(func, schema)

Maps each group of the current DataFrame using an Arrow udf and returns the result as a DataFrame.

GroupedData.applyInPandas(func, schema)

Maps each group of the current DataFrame using a pandas udf and returns the result as a DataFrame.

GroupedData.applyInPandasWithState(func, ...)

Applies the given function to each group of data, while maintaining a user-defined per-group state.

GroupedData.avg(*cols)

Computes average values for each numeric columns for each group.

GroupedData.cogroup(other)

Cogroups this group with another group so that we can run cogrouped operations.

GroupedData.count()

Counts the number of records for each group.

GroupedData.max(*cols)

Computes the max value for each numeric columns for each group.

GroupedData.mean(*cols)

Computes average values for each numeric columns for each group.

GroupedData.min(*cols)

Computes the min value for each numeric column for each group.

GroupedData.pivot(pivot_col[, values])

Pivots a column of the current DataFrame and performs the specified aggregation.

GroupedData.sum(*cols)

Computes the sum for each numeric columns for each group.

PandasCogroupedOps.applyInArrow(func, schema)

Applies a function to each cogroup using Arrow and returns the result as a DataFrame.

PandasCogroupedOps.applyInPandas(func, schema)

Applies a function to each cogroup using pandas and returns the result as a DataFrame.