pyspark.pandas.DataFrame.prod#

DataFrame.prod(axis=None, skipna=True, numeric_only=None, min_count=0)#

Return the product of the values.

Note

unlike pandas’, pandas-on-Spark’s emulates product by exp(sum(log(...))) trick. Therefore, it only works for positive numbers.

Parameters

axis: {index (0), columns (1)}: Axis for the function to be applied on.
skipna: bool, default True: Exclude NA/null values when computing the result.

Changed in version 3.4.0: Supported including NA/null values.
numeric_only: bool, default None: Include only float, int, boolean columns. False is not supported. This parameter is mainly for pandas compatibility.
min_count: int, default 0: The required number of valid values to perform the operation. If fewer than min_count non-NA values are present the result will be NA.

Examples

On a DataFrame:

Non-numeric type column is not included to the result.

>>> psdf = ps.DataFrame({'A': [1, 2, 3, 4, 5],
...                     'B': [10, 20, 30, 40, 50],
...                     'C': ['a', 'b', 'c', 'd', 'e']})
>>> psdf
   A   B  C
0  1  10  a
1  2  20  b
2  3  30  c
3  4  40  d
4  5  50  e

>>> psdf.prod()
A         120
B    12000000
dtype: int64

If there is no numeric type columns, returns empty Series.

>>> ps.DataFrame({"key": ['a', 'b', 'c'], "val": ['x', 'y', 'z']}).prod()  
Series([], dtype: float64)

On a Series:

>>> ps.Series([1, 2, 3, 4, 5]).prod()
120

By default, the product of an empty or all-NA Series is 1

>>> ps.Series([]).prod()  
1.0

This can be controlled with the min_count parameter

>>> ps.Series([]).prod(min_count=1)  
nan