Core Classes#

SparkSession(sparkContext[, jsparkSession, ...])

The entry point to programming Spark with the Dataset and DataFrame API.

Catalog(sparkSession)

User-facing catalog API, accessible through SparkSession.catalog.

DataFrame(jdf, sql_ctx)

A distributed collection of data grouped into named columns.

Column(jc)

A column in a DataFrame.

Observation(*args, **kwargs)

Class to observe (named) metrics on a DataFrame.

Row(*args, **kwargs)

A row in DataFrame.

GroupedData(jgd, df)

A set of methods for aggregations on a DataFrame, created by DataFrame.groupBy().

PandasCogroupedOps(gd1, gd2)

A logical grouping of two GroupedData, created by GroupedData.cogroup().

DataFrameNaFunctions(df)

Functionality for working with missing data in DataFrame.

DataFrameStatFunctions(df)

Functionality for statistic functions with DataFrame.

Window()

Utility functions for defining window in DataFrames.

DataFrameReader(spark)

Interface used to load a DataFrame from external storage systems (e.g.

DataFrameWriter(df)

Interface used to write a DataFrame to external storage systems (e.g.

DataFrameWriterV2(df, table)

Interface used to write a class:pyspark.sql.dataframe.DataFrame to external storage using the v2 API.

UDFRegistration(sparkSession)

Wrapper for user-defined function registration.

UDTFRegistration(sparkSession)

Wrapper for user-defined table function registration.

udf.UserDefinedFunction(func[, returnType, ...])

User defined function in Python

udtf.UserDefinedTableFunction(func, returnType)

User-defined table function in Python

datasource.DataSource(options)

A base class for data sources.

datasource.DataSourceReader()

A base class for data source readers.

datasource.DataSourceStreamReader()

A base class for streaming data source readers.

datasource.DataSourceWriter()

A base class for data source writers.

datasource.DataSourceRegistration(sparkSession)

Wrapper for data source registration.

datasource.InputPartition(value)

A base class representing an input partition returned by the partitions() method of DataSourceReader.

datasource.WriterCommitMessage()

A commit message returned by the DataSourceWriter.write() and will be sent back to the driver side as input parameter of DataSourceWriter.commit() or DataSourceWriter.abort() method.

VariantVal(value, metadata)

A class to represent a Variant value in Python.