Input/Output#

Data Generator#

range(start[, end, step, num_partitions])

Create a DataFrame with some range of numbers.

Spark Metastore Table#

read_table(name[, index_col])

Read a Spark table and return a DataFrame.

DataFrame.to_table(name[, format, mode, ...])

Write the DataFrame into a Spark table.

Delta Lake#

read_delta(path[, version, timestamp, index_col])

Read a Delta Lake table on some file system and return a DataFrame.

DataFrame.to_delta(path[, mode, ...])

Write the DataFrame out as a Delta Lake table.

Parquet#

read_parquet(path[, columns, index_col, ...])

Load a parquet object from the file path, returning a DataFrame.

DataFrame.to_parquet(path[, mode, ...])

Write the DataFrame out as a Parquet file or directory.

ORC#

read_orc(path[, columns, index_col])

Load an ORC object from the file path, returning a DataFrame.

DataFrame.to_orc(path[, mode, ...])

Write a DataFrame to the ORC format.

Generic Spark I/O#

read_spark_io([path, format, schema, index_col])

Load a DataFrame from a Spark data source.

DataFrame.spark.to_spark_io([path, format, ...])

Write the DataFrame out to a Spark data source.

Flat File / CSV#

read_csv(path[, sep, header, names, ...])

Read CSV (comma-separated) file into DataFrame or Series.

DataFrame.to_csv([path, sep, na_rep, ...])

Write object to a comma-separated values (csv) file.

Clipboard#

read_clipboard([sep])

Read text from clipboard and pass to read_csv.

DataFrame.to_clipboard([excel, sep])

Copy object to the system clipboard.

Excel#

read_excel(io[, sheet_name, header, names, ...])

Read an Excel file into a pandas-on-Spark DataFrame or Series.

DataFrame.to_excel(excel_writer[, ...])

Write object to an Excel sheet.

JSON#

read_json(path[, lines, index_col])

Convert a JSON string to DataFrame.

DataFrame.to_json([path, compression, ...])

Convert the object to a JSON string.

HTML#

read_html(io[, match, flavor, header, ...])

Read HTML tables into a list of DataFrame objects.

DataFrame.to_html([buf, columns, col_space, ...])

Render a DataFrame as an HTML table.

SQL#

read_sql_table(table_name, con[, schema, ...])

Read SQL database table into a DataFrame.

read_sql_query(sql, con[, index_col])

Read SQL query into a DataFrame.

read_sql(sql, con[, index_col, columns])

Read SQL query or database table into a DataFrame.