pyspark.pandas.Series

class pyspark.pandas.Series(data=None, index=None, dtype=None, name=None, copy=False, fastpath=False)[source]

pandas-on-Spark Series that corresponds to pandas Series logically. This holds Spark Column internally.

Variables
  • _internal – an internal immutable Frame to manage metadata.

  • _psdf – Parent’s pandas-on-Spark DataFrame

Parameters
dataarray-like, dict, or scalar value, pandas Series

Contains data stored in Series If data is a dict, argument order is maintained for Python 3.6 and later. Note that if data is a pandas Series, other arguments should not be used.

indexarray-like or Index (1d)

Values must be hashable and have the same length as data. Non-unique index values are allowed. Will default to RangeIndex (0, 1, 2, …, n) if not provided. If both a dict and index sequence are used, the index will override the keys found in the dict.

dtypenumpy.dtype or None

If None, dtype will be inferred

copyboolean, default False

Copy input data

Methods

abs()

Return a Series/DataFrame with absolute numeric value of each element.

add(other)

Return Addition of series and other, element-wise (binary operator +).

add_prefix(prefix)

Prefix labels with string prefix.

add_suffix(suffix)

Suffix labels with string suffix.

agg(func)

Aggregate using one or more operations over the specified axis.

aggregate(func)

Aggregate using one or more operations over the specified axis.

align(other[, join, axis, copy])

Align two objects on their axes with the specified join method.

all([axis])

Return whether all elements are True.

any([axis])

Return whether any element is True.

append(to_append[, ignore_index, …])

Concatenate two or more Series.

apply(func[, args])

Invoke function on values of Series.

argmax()

Return int position of the largest value in the Series.

argmin()

Return int position of the smallest value in the Series.

argsort()

Return the integer indices that would sort the Series values.

asof(where)

Return the last row(s) without any NaNs before where.

astype(dtype)

Cast a pandas-on-Spark object to a specified dtype dtype.

at_time(time[, asof, axis])

Select values at particular time of day (example: 9:30AM).

backfill([axis, inplace, limit])

Synonym for DataFrame.fillna() or Series.fillna() with method=`bfill`.

between(left, right[, inclusive])

Return boolean Series equivalent to left <= series <= right.

between_time(start_time, end_time[, …])

Select values between particular times of the day (example: 9:00-9:30 AM).

bfill([axis, inplace, limit])

Synonym for DataFrame.fillna() or Series.fillna() with method=`bfill`.

bool()

Return the bool of a single element in the current object.

clip([lower, upper])

Trim values at input threshold(s).

combine_first(other)

Combine Series values, choosing the calling Series’s values first.

compare(other[, keep_shape, keep_equal])

Compare to another Series and show the differences.

copy([deep])

Make a copy of this object’s indices and data.

corr(other[, method])

Compute correlation with other Series, excluding missing values.

count([axis, numeric_only])

Count non-NA cells for each column.

cummax([skipna])

Return cumulative maximum over a DataFrame or Series axis.

cummin([skipna])

Return cumulative minimum over a DataFrame or Series axis.

cumprod([skipna])

Return cumulative product over a DataFrame or Series axis.

cumsum([skipna])

Return cumulative sum over a DataFrame or Series axis.

describe([percentiles])

Generate descriptive statistics that summarize the central tendency, dispersion and shape of a dataset’s distribution, excluding NaN values.

diff([periods])

First discrete difference of element.

div(other)

Return Floating division of series and other, element-wise (binary operator /).

divide(other)

Return Floating division of series and other, element-wise (binary operator /).

divmod(other)

Return Integer division and modulo of series and other, element-wise (binary operator divmod).

dot(other)

Compute the dot product between the Series and the columns of other.

drop([labels, index, level])

Return Series with specified index labels removed.

drop_duplicates([keep, inplace])

Return Series with duplicate values removed.

droplevel(level)

Return Series with requested index level(s) removed.

dropna([axis, inplace])

Return a new Series with missing values removed.

eq(other)

Compare if the current value is equal to the other.

equals(other)

Compare if the current value is equal to the other.

expanding([min_periods])

Provide expanding transformations.

explode()

Transform each element of a list-like to a row.

factorize([sort, na_sentinel])

Encode the object as an enumerated type or categorical variable.

ffill([axis, inplace, limit])

Synonym for DataFrame.fillna() or Series.fillna() with method=`ffill`.

fillna([value, method, axis, inplace, limit])

Fill NA/NaN values.

filter([items, like, regex, axis])

Subset rows or columns of dataframe according to labels in the specified index.

first(offset)

Select first periods of time series data based on a date offset.

first_valid_index()

Retrieves the index of the first valid value.

floordiv(other)

Return Integer division of series and other, element-wise (binary operator //).

ge(other)

Compare if the current value is greater than or equal to the other.

get(key[, default])

Get item from object for given key (DataFrame column, Panel slice, etc.).

get_dtype_counts()

Return counts of unique dtypes in this object.

groupby(by[, axis, as_index, dropna])

Group DataFrame or Series using one or more columns.

gt(other)

Compare if the current value is greater than the other.

head([n])

Return the first n rows.

hist([bins])

Draw one histogram of the DataFrame’s columns.

idxmax([skipna])

Return the row label of the maximum value.

idxmin([skipna])

Return the row label of the minimum value.

isin(values)

Check whether values are contained in Series or Index.

isna()

Detect existing (non-missing) values.

isnull()

Detect existing (non-missing) values.

item()

Return the first element of the underlying data as a Python scalar.

items()

This is an alias of iteritems.

iteritems()

Lazily iterate over (index, value) tuples.

keys()

Return alias for index.

kurt([axis, numeric_only])

Return unbiased kurtosis using Fisher’s definition of kurtosis (kurtosis of normal == 0.0).

kurtosis([axis, numeric_only])

Return unbiased kurtosis using Fisher’s definition of kurtosis (kurtosis of normal == 0.0).

last(offset)

Select final periods of time series data based on a date offset.

last_valid_index()

Return index for last non-NA/null value.

le(other)

Compare if the current value is less than or equal to the other.

lt(other)

Compare if the current value is less than the other.

mad()

Return the mean absolute deviation of values.

map(arg)

Map values of Series according to input correspondence.

mask(cond[, other])

Replace values where the condition is True.

max([axis, numeric_only])

Return the maximum of the values.

mean([axis, numeric_only])

Return the mean of the values.

median([axis, numeric_only, accuracy])

Return the median of the values for the requested axis.

min([axis, numeric_only])

Return the minimum of the values.

mod(other)

Return Modulo of series and other, element-wise (binary operator %).

mode([dropna])

Return the mode(s) of the dataset.

mul(other)

Return Multiplication of series and other, element-wise (binary operator *).

multiply(other)

Return Multiplication of series and other, element-wise (binary operator *).

ne(other)

Compare if the current value is not equal to the other.

nlargest([n])

Return the largest n elements.

notna()

Detect existing (non-missing) values.

notnull()

Detect existing (non-missing) values.

nsmallest([n])

Return the smallest n elements.

nunique([dropna, approx, rsd])

Return number of unique elements in the object.

pad([axis, inplace, limit])

Synonym for DataFrame.fillna() or Series.fillna() with method=`ffill`.

pct_change([periods])

Percentage change between the current and a prior element.

pipe(func, *args, **kwargs)

Apply func(self, *args, **kwargs).

pop(item)

Return item and drop from series.

pow(other)

Return Exponential power of series of series and other, element-wise (binary operator **).

prod([axis, numeric_only, min_count])

Return the product of the values.

product([axis, numeric_only, min_count])

Return the product of the values.

quantile([q, accuracy])

Return value at the given quantile.

radd(other)

Return Reverse Addition of series and other, element-wise (binary operator +).

rank([method, ascending])

Compute numerical data ranks (1 through n) along axis.

rdiv(other)

Return Reverse Floating division of series and other, element-wise (binary operator /).

rdivmod(other)

Return Integer division and modulo of series and other, element-wise (binary operator rdivmod).

reindex([index, fill_value])

Conform Series to new index with optional filling logic, placing NA/NaN in locations having no value in the previous index.

reindex_like(other)

Return a Series with matching indices as other object.

rename([index])

Alter Series name.

rename_axis([mapper, index, inplace])

Set the name of the axis for the index or columns.

repeat(repeats)

Repeat elements of a Series.

replace([to_replace, value, regex])

Replace values given in to_replace with value.

reset_index([level, drop, name, inplace])

Generate a new DataFrame or Series with the index reset.

rfloordiv(other)

Return Reverse Integer division of series and other, element-wise (binary operator //).

rmod(other)

Return Reverse Modulo of series and other, element-wise (binary operator %).

rmul(other)

Return Reverse Multiplication of series and other, element-wise (binary operator *).

rolling(window[, min_periods])

Provide rolling transformations.

round([decimals])

Round each value in a Series to the given number of decimals.

rpow(other)

Return Reverse Exponential power of series and other, element-wise (binary operator **).

rsub(other)

Return Reverse Subtraction of series and other, element-wise (binary operator -).

rtruediv(other)

Return Reverse Floating division of series and other, element-wise (binary operator /).

sample([n, frac, replace, random_state])

Return a random sample of items from an axis of object.

sem([axis, ddof, numeric_only])

Return unbiased standard error of the mean over requested axis.

shift([periods, fill_value])

Shift Series/Index by desired number of periods.

skew([axis, numeric_only])

Return unbiased skew normalized by N-1.

sort_index([axis, level, ascending, …])

Sort object by labels (along an axis)

sort_values([ascending, inplace, na_position])

Sort by the values.

squeeze([axis])

Squeeze 1 dimensional axis objects into scalars.

std([axis, ddof, numeric_only])

Return sample standard deviation.

sub(other)

Return Subtraction of series and other, element-wise (binary operator -).

subtract(other)

Return Subtraction of series and other, element-wise (binary operator -).

sum([axis, numeric_only, min_count])

Return the sum of the values.

swapaxes(i, j[, copy])

Interchange axes and swap values axes appropriately.

swaplevel([i, j, copy])

Swap levels i and j in a MultiIndex.

tail([n])

Return the last n rows.

take(indices)

Return the elements in the given positional indices along an axis.

to_clipboard([excel, sep])

Copy object to the system clipboard.

to_csv([path, sep, na_rep, columns, header, …])

Write object to a comma-separated values (csv) file.

to_dataframe([name])

Convert Series to DataFrame.

to_dict([into])

Convert Series to {label -> value} dict or dict-like object.

to_excel(excel_writer[, sheet_name, na_rep, …])

Write object to an Excel sheet.

to_frame([name])

Convert Series to DataFrame.

to_json([path, compression, num_files, …])

Convert the object to a JSON string.

to_latex([buf, columns, col_space, header, …])

Render an object to a LaTeX tabular environment table.

to_list()

Return a list of the values.

to_markdown([buf, mode])

Print Series or DataFrame in Markdown-friendly format.

to_numpy()

A NumPy ndarray representing the values in this DataFrame or Series.

to_pandas()

Return a pandas Series.

to_string([buf, na_rep, float_format, …])

Render a string representation of the Series.

tolist()

Return a list of the values.

transform(func[, axis])

Call func producing the same type as self with transformed values and that has the same axis length as input.

transpose(*args, **kwargs)

Return the transpose, which is by definition self.

truediv(other)

Return Floating division of series and other, element-wise (binary operator /).

truncate([before, after, axis, copy])

Truncate a Series or DataFrame before and after some index value.

unique()

Return unique values of Series object.

unstack([level])

Unstack, a.k.a.

update(other)

Modify Series in place using non-NA values from passed Series.

value_counts([normalize, sort, ascending, …])

Return a Series containing counts of unique values.

var([axis, ddof, numeric_only])

Return unbiased variance.

where(cond[, other])

Replace values where the condition is False.

xs(key[, level])

Return cross-section from the Series.

Attributes

T

Return the transpose, which is by definition self.

at

Access a single value for a row/column label pair.

axes

Return a list of the row axis labels.

dtype

Return the dtype object of the underlying data.

dtypes

Return the dtype object of the underlying data.

empty

Returns true if the current object is empty.

hasnans

Return True if it has any missing values.

iat

Access a single value for a row/column pair by integer position.

iloc

Purely integer-location based indexing for selection by position.

index

The index (axis labels) Column of the Series.

is_monotonic

Return boolean if values in the object are monotonically increasing.

is_monotonic_decreasing

Return boolean if values in the object are monotonically decreasing.

is_monotonic_increasing

Return boolean if values in the object are monotonically increasing.

is_unique

Return boolean if values in the object are unique

loc

Access a group of rows and columns by label(s) or a boolean Series.

name

Return name of the Series.

ndim

Return an int representing the number of array dimensions.

shape

Return a tuple of the shape of the underlying data.

size

Return an int representing the number of elements in this object.

values

Return a Numpy representation of the DataFrame or the Series.