Supported pandas API¶

The following table shows the pandas APIs that implemented or non-implemented from pandas API on Spark. Some pandas API do not implement full parameters, so the third column shows missing parameters for each API.

‘Y’ in the second column means it’s implemented including its whole parameter.
‘N’ means it’s not implemented yet.
‘P’ means it’s partially implemented with the missing of some parameters.

All API in the list below computes the data with distributed execution except the ones that require the local execution by design. For example, DataFrame.to_numpy() requires to collect the data to the driver side.

If there is non-implemented pandas API or parameter you want, you can create an Apache Spark JIRA to request or to contribute by your own.

The API list is updated based on the pandas 2.0.0 pre-release.

CategoricalIndex API¶

API	Implemented	Missing parameters
`add_categories()`	Y
`all()`	Y
`any()`	Y
`append()`	Y
`argmax()`	P	`axis` , `skipna`
`argmin()`	P	`axis` , `skipna`
argsort	N
`as_ordered()`	Y
`as_unordered()`	Y
`asof()`	Y
asof_locs	N
`astype()`	P	`copy`
`copy()`	P	`dtype` , `names`
`delete()`	Y
`difference()`	Y
`drop()`	P	`errors`
`drop_duplicates()`	Y
`droplevel()`	Y
`dropna()`	Y
duplicated	N
`equals()`	Y
`factorize()`	P	`use_na_sentinel`
`fillna()`	P	`downcast`
format	N
get_indexer	N
get_indexer_for	N
get_indexer_non_unique	N
`get_level_values()`	Y
get_loc	N
get_slice_bound	N
get_value	N
groupby	N
`holds_integer()`	Y
`identical()`	Y
`insert()`	Y
`intersection()`	P	`sort`
is_	N
`is_boolean()`	Y
`is_categorical()`	Y
is_dtype_equal	N
`is_floating()`	Y
`is_integer()`	Y
`is_interval()`	Y
is_mixed	N
`is_numeric()`	Y
`is_object()`	Y
`is_type_compatible()`	Y
`isin()`	P	`level`
`isna()`	Y
`isnull()`	Y
`item()`	Y
join	N
`map()`	Y
`max()`	Y
memory_usage	N
`min()`	Y
`notna()`	Y
`notnull()`	Y
`nunique()`	Y
putmask	N
ravel	N
reindex	N
`remove_categories()`	Y
`remove_unused_categories()`	Y
`rename()`	Y
`rename_categories()`	Y
`reorder_categories()`	Y
`repeat()`	P	`axis`
searchsorted	N
`set_categories()`	Y
`set_names()`	Y
set_value	N
`shift()`	P	`freq`
slice_indexer	N
slice_locs	N
`sort()`	Y
`sort_values()`	P	`key` , `na_position`
sortlevel	N
`symmetric_difference()`	Y
`take()`	P	`allow_fill` , `axis` , `fill_value`
take_nd	N
to_flat_index	N
`to_frame()`	Y
`to_list()`	Y
to_native_types	N
`to_numpy()`	P	`na_value`
`to_series()`	P	`index`
`tolist()`	Y
`transpose()`	Y
`union()`	Y
`unique()`	Y
`value_counts()`	Y
`view()`	Y
where	N

DataFrame API¶

API	Implemented	Missing parameters
`abs()`	Y
`add()`	P	`axis` , `fill_value` , `level`
`add_prefix()`	Y
`add_suffix()`	Y
`agg()`	P	`axis`
`aggregate()`	P	`axis`
`align()`	P	`broadcast_axis` , `fill_axis` , `fill_value` , `level` , `limit` and more. See the pandas.DataFrame.align and pyspark.pandas.DataFrame.align for detail.
`all()`	P	`level`
`any()`	P	`level` , `skipna`
`append()`	Y
`apply()`	P	`raw` , `result_type`
`applymap()`	P	`na_action`
asfreq	N
asof	N
`assign()`	Y
`astype()`	P	`copy` , `errors`
`at_time()`	Y
`backfill()`	P	`downcast`
`between_time()`	P	`inclusive`
`bfill()`	P	`downcast`
`bool()`	Y
`boxplot()`	P	`ax` , `backend` , `by` , `column` , `figsize` and more. See the pandas.DataFrame.boxplot and pyspark.pandas.DataFrame.boxplot for detail.
`clip()`	P	`axis` , `inplace`
combine	N
`combine_first()`	Y
compare	N
convert_dtypes	N
`copy()`	Y
`corr()`	P	`numeric_only`
`corrwith()`	P	`numeric_only`
`count()`	P	`level`
`cov()`	P	`numeric_only`
`cummax()`	P	`axis`
`cummin()`	P	`axis`
`cumprod()`	P	`axis`
`cumsum()`	P	`axis`
`describe()`	P	`datetime_is_numeric` , `exclude` , `include`
`diff()`	Y
`div()`	P	`axis` , `fill_value` , `level`
`divide()`	P	`axis` , `fill_value` , `level`
`dot()`	Y
`drop()`	P	`errors` , `inplace` , `level`
`drop_duplicates()`	Y
`droplevel()`	Y
`dropna()`	Y
`duplicated()`	Y
`eq()`	P	`axis` , `level`
`equals()`	Y
`eval()`	Y
`ewm()`	P	`adjust` , `axis` , `method` , `times`
`expanding()`	P	`axis` , `center` , `method`
`explode()`	Y
`ffill()`	P	`downcast`
`fillna()`	P	`downcast`
`filter()`	Y
`first()`	Y
`first_valid_index()`	Y
`floordiv()`	P	`axis` , `fill_value` , `level`
`ge()`	P	`axis` , `level`
`get()`	Y
`groupby()`	P	`group_keys` , `level` , `observed` , `sort` , `squeeze`
`gt()`	P	`axis` , `level`
`head()`	Y
`hist()`	P	`ax` , `backend` , `by` , `column` , `data` and more. See the pandas.DataFrame.hist and pyspark.pandas.DataFrame.hist for detail.
`idxmax()`	P	`numeric_only` , `skipna`
`idxmin()`	P	`numeric_only` , `skipna`
infer_objects	N
`info()`	P	`memory_usage` , `show_counts`
`insert()`	Y
`interpolate()`	P	`axis` , `downcast` , `inplace`
isetitem	N
`isin()`	Y
`isna()`	Y
`isnull()`	Y
`items()`	Y
`iteritems()`	Y
`iterrows()`	Y
`itertuples()`	Y
`join()`	P	`other` , `sort` , `validate`
`keys()`	Y
`kurt()`	P	`level`
`kurtosis()`	P	`level`
`last()`	Y
`last_valid_index()`	Y
`le()`	P	`axis` , `level`
lookup	N
`lt()`	P	`axis` , `level`
`mad()`	P	`level` , `skipna`
`mask()`	P	`axis` , `errors` , `inplace` , `level` , `try_cast`
`max()`	P	`level`
`mean()`	P	`level`
`median()`	P	`level`
`melt()`	P	`col_level` , `ignore_index`
memory_usage	N
`merge()`	P	`copy` , `indicator` , `sort` , `validate`
`min()`	P	`level`
`mod()`	P	`axis` , `fill_value` , `level`
`mode()`	Y
`mul()`	P	`axis` , `fill_value` , `level`
`multiply()`	P	`axis` , `fill_value` , `level`
`ne()`	P	`axis` , `level`
`nlargest()`	Y
`notna()`	Y
`notnull()`	Y
`nsmallest()`	Y
`nunique()`	Y
`pad()`	P	`downcast`
`pct_change()`	P	`fill_method` , `freq` , `limit`
`pipe()`	Y
`pivot()`	Y
`pivot_table()`	P	`dropna` , `margins` , `margins_name` , `observed` , `sort`
`pop()`	Y
`pow()`	P	`axis` , `fill_value` , `level`
`prod()`	P	`level`
`product()`	P	`level`
`quantile()`	P	`interpolation` , `method`
`query()`	Y
`radd()`	P	`axis` , `fill_value` , `level`
`rank()`	P	`axis` , `na_option` , `pct`
`rdiv()`	P	`axis` , `fill_value` , `level`
`reindex()`	P	`level` , `limit` , `method` , `tolerance`
`reindex_like()`	P	`limit` , `method` , `tolerance`
`rename()`	P	`copy`
`rename_axis()`	Y
reorder_levels	N
`replace()`	Y
`resample()`	P	`axis` , `base` , `convention` , `group_keys` , `kind` and more. See the pandas.DataFrame.resample and pyspark.pandas.DataFrame.resample for detail.
`reset_index()`	P	`allow_duplicates` , `names`
`rfloordiv()`	P	`axis` , `fill_value` , `level`
`rmod()`	P	`axis` , `fill_value` , `level`
`rmul()`	P	`axis` , `fill_value` , `level`
`rolling()`	P	`axis` , `center` , `closed` , `method` , `on` and more. See the pandas.DataFrame.rolling and pyspark.pandas.DataFrame.rolling for detail.
`round()`	Y
`rpow()`	P	`axis` , `fill_value` , `level`
`rsub()`	P	`axis` , `fill_value` , `level`
`rtruediv()`	P	`axis` , `fill_value` , `level`
`sample()`	P	`axis` , `weights`
`select_dtypes()`	Y
`sem()`	P	`level`
set_axis	N
set_flags	N
`set_index()`	P	`verify_integrity`
`shift()`	P	`axis` , `freq`
`skew()`	P	`level`
slice_shift	N
`sort_index()`	P	`key` , `sort_remaining`
`sort_values()`	P	`axis` , `key` , `kind`
`squeeze()`	Y
`stack()`	P	`dropna` , `level`
`std()`	P	`level`
`sub()`	P	`axis` , `fill_value` , `level`
`subtract()`	P	`axis` , `fill_value` , `level`
`sum()`	P	`level`
`swapaxes()`	P	`axis1` , `axis2`
`swaplevel()`	Y
`tail()`	Y
`take()`	P	`is_copy`
`to_clipboard()`	Y
`to_csv()`	P	`chunksize` , `compression` , `decimal` , `doublequote` , `encoding` and more. See the pandas.DataFrame.to_csv and pyspark.pandas.DataFrame.to_csv for detail.
`to_dict()`	Y
`to_excel()`	P	`storage_options`
to_feather	N
to_gbq	N
to_hdf	N
`to_html()`	P	`encoding`
`to_json()`	P	`date_format` , `date_unit` , `default_handler` , `double_precision` , `force_ascii` and more. See the pandas.DataFrame.to_json and pyspark.pandas.DataFrame.to_json for detail.
`to_latex()`	P	`caption` , `label` , `position`
`to_markdown()`	P	`index` , `storage_options`
`to_numpy()`	P	`copy` , `dtype` , `na_value`
`to_orc()`	P	`engine` , `engine_kwargs` , `index`
`to_parquet()`	P	`engine` , `index` , `storage_options`
to_period	N
to_pickle	N
`to_records()`	Y
to_sql	N
to_stata	N
`to_string()`	P	`encoding` , `max_colwidth` , `min_rows`
to_timestamp	N
to_xarray	N
to_xml	N
`transform()`	Y
`transpose()`	P	`copy`
`truediv()`	P	`axis` , `fill_value` , `level`
`truncate()`	Y
tshift	N
tz_convert	N
tz_localize	N
`unstack()`	P	`fill_value` , `level`
`update()`	P	`errors` , `filter_func`
value_counts	N
`var()`	P	`level` , `skipna`
`where()`	P	`errors` , `inplace` , `level` , `try_cast`
`xs()`	P	`drop_level`

DatetimeIndex API¶

API	Implemented	Missing parameters
`all()`	Y
`any()`	Y
`append()`	Y
`argmax()`	P	`axis` , `skipna`
`argmin()`	P	`axis` , `skipna`
argsort	N
`asof()`	Y
asof_locs	N
`astype()`	P	`copy`
`ceil()`	Y
`copy()`	P	`dtype` , `names`
`day_name()`	Y
`delete()`	Y
`difference()`	Y
`drop()`	P	`errors`
`drop_duplicates()`	Y
`droplevel()`	Y
`dropna()`	Y
duplicated	N
`equals()`	Y
`factorize()`	P	`use_na_sentinel`
`fillna()`	P	`downcast`
`floor()`	Y
format	N
get_indexer	N
get_indexer_for	N
get_indexer_non_unique	N
`get_level_values()`	Y
get_loc	N
get_slice_bound	N
get_value	N
groupby	N
`holds_integer()`	Y
`identical()`	Y
`indexer_at_time()`	Y
`indexer_between_time()`	Y
`insert()`	Y
`intersection()`	P	`sort`
is_	N
`is_boolean()`	Y
`is_categorical()`	Y
`is_floating()`	Y
`is_integer()`	Y
`is_interval()`	Y
is_mixed	N
`is_numeric()`	Y
`is_object()`	Y
`is_type_compatible()`	Y
`isin()`	P	`level`
`isna()`	Y
`isnull()`	Y
isocalendar	N
`item()`	Y
join	N
`map()`	Y
`max()`	P	`axis` , `skipna`
mean	N
memory_usage	N
`min()`	P	`axis` , `skipna`
`month_name()`	Y
`normalize()`	Y
`notna()`	Y
`notnull()`	Y
`nunique()`	Y
putmask	N
ravel	N
reindex	N
`rename()`	Y
`repeat()`	P	`axis`
`round()`	Y
searchsorted	N
`set_names()`	Y
set_value	N
`shift()`	P	`freq`
slice_indexer	N
slice_locs	N
snap	N
`sort()`	Y
`sort_values()`	P	`key` , `na_position`
sortlevel	N
std	N
`strftime()`	Y
`symmetric_difference()`	Y
`take()`	P	`allow_fill` , `axis` , `fill_value`
to_flat_index	N
`to_frame()`	Y
to_julian_date	N
`to_list()`	Y
to_native_types	N
`to_numpy()`	P	`na_value`
to_period	N
to_perioddelta	N
to_pydatetime	N
`to_series()`	P	`index` , `keep_tz`
`tolist()`	Y
`transpose()`	Y
tz_convert	N
tz_localize	N
`union()`	Y
union_many	N
`unique()`	Y
`value_counts()`	Y
`view()`	Y
where	N

Float64Index API¶

API	Implemented	Missing parameters
`all()`	Y
`any()`	Y
`append()`	Y
`argmax()`	P	`axis` , `skipna`
`argmin()`	P	`axis` , `skipna`
argsort	N
`asof()`	Y
asof_locs	N
`astype()`	P	`copy`
`copy()`	P	`dtype` , `names`
`delete()`	Y
`difference()`	Y
`drop()`	P	`errors`
`drop_duplicates()`	Y
`droplevel()`	Y
`dropna()`	Y
duplicated	N
`equals()`	Y
`factorize()`	P	`use_na_sentinel`
`fillna()`	P	`downcast`
format	N
get_indexer	N
get_indexer_for	N
get_indexer_non_unique	N
`get_level_values()`	Y
get_loc	N
get_slice_bound	N
get_value	N
groupby	N
`holds_integer()`	Y
`identical()`	Y
`insert()`	Y
`intersection()`	P	`sort`
is_	N
`is_boolean()`	Y
`is_categorical()`	Y
`is_floating()`	Y
`is_integer()`	Y
`is_interval()`	Y
is_mixed	N
`is_numeric()`	Y
`is_object()`	Y
`is_type_compatible()`	Y
`isin()`	P	`level`
`isna()`	Y
`isnull()`	Y
`item()`	Y
join	N
`map()`	Y
`max()`	P	`axis` , `skipna`
memory_usage	N
`min()`	P	`axis` , `skipna`
`notna()`	Y
`notnull()`	Y
`nunique()`	Y
putmask	N
ravel	N
reindex	N
`rename()`	Y
`repeat()`	P	`axis`
searchsorted	N
`set_names()`	Y
set_value	N
`shift()`	P	`freq`
slice_indexer	N
slice_locs	N
`sort()`	Y
`sort_values()`	P	`key` , `na_position`
sortlevel	N
`symmetric_difference()`	Y
`take()`	P	`allow_fill` , `axis` , `fill_value`
to_flat_index	N
`to_frame()`	Y
`to_list()`	Y
to_native_types	N
`to_numpy()`	P	`na_value`
`to_series()`	P	`index`
`tolist()`	Y
`transpose()`	Y
`union()`	Y
`unique()`	Y
`value_counts()`	Y
`view()`	Y
where	N

Index API¶

API	Implemented	Missing parameters
`all()`	Y
`any()`	Y
`append()`	Y
`argmax()`	P	`axis` , `skipna`
`argmin()`	P	`axis` , `skipna`
argsort	N
`asof()`	Y
asof_locs	N
`astype()`	P	`copy`
`copy()`	P	`dtype` , `names`
`delete()`	Y
`difference()`	Y
`drop()`	P	`errors`
`drop_duplicates()`	Y
`droplevel()`	Y
`dropna()`	Y
duplicated	N
`equals()`	Y
`factorize()`	P	`use_na_sentinel`
`fillna()`	P	`downcast`
format	N
get_indexer	N
get_indexer_for	N
get_indexer_non_unique	N
`get_level_values()`	Y
get_loc	N
get_slice_bound	N
get_value	N
groupby	N
`holds_integer()`	Y
`identical()`	Y
`insert()`	Y
`intersection()`	P	`sort`
is_	N
`is_boolean()`	Y
`is_categorical()`	Y
`is_floating()`	Y
`is_integer()`	Y
`is_interval()`	Y
is_mixed	N
`is_numeric()`	Y
`is_object()`	Y
`is_type_compatible()`	Y
`isin()`	P	`level`
`isna()`	Y
`isnull()`	Y
`item()`	Y
join	N
`map()`	Y
`max()`	P	`axis` , `skipna`
memory_usage	N
`min()`	P	`axis` , `skipna`
`notna()`	Y
`notnull()`	Y
`nunique()`	Y
putmask	N
ravel	N
reindex	N
`rename()`	Y
`repeat()`	P	`axis`
searchsorted	N
`set_names()`	Y
set_value	N
`shift()`	P	`freq`
slice_indexer	N
slice_locs	N
`sort()`	Y
`sort_values()`	P	`key` , `na_position`
sortlevel	N
`symmetric_difference()`	Y
`take()`	P	`allow_fill` , `axis` , `fill_value`
to_flat_index	N
`to_frame()`	Y
`to_list()`	Y
to_native_types	N
`to_numpy()`	P	`na_value`
`to_series()`	P	`index`
`tolist()`	Y
`transpose()`	Y
`union()`	Y
`unique()`	Y
`value_counts()`	Y
`view()`	Y
where	N

Int64Index API¶

API	Implemented	Missing parameters
`all()`	Y
`any()`	Y
`append()`	Y
`argmax()`	P	`axis` , `skipna`
`argmin()`	P	`axis` , `skipna`
argsort	N
`asof()`	Y
asof_locs	N
`astype()`	P	`copy`
`copy()`	P	`dtype` , `names`
`delete()`	Y
`difference()`	Y
`drop()`	P	`errors`
`drop_duplicates()`	Y
`droplevel()`	Y
`dropna()`	Y
duplicated	N
`equals()`	Y
`factorize()`	P	`use_na_sentinel`
`fillna()`	P	`downcast`
format	N
get_indexer	N
get_indexer_for	N
get_indexer_non_unique	N
`get_level_values()`	Y
get_loc	N
get_slice_bound	N
get_value	N
groupby	N
`holds_integer()`	Y
`identical()`	Y
`insert()`	Y
`intersection()`	P	`sort`
is_	N
`is_boolean()`	Y
`is_categorical()`	Y
`is_floating()`	Y
`is_integer()`	Y
`is_interval()`	Y
is_mixed	N
`is_numeric()`	Y
`is_object()`	Y
`is_type_compatible()`	Y
`isin()`	P	`level`
`isna()`	Y
`isnull()`	Y
`item()`	Y
join	N
`map()`	Y
`max()`	P	`axis` , `skipna`
memory_usage	N
`min()`	P	`axis` , `skipna`
`notna()`	Y
`notnull()`	Y
`nunique()`	Y
putmask	N
ravel	N
reindex	N
`rename()`	Y
`repeat()`	P	`axis`
searchsorted	N
`set_names()`	Y
set_value	N
`shift()`	P	`freq`
slice_indexer	N
slice_locs	N
`sort()`	Y
`sort_values()`	P	`key` , `na_position`
sortlevel	N
`symmetric_difference()`	Y
`take()`	P	`allow_fill` , `axis` , `fill_value`
to_flat_index	N
`to_frame()`	Y
`to_list()`	Y
to_native_types	N
`to_numpy()`	P	`na_value`
`to_series()`	P	`index`
`tolist()`	Y
`transpose()`	Y
`union()`	Y
`unique()`	Y
`value_counts()`	Y
`view()`	Y
where	N

MultiIndex API¶

API	Implemented	Missing parameters
`all()`	Y
`any()`	Y
`append()`	Y
`argmax()`	P	`axis` , `skipna`
`argmin()`	P	`axis` , `skipna`
argsort	N
`asof()`	Y
asof_locs	N
`astype()`	P	`copy`
`copy()`	P	`codes` , `dtype` , `levels` , `name` , `names`
`delete()`	Y
`difference()`	Y
`drop()`	P	`errors`
`drop_duplicates()`	Y
`droplevel()`	Y
`dropna()`	Y
duplicated	N
`equal_levels()`	Y
`equals()`	Y
`factorize()`	P	`use_na_sentinel`
`fillna()`	P	`downcast`
format	N
get_indexer	N
get_indexer_for	N
get_indexer_non_unique	N
`get_level_values()`	Y
get_loc	N
get_loc_level	N
get_locs	N
get_slice_bound	N
get_value	N
groupby	N
`holds_integer()`	Y
`identical()`	Y
`insert()`	Y
`intersection()`	P	`sort`
is_	N
`is_boolean()`	Y
`is_categorical()`	Y
`is_floating()`	Y
`is_integer()`	Y
`is_interval()`	Y
is_lexsorted	N
is_mixed	N
`is_numeric()`	Y
`is_object()`	Y
`is_type_compatible()`	Y
`isin()`	P	`level`
`isna()`	Y
`isnull()`	Y
`item()`	Y
join	N
`map()`	Y
`max()`	P	`axis` , `skipna`
memory_usage	N
`min()`	P	`axis` , `skipna`
`notna()`	Y
`notnull()`	Y
`nunique()`	Y
putmask	N
ravel	N
reindex	N
remove_unused_levels	N
`rename()`	P	`level` , `names`
reorder_levels	N
`repeat()`	P	`axis`
searchsorted	N
set_codes	N
set_levels	N
`set_names()`	Y
set_value	N
`shift()`	P	`freq`
slice_indexer	N
slice_locs	N
`sort()`	Y
`sort_values()`	P	`key` , `na_position`
sortlevel	N
`swaplevel()`	Y
`symmetric_difference()`	Y
`take()`	P	`allow_fill` , `axis` , `fill_value`
to_flat_index	N
`to_frame()`	P	`allow_duplicates`
`to_list()`	Y
to_native_types	N
`to_numpy()`	P	`na_value`
`to_series()`	P	`index`
`tolist()`	Y
`transpose()`	Y
truncate	N
`union()`	Y
`unique()`	Y
`value_counts()`	Y
`view()`	Y
where	N

Series API¶

API	Implemented	Missing parameters
`abs()`	Y
`add()`	P	`axis` , `fill_value` , `level`
`add_prefix()`	Y
`add_suffix()`	Y
`agg()`	P	`axis`
`aggregate()`	P	`axis`
`align()`	P	`broadcast_axis` , `fill_axis` , `fill_value` , `level` , `limit` and more. See the pandas.Series.align and pyspark.pandas.Series.align for detail.
`all()`	P	`bool_only` , `level`
`any()`	P	`bool_only` , `level` , `skipna`
`append()`	Y
`apply()`	P	`convert_dtype`
`argmax()`	Y
`argmin()`	Y
`argsort()`	P	`axis` , `kind` , `order`
asfreq	N
`asof()`	P	`subset`
`astype()`	P	`copy` , `errors`
`at_time()`	Y
`autocorr()`	Y
`backfill()`	P	`downcast`
`between()`	Y
`between_time()`	P	`inclusive`
`bfill()`	P	`downcast`
`bool()`	Y
`clip()`	P	`axis`
combine	N
`combine_first()`	Y
`compare()`	P	`align_axis` , `result_names`
convert_dtypes	N
`copy()`	Y
`corr()`	Y
`count()`	P	`level`
`cov()`	Y
`cummax()`	P	`axis`
`cummin()`	P	`axis`
`cumprod()`	P	`axis`
`cumsum()`	P	`axis`
`describe()`	P	`datetime_is_numeric` , `exclude` , `include`
`diff()`	Y
`div()`	P	`axis` , `fill_value` , `level`
`divide()`	P	`axis` , `fill_value` , `level`
`divmod()`	P	`axis` , `fill_value` , `level`
`dot()`	Y
`drop()`	P	`axis` , `errors`
`drop_duplicates()`	Y
`droplevel()`	P	`axis`
`dropna()`	P	`how`
`duplicated()`	Y
`eq()`	P	`axis` , `fill_value` , `level`
`equals()`	Y
`ewm()`	P	`adjust` , `axis` , `method` , `times`
`expanding()`	P	`axis` , `center` , `method`
`explode()`	P	`ignore_index`
`factorize()`	P	`use_na_sentinel`
`ffill()`	P	`downcast`
`fillna()`	P	`downcast`
`filter()`	Y
`first()`	Y
`first_valid_index()`	Y
`floordiv()`	P	`axis` , `fill_value` , `level`
`ge()`	P	`axis` , `fill_value` , `level`
`get()`	Y
`groupby()`	P	`group_keys` , `level` , `observed` , `sort` , `squeeze`
`gt()`	P	`axis` , `fill_value` , `level`
`head()`	Y
`hist()`	P	`ax` , `backend` , `by` , `figsize` , `grid` and more. See the pandas.Series.hist and pyspark.pandas.Series.hist for detail.
`idxmax()`	P	`axis`
`idxmin()`	P	`axis`
infer_objects	N
info	N
`interpolate()`	P	`axis` , `downcast` , `inplace`
`isin()`	Y
`isna()`	Y
`isnull()`	Y
`item()`	Y
`items()`	Y
`iteritems()`	Y
`keys()`	Y
`kurt()`	P	`level`
`kurtosis()`	P	`level`
`last()`	Y
`last_valid_index()`	Y
`le()`	P	`axis` , `fill_value` , `level`
`lt()`	P	`axis` , `fill_value` , `level`
`mad()`	P	`axis` , `level` , `skipna`
`map()`	Y
`mask()`	P	`axis` , `errors` , `inplace` , `level` , `try_cast`
`max()`	P	`level`
`mean()`	P	`level`
`median()`	P	`level`
memory_usage	N
`min()`	P	`level`
`mod()`	P	`axis` , `fill_value` , `level`
`mode()`	Y
`mul()`	P	`axis` , `fill_value` , `level`
`multiply()`	P	`axis` , `fill_value` , `level`
`ne()`	P	`axis` , `fill_value` , `level`
`nlargest()`	P	`keep`
`notna()`	Y
`notnull()`	Y
`nsmallest()`	P	`keep`
`nunique()`	Y
`pad()`	P	`downcast`
`pct_change()`	P	`fill_method` , `freq` , `limit`
`pipe()`	Y
`pop()`	Y
`pow()`	P	`axis` , `fill_value` , `level`
`prod()`	P	`level`
`product()`	P	`level`
`quantile()`	P	`interpolation`
`radd()`	P	`axis` , `fill_value` , `level`
`rank()`	P	`axis` , `na_option` , `pct`
ravel	N
`rdiv()`	P	`axis` , `fill_value` , `level`
`rdivmod()`	P	`axis` , `fill_value` , `level`
`reindex()`	Y
`reindex_like()`	P	`copy` , `limit` , `method` , `tolerance`
`rename()`	P	`axis` , `copy` , `errors` , `inplace` , `level`
`rename_axis()`	Y
reorder_levels	N
`repeat()`	P	`axis`
`replace()`	P	`inplace` , `limit` , `method`
`resample()`	P	`axis` , `base` , `convention` , `group_keys` , `kind` and more. See the pandas.Series.resample and pyspark.pandas.Series.resample for detail.
`reset_index()`	P	`allow_duplicates`
`rfloordiv()`	P	`axis` , `fill_value` , `level`
`rmod()`	P	`axis` , `fill_value` , `level`
`rmul()`	P	`axis` , `fill_value` , `level`
`rolling()`	P	`axis` , `center` , `closed` , `method` , `on` and more. See the pandas.Series.rolling and pyspark.pandas.Series.rolling for detail.
`round()`	Y
`rpow()`	P	`axis` , `fill_value` , `level`
`rsub()`	P	`axis` , `fill_value` , `level`
`rtruediv()`	P	`axis` , `fill_value` , `level`
`sample()`	P	`axis` , `weights`
`searchsorted()`	P	`sorter`
`sem()`	P	`level`
set_axis	N
set_flags	N
`shift()`	P	`axis` , `freq`
`skew()`	P	`level`
slice_shift	N
`sort_index()`	P	`key` , `sort_remaining`
`sort_values()`	P	`axis` , `key` , `kind`
`squeeze()`	Y
`std()`	P	`level`
`sub()`	P	`axis` , `fill_value` , `level`
`subtract()`	P	`axis` , `fill_value` , `level`
`sum()`	P	`level`
`swapaxes()`	P	`axis1` , `axis2`
`swaplevel()`	Y
`tail()`	Y
`take()`	P	`axis` , `is_copy`
`to_clipboard()`	Y
`to_csv()`	P	`chunksize` , `compression` , `decimal` , `doublequote` , `encoding` and more. See the pandas.Series.to_csv and pyspark.pandas.Series.to_csv for detail.
`to_dict()`	Y
`to_excel()`	P	`storage_options`
`to_frame()`	Y
to_hdf	N
`to_json()`	P	`date_format` , `date_unit` , `default_handler` , `double_precision` , `force_ascii` and more. See the pandas.Series.to_json and pyspark.pandas.Series.to_json for detail.
`to_latex()`	P	`caption` , `label` , `position`
`to_list()`	Y
`to_markdown()`	P	`index` , `storage_options`
`to_numpy()`	P	`copy` , `dtype` , `na_value`
to_period	N
to_pickle	N
to_sql	N
`to_string()`	P	`min_rows`
to_timestamp	N
to_xarray	N
`tolist()`	Y
`transform()`	Y
`transpose()`	Y
`truediv()`	P	`axis` , `fill_value` , `level`
`truncate()`	Y
tshift	N
tz_convert	N
tz_localize	N
`unique()`	Y
`unstack()`	P	`fill_value`
`update()`	Y
`value_counts()`	Y
`var()`	P	`level` , `skipna`
view	N
`where()`	P	`axis` , `errors` , `inplace` , `level` , `try_cast`
`xs()`	P	`axis` , `drop_level`

TimedeltaIndex API¶

API	Implemented	Missing parameters
`all()`	Y
`any()`	Y
`append()`	Y
`argmax()`	P	`axis` , `skipna`
`argmin()`	P	`axis` , `skipna`
argsort	N
`asof()`	Y
asof_locs	N
`astype()`	P	`copy`
ceil	N
`copy()`	P	`dtype` , `names`
`delete()`	Y
`difference()`	Y
`drop()`	P	`errors`
`drop_duplicates()`	Y
`droplevel()`	Y
`dropna()`	Y
duplicated	N
`equals()`	Y
`factorize()`	P	`use_na_sentinel`
`fillna()`	P	`downcast`
floor	N
format	N
get_indexer	N
get_indexer_for	N
get_indexer_non_unique	N
`get_level_values()`	Y
get_loc	N
get_slice_bound	N
get_value	N
groupby	N
`holds_integer()`	Y
`identical()`	Y
`insert()`	Y
`intersection()`	P	`sort`
is_	N
`is_boolean()`	Y
`is_categorical()`	Y
`is_floating()`	Y
`is_integer()`	Y
`is_interval()`	Y
is_mixed	N
`is_numeric()`	Y
`is_object()`	Y
`is_type_compatible()`	Y
`isin()`	P	`level`
`isna()`	Y
`isnull()`	Y
`item()`	Y
join	N
`map()`	Y
`max()`	P	`axis` , `skipna`
mean	N
median	N
memory_usage	N
`min()`	P	`axis` , `skipna`
`notna()`	Y
`notnull()`	Y
`nunique()`	Y
putmask	N
ravel	N
reindex	N
`rename()`	Y
`repeat()`	P	`axis`
round	N
searchsorted	N
`set_names()`	Y
set_value	N
`shift()`	P	`freq`
slice_indexer	N
slice_locs	N
`sort()`	Y
`sort_values()`	P	`key` , `na_position`
sortlevel	N
std	N
sum	N
`symmetric_difference()`	Y
`take()`	P	`allow_fill` , `axis` , `fill_value`
to_flat_index	N
`to_frame()`	Y
`to_list()`	Y
to_native_types	N
`to_numpy()`	P	`na_value`
to_pytimedelta	N
`to_series()`	P	`index`
`tolist()`	Y
total_seconds	N
`transpose()`	Y
`union()`	Y
`unique()`	Y
`value_counts()`	Y
`view()`	Y
where	N

General Function API¶

API	Implemented	Missing parameters
array	N
bdate_range	N
`concat()`	P	`copy` , `keys` , `levels` , `names` , `verify_integrity`
crosstab	N
cut	N
`date_range()`	P	`inclusive`
eval	N
factorize	N
from_dummies	N
`get_dummies()`	Y
infer_freq	N
interval_range	N
`isna()`	Y
`isnull()`	Y
json_normalize	N
lreshape	N
`melt()`	P	`col_level` , `ignore_index`
`merge()`	P	`copy` , `indicator` , `left` , `sort` , `validate`
`merge_asof()`	Y
merge_ordered	N
`notna()`	Y
`notnull()`	Y
period_range	N
pivot	N
pivot_table	N
qcut	N
`read_clipboard()`	Y
`read_csv()`	P	`cache_dates` , `chunksize` , `compression` , `converters` , `date_parser` and more. See the pandas.read_csv and pyspark.pandas.read_csv for detail.
`read_excel()`	P	`decimal` , `na_filter` , `storage_options`
read_feather	N
read_fwf	N
read_gbq	N
read_hdf	N
`read_html()`	P	`extract_links`
`read_json()`	P	`chunksize` , `compression` , `convert_axes` , `convert_dates` , `date_unit` and more. See the pandas.read_json and pyspark.pandas.read_json for detail.
`read_orc()`	Y
`read_parquet()`	P	`engine` , `storage_options` , `use_nullable_dtypes`
read_pickle	N
read_sas	N
read_spss	N
`read_sql()`	P	`chunksize` , `coerce_float` , `params` , `parse_dates`
`read_sql_query()`	P	`chunksize` , `coerce_float` , `dtype` , `params` , `parse_dates`
`read_sql_table()`	P	`chunksize` , `coerce_float` , `parse_dates`
read_stata	N
`read_table()`	P	`cache_dates` , `chunksize` , `comment` , `compression` , `converters` and more. See the pandas.read_table and pyspark.pandas.read_table for detail.
read_xml	N
set_eng_float_format	N
show_versions	N
test	N
`timedelta_range()`	Y
`to_datetime()`	P	`cache` , `dayfirst` , `exact` , `utc` , `yearfirst`
`to_numeric()`	P	`downcast`
to_pickle	N
`to_timedelta()`	Y
unique	N
value_counts	N
wide_to_long	N

Expanding API¶

API	Implemented	Missing parameters
agg	N
aggregate	N
apply	N
corr	N
`count()`	P	`numeric_only`
cov	N
`kurt()`	P	`numeric_only`
`max()`	P	`engine` , `engine_kwargs` , `numeric_only`
`mean()`	P	`engine` , `engine_kwargs` , `numeric_only`
median	N
`min()`	P	`engine` , `engine_kwargs` , `numeric_only`
`quantile()`	P	`interpolation` , `numeric_only`
rank	N
sem	N
`skew()`	P	`numeric_only`
`std()`	P	`ddof` , `engine` , `engine_kwargs` , `numeric_only`
`sum()`	P	`engine` , `engine_kwargs` , `numeric_only`
validate	N
`var()`	P	`ddof` , `engine` , `engine_kwargs` , `numeric_only`

ExpandingGroupby API¶

API	Implemented	Missing parameters
agg	N
aggregate	N
apply	N
corr	N
`count()`	P	`numeric_only`
cov	N
`kurt()`	P	`numeric_only`
`max()`	P	`engine` , `engine_kwargs` , `numeric_only`
`mean()`	P	`engine` , `engine_kwargs` , `numeric_only`
median	N
`min()`	P	`engine` , `engine_kwargs` , `numeric_only`
`quantile()`	P	`interpolation` , `numeric_only`
rank	N
sem	N
`skew()`	P	`numeric_only`
`std()`	P	`ddof` , `engine` , `engine_kwargs` , `numeric_only`
`sum()`	P	`engine` , `engine_kwargs` , `numeric_only`
validate	N
`var()`	P	`ddof` , `engine` , `engine_kwargs` , `numeric_only`

Rolling API¶

API	Implemented	Missing parameters
agg	N
aggregate	N
apply	N
corr	N
`count()`	P	`numeric_only`
cov	N
`kurt()`	P	`numeric_only`
`max()`	P	`engine` , `engine_kwargs` , `numeric_only`
`mean()`	P	`engine` , `engine_kwargs` , `numeric_only`
median	N
`min()`	P	`engine` , `engine_kwargs` , `numeric_only`
`quantile()`	P	`interpolation` , `numeric_only`
rank	N
sem	N
`skew()`	P	`numeric_only`
`std()`	P	`ddof` , `engine` , `engine_kwargs` , `numeric_only`
`sum()`	P	`engine` , `engine_kwargs` , `numeric_only`
validate	N
`var()`	P	`ddof` , `engine` , `engine_kwargs` , `numeric_only`

RollingGroupby API¶

API	Implemented	Missing parameters
agg	N
aggregate	N
apply	N
corr	N
`count()`	P	`numeric_only`
cov	N
`kurt()`	P	`numeric_only`
`max()`	P	`engine` , `engine_kwargs` , `numeric_only`
`mean()`	P	`engine` , `engine_kwargs` , `numeric_only`
median	N
`min()`	P	`engine` , `engine_kwargs` , `numeric_only`
`quantile()`	P	`interpolation` , `numeric_only`
rank	N
sem	N
`skew()`	P	`numeric_only`
`std()`	P	`ddof` , `engine` , `engine_kwargs` , `numeric_only`
`sum()`	P	`engine` , `engine_kwargs` , `numeric_only`
validate	N
`var()`	P	`ddof` , `engine` , `engine_kwargs` , `numeric_only`

Window API¶

API	Implemented	Missing parameters
agg	N
aggregate	N
mean	N
std	N
sum	N
validate	N
var	N

DataFrameGroupBy API¶

API	Implemented	Missing parameters
`agg()`	P	`engine` , `engine_kwargs` , `func`
`aggregate()`	P	`engine` , `engine_kwargs` , `func`
`all()`	Y
`any()`	P	`skipna`
`apply()`	Y
`backfill()`	Y
`bfill()`	Y
boxplot	N
`count()`	Y
`cumcount()`	Y
`cummax()`	P	`axis` , `numeric_only`
`cummin()`	P	`axis` , `numeric_only`
`cumprod()`	P	`axis`
`cumsum()`	P	`axis`
`describe()`	Y
`diff()`	P	`axis`
`ewm()`	Y
`expanding()`	Y
`ffill()`	Y
`filter()`	P	`dropna`
`first()`	Y
`get_group()`	P	`obj`
`head()`	Y
`idxmax()`	P	`axis` , `numeric_only`
`idxmin()`	P	`axis` , `numeric_only`
`last()`	Y
`max()`	P	`engine` , `engine_kwargs`
`mean()`	P	`engine` , `engine_kwargs`
`median()`	Y
`min()`	P	`engine` , `engine_kwargs`
ngroup	N
`nunique()`	Y
ohlc	N
`pad()`	Y
pct_change	N
pipe	N
`prod()`	Y
`quantile()`	P	`interpolation` , `numeric_only`
`rank()`	P	`axis` , `na_option` , `pct`
resample	N
`rolling()`	Y
sample	N
`sem()`	P	`numeric_only`
`shift()`	P	`axis` , `freq`
`size()`	Y
`std()`	P	`engine` , `engine_kwargs` , `numeric_only`
`sum()`	P	`engine` , `engine_kwargs`
`tail()`	Y
`transform()`	P	`engine` , `engine_kwargs`
value_counts	N
`var()`	P	`engine` , `engine_kwargs` , `numeric_only`

GroupBy API¶

API	Implemented	Missing parameters
`agg()`	P	`func`
`aggregate()`	P	`func`
`all()`	Y
`any()`	P	`skipna`
`apply()`	Y
`backfill()`	Y
`bfill()`	Y
`count()`	Y
`cumcount()`	Y
`cummax()`	P	`axis` , `numeric_only`
`cummin()`	P	`axis` , `numeric_only`
`cumprod()`	P	`axis`
`cumsum()`	P	`axis`
describe	N
`diff()`	P	`axis`
`ewm()`	Y
`expanding()`	Y
`ffill()`	Y
`first()`	Y
`get_group()`	P	`obj`
`head()`	Y
`last()`	Y
`max()`	P	`engine` , `engine_kwargs`
`mean()`	P	`engine` , `engine_kwargs`
`median()`	Y
`min()`	P	`engine` , `engine_kwargs`
ngroup	N
ohlc	N
`pad()`	Y
pct_change	N
pipe	N
`prod()`	Y
`quantile()`	P	`interpolation` , `numeric_only`
`rank()`	P	`axis` , `na_option` , `pct`
resample	N
`rolling()`	Y
sample	N
`sem()`	P	`numeric_only`
`shift()`	P	`axis` , `freq`
`size()`	Y
`std()`	P	`engine` , `engine_kwargs` , `numeric_only`
`sum()`	P	`engine` , `engine_kwargs`
`tail()`	Y
`var()`	P	`engine` , `engine_kwargs` , `numeric_only`

SeriesGroupBy API¶

API	Implemented	Missing parameters
`agg()`	P	`engine` , `engine_kwargs` , `func`
`aggregate()`	P	`engine` , `engine_kwargs` , `func`
`all()`	Y
`any()`	P	`skipna`
`apply()`	Y
`backfill()`	Y
`bfill()`	Y
`count()`	Y
`cumcount()`	Y
`cummax()`	P	`axis` , `numeric_only`
`cummin()`	P	`axis` , `numeric_only`
`cumprod()`	P	`axis`
`cumsum()`	P	`axis`
describe	N
`diff()`	P	`axis`
`ewm()`	Y
`expanding()`	Y
`ffill()`	Y
`filter()`	P	`dropna`
`first()`	Y
`get_group()`	P	`obj`
`head()`	Y
`last()`	Y
`max()`	P	`engine` , `engine_kwargs`
`mean()`	P	`engine` , `engine_kwargs`
`median()`	Y
`min()`	P	`engine` , `engine_kwargs`
ngroup	N
`nlargest()`	P	`keep`
`nsmallest()`	P	`keep`
`nunique()`	Y
ohlc	N
`pad()`	Y
pct_change	N
pipe	N
`prod()`	Y
`quantile()`	P	`interpolation` , `numeric_only`
`rank()`	P	`axis` , `na_option` , `pct`
resample	N
`rolling()`	Y
sample	N
`sem()`	P	`numeric_only`
`shift()`	P	`axis` , `freq`
`size()`	Y
`std()`	P	`engine` , `engine_kwargs` , `numeric_only`
`sum()`	P	`engine` , `engine_kwargs`
`tail()`	Y
`transform()`	P	`engine` , `engine_kwargs`
`value_counts()`	P	`bins` , `normalize`
`var()`	P	`engine` , `engine_kwargs` , `numeric_only`

Best Practices FAQ