DataFrame.
duplicated
Return boolean Series denoting duplicate rows, optionally only considering certain columns.
Only consider certain columns for identifying duplicates, default use all of the columns
first : Mark duplicates as True except for the first occurrence.
first
True
last : Mark duplicates as True except for the last occurrence.
last
False : Mark all duplicates as True.
Examples
>>> df = ps.DataFrame({'a': [1, 1, 1, 3], 'b': [1, 1, 1, 4], 'c': [1, 1, 1, 5]}, ... columns = ['a', 'b', 'c']) >>> df a b c 0 1 1 1 1 1 1 1 2 1 1 1 3 3 4 5
>>> df.duplicated().sort_index() 0 False 1 True 2 True 3 False dtype: bool
Mark duplicates as True except for the last occurrence.
>>> df.duplicated(keep='last').sort_index() 0 True 1 True 2 False 3 False dtype: bool
Mark all duplicates as True.
>>> df.duplicated(keep=False).sort_index() 0 True 1 True 2 True 3 False dtype: bool