Drop duplicate rows based on certain columns If you want the returned dataframe to have a continuous index pass ignore_index=True to the drop_duplicates() function or reset the index of the returned dataframe. As a result, the dataframe returned does not have a continuous index. On applying the drop_duplicates() function, the first row is retained and the remaining duplicate rows are dropped. In the above example, you can see that the rows with index 1 and 2 have the same values for all the three columns. # create a sample dataframe with duplicate rows It then, drops the duplicate rows and just keeps their first occurrence. Drop duplicate rows based on all columnsīy default, the drop_duplicates() function identifies the duplicates taking all the columns into consideration. Let’s look at some of the use-cases of the drop_duplicates() function through examples – 1. To modify the dataframe in-place pass the argument inplace=True. You can change this behavior through the parameter keep which takes in 'first', 'last', or False. It drops the duplicates except for the first occurrence by default. It returns a dataframe with the duplicate rows removed. The following is its syntax: df.drop_duplicates() It also gives you the flexibility to identify duplicates based on certain columns through the subset parameter. The pandas dataframe drop_duplicates() function can be used to remove duplicate rows from a dataframe. In this tutorial, we’ll look at how to drop duplicates from a pandas dataframe through some examples. Knowing how to remove such rows quickly can be quite handy. While working with data there can be situations where your dataframe has duplicate rows.
0 Comments
Leave a Reply. |