AMET-SOLID: P#19 Duplicates Handling

Sunday, 3 April 2022

P#19 Duplicates Handling

DUPLICATE REMOVAL

In any Data set, Duplicates are perennial problem in data cleaning. Let us brief how we can handle duplicates in this article.

Method 1: (Traditional ..loop way)

# Create a list with duplicates

dlist = [10,20,30,40,50,60,10,20,30]
print(dlist)
# remove duplicates
dupFreeList = []
for element in dlist:
    print(element)
    if element not in dupFreeList:
        dupFreeList.append(element)
#
print(dupFreeList) # [10, 20, 30, 40, 50, 60]

Method 2 : (Comprhensive Way)


res = []
[res.append(x) for x in dlist if x not in res]

# printing list after removal
print ("The list after removing duplicates : " + str(res)) 
# The list after removing duplicates : [10, 20, 30, 40, 50]

Method 3:

You can convert to set and then convert to list to remove duplicates.



dlistset = set(dlist)
print(dlistset)
                                # {40, 10, 50, 20, 60, 30}
dupFreeList = list(dlistset)
print(dupFreeList)              # [40, 10, 50, 20, 60, 30] # Order is not Maintained

Method 4:


from collections import OrderedDict

dupFreeList = list(OrderedDict.fromkeys(dlist))

print(dupFreeList)  # [10, 20, 30, 40, 50, 60]  # order is maintained

Here, we have imported package OrderedDict from collections and used the method list(OrderedDict.fromkeys(dlist))

Method 5: list(dict.fromkeys(df)) usage


dlist = ["10","20", "30","40","20","30"]  # String
dflist = list(dict.fromkeys(dlist))
print(dlist, dflist)
#['10', '20', '30', '40', '20', '30'] ## ['10', '20', '30', '40']


dlist = [10,20,30,40,50,10,20] # integer 
dflist = list(dict.fromkeys(dlist))
print(dlist, dflist) #[10, 20, 30, 40, 50, 10, 20] [10, 20, 30, 40, 50]

Happy Open Learning at AMET ODL!

AMET-SOLID

Sunday, 3 April 2022

P#19 Duplicates Handling

DUPLICATE REMOVAL

Method 1: (Traditional ..loop way)

Method 2 : (Comprhensive Way)

res = []
[res.append(x) for x in dlist if x not in res]

# printing list after removal
print ("The list after removing duplicates : " + str(res))
# The list after removing duplicates : [10, 20, 30, 40, 50]

Method 3:

Method 4:

Method 5: list(dict.fromkeys(df)) usage

No comments:

Post a Comment

Work Diary - 2025

Happy open and Distance Learning!

Blog Archive

Sunday, 3 April 2022

P#19 Duplicates Handling

DUPLICATE REMOVAL

Method 1: (Traditional ..loop way)

Method 2 : (Comprhensive Way)

res = [][res.append(x) for x in dlist if x not in res]# printing list after removalprint ("The list after removing duplicates : " + str(res)) # The list after removing duplicates : [10, 20, 30, 40, 50]

Method 3:

Method 4:

Method 5: list(dict.fromkeys(df)) usage

No comments:

Post a Comment

Work Diary - 2025

Happy open and Distance Learning!

Blog Archive

res = []
[res.append(x) for x in dlist if x not in res]

# printing list after removal
print ("The list after removing duplicates : " + str(res))
# The list after removing duplicates : [10, 20, 30, 40, 50]