Uniquifying lists with sets and dictionaries
Phase 2 January 29th, 2008We are going to use our previous example to compare the use of sets and dictionaries to create unique lists. We’ve already seen that when sets are used it is very simple to transform a list with repeated items in a unique list. The only hassle is to create the set and then transform it back into a list.
Like last time (with one small addition)
from sets import Set cluster1 = open(sys.argv[1]).readlines() cluster2 = open(sys.argv[2]).readlines() allgenes = cluster1 + cluster2 uniqueset = Set(allgenes) finalist = list(uniqueset)
We can accomplish identical result by using a dictionary. We create a small function to make our code clearer and pass a list to it and we return the dictionary keys. Rmember that Python dictionaries have values and keys and the latter cannot be repeated, so it is basically a list of unique entries. Our function would look like
def make_unique_list(mylist):
dict = {}
for word in mylist:
dict[word] = 1
return dict.keys()
In this function we declare the object and in the loop, iterating over every list’s item we assign a value (arbitrary) to the dictionary key. As pointed above, no repeated keys are allowed, so everytime a already checked item is seen by the assignment it is not included in the dictionary. Finally we return only the dictionary keys which is our final unique list.
Our small script would be:
cluster1 = open(sys.argv[1]).readlines() cluster2 = open(sys.argv[2]).readlines() allgenes = cluster1 + cluster2 allgenes = make_unique_list(allgenes)
Both methods are very effective and usually fast. I will post some comparisons and benchmarks, just for fun.
Nathan posted in the comments another approach using dictionaries. It is below with syntax highlighting
dict.fromkeys(mylist).keys()
Basically in one line you pass a list of elements to dictionary and return all the keys that are in the dic. Very Pythonic.
January 30th, 2008 at 2:45 am
But if you are using a version of Python that has sets, there is no need to use the dict solution as this kind of thing is what sets were introduced for and saves having to think about dictionary values.
- Paddy.
January 30th, 2008 at 10:09 am
Exactly. The post was written just to compare both. Sometimes you are caught without the right set of tools and knowing an alternative might be handy. Some people claim that sets are slower than dicts, while others claim the opposite.
March 13th, 2008 at 3:02 pm
There’s a much shorter version of the dictionary solution (I didn’t make it up):
dict.fromkeys(mylist).keys()
March 13th, 2008 at 3:23 pm
Nice. Thanks a lot, I will add the entry as another option.
Cheers
March 14th, 2008 at 2:37 pm
[...] ago we saw how to use sets and uniquify lists. This time we will see anothe example of the use of [...]