“Manipulating” Python lists II
Section 1 February 12th, 2007As mentioned we will see in this entry some other features of Python lists. We will start with a similar example to the one in the book and then use our DNA file. So let’s assume we have this simple list
nucleotides = [ 'A', 'C', 'G', 'T']
If we print it directly we would get something like this
['A', 'C', 'G', 'T']
which is fine for now, as we are not worried (yet) with the output (what we will do further below). Let’s remove the last nucleotide. To accomplish that, we use pop with no specific index
nucleotides.pop()
which gives me this when printed
['A', 'C', 'G']
Remember that lists are mutable, so the removed item is lost. We can also remove any other in the list, let’s say ‘C’. First, we reassign the original list items and then remove the second item
nucleotides = [ 'A', 'C', 'G'. 'T'] nucleotides.pop(1)
The list when printed will return this
['A', 'G', 'T']
pop accepts any valid index of the list. Any index larger that the length of the list will return an error. For future reference, remember that when any item is removed (and inserted) the indexes change and the length also. It may seems obvious but mistakes are common.
Shifting from our ‘destructive’ mode, we cal also add elements to the list. Adding to the end of the list is trivial, by using append
nucleotides = [ 'A', 'C', 'G'. 'T']
nucleotides.append('A')
that returns
nucleotides = [ 'A', 'C', 'G'. 'T', 'A']
Adding to any position is also very straightforward with insert, like this
nucleotides = [ 'A', 'C', 'G'. 'T'] nucleotides.insert(0, 'A')
where insert takes two arguments: first is the index of the element before which to insert and second the element to be inserted. So our line above will insert an ‘A’ just before the ‘A’ at position zero. We can try this
nucleotides = [ 'A', 'C', 'G'. 'T'] nucleotides.insert(0, 'A1') nucleotides.insert(2, 'C1') nucleotides.insert(4, 'G1') nucleotides.insert(6, 'T1')
that will result in
['A1', 'A', 'C1', 'C', 'T1', 'T', 'G1', 'G']
Notice that we add every new item at an even position, due to the fact that for every insertion the list’s length and indexes change.
And for last, we will take care of the output. Of course if are creating a script that requires a nicer output, printing a list is not the best way. We could create a loop and merge all entries in the list, but that would be a couple of lines and we ought to have an easier way (otherwise we could be using C++ instead). There is a way, by using the method join. This method will join all the elements in a list into a single string, with a selected delimiter.
nucleotides = [ 'A', 'C', 'G'. 'T'] "".join(nucleotides)
will generate this output
ACGT
join is a method that applies to strings. The first “item” is a string, that could be anything (in our case is an empty one). The code line tells Python to get the empty string an join it to the list of strings that we call nucleotides.
With this we finish the first section of the site and we are moving to chapter 5 in the book.
May 9th, 2007 at 3:25 am
Hello
You are The Best!!!
Bye
September 13th, 2007 at 3:53 pm
Assume that we have the list,
p = ['B','A','A','A','A','A','B','C','D']
we don’t know the order of items in this list. However, we do know ‘B’ appears 2 times and A appears 5 times. How we remove the second ‘B’ ?
September 13th, 2007 at 11:44 pm
Hi
IT is quite late, so my answer might not be the most correct one. Try something in the line of
p = …
bfound = False
while i <= range(len(p)):
if p[i] == ‘B’ and bfound == True:
p.pop(i)
elif p[i] == ‘B’ and bfound == False:
bfound == True
I will try something else tomorrow. But this should give you a hint.
September 13th, 2007 at 11:44 pm
Please use correct identing. It seems that it has eaten my spaces.
September 13th, 2007 at 11:47 pm
If the keeping the order is not important, you can sort the list and then searching for the second item that you want to remove.