<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Alternative methods to split a FASTA file &#8211; updated (again)</title>
	<atom:link href="http://python.genedrift.org/2007/10/10/alternative-methods-to-split-a-fasta-file/feed/" rel="self" type="application/rss+xml" />
	<link>http://python.genedrift.org/2007/10/10/alternative-methods-to-split-a-fasta-file/</link>
	<description>a step-by-step guide to create Python applications in bioinformatics</description>
	<lastBuildDate>Mon, 22 Feb 2010 18:22:18 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=3.0-alpha</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: nuin</title>
		<link>http://python.genedrift.org/2007/10/10/alternative-methods-to-split-a-fasta-file/comment-page-2/#comment-8347</link>
		<dc:creator>nuin</dc:creator>
		<pubDate>Mon, 04 Feb 2008 18:40:30 +0000</pubDate>
		<guid isPermaLink="false">http://python.genedrift.org/2007/10/10/alternative-methods-to-split-a-fasta-file/#comment-8347</guid>
		<description>My perl is a little rusty (the site is about Python!!) but my best bet would be to read line by line checking for the first character. If it is a &#039;&gt;&#039; you process the lines until you find a new &#039;&gt;&#039;.

A &#039;while&#039; loop inside a if clause with a couple of flags should to the trick.

HTH

Paulo</description>
		<content:encoded><![CDATA[<p>My perl is a little rusty (the site is about Python!!) but my best bet would be to read line by line checking for the first character. If it is a &#8216;>&#8217; you process the lines until you find a new &#8216;>&#8217;.</p>
<p>A &#8216;while&#8217; loop inside a if clause with a couple of flags should to the trick.</p>
<p>HTH</p>
<p>Paulo</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Dereje</title>
		<link>http://python.genedrift.org/2007/10/10/alternative-methods-to-split-a-fasta-file/comment-page-2/#comment-8338</link>
		<dc:creator>Dereje</dc:creator>
		<pubDate>Mon, 04 Feb 2008 15:29:06 +0000</pubDate>
		<guid isPermaLink="false">http://python.genedrift.org/2007/10/10/alternative-methods-to-split-a-fasta-file/#comment-8338</guid>
		<description>I have one question, how can I process the following format in perl:
&gt;fasta1
AAGCCCCCCCCCCCCCCCCCCCCCCCCCCCCGAGAT
1                                  80
16.30 (((((()))))&gt;&gt;&gt;.......())))))))) 0.999
GGGCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCGG
2                                   81
16.30 (((((()))))&gt;&gt;&gt;.......())))))))) 0.999
.
.
&gt;fasta2
1                                   80
AACCCGGGGGGGGGGGGCCCCCCCCCCCCCCCCCCCC
14.40 (((((()))))&gt;&gt;&gt;.......())))))))) 0.89
AACCCGGGGGGGGGGGGCCCCCCCCCCCCCCCCCCCC
140 (((((()))))&gt;&gt;&gt;.......())))))))) 0.89
.
.
I want to read the fasta header and perse the value from each line  14.40 (((((()))))&gt;&gt;&gt;.......())))))))) 0.89&quot;
using perl REGEX. I know I need two while loop(one to read a line and one to check fasta header) but unable to get the result I need. Each fasta file has multiple line under them and the number varies but the fomat is the one described above.


Here is the result I need:
Fasta1
$value1= AAGCCCCCCCCCCCCCCCCCCCCCCCCCCCCGAGAT
$value2 = 1
$value3 = 80
$value4 = 16.30 
$value 5 =(((((()))))&gt;&gt;&gt;.......())))))))) 
$value6 = 0.999
Fasta1
.....
fast1
----
I know how to handle regular expression and passing the above value to database. I only need the loop to itrate 

Thank you,
Dereje</description>
		<content:encoded><![CDATA[<p>I have one question, how can I process the following format in perl:<br />
&gt;fasta1<br />
AAGCCCCCCCCCCCCCCCCCCCCCCCCCCCCGAGAT<br />
1                                  80<br />
16.30 (((((()))))&gt;&gt;&gt;&#8230;&#8230;.())))))))) 0.999<br />
GGGCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCGG<br />
2                                   81<br />
16.30 (((((()))))&gt;&gt;&gt;&#8230;&#8230;.())))))))) 0.999<br />
.<br />
.<br />
&gt;fasta2<br />
1                                   80<br />
AACCCGGGGGGGGGGGGCCCCCCCCCCCCCCCCCCCC<br />
14.40 (((((()))))&gt;&gt;&gt;&#8230;&#8230;.())))))))) 0.89<br />
AACCCGGGGGGGGGGGGCCCCCCCCCCCCCCCCCCCC<br />
140 (((((()))))&gt;&gt;&gt;&#8230;&#8230;.())))))))) 0.89<br />
.<br />
.<br />
I want to read the fasta header and perse the value from each line  14.40 (((((()))))&gt;&gt;&gt;&#8230;&#8230;.())))))))) 0.89&#8243;<br />
using perl REGEX. I know I need two while loop(one to read a line and one to check fasta header) but unable to get the result I need. Each fasta file has multiple line under them and the number varies but the fomat is the one described above.</p>
<p>Here is the result I need:<br />
Fasta1<br />
$value1= AAGCCCCCCCCCCCCCCCCCCCCCCCCCCCCGAGAT<br />
$value2 = 1<br />
$value3 = 80<br />
$value4 = 16.30<br />
$value 5 =(((((()))))&gt;&gt;&gt;&#8230;&#8230;.()))))))))<br />
$value6 = 0.999<br />
Fasta1<br />
&#8230;..<br />
fast1<br />
&#8212;-<br />
I know how to handle regular expression and passing the above value to database. I only need the loop to itrate </p>
<p>Thank you,<br />
Dereje</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Beginning Python for Bioinformatics &#187; Blog Archive &#187; Splitting a FASTA file using awk (no sed required), or do we care about csplit?</title>
		<link>http://python.genedrift.org/2007/10/10/alternative-methods-to-split-a-fasta-file/comment-page-2/#comment-4776</link>
		<dc:creator>Beginning Python for Bioinformatics &#187; Blog Archive &#187; Splitting a FASTA file using awk (no sed required), or do we care about csplit?</dc:creator>
		<pubDate>Mon, 29 Oct 2007 16:37:13 +0000</pubDate>
		<guid isPermaLink="false">http://python.genedrift.org/2007/10/10/alternative-methods-to-split-a-fasta-file/#comment-4776</guid>
		<description>[...] saw that &#8220;top notch bioinformaticians&#8221; use csplit to split FASTA files, so I decided to post as many as possible alternatives to split these files. As csplit, awk is something found with more frequency in Linux machines than [...]</description>
		<content:encoded><![CDATA[<p>[...] saw that &#8220;top notch bioinformaticians&#8221; use csplit to split FASTA files, so I decided to post as many as possible alternatives to split these files. As csplit, awk is something found with more frequency in Linux machines than [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: nuin</title>
		<link>http://python.genedrift.org/2007/10/10/alternative-methods-to-split-a-fasta-file/comment-page-2/#comment-4658</link>
		<dc:creator>nuin</dc:creator>
		<pubDate>Sat, 27 Oct 2007 14:54:19 +0000</pubDate>
		<guid isPermaLink="false">http://python.genedrift.org/2007/10/10/alternative-methods-to-split-a-fasta-file/#comment-4658</guid>
		<description>Hi Joe

Don&#039;t worry about commenting and please continue doing so. And don&#039;t worry about using my time or annoying the readers. It is very important to receive comments, suggestions, ideas and criticism. 

Please continue posting. Or send me an email when needed, it is a pleasure to interact.

I am posting your code to the topic this afternoon.

Paulo</description>
		<content:encoded><![CDATA[<p>Hi Joe</p>
<p>Don&#8217;t worry about commenting and please continue doing so. And don&#8217;t worry about using my time or annoying the readers. It is very important to receive comments, suggestions, ideas and criticism. </p>
<p>Please continue posting. Or send me an email when needed, it is a pleasure to interact.</p>
<p>I am posting your code to the topic this afternoon.</p>
<p>Paulo</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Joseph T Oettinger MD, FACOG</title>
		<link>http://python.genedrift.org/2007/10/10/alternative-methods-to-split-a-fasta-file/comment-page-2/#comment-4619</link>
		<dc:creator>Joseph T Oettinger MD, FACOG</dc:creator>
		<pubDate>Fri, 26 Oct 2007 23:33:35 +0000</pubDate>
		<guid isPermaLink="false">http://python.genedrift.org/2007/10/10/alternative-methods-to-split-a-fasta-file/#comment-4619</guid>
		<description>Dear Paolo,

Now I see what you mean! Somehow my indents don&#039;t show on the comments posts! They should show on the email I sent, but I&#039;m using a lot of your time, and this must be rather boring to any readers of the comments. So I&#039;m going to post less and read more for a while. 

Thanks for the advice on not keeping all the data in memory.

Joe</description>
		<content:encoded><![CDATA[<p>Dear Paolo,</p>
<p>Now I see what you mean! Somehow my indents don&#8217;t show on the comments posts! They should show on the email I sent, but I&#8217;m using a lot of your time, and this must be rather boring to any readers of the comments. So I&#8217;m going to post less and read more for a while. </p>
<p>Thanks for the advice on not keeping all the data in memory.</p>
<p>Joe</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: cariaso</title>
		<link>http://python.genedrift.org/2007/10/10/alternative-methods-to-split-a-fasta-file/comment-page-2/#comment-4616</link>
		<dc:creator>cariaso</dc:creator>
		<pubDate>Fri, 26 Oct 2007 22:33:49 +0000</pubDate>
		<guid isPermaLink="false">http://python.genedrift.org/2007/10/10/alternative-methods-to-split-a-fasta-file/#comment-4616</guid>
		<description>Dr. Oettinger MD

With the whitespace issues its hard for me to be sure, but it appears that you solution does add some useful checks. However I&#039;d like to caution you on one aspect. It appears you are trying to keep all of the sequences in memory, before you finally print them. That is fine for small files, but fasta files are often quite huge (especially the ones you&#039;d want to split). If so, your code will hit a pretty hard wall once the input file is nearly as big as ram.</description>
		<content:encoded><![CDATA[<p>Dr. Oettinger MD</p>
<p>With the whitespace issues its hard for me to be sure, but it appears that you solution does add some useful checks. However I&#8217;d like to caution you on one aspect. It appears you are trying to keep all of the sequences in memory, before you finally print them. That is fine for small files, but fasta files are often quite huge (especially the ones you&#8217;d want to split). If so, your code will hit a pretty hard wall once the input file is nearly as big as ram.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: nuin</title>
		<link>http://python.genedrift.org/2007/10/10/alternative-methods-to-split-a-fasta-file/comment-page-2/#comment-4612</link>
		<dc:creator>nuin</dc:creator>
		<pubDate>Fri, 26 Oct 2007 20:30:42 +0000</pubDate>
		<guid isPermaLink="false">http://python.genedrift.org/2007/10/10/alternative-methods-to-split-a-fasta-file/#comment-4612</guid>
		<description>Hi Joe

I sent you an email to your netscape address (the one I have registered). Check your spam box otherwise my email is 

nuin at genedrift dot org

Cheers
Paulo</description>
		<content:encoded><![CDATA[<p>Hi Joe</p>
<p>I sent you an email to your netscape address (the one I have registered). Check your spam box otherwise my email is </p>
<p>nuin at genedrift dot org</p>
<p>Cheers<br />
Paulo</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Joseph T Oettinger MD, FACOG</title>
		<link>http://python.genedrift.org/2007/10/10/alternative-methods-to-split-a-fasta-file/comment-page-2/#comment-4611</link>
		<dc:creator>Joseph T Oettinger MD, FACOG</dc:creator>
		<pubDate>Fri, 26 Oct 2007 20:20:15 +0000</pubDate>
		<guid isPermaLink="false">http://python.genedrift.org/2007/10/10/alternative-methods-to-split-a-fasta-file/#comment-4611</guid>
		<description>Dear Paolo,

I&#039;m embarrassed to say that I can&#039;t find the email address on your blog to send the source to. If you don&#039;t mind sending same to me or telling me where to find it, I&#039;ll be glad to send the source. Thanks for your patience.

Joe</description>
		<content:encoded><![CDATA[<p>Dear Paolo,</p>
<p>I&#8217;m embarrassed to say that I can&#8217;t find the email address on your blog to send the source to. If you don&#8217;t mind sending same to me or telling me where to find it, I&#8217;ll be glad to send the source. Thanks for your patience.</p>
<p>Joe</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: nuin</title>
		<link>http://python.genedrift.org/2007/10/10/alternative-methods-to-split-a-fasta-file/comment-page-2/#comment-4608</link>
		<dc:creator>nuin</dc:creator>
		<pubDate>Fri, 26 Oct 2007 17:20:23 +0000</pubDate>
		<guid isPermaLink="false">http://python.genedrift.org/2007/10/10/alternative-methods-to-split-a-fasta-file/#comment-4608</guid>
		<description>Hi Joe 

Would you mind sending me source in an email? The comments are not formatted for Python.

Thanks
Paulo</description>
		<content:encoded><![CDATA[<p>Hi Joe </p>
<p>Would you mind sending me source in an email? The comments are not formatted for Python.</p>
<p>Thanks<br />
Paulo</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Joseph T Oettinger MD, FACOG</title>
		<link>http://python.genedrift.org/2007/10/10/alternative-methods-to-split-a-fasta-file/comment-page-1/#comment-4604</link>
		<dc:creator>Joseph T Oettinger MD, FACOG</dc:creator>
		<pubDate>Fri, 26 Oct 2007 16:32:20 +0000</pubDate>
		<guid isPermaLink="false">http://python.genedrift.org/2007/10/10/alternative-methods-to-split-a-fasta-file/#comment-4604</guid>
		<description>Dear Pablo,

It&#039;s been fascinating reading your blog. And very instructive. I&#039;m a beginner in Python and bioinformatics, but have been able to follow it (really, that&#039;s not a boast but a compliment).

The way to split a FASTA file with Python that Michael Cariaso sent in is short, efficient and understandable (maybe that&#039;s what Pythonic means). I&#039;m sending a little modification (actually two small additions) to it. I&#039;m pretty sure that they&#039;d be obvious additions to you and to him, but, if correct, they might be useful and instructive to others who are &quot;beginning Python for bioinformatics&quot; like me. It seems to work with the fasta.seq file you supplied in you blog, and some permutations and multiplications of it that I invented.

At any rate, if you have the time to examine and correct it, I&#039;d be grateful.

Joe

# f6smod.py 10-26-07 jto
# purpose: modify program by Michael Cariaso, given in Paolo Nuin&#039;s blog:
# http://python.genedrift.org
# archive: kept by joeoettinger@netscape.net

import sys

fileobj = open(&#039;fasta.seq&#039;)
ignore = fileobj.read(1)
# The above line removes the first char from the file object.
# It could also be used to check that the first char is a &#039;&gt;&#039;

if ignore != &#039;&gt;&#039;:
  print &quot;The first character in the supposed FASTA file is: &quot; + ignore
  print &quot;not &#039;&gt;&#039;, so sys.exit() is being invoked.&quot;
  sys.exit()


text = fileobj.read()
records = text.split(&#039;&gt;&#039;)

# Here, rather than use the for loop to just print out the sequences, it
# is used to store them in a list. After that they can be printed out, or
# stored in separate files, or be further split into header line and sequence
# (using the carriage return at the end of the header file).
seqlist = []
listcount = 0
 
# store each header-sequence in a list
for i in records:
  i = &#039;&gt;&#039; + i
  seqlist.append(i)
  listcount += 1
  # Just to show it&#039;s working right, print the list
for seq in seqlist:
  print seq
# Split into header line and sequence, and make the sequence a single string.
for seq in seqlist:
  splitCR = seq.split(&#039;\n&#039;)
  print &quot;header: &quot; + splitCR[0]
  sequence = &#039;&#039;.join(splitCR[1:])
  print &quot;sequence: &quot;
  print sequence</description>
		<content:encoded><![CDATA[<p>Dear Pablo,</p>
<p>It&#8217;s been fascinating reading your blog. And very instructive. I&#8217;m a beginner in Python and bioinformatics, but have been able to follow it (really, that&#8217;s not a boast but a compliment).</p>
<p>The way to split a FASTA file with Python that Michael Cariaso sent in is short, efficient and understandable (maybe that&#8217;s what Pythonic means). I&#8217;m sending a little modification (actually two small additions) to it. I&#8217;m pretty sure that they&#8217;d be obvious additions to you and to him, but, if correct, they might be useful and instructive to others who are &#8220;beginning Python for bioinformatics&#8221; like me. It seems to work with the fasta.seq file you supplied in you blog, and some permutations and multiplications of it that I invented.</p>
<p>At any rate, if you have the time to examine and correct it, I&#8217;d be grateful.</p>
<p>Joe</p>
<p># f6smod.py 10-26-07 jto<br />
# purpose: modify program by Michael Cariaso, given in Paolo Nuin&#8217;s blog:<br />
# <a href="http://python.genedrift.org" rel="nofollow">http://python.genedrift.org</a><br />
# archive: kept by <a href="mailto:joeoettinger@netscape.net">joeoettinger@netscape.net</a></p>
<p>import sys</p>
<p>fileobj = open(&#8216;fasta.seq&#8217;)<br />
ignore = fileobj.read(1)<br />
# The above line removes the first char from the file object.<br />
# It could also be used to check that the first char is a &#8216;&gt;&#8217;</p>
<p>if ignore != &#8216;&gt;&#8217;:<br />
  print &#8220;The first character in the supposed FASTA file is: &#8221; + ignore<br />
  print &#8220;not &#8216;&gt;&#8217;, so sys.exit() is being invoked.&#8221;<br />
  sys.exit()</p>
<p>text = fileobj.read()<br />
records = text.split(&#8216;&gt;&#8217;)</p>
<p># Here, rather than use the for loop to just print out the sequences, it<br />
# is used to store them in a list. After that they can be printed out, or<br />
# stored in separate files, or be further split into header line and sequence<br />
# (using the carriage return at the end of the header file).<br />
seqlist = []<br />
listcount = 0</p>
<p># store each header-sequence in a list<br />
for i in records:<br />
  i = &#8216;&gt;&#8217; + i<br />
  seqlist.append(i)<br />
  listcount += 1<br />
  # Just to show it&#8217;s working right, print the list<br />
for seq in seqlist:<br />
  print seq<br />
# Split into header line and sequence, and make the sequence a single string.<br />
for seq in seqlist:<br />
  splitCR = seq.split(&#8216;\n&#8217;)<br />
  print &#8220;header: &#8221; + splitCR[0]<br />
  sequence = &#8221;.join(splitCR[1:])<br />
  print &#8220;sequence: &#8221;<br />
  print sequence</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: nuin</title>
		<link>http://python.genedrift.org/2007/10/10/alternative-methods-to-split-a-fasta-file/comment-page-1/#comment-4322</link>
		<dc:creator>nuin</dc:creator>
		<pubDate>Fri, 19 Oct 2007 23:38:11 +0000</pubDate>
		<guid isPermaLink="false">http://python.genedrift.org/2007/10/10/alternative-methods-to-split-a-fasta-file/#comment-4322</guid>
		<description>Hi Cariaso 

I will add your code to the post (referencing it of course). I will work on a sed/awk solution this weekend.

Thanks a lot for your input.</description>
		<content:encoded><![CDATA[<p>Hi Cariaso </p>
<p>I will add your code to the post (referencing it of course). I will work on a sed/awk solution this weekend.</p>
<p>Thanks a lot for your input.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: cariaso</title>
		<link>http://python.genedrift.org/2007/10/10/alternative-methods-to-split-a-fasta-file/comment-page-1/#comment-4321</link>
		<dc:creator>cariaso</dc:creator>
		<pubDate>Fri, 19 Oct 2007 22:13:53 +0000</pubDate>
		<guid isPermaLink="false">http://python.genedrift.org/2007/10/10/alternative-methods-to-split-a-fasta-file/#comment-4321</guid>
		<description>for a legible copy of the above visit

http://mike.pbwiki.com/pythonfasta</description>
		<content:encoded><![CDATA[<p>for a legible copy of the above visit</p>
<p><a href="http://mike.pbwiki.com/pythonfasta" rel="nofollow">http://mike.pbwiki.com/pythonfasta</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: cariaso</title>
		<link>http://python.genedrift.org/2007/10/10/alternative-methods-to-split-a-fasta-file/comment-page-1/#comment-4320</link>
		<dc:creator>cariaso</dc:creator>
		<pubDate>Fri, 19 Oct 2007 22:03:40 +0000</pubDate>
		<guid isPermaLink="false">http://python.genedrift.org/2007/10/10/alternative-methods-to-split-a-fasta-file/#comment-4320</guid>
		<description>
The perl above works better as

# This magic variable makes perl read lines
# that end with &gt;
# instead of a newline \n
$/=&quot;&gt;&quot;;

while () {           # foreach line in the input files
    if (/^&gt;(\w+)/) {      # grab the first word of text
        open(F,&quot;&gt;$1&quot;) &#124;&#124;  # open a file named that word
             warn &quot;$1 write failed:$!\n&quot;;
        chomp;            # strip off the &gt; at the end
        print F &quot;&gt;&quot;, $_;  # print &gt;text to the file
    }
}

It as well as the python scripts above suffer from an extra &#039;&gt;&#039; at the beginning of the output.


An easy fix is

fileobj = open(&quot;myfile.fasta&quot;)
ignore  = fileobj.read(1)
text    = fileobj.read()
records = text.split(&quot;&gt;&quot;)
for i in records:
    print &#039;&gt;&#039; + i


But I consider this a better approach

def eachfasta(fileobj):
        sofar = fileobj.readline()
        for line in fileobj:
                if &#039;&gt;&#039; == line[0]:
                        yield sofar
                        sofar = line
                else:
                        sofar += line

        yield sofar


fileobj = open(&quot;myfile.fasta&quot;)

for i in eachfasta(fileobj):
    print &#039;**[%s]**&#039; % i


</description>
		<content:encoded><![CDATA[<p>The perl above works better as</p>
<p># This magic variable makes perl read lines<br />
# that end with &gt;<br />
# instead of a newline \n<br />
$/=&#8221;&gt;&#8221;;</p>
<p>while () {           # foreach line in the input files<br />
    if (/^&gt;(\w+)/) {      # grab the first word of text<br />
        open(F,&#8221;&gt;$1&#8243;) ||  # open a file named that word<br />
             warn &#8220;$1 write failed:$!\n&#8221;;<br />
        chomp;            # strip off the &gt; at the end<br />
        print F &#8220;&gt;&#8221;, $_;  # print &gt;text to the file<br />
    }<br />
}</p>
<p>It as well as the python scripts above suffer from an extra &#8216;&gt;&#8217; at the beginning of the output.</p>
<p>An easy fix is</p>
<p>fileobj = open(&#8220;myfile.fasta&#8221;)<br />
ignore  = fileobj.read(1)<br />
text    = fileobj.read()<br />
records = text.split(&#8220;&gt;&#8221;)<br />
for i in records:<br />
    print &#8216;&gt;&#8217; + i</p>
<p>But I consider this a better approach</p>
<p>def eachfasta(fileobj):<br />
        sofar = fileobj.readline()<br />
        for line in fileobj:<br />
                if &#8216;&gt;&#8217; == line[0]:<br />
                        yield sofar<br />
                        sofar = line<br />
                else:<br />
                        sofar += line</p>
<p>        yield sofar</p>
<p>fileobj = open(&#8220;myfile.fasta&#8221;)</p>
<p>for i in eachfasta(fileobj):<br />
    print &#8216;**[%s]**&#8217; % i</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Splitting a FASTA file</title>
		<link>http://python.genedrift.org/2007/10/10/alternative-methods-to-split-a-fasta-file/comment-page-1/#comment-4275</link>
		<dc:creator>Splitting a FASTA file</dc:creator>
		<pubDate>Thu, 18 Oct 2007 01:08:23 +0000</pubDate>
		<guid isPermaLink="false">http://python.genedrift.org/2007/10/10/alternative-methods-to-split-a-fasta-file/#comment-4275</guid>
		<description>[...] have posted in my other site different ways to split a FASTA file. Take a look. The Python method is [...]</description>
		<content:encoded><![CDATA[<p>[...] have posted in my other site different ways to split a FASTA file. Take a look. The Python method is [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: nuin</title>
		<link>http://python.genedrift.org/2007/10/10/alternative-methods-to-split-a-fasta-file/comment-page-1/#comment-4215</link>
		<dc:creator>nuin</dc:creator>
		<pubDate>Tue, 16 Oct 2007 20:50:50 +0000</pubDate>
		<guid isPermaLink="false">http://python.genedrift.org/2007/10/10/alternative-methods-to-split-a-fasta-file/#comment-4215</guid>
		<description>Hi Cariaso

Thanks for the Perl explanation. I am researching a way to do it with  sed and awk and I post it if no one posts before.

I also expect that Andrew would know the Pythonic way! I will probably write him and ask.

Cheers
Paulo</description>
		<content:encoded><![CDATA[<p>Hi Cariaso</p>
<p>Thanks for the Perl explanation. I am researching a way to do it with  sed and awk and I post it if no one posts before.</p>
<p>I also expect that Andrew would know the Pythonic way! I will probably write him and ask.</p>
<p>Cheers<br />
Paulo</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: cariaso</title>
		<link>http://python.genedrift.org/2007/10/10/alternative-methods-to-split-a-fasta-file/comment-page-1/#comment-4085</link>
		<dc:creator>cariaso</dc:creator>
		<pubDate>Sat, 13 Oct 2007 20:48:35 +0000</pubDate>
		<guid isPermaLink="false">http://python.genedrift.org/2007/10/10/alternative-methods-to-split-a-fasta-file/#comment-4085</guid>
		<description>I think your process is excellent, please continue. I hope to see the same core python theme continue. Others seems to want a bioinfomatics course with python as the language. 


the old perl motto was 
&lt;a href=&#039;http://en.wikipedia.org/wiki/There_is_more_than_one_way_to_do_it&#039; rel=&quot;nofollow&quot;&gt;Tim Toady&lt;/a&gt;. In that spirit, I was unaware of cplit and am still unlikely to use it. I probably would have kicked this one off in perl, and assuming someone will find it useful, I will try to explain the perl above. I expect there is also a sed/awk solution, and I&#039;d be curious to hear those if anyone can enlighten me.

Incidentally I use biopython daily, but mainly for the ncbi eutils interfaces. I certainly don&#039;t know the biopythonic way to split fasta. I expect Andrew Dalke might.

# This magic variable makes perl read &#039;lines&#039; 
# that end with &#039;&gt;&#039; 
# instead of a newline &#039;\n&#039;
$/=&quot;&gt;&quot; 

while () { # foreach &#039;line&#039; in the input files

  if (/^\s*(\S+)/) {    # grab the first &#039;word&#039; of text
      open(F,&quot;&gt;$1&quot;) &#124;&#124;  # open a file named that &#039;word&#039;
          warn &quot;$1 write failed:$!\n&quot;;
      chomp;            # strip off the &quot;&gt;&quot; at the end
      print F &quot;&gt;&quot;, $_;  # print &#039;&gt;text&#039; to the file
      }

}</description>
		<content:encoded><![CDATA[<p>I think your process is excellent, please continue. I hope to see the same core python theme continue. Others seems to want a bioinfomatics course with python as the language. </p>
<p>the old perl motto was<br />
<a href='http://en.wikipedia.org/wiki/There_is_more_than_one_way_to_do_it' rel="nofollow">Tim Toady</a>. In that spirit, I was unaware of cplit and am still unlikely to use it. I probably would have kicked this one off in perl, and assuming someone will find it useful, I will try to explain the perl above. I expect there is also a sed/awk solution, and I&#8217;d be curious to hear those if anyone can enlighten me.</p>
<p>Incidentally I use biopython daily, but mainly for the ncbi eutils interfaces. I certainly don&#8217;t know the biopythonic way to split fasta. I expect Andrew Dalke might.</p>
<p># This magic variable makes perl read &#8216;lines&#8217;<br />
# that end with &#8216;&gt;&#8217;<br />
# instead of a newline &#8216;\n&#8217;<br />
$/=&#8221;&gt;&#8221; </p>
<p>while () { # foreach &#8216;line&#8217; in the input files</p>
<p>  if (/^\s*(\S+)/) {    # grab the first &#8216;word&#8217; of text<br />
      open(F,&#8221;&gt;$1&#8243;) ||  # open a file named that &#8216;word&#8217;<br />
          warn &#8220;$1 write failed:$!\n&#8221;;<br />
      chomp;            # strip off the &#8220;&gt;&#8221; at the end<br />
      print F &#8220;&gt;&#8221;, $_;  # print &#8216;&gt;text&#8217; to the file<br />
      }</p>
<p>}</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: nuin</title>
		<link>http://python.genedrift.org/2007/10/10/alternative-methods-to-split-a-fasta-file/comment-page-1/#comment-4018</link>
		<dc:creator>nuin</dc:creator>
		<pubDate>Fri, 12 Oct 2007 12:11:54 +0000</pubDate>
		<guid isPermaLink="false">http://python.genedrift.org/2007/10/10/alternative-methods-to-split-a-fasta-file/#comment-4018</guid>
		<description>Hi Mike

I have the intention of include some BioPython basic usage in the future. I am not an experienced user of BioPython, but I am currently playing with it and hope to but something soon here.

The idea of the site was to create a basic guide of Python and Bioinformatics, especially for people that don&#039;t know that much Python and/or a lot of biology. Most of the scripts presented here are very simple and short; the scripts can be used as a primer for more advanced programming. And I hope to do that with BioPython too.

I would like to include some basic tutorials of other Python packages too (bioinfo or not) and even some basic steps to build interfaces with wxPython. 

If there is an specific topic you would like to see it covered, please let me know. And thanks for your support.

Cheers
Paulo</description>
		<content:encoded><![CDATA[<p>Hi Mike</p>
<p>I have the intention of include some BioPython basic usage in the future. I am not an experienced user of BioPython, but I am currently playing with it and hope to but something soon here.</p>
<p>The idea of the site was to create a basic guide of Python and Bioinformatics, especially for people that don&#8217;t know that much Python and/or a lot of biology. Most of the scripts presented here are very simple and short; the scripts can be used as a primer for more advanced programming. And I hope to do that with BioPython too.</p>
<p>I would like to include some basic tutorials of other Python packages too (bioinfo or not) and even some basic steps to build interfaces with wxPython. </p>
<p>If there is an specific topic you would like to see it covered, please let me know. And thanks for your support.</p>
<p>Cheers<br />
Paulo</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Mike</title>
		<link>http://python.genedrift.org/2007/10/10/alternative-methods-to-split-a-fasta-file/comment-page-1/#comment-3997</link>
		<dc:creator>Mike</dc:creator>
		<pubDate>Fri, 12 Oct 2007 04:41:39 +0000</pubDate>
		<guid isPermaLink="false">http://python.genedrift.org/2007/10/10/alternative-methods-to-split-a-fasta-file/#comment-3997</guid>
		<description>I think this is a marvellous resource but (and perhaps I missed this), why are you using python instead of biopython. Is that to come later?
I would like to see a tutorial that takes biopython through in very simple steps. I find the biopython &#039;Tutorial and Cookbook&#039; assumes a strong knowledge of python. So you spend then spend an inordinate amount of time learning how to do things the hard way, only to find later that biopython modules have been built specifically to try and make it easy to handle DNA etc.
Cheers, Mike</description>
		<content:encoded><![CDATA[<p>I think this is a marvellous resource but (and perhaps I missed this), why are you using python instead of biopython. Is that to come later?<br />
I would like to see a tutorial that takes biopython through in very simple steps. I find the biopython &#8216;Tutorial and Cookbook&#8217; assumes a strong knowledge of python. So you spend then spend an inordinate amount of time learning how to do things the hard way, only to find later that biopython modules have been built specifically to try and make it easy to handle DNA etc.<br />
Cheers, Mike</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Bioinformatics &#187; Blog Archives &#187; Alternative methods to split a FASTA file</title>
		<link>http://python.genedrift.org/2007/10/10/alternative-methods-to-split-a-fasta-file/comment-page-1/#comment-3947</link>
		<dc:creator>Bioinformatics &#187; Blog Archives &#187; Alternative methods to split a FASTA file</dc:creator>
		<pubDate>Thu, 11 Oct 2007 00:58:39 +0000</pubDate>
		<guid isPermaLink="false">http://python.genedrift.org/2007/10/10/alternative-methods-to-split-a-fasta-file/#comment-3947</guid>
		<description>[...] Alternative methods to split a FASTA file As Daniel didn&#8217;t enlightened us on how to use csplit, I am posting several ways on how to split a multiple sequence FASTA file. [...]</description>
		<content:encoded><![CDATA[<p>[...] Alternative methods to split a FASTA file As Daniel didn&#38;#8217;t enlightened us on how to use csplit, I am posting several ways on how to split a multiple sequence FASTA file. [...]</p>
]]></content:encoded>
	</item>
</channel>
</rss>

