Re: splitfile option help

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

Re: splitfile option help

Scott Cain
Hi Sanjay,

I cc'ed this to the schema mailing list, since that's where Chado and Chado-related utilities get discussed.

I wonder from the error message if there is something wrong with the GFF file which results in the undefined value message.  Can you "head -n 500 file > temp.gff" and send it to the list?

Additionally, my suggestion for you is that you not sort unless you are sure you need to; the preprocessor uses a very dumb sorting algorithm, so sorting on long files can take a very long time.

Finally, the preprocessor deposits the fasta squence in a separate file, but those can be loaded with the GFF bulk loader after the GFF has been loaded.


On Thu, Nov 1, 2012 at 7:58 PM, Sanjay Chellapilla <[hidden email]> wrote:
Hi Scott,

I'm trying to split a 1.4GB gff3 file containing embedded fasta sequences using the following
command-line, but getting an error. I'd like the gff3 to be sorted, chunked by the sequence id
in column 1 of the gff3 and also contain the corresponding fasta sequence in the footer of each
sorted gff3 chunk, so that they can be loaded one by one into Chado using --splitfile 1 --outfile chunk.gff3 --gfffile ./file.gff3
Can't call method "print" on an undefined value at /usr/local/bin/ line 201, <GFFIN> line 2.

Is there a way to to use this script to sort and chunk the gff3 by column 1 seqid.


Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (                     216-392-3087
Ontario Institute for Cancer Research

LogMeIn Central: Instant, anywhere, Remote PC access and management.
Stay in control, update software, and manage PCs from one command center
Diagnose problems and improve visibility into emerging IT issues
Automate, monitor and manage. Do more in less time with Central
Gmod-schema mailing list
[hidden email]