bp_seqfeature_load.pl too slow

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

bp_seqfeature_load.pl too slow

JINGJING JIN-2

Dear all,

When I try to use bp_sequence_load.pl to load into my new mysql database, I
find it is a little too slow to run it.
bp_seqfeature_load.pl -c -f -a DBI::mysql -d oilpalm oilpalm_all.fa
oilpalm_all.gff3

Can anyone give me some suggestion about how to speed up it?

Or is there something wrong with my command?

Thanks!

Jingjing






--
View this message in context: http://generic-model-organism-system-database.450254.n5.nabble.com/bp-seqfeature-load-pl-too-slow-tp5711076.html
Sent from the gmod-gbrowse mailing list archive at Nabble.com.

------------------------------------------------------------------------------
Minimize network downtime and maximize team effectiveness.
Reduce network management and security costs.Learn how to hire
the most talented Cisco Certified professionals. Visit the
Employer Resources Portal
http://www.cisco.com/web/learning/employer_resources/index.html
_______________________________________________
Gmod-gbrowse mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
Reply | Threaded
Open this post in threaded view
|

Re: bp_seqfeature_load.pl too slow

Scott Cain
Hi Jingjing,

Are you sure your GFF file is sorted the way it needs to be to use fast loading?  Parents need to be before any children and any lines of GFF that share the same ID must be together in the file.  When the loader detects that these conditions aren't met, it will automatically degrade to "slow" mode.

By the way, what do you consider too slow?

Scott




On Mon, Apr 8, 2013 at 7:44 PM, jjjscuedu <[hidden email]> wrote:

Dear all,

When I try to use bp_sequence_load.pl to load into my new mysql database, I
find it is a little too slow to run it.
bp_seqfeature_load.pl -c -f -a DBI::mysql -d oilpalm oilpalm_all.fa
oilpalm_all.gff3

Can anyone give me some suggestion about how to speed up it?

Or is there something wrong with my command?

Thanks!

Jingjing






--
View this message in context: http://generic-model-organism-system-database.450254.n5.nabble.com/bp-seqfeature-load-pl-too-slow-tp5711076.html
Sent from the gmod-gbrowse mailing list archive at Nabble.com.

------------------------------------------------------------------------------
Minimize network downtime and maximize team effectiveness.
Reduce network management and security costs.Learn how to hire
the most talented Cisco Certified professionals. Visit the
Employer Resources Portal
http://www.cisco.com/web/learning/employer_resources/index.html
_______________________________________________
Gmod-gbrowse mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse



--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research

------------------------------------------------------------------------------
Minimize network downtime and maximize team effectiveness.
Reduce network management and security costs.Learn how to hire
the most talented Cisco Certified professionals. Visit the
Employer Resources Portal
http://www.cisco.com/web/learning/employer_resources/index.html
_______________________________________________
Gmod-gbrowse mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
Reply | Threaded
Open this post in threaded view
|

Re: bp_seqfeature_load.pl too slow

JINGJING JIN-2
Dear Scott,

Thanks for your kindly reply.

Yes, I have put the features according to fast loading, Parents need to be
before any children and any lines of GFF that share the same ID must be
together in the file.

However, it may be because my features is too many, at first, it is fast,
about 0.04s/1000 features.

Later, the speed is around 8.7s/1000 features. Takes nearly 10 hours to load
all these features.

jingjing@jingjing-desktop:~/db/gbrowse2/databases/oilpalm$
bp_seqfeature_load.pl -c -f -a DBI::mysql -d oilpalm oilpalm_all.fa
oilpalm_all_pre.gff3

loading oilpalm_all.fa...
Building object tree... 0.00s                                                                              
Loading bulk data into database.../tmp/feature.7469
/tmp/name.7469
/tmp/attribute.7469
/tmp/parent2child.7469
 0.00s
load time: 349.22s
loading oilpalm_all_pre.gff3...
Building object tree...311.35s152.95s                                                                                                          
Loading bulk data into database.../tmp/feature.7469
DBD::mysql::db do failed: MySQL server has gone away at
/usr/local/share/perl/5.10.1/Bio/DB/SeqFeature/Store/DBI/mysql.pm line 571,
<GEN5> line 11856303.

-------------------- EXCEPTION --------------------
MSG: MySQL server has gone away
STACK Bio::DB::SeqFeature::Store::DBI::mysql::_finish_bulk_update
/usr/local/share/perl/5.10.1/Bio/DB/SeqFeature/Store/DBI/mysql.pm:571
STACK Bio::DB::SeqFeature::Store::finish_bulk_update
/usr/local/share/perl/5.10.1/Bio/DB/SeqFeature/Store.pm:1505
STACK Bio::DB::SeqFeature::Store::GFF3Loader::finish_load
/usr/local/share/perl/5.10.1/Bio/DB/SeqFeature/Store/GFF3Loader.pm:349
STACK Bio::DB::SeqFeature::Store::Loader::load_fh
/usr/local/share/perl/5.10.1/Bio/DB/SeqFeature/Store/Loader.pm:354
STACK Bio::DB::SeqFeature::Store::Loader::load
/usr/local/share/perl/5.10.1/Bio/DB/SeqFeature/Store/Loader.pm:243
STACK toplevel /usr/local/bin/bp_seqfeature_load.pl:252

However, this line 11856303  should be last line in my gff file.

For this error, Is it anything wrong with my file?

jingjing




--
View this message in context: http://generic-model-organism-system-database.450254.n5.nabble.com/bp-seqfeature-load-pl-too-slow-tp5711076p5711095.html
Sent from the gmod-gbrowse mailing list archive at Nabble.com.

------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
Gmod-gbrowse mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
Reply | Threaded
Open this post in threaded view
|

Re: bp_seqfeature_load.pl too slow

Lincoln Stein
Something is the matter with your MySQL installation. Your MySQL server is crashing during the bulk load phase. You should have a look in the MySQL error log to see what exactly is happening.

Are you running on top of an NFS-mounted directory by any chance?

Lincoln


On Tue, Apr 9, 2013 at 7:49 PM, jjjscuedu <[hidden email]> wrote:
Dear Scott,

Thanks for your kindly reply.

Yes, I have put the features according to fast loading, Parents need to be
before any children and any lines of GFF that share the same ID must be
together in the file.

However, it may be because my features is too many, at first, it is fast,
about 0.04s/1000 features.

Later, the speed is around 8.7s/1000 features. Takes nearly 10 hours to load
all these features.

jingjing@jingjing-desktop:~/db/gbrowse2/databases/oilpalm$
bp_seqfeature_load.pl -c -f -a DBI::mysql -d oilpalm oilpalm_all.fa
oilpalm_all_pre.gff3

loading oilpalm_all.fa...
Building object tree... 0.00s
Loading bulk data into database.../tmp/feature.7469
/tmp/name.7469
/tmp/attribute.7469
/tmp/parent2child.7469
 0.00s
load time: 349.22s
loading oilpalm_all_pre.gff3...
Building object tree...311.35s152.95s
Loading bulk data into database.../tmp/feature.7469
DBD::mysql::db do failed: MySQL server has gone away at
/usr/local/share/perl/5.10.1/Bio/DB/SeqFeature/Store/DBI/mysql.pm line 571,
<GEN5> line 11856303.

-------------------- EXCEPTION --------------------
MSG: MySQL server has gone away
STACK Bio::DB::SeqFeature::Store::DBI::mysql::_finish_bulk_update
/usr/local/share/perl/5.10.1/Bio/DB/SeqFeature/Store/DBI/mysql.pm:571
STACK Bio::DB::SeqFeature::Store::finish_bulk_update
/usr/local/share/perl/5.10.1/Bio/DB/SeqFeature/Store.pm:1505
STACK Bio::DB::SeqFeature::Store::GFF3Loader::finish_load
/usr/local/share/perl/5.10.1/Bio/DB/SeqFeature/Store/GFF3Loader.pm:349
STACK Bio::DB::SeqFeature::Store::Loader::load_fh
/usr/local/share/perl/5.10.1/Bio/DB/SeqFeature/Store/Loader.pm:354
STACK Bio::DB::SeqFeature::Store::Loader::load
/usr/local/share/perl/5.10.1/Bio/DB/SeqFeature/Store/Loader.pm:243
STACK toplevel /usr/local/bin/bp_seqfeature_load.pl:252

However, this line 11856303  should be last line in my gff file.

For this error, Is it anything wrong with my file?

jingjing




--
View this message in context: http://generic-model-organism-system-database.450254.n5.nabble.com/bp-seqfeature-load-pl-too-slow-tp5711076p5711095.html
Sent from the gmod-gbrowse mailing list archive at Nabble.com.

------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
Gmod-gbrowse mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse



--
Lincoln D. Stein
Director, Informatics and Biocomputing Platform
Ontario Institute for Cancer Research
101 College St., Suite 800
Toronto, ON, Canada M5G0A3
416 673-8514
Assistant: Renata Musa <[hidden email]>

------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
Gmod-gbrowse mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
Reply | Threaded
Open this post in threaded view
|

Re: bp_seqfeature_load.pl too slow

JINGJING JIN-2
Dear Lincoln,

My mysql error log is something like this:

130408 14:21:08 [Note] Plugin 'FEDERATED' is disabled.
130408 14:21:08  InnoDB: Initializing buffer pool, size = 8.0M
130408 14:21:08  InnoDB: Completed initialization of buffer pool
130408 14:21:08  InnoDB: Started; log sequence number 0 44243
130408 14:21:08 [Note] Event Scheduler: Loaded 0 events
130408 14:21:08 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.1.67-0ubuntu0.10.04.1'  socket: '/var/run/mysqld/mysqld.sock'
port: 3306  (Ubuntu)
130408 14:32:25 [Note] /usr/sbin/mysqld: Normal shutdown

130408 14:32:26 [Note] Event Scheduler: Purging the queue. 0 events
130408 14:32:26  InnoDB: Starting shutdown...
130408 14:32:29  InnoDB: Shutdown completed; log sequence number 0 44243
130408 14:32:29 [Note] /usr/sbin/mysqld: Shutdown complete

130408 14:33:14 [Note] Plugin 'FEDERATED' is disabled.
130408 14:33:15  InnoDB: Initializing buffer pool, size = 8.0M
130408 14:33:15  InnoDB: Completed initialization of buffer pool
130408 14:33:15  InnoDB: Started; log sequence number 0 44243
130408 14:33:15 [Note] Event Scheduler: Loaded 0 events
130408 14:33:15 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.1.67-0ubuntu0.10.04.1'  socket: '/var/run/mysqld/mysqld.sock'
port: 3306  (Ubuntu)

Is it because my features are too many?





--
View this message in context: http://generic-model-organism-system-database.450254.n5.nabble.com/bp-seqfeature-load-pl-too-slow-tp5711076p5711097.html
Sent from the gmod-gbrowse mailing list archive at Nabble.com.

------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
Gmod-gbrowse mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
Reply | Threaded
Open this post in threaded view
|

Re: bp_seqfeature_load.pl too slow

Lincoln Stein
Something is telling mysql to shut down. Do you have some type of cron job that terminates inactive daemons?

Lincoln


On Tue, Apr 9, 2013 at 8:27 PM, jjjscuedu <[hidden email]> wrote:
Dear Lincoln,

My mysql error log is something like this:

130408 14:21:08 [Note] Plugin 'FEDERATED' is disabled.
130408 14:21:08  InnoDB: Initializing buffer pool, size = 8.0M
130408 14:21:08  InnoDB: Completed initialization of buffer pool
130408 14:21:08  InnoDB: Started; log sequence number 0 44243
130408 14:21:08 [Note] Event Scheduler: Loaded 0 events
130408 14:21:08 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.1.67-0ubuntu0.10.04.1'  socket: '/var/run/mysqld/mysqld.sock'
port: 3306  (Ubuntu)
130408 14:32:25 [Note] /usr/sbin/mysqld: Normal shutdown

130408 14:32:26 [Note] Event Scheduler: Purging the queue. 0 events
130408 14:32:26  InnoDB: Starting shutdown...
130408 14:32:29  InnoDB: Shutdown completed; log sequence number 0 44243
130408 14:32:29 [Note] /usr/sbin/mysqld: Shutdown complete

130408 14:33:14 [Note] Plugin 'FEDERATED' is disabled.
130408 14:33:15  InnoDB: Initializing buffer pool, size = 8.0M
130408 14:33:15  InnoDB: Completed initialization of buffer pool
130408 14:33:15  InnoDB: Started; log sequence number 0 44243
130408 14:33:15 [Note] Event Scheduler: Loaded 0 events
130408 14:33:15 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.1.67-0ubuntu0.10.04.1'  socket: '/var/run/mysqld/mysqld.sock'
port: 3306  (Ubuntu)

Is it because my features are too many?





--
View this message in context: http://generic-model-organism-system-database.450254.n5.nabble.com/bp-seqfeature-load-pl-too-slow-tp5711076p5711097.html
Sent from the gmod-gbrowse mailing list archive at Nabble.com.

------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
Gmod-gbrowse mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse



--
Lincoln D. Stein
Director, Informatics and Biocomputing Platform
Ontario Institute for Cancer Research
101 College St., Suite 800
Toronto, ON, Canada M5G0A3
416 673-8514
Assistant: Renata Musa <[hidden email]>

------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
Gmod-gbrowse mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
Reply | Threaded
Open this post in threaded view
|

Re: bp_seqfeature_load.pl too slow

JINGJING JIN-2
In fact, I just run this job last night and don't run other jobs on this
computer.

I am also a little confused why the mysql shut down, is it because it
running too long time/

jingjing



--
View this message in context: http://generic-model-organism-system-database.450254.n5.nabble.com/bp-seqfeature-load-pl-too-slow-tp5711076p5711099.html
Sent from the gmod-gbrowse mailing list archive at Nabble.com.

------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
Gmod-gbrowse mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse