Database disk image is malformed error

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Database disk image is malformed error

Tim Fallon
Hi there,

I’ve been running MAKER in a 2 stage way using MPI, to annotate a de novo insect genome.  By two stage, I mean for stage 1 I have a lot of independent folders / maker runs (e.g. individuals reference insect proteomes passed as FASTA with protein2genome=1), and then for stage 2 in a separate folder I am concatenating all that evidence from Stage 1 (using gff3_merge -o) and passing it as GFF parameters.

Stage 2 has been crashing.  It takes a very long time to setup the SQLite DB from the (~24 hours, with 39 MPI CPUs), and then once it is all loaded it works for a couple seconds then crashes with things like this:

"DBD::SQLite::db selectcol_arrayref failed: database disk image is malformed at /lab/solexa_weng/testtube/maker_3.00_beta/bin/../lib/GFFDB.pm line 525.”

I am passing a lot of evidence to Stage 2, probably more than people typically pass (the GFFs together are 44GB, whereas the resulting *.db file is 95G).

Have you seen this error before?  I’m thinking it could be a couple possibilities:
1) Running up against SQLite size / concurrency constraints where the .db ends up being malformed due to MPI / passing too much evidence. Solution -> Load GFFs without MPI, or load less evidence.
2) GFFs are malformed (they pass validation with GT). Solution -> Remove the malformed GFF evidence, although I haven’t been able to track any malformed GFFs down.
3) Identifiers in the GFF that are unique when in a single file, become non-unique. Solution -> Manually rename IDs in passed GFF files to be unique.

Thoughts?

All the best,
-Tim

Timothy R. Fallon
PhD candidate
Laboratory of Jing-Ke Weng
Department of Biology
MIT



_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

smime.p7s (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Database disk image is malformed error

Carson Holt-2
Don’t use the GFF3 as input to the second stage. Use the original work directory, and just modify and parameters in the control file. MAKER will reuse old results and only delete things that require rerun. Using the GFF3 as input is just a way to reuse MAKER data when the work directory is no longer available, and in most cases you will only pass in the genes (and not the evidence in the GFF3).

Thanks,
Carson



On Jun 16, 2017, at 9:07 AM, Tim Fallon <[hidden email]> wrote:

Hi there,

I’ve been running MAKER in a 2 stage way using MPI, to annotate a de novo insect genome.  By two stage, I mean for stage 1 I have a lot of independent folders / maker runs (e.g. individuals reference insect proteomes passed as FASTA with protein2genome=1), and then for stage 2 in a separate folder I am concatenating all that evidence from Stage 1 (using gff3_merge -o) and passing it as GFF parameters.

Stage 2 has been crashing.  It takes a very long time to setup the SQLite DB from the (~24 hours, with 39 MPI CPUs), and then once it is all loaded it works for a couple seconds then crashes with things like this:

"DBD::SQLite::db selectcol_arrayref failed: database disk image is malformed at /lab/solexa_weng/testtube/maker_3.00_beta/bin/../lib/GFFDB.pm line 525.”

I am passing a lot of evidence to Stage 2, probably more than people typically pass (the GFFs together are 44GB, whereas the resulting *.db file is 95G).

Have you seen this error before?  I’m thinking it could be a couple possibilities:
1) Running up against SQLite size / concurrency constraints where the .db ends up being malformed due to MPI / passing too much evidence. Solution -> Load GFFs without MPI, or load less evidence.
2) GFFs are malformed (they pass validation with GT). Solution -> Remove the malformed GFF evidence, although I haven’t been able to track any malformed GFFs down.
3) Identifiers in the GFF that are unique when in a single file, become non-unique. Solution -> Manually rename IDs in passed GFF files to be unique.

Thoughts?

All the best,
-Tim

Timothy R. Fallon
PhD candidate
Laboratory of Jing-Ke Weng
Department of Biology
MIT


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Database disk image is malformed error

Tim Fallon
Hi Carson,

Thanks for the tip!  The issue turned out that I needed using the “-l” parameter for gff3_merge, to automatically rename the IDs when merging them, and also to pass the appropriate evidence in the merged GFF using the "Re-annotation Using MAKER Derived GFF3” parameters.  I was using the more general parameters down below (protein_gff , est_gff etc).  Seems to be working now, though I am still getting the hang of how to fix up misbehaving gene models.

All the best,
-Tim

On Jun 23, 2017, at 12:15 AM, Carson Holt <[hidden email]> wrote:

Don’t use the GFF3 as input to the second stage. Use the original work directory, and just modify and parameters in the control file. MAKER will reuse old results and only delete things that require rerun. Using the GFF3 as input is just a way to reuse MAKER data when the work directory is no longer available, and in most cases you will only pass in the genes (and not the evidence in the GFF3).

Thanks,
Carson



On Jun 16, 2017, at 9:07 AM, Tim Fallon <[hidden email]> wrote:

Hi there,

I’ve been running MAKER in a 2 stage way using MPI, to annotate a de novo insect genome.  By two stage, I mean for stage 1 I have a lot of independent folders / maker runs (e.g. individuals reference insect proteomes passed as FASTA with protein2genome=1), and then for stage 2 in a separate folder I am concatenating all that evidence from Stage 1 (using gff3_merge -o) and passing it as GFF parameters.

Stage 2 has been crashing.  It takes a very long time to setup the SQLite DB from the (~24 hours, with 39 MPI CPUs), and then once it is all loaded it works for a couple seconds then crashes with things like this:

"DBD::SQLite::db selectcol_arrayref failed: database disk image is malformed at /lab/solexa_weng/testtube/maker_3.00_beta/bin/../lib/GFFDB.pm line 525.”

I am passing a lot of evidence to Stage 2, probably more than people typically pass (the GFFs together are 44GB, whereas the resulting *.db file is 95G).

Have you seen this error before?  I’m thinking it could be a couple possibilities:
1) Running up against SQLite size / concurrency constraints where the .db ends up being malformed due to MPI / passing too much evidence. Solution -> Load GFFs without MPI, or load less evidence.
2) GFFs are malformed (they pass validation with GT). Solution -> Remove the malformed GFF evidence, although I haven’t been able to track any malformed GFFs down.
3) Identifiers in the GFF that are unique when in a single file, become non-unique. Solution -> Manually rename IDs in passed GFF files to be unique.

Thoughts?

All the best,
-Tim

Timothy R. Fallon
PhD candidate
Laboratory of Jing-Ke Weng
Department of Biology
MIT


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


Timothy R. Fallon
PhD candidate
Laboratory of Jing-Ke Weng
Department of Biology
MIT



_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

smime.p7s (2K) Download Attachment
Loading...