Re: [Gmod-gbrowse] FW: bp_genbank2gff3- Unflattening error

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: [Gmod-gbrowse] FW: bp_genbank2gff3- Unflattening error

Don Gilbert-2-3
Pushkala,

Scott seems right, your Genbank entry may be biologically correct, but it is
computationally in error because it says one mRNA extends beyond its
enclosing gene boundaries:

mRNA: complement(join(9047672...9065992))  /gene="CCL14"
gene: complement(9047672..9050719)         /gene="CCL14"
                          ^^^^^^^ shorter than mRNA

versus this gene span that encloses above mRNA:
gene: complement(9047672..9065992)     /gene="CCL14-CCL15"

- Don
-- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405
-- [hidden email]--http://marmot.bio.indiana.edu/

------------------------------------------------------------------------------
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2 & L3.
Spend less time writing and  rewriting code and more time creating great
experiences on the web. Be a part of the beta today.
http://p.sf.net/sfu/beautyoftheweb
_______________________________________________
Gmod-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-devel
Reply | Threaded
Open this post in threaded view
|

Re: [Gmod-gbrowse] FW: bp_genbank2gff3- Unflattening error

Jayaraman, Pushkala
Ouch..
So its bad data.. :(

How did you guys handle it? Simple delete that segment and let the
script continue? Coz it finds such errors and then fails to process the
entire contig..


As an extension I get this error too:
PROBLEM:
NT_024524 Unflattening error:
Details:
------------- EXCEPTION -------------
MSG: 1 there is a conflict with exons; there was an explicitly stated
exon with location 22748456..22748502, yet I cannot generate this exon
from the supplied mRNA locations

1 There are some inferred exons that are not in the explicit exon list;
they are the exons at locations:
10982777..10983033
9516278..9517506
1225346..1225429
33491613..33491816
58797942..58798087
7323184..7323367
21253638..21253755
59172140..59172196
54309290..54310329
8988942..8989171
26569087..26569218
6479986..6480032
32266760..32267377
...

STACK Bio::SeqFeature::Tools::Unflattener::unflatten_seq
/usr/local/perl5.8.9/lib/site_perl/5.8.9/Bio/SeqFeature/Tools/Unflattene
r.pm:1631
STACK (eval) /usr/local/perl5.8.9/bin/bp_genbank2gff3.pl:915
STACK main::unflatten_seq
/usr/local/perl5.8.9/bin/bp_genbank2gff3.pl:914
STACK toplevel /usr/local/perl5.8.9/bin/bp_genbank2gff3.pl:411
-------------------------------------


Has anyone also ever come across this.. ?
Pushkala Jayaraman
Programmer/Analyst
Rat Genome Database
Human and Molecular Genetics Center
Medical College of Wisconsin
Email: [hidden email]
Work: 414-955-2229
www.rgd.mcw.edu


-----Original Message-----
From: Don Gilbert [mailto:[hidden email]]
Sent: Thursday, October 07, 2010 4:09 PM
To: Jayaraman, Pushkala; [hidden email]
Cc: [hidden email]; [hidden email]
Subject: Re: [Gmod-gbrowse] [GMOD-devel] FW: bp_genbank2gff3-
Unflattening error

Pushkala,

Scott seems right, your Genbank entry may be biologically correct, but
it is
computationally in error because it says one mRNA extends beyond its
enclosing gene boundaries:

mRNA: complement(join(9047672...9065992))  /gene="CCL14"
gene: complement(9047672..9050719)         /gene="CCL14"
                          ^^^^^^^ shorter than mRNA

versus this gene span that encloses above mRNA:
gene: complement(9047672..9065992)     /gene="CCL14-CCL15"

- Don
-- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405
-- [hidden email]--http://marmot.bio.indiana.edu/

------------------------------------------------------------------------------
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2 & L3.
Spend less time writing and  rewriting code and more time creating great
experiences on the web. Be a part of the beta today.
http://p.sf.net/sfu/beautyoftheweb
_______________________________________________
Gmod-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-devel
Reply | Threaded
Open this post in threaded view
|

bp_genbank2gff3- Unflattening error - solution/quickfix

Jayaraman, Pushkala
In reply to this post by Don Gilbert-2-3
Hello,
I think I have found a quickfix to the problem without changing the
code..
I'm guessing Don already had addressed this in an earlier post.


The Unflattener.pm module reports error where it finds strange data or
finds tags it is not yet taught to parse correctly..
I noticed that when it reports errors for a certain .gbk file, it also
ends up messing up the entire file format.. i.e when it finds a
dicistronic gene, i.e a gene with a read through mRNA that spans more
than the gene, it reports an error with a SEVERITY value. The gff file
that it creates will have ID=XXX.t01,XXX.t02;Parent=XXX,XXX; etc.. for
an mRNA feature. And it also seems to list out all exon features as mRNA
and give them all the same IDs. Even the CDS seem to get more than 2 IDs
out of which the 2nd and third ID is repeated..  

There is an option in the bp_genbank2gff3.pl script that allows users to
set the error_threshold. If you set the error_threshold relatively high
i.e >2 then it ensures that the Unflatterner.pm doesn't report any
errors and reports the converted gbk to gff3 file as is.  

This seems to be a more common case in the human .gbk files. So a
quick-fix is to set the option -e 3 so that the gff3 files can be
correctly parsed...

Bp_genbank2gff3.pl -e 3 ***.gbk
Just wanted to let you guys know... didn't want anyone else to break
their head over this and wonder why their gff3 files are turning out all
weird..
I wasn't able to post this on the BioPerl forum as my mail is still
awaiting moderator approval..

Thanks,
Pushkala Jayaraman
Programmer/Analyst
Rat Genome Database
Human and Molecular Genetics Center
Medical College of Wisconsin
Email: [hidden email]
Work: 414-955-2229
www.rgd.mcw.edu


-----Original Message-----
From: Don Gilbert [mailto:[hidden email]]
Sent: Thursday, October 07, 2010 4:09 PM
To: Jayaraman, Pushkala; [hidden email]
Cc: [hidden email]; [hidden email]
Subject: Re: [Gmod-gbrowse] [GMOD-devel] FW: bp_genbank2gff3-
Unflattening error

Pushkala,

Scott seems right, your Genbank entry may be biologically correct, but
it is
computationally in error because it says one mRNA extends beyond its
enclosing gene boundaries:

mRNA: complement(join(9047672...9065992))  /gene="CCL14"
gene: complement(9047672..9050719)         /gene="CCL14"
                          ^^^^^^^ shorter than mRNA

versus this gene span that encloses above mRNA:
gene: complement(9047672..9065992)     /gene="CCL14-CCL15"

- Don
-- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405
-- [hidden email]--http://marmot.bio.indiana.edu/

------------------------------------------------------------------------------
Download new Adobe(R) Flash(R) Builder(TM) 4
The new Adobe(R) Flex(R) 4 and Flash(R) Builder(TM) 4 (formerly
Flex(R) Builder(TM)) enable the development of rich applications that run
across multiple browsers and platforms. Download your free trials today!
http://p.sf.net/sfu/adobe-dev2dev
_______________________________________________
Gmod-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-devel
Reply | Threaded
Open this post in threaded view
|

Re: bp_genbank2gff3- Unflattening error - solution/quickfix

Chris Mungall
[removed gbrowse from cc]

this is a dicistronic gene. ideally the unflattener would detect this  
and make the cassette gene extend over.

this is from hs17? Which version?

When I look here
http://www.ncbi.nlm.nih.gov/nuccore/NT_010799.15?from=9047686&to=9066094&report=genbank&strand=true

the genes all contain the mRNA - however, there are other oddities  
such as CCL14 and CCL15 being co-located.

On Oct 14, 2010, at 3:27 PM, Jayaraman, Pushkala wrote:

> Hello,
> I think I have found a quickfix to the problem without changing the
> code..
> I'm guessing Don already had addressed this in an earlier post.
>
>
> The Unflattener.pm module reports error where it finds strange data or
> finds tags it is not yet taught to parse correctly..
> I noticed that when it reports errors for a certain .gbk file, it also
> ends up messing up the entire file format.. i.e when it finds a
> dicistronic gene, i.e a gene with a read through mRNA that spans more
> than the gene, it reports an error with a SEVERITY value. The gff file
> that it creates will have ID=XXX.t01,XXX.t02;Parent=XXX,XXX; etc.. for
> an mRNA feature. And it also seems to list out all exon features as  
> mRNA
> and give them all the same IDs. Even the CDS seem to get more than 2  
> IDs
> out of which the 2nd and third ID is repeated..
>
> There is an option in the bp_genbank2gff3.pl script that allows  
> users to
> set the error_threshold. If you set the error_threshold relatively  
> high
> i.e >2 then it ensures that the Unflatterner.pm doesn't report any
> errors and reports the converted gbk to gff3 file as is.
>
> This seems to be a more common case in the human .gbk files. So a
> quick-fix is to set the option -e 3 so that the gff3 files can be
> correctly parsed...
>
> Bp_genbank2gff3.pl -e 3 ***.gbk
> Just wanted to let you guys know... didn't want anyone else to break
> their head over this and wonder why their gff3 files are turning out  
> all
> weird..
> I wasn't able to post this on the BioPerl forum as my mail is still
> awaiting moderator approval..
>
> Thanks,
> Pushkala Jayaraman
> Programmer/Analyst
> Rat Genome Database
> Human and Molecular Genetics Center
> Medical College of Wisconsin
> Email: [hidden email]
> Work: 414-955-2229
> www.rgd.mcw.edu
>
>
> -----Original Message-----
> From: Don Gilbert [mailto:[hidden email]]
> Sent: Thursday, October 07, 2010 4:09 PM
> To: Jayaraman, Pushkala; [hidden email]
> Cc: [hidden email]; [hidden email]
> Subject: Re: [Gmod-gbrowse] [GMOD-devel] FW: bp_genbank2gff3-
> Unflattening error
>
> Pushkala,
>
> Scott seems right, your Genbank entry may be biologically correct, but
> it is
> computationally in error because it says one mRNA extends beyond its
> enclosing gene boundaries:
>
> mRNA: complement(join(9047672...9065992))  /gene="CCL14"
> gene: complement(9047672..9050719)         /gene="CCL14"
>                          ^^^^^^^ shorter than mRNA
>
> versus this gene span that encloses above mRNA:
> gene: complement(9047672..9065992)     /gene="CCL14-CCL15"
>
> - Don
> -- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405
> -- [hidden email]--http://marmot.bio.indiana.edu/
>
> ------------------------------------------------------------------------------
> Download new Adobe(R) Flash(R) Builder(TM) 4
> The new Adobe(R) Flex(R) 4 and Flash(R) Builder(TM) 4 (formerly
> Flex(R) Builder(TM)) enable the development of rich applications  
> that run
> across multiple browsers and platforms. Download your free trials  
> today!
> http://p.sf.net/sfu/adobe-dev2dev
> _______________________________________________
> Gmod-devel mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gmod-devel


------------------------------------------------------------------------------
Download new Adobe(R) Flash(R) Builder(TM) 4
The new Adobe(R) Flex(R) 4 and Flash(R) Builder(TM) 4 (formerly
Flex(R) Builder(TM)) enable the development of rich applications that run
across multiple browsers and platforms. Download your free trials today!
http://p.sf.net/sfu/adobe-dev2dev
_______________________________________________
Gmod-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-devel
Reply | Threaded
Open this post in threaded view
|

Re: bp_genbank2gff3- Unflattening error - solution/quickfix

Jayaraman, Pushkala
This is from build 36.2 and build 36.3
It does detect that it is a dicistronic gene, but then it messes up the
entire gff3 file by replacing all "exon" tags to "mRNA" tags, with
multiple IDs(sometimes more than one of these features has the same set
of Ids)like this:

NT_010799          GenBank             gene      344463  344957  .
+             .
ID=LOC654170;Dbxref=GeneID:654170;Note=Derived by automated
computational analysis using gene prediction method: GNOMON. Supporting
evidence includes similarity to: 2 Proteins;gene=LOC654170
NT_010799          GenBank             mRNA   344463  344957  .
+             .
ID=LOC654170.t01,LOC654170.t02;Parent=LOC654170,LOC654170;Dbxref=GI:1692
11091,GeneID:654170;Note=Derived by automated computational analysis
using gene prediction method: GNOMON. Supporting evidence includes
similarity to: 2 Proteins;gene=LOC654170;product=similar to
hCG1643342;transcript_id=XM_001723743.1
NT_010799          GenBank             mRNA   344463  344957  .
+             .
ID=LOC654170.t01,LOC654170.t02;Parent=LOC654170,LOC654170;Dbxref=GI:1692
11091,GeneID:654170;Note=Derived by automated computational analysis
using gene prediction method: GNOMON. Supporting evidence includes
similarity to: 2 Proteins;gene=LOC654170;product=similar to
hCG1643342;transcript_id=XM_001723743.1
NT_010799          GenBank             CDS        344463  344957  .
+             .
ID=LOC654170.p01,LOC654170.p02,LOC654170.p02;Parent=LOC654170.t01,LOC654
170.t02,LOC654170.t02;Dbxref=GI:169211092,GeneID:654170;codon_start=1;ge
ne=LOC654170;product=similar to hCG1643342;protein_id=XP_001723795.1
NT_010799          GenBank             CDS        344463  344957  .
+             .
ID=LOC654170.p01,LOC654170.p02,LOC654170.p02;Parent=LOC654170.t01,LOC654
170.t02,LOC654170.t02;Dbxref=GI:169211092,GeneID:654170;codon_start=1;ge
ne=LOC654170;product=similar to hCG1643342;protein_id=XP_001723795.1
NT_010799          GenBank             CDS        344463  344957  .
+             .
ID=LOC654170.p01,LOC654170.p02,LOC654170.p02;Parent=LOC654170.t01,LOC654
170.t02,LOC654170.t02;Dbxref=GI:169211092,GeneID:654170;codon_start=1;ge
ne=LOC654170;product=similar to hCG1643342;protein_id=XP_001723795.1

 
Is this the expected behavior?
Pushkala Jayaraman
Programmer/Analyst
Rat Genome Database
Human and Molecular Genetics Center
Medical College of Wisconsin
Email: [hidden email]
Work: 414-955-2229
www.rgd.mcw.edu


-----Original Message-----
From: Chris Mungall [mailto:[hidden email]]
Sent: Thursday, October 14, 2010 5:45 PM
To: Jayaraman, Pushkala
Cc: Don Gilbert; Scott Cain; gmod-devel list
Subject: Re: [GMOD-devel] bp_genbank2gff3- Unflattening error -
solution/quickfix

[removed gbrowse from cc]

this is a dicistronic gene. ideally the unflattener would detect this  
and make the cassette gene extend over.

this is from hs17? Which version?

When I look here
http://www.ncbi.nlm.nih.gov/nuccore/NT_010799.15?from=9047686&to=9066094
&report=genbank&strand=true

the genes all contain the mRNA - however, there are other oddities  
such as CCL14 and CCL15 being co-located.

On Oct 14, 2010, at 3:27 PM, Jayaraman, Pushkala wrote:

> Hello,
> I think I have found a quickfix to the problem without changing the
> code..
> I'm guessing Don already had addressed this in an earlier post.
>
>
> The Unflattener.pm module reports error where it finds strange data or
> finds tags it is not yet taught to parse correctly..
> I noticed that when it reports errors for a certain .gbk file, it also
> ends up messing up the entire file format.. i.e when it finds a
> dicistronic gene, i.e a gene with a read through mRNA that spans more
> than the gene, it reports an error with a SEVERITY value. The gff file
> that it creates will have ID=XXX.t01,XXX.t02;Parent=XXX,XXX; etc.. for
> an mRNA feature. And it also seems to list out all exon features as  
> mRNA
> and give them all the same IDs. Even the CDS seem to get more than 2  
> IDs
> out of which the 2nd and third ID is repeated..
>
> There is an option in the bp_genbank2gff3.pl script that allows  
> users to
> set the error_threshold. If you set the error_threshold relatively  
> high
> i.e >2 then it ensures that the Unflatterner.pm doesn't report any
> errors and reports the converted gbk to gff3 file as is.
>
> This seems to be a more common case in the human .gbk files. So a
> quick-fix is to set the option -e 3 so that the gff3 files can be
> correctly parsed...
>
> Bp_genbank2gff3.pl -e 3 ***.gbk
> Just wanted to let you guys know... didn't want anyone else to break
> their head over this and wonder why their gff3 files are turning out  
> all
> weird..
> I wasn't able to post this on the BioPerl forum as my mail is still
> awaiting moderator approval..
>
> Thanks,
> Pushkala Jayaraman
> Programmer/Analyst
> Rat Genome Database
> Human and Molecular Genetics Center
> Medical College of Wisconsin
> Email: [hidden email]
> Work: 414-955-2229
> www.rgd.mcw.edu
>
>
> -----Original Message-----
> From: Don Gilbert [mailto:[hidden email]]
> Sent: Thursday, October 07, 2010 4:09 PM
> To: Jayaraman, Pushkala; [hidden email]
> Cc: [hidden email];
[hidden email]

> Subject: Re: [Gmod-gbrowse] [GMOD-devel] FW: bp_genbank2gff3-
> Unflattening error
>
> Pushkala,
>
> Scott seems right, your Genbank entry may be biologically correct, but
> it is
> computationally in error because it says one mRNA extends beyond its
> enclosing gene boundaries:
>
> mRNA: complement(join(9047672...9065992))  /gene="CCL14"
> gene: complement(9047672..9050719)         /gene="CCL14"
>                          ^^^^^^^ shorter than mRNA
>
> versus this gene span that encloses above mRNA:
> gene: complement(9047672..9065992)     /gene="CCL14-CCL15"
>
> - Don
> -- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405
> -- [hidden email]--http://marmot.bio.indiana.edu/
>
>
------------------------------------------------------------------------
------

> Download new Adobe(R) Flash(R) Builder(TM) 4
> The new Adobe(R) Flex(R) 4 and Flash(R) Builder(TM) 4 (formerly
> Flex(R) Builder(TM)) enable the development of rich applications  
> that run
> across multiple browsers and platforms. Download your free trials  
> today!
> http://p.sf.net/sfu/adobe-dev2dev
> _______________________________________________
> Gmod-devel mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gmod-devel


------------------------------------------------------------------------------
Download new Adobe(R) Flash(R) Builder(TM) 4
The new Adobe(R) Flex(R) 4 and Flash(R) Builder(TM) 4 (formerly
Flex(R) Builder(TM)) enable the development of rich applications that run
across multiple browsers and platforms. Download your free trials today!
http://p.sf.net/sfu/adobe-dev2dev
_______________________________________________
Gmod-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-devel