Duplicate IDs in GFF3 when loading Gene->mRNA->exon/CDS

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Duplicate IDs in GFF3 when loading Gene->mRNA->exon/CDS

Todd Harris-2
Hi all -

I'm using the core GFF3 parser using the following simple GFF3 example (attached below).

This fails to load with the following exception

--- Nested Exception ---
/usr/local/wormbase/intermine/intermine_0_98/imbuild/source.xml:253: java.lang.IllegalArgumentException: Duplicated IDs in GFF file: [CDS:F14B4.3]

I believe the GFF conforms to the spec (duplicate IDs are allowed for features that span multiple locations on a sequence).

Hints?

Thanks,

Todd

Test GFF
I       WormBase        gene    9280954 9286569 .       -       .       ID=Gene:WBGene00008781;Name=WBGene00008781
I       WormBase        mRNA    9280954 9286569 .       -       .       ID=Transcript:F14B4.3;Parent=Gene:WBGene00008781
I       WormBase        exon    9280954 9281597 .       -       .       Parent=Transcript:F14B4.3
I       WormBase        exon    9281879 9282076 .       -       .       Parent=Transcript:F14B4.3
I       WormBase        exon    9282128 9282236 .       -       .       Parent=Transcript:F14B4.3
I       WormBase        exon    9282686 9283477 .       -       .       Parent=Transcript:F14B4.3
I       WormBase        exon    9283664 9283944 .       -       .       Parent=Transcript:F14B4.3
I       WormBase        exon    9284000 9284125 .       -       .       Parent=Transcript:F14B4.3
I       WormBase        exon    9284355 9284781 .       -       .       Parent=Transcript:F14B4.3
I       WormBase        exon    9285184 9286019 .       -       .       Parent=Transcript:F14B4.3
I       WormBase        exon    9286182 9286325 .       -       .       Parent=Transcript:F14B4.3
I       WormBase        exon    9286381 9286569 .       -       .       Parent=Transcript:F14B4.3
I       WormBase        CDS     9281301 9281597 .       -       0       ID=CDS:F14B4.3;Parent=Transcript:F14B4.3
I       WormBase        CDS     9281879 9282076 .       -       0       ID=CDS:F14B4.3;Parent=Transcript:F14B4.3
I       WormBase        CDS     9282128 9282236 .       -       1       ID=CDS:F14B4.3;Parent=Transcript:F14B4.3
I       WormBase        CDS     9282686 9283477 .       -       1       ID=CDS:F14B4.3;Parent=Transcript:F14B4.3
I       WormBase        CDS     9283664 9283944 .       -       0       ID=CDS:F14B4.3;Parent=Transcript:F14B4.3
I       WormBase        CDS     9284000 9284125 .       -       0       ID=CDS:F14B4.3;Parent=Transcript:F14B4.3
I       WormBase        CDS     9284355 9284781 .       -       1       ID=CDS:F14B4.3;Parent=Transcript:F14B4.3
I       WormBase        CDS     9285184 9286019 .       -       0       ID=CDS:F14B4.3;Parent=Transcript:F14B4.3
I       WormBase        CDS     9286182 9286325 .       -       0       ID=CDS:F14B4.3;Parent=Transcript:F14B4.3
I       WormBase        CDS     9286381 9286554 .       -       0       ID=CDS:F14B4.3;Parent=Transcript:F14B4.3
_______________________________________________
dev mailing list
[hidden email]
http://mail.intermine.org/cgi-bin/mailman/listinfo/dev
Reply | Threaded
Open this post in threaded view
|

Re: Duplicate IDs in GFF3 when loading Gene->mRNA->exon/CDS

Richard Smith
Hi Todd,
Andrew pointed this out a couple of months ago, we need to update the
parser to cope with discontinuous locations.  The change will be in 0.99
and we can try to port it to 0.98.  If it's possible to fiddle the ids
or ignore CDSs temporarily that would be good.

We haven't updated the parser since the GFF3 spec was 'clarified'
concerning duplicate ids.

Cheers,
Richard.



On 19/01/2012 00:03, Todd Harris wrote:

> Hi all -
>
> I'm using the core GFF3 parser using the following simple GFF3 example (attached below).
>
> This fails to load with the following exception
>
> --- Nested Exception ---
> /usr/local/wormbase/intermine/intermine_0_98/imbuild/source.xml:253: java.lang.IllegalArgumentException: Duplicated IDs in GFF file: [CDS:F14B4.3]
>
> I believe the GFF conforms to the spec (duplicate IDs are allowed for features that span multiple locations on a sequence).
>
> Hints?
>
> Thanks,
>
> Todd
>
> Test GFF
> I       WormBase        gene    9280954 9286569 .       -       .       ID=Gene:WBGene00008781;Name=WBGene00008781
> I       WormBase        mRNA    9280954 9286569 .       -       .       ID=Transcript:F14B4.3;Parent=Gene:WBGene00008781
> I       WormBase        exon    9280954 9281597 .       -       .       Parent=Transcript:F14B4.3
> I       WormBase        exon    9281879 9282076 .       -       .       Parent=Transcript:F14B4.3
> I       WormBase        exon    9282128 9282236 .       -       .       Parent=Transcript:F14B4.3
> I       WormBase        exon    9282686 9283477 .       -       .       Parent=Transcript:F14B4.3
> I       WormBase        exon    9283664 9283944 .       -       .       Parent=Transcript:F14B4.3
> I       WormBase        exon    9284000 9284125 .       -       .       Parent=Transcript:F14B4.3
> I       WormBase        exon    9284355 9284781 .       -       .       Parent=Transcript:F14B4.3
> I       WormBase        exon    9285184 9286019 .       -       .       Parent=Transcript:F14B4.3
> I       WormBase        exon    9286182 9286325 .       -       .       Parent=Transcript:F14B4.3
> I       WormBase        exon    9286381 9286569 .       -       .       Parent=Transcript:F14B4.3
> I       WormBase        CDS     9281301 9281597 .       -       0       ID=CDS:F14B4.3;Parent=Transcript:F14B4.3
> I       WormBase        CDS     9281879 9282076 .       -       0       ID=CDS:F14B4.3;Parent=Transcript:F14B4.3
> I       WormBase        CDS     9282128 9282236 .       -       1       ID=CDS:F14B4.3;Parent=Transcript:F14B4.3
> I       WormBase        CDS     9282686 9283477 .       -       1       ID=CDS:F14B4.3;Parent=Transcript:F14B4.3
> I       WormBase        CDS     9283664 9283944 .       -       0       ID=CDS:F14B4.3;Parent=Transcript:F14B4.3
> I       WormBase        CDS     9284000 9284125 .       -       0       ID=CDS:F14B4.3;Parent=Transcript:F14B4.3
> I       WormBase        CDS     9284355 9284781 .       -       1       ID=CDS:F14B4.3;Parent=Transcript:F14B4.3
> I       WormBase        CDS     9285184 9286019 .       -       0       ID=CDS:F14B4.3;Parent=Transcript:F14B4.3
> I       WormBase        CDS     9286182 9286325 .       -       0       ID=CDS:F14B4.3;Parent=Transcript:F14B4.3
> I       WormBase        CDS     9286381 9286554 .       -       0       ID=CDS:F14B4.3;Parent=Transcript:F14B4.3
> _______________________________________________
> dev mailing list
> [hidden email]
> http://mail.intermine.org/cgi-bin/mailman/listinfo/dev
>


_______________________________________________
dev mailing list
[hidden email]
http://mail.intermine.org/cgi-bin/mailman/listinfo/dev
Reply | Threaded
Open this post in threaded view
|

Re: Duplicate IDs in GFF3 when loadingGene->mRNA->exon/CDS

Vallejos, Andrew
Here is how I have "fixed" this in RatMine...use at own risk.

-Andrew

Index:
bio/core/main/src/org/intermine/bio/dataconversion/GFF3Converter.java
===================================================================
---
bio/core/main/src/org/intermine/bio/dataconversion/GFF3Converter.java
(revision 29199)
+++
bio/core/main/src/org/intermine/bio/dataconversion/GFF3Converter.java
(working copy)
@@ -122,6 +122,8 @@
                     processedIds.add(record.getId());
                 }
             }
+                       //AKV override
+                       duplicates = false;
             if (!duplicates) {
                 process

-----Original Message-----
From: [hidden email] [mailto:[hidden email]] On
Behalf Of Richard Smith
Sent: Thursday, January 19, 2012 7:59 AM
To: [hidden email]
Subject: Re: [InterMine Dev] Duplicate IDs in GFF3 when
loadingGene->mRNA->exon/CDS

Hi Todd,
Andrew pointed this out a couple of months ago, we need to update the
parser to cope with discontinuous locations.  The change will be in 0.99
and we can try to port it to 0.98.  If it's possible to fiddle the ids
or ignore CDSs temporarily that would be good.

We haven't updated the parser since the GFF3 spec was 'clarified'
concerning duplicate ids.

Cheers,
Richard.



On 19/01/2012 00:03, Todd Harris wrote:
> Hi all -
>
> I'm using the core GFF3 parser using the following simple GFF3 example
(attached below).
>
> This fails to load with the following exception
>
> --- Nested Exception ---
> /usr/local/wormbase/intermine/intermine_0_98/imbuild/source.xml:253:
java.lang.IllegalArgumentException: Duplicated IDs in GFF file:
[CDS:F14B4.3]
>
> I believe the GFF conforms to the spec (duplicate IDs are allowed for
features that span multiple locations on a sequence).
>
> Hints?
>
> Thanks,
>
> Todd
>
> Test GFF
> I       WormBase        gene    9280954 9286569 .       -       .
ID=Gene:WBGene00008781;Name=WBGene00008781
> I       WormBase        mRNA    9280954 9286569 .       -       .
ID=Transcript:F14B4.3;Parent=Gene:WBGene00008781
> I       WormBase        exon    9280954 9281597 .       -       .
Parent=Transcript:F14B4.3
> I       WormBase        exon    9281879 9282076 .       -       .
Parent=Transcript:F14B4.3
> I       WormBase        exon    9282128 9282236 .       -       .
Parent=Transcript:F14B4.3
> I       WormBase        exon    9282686 9283477 .       -       .
Parent=Transcript:F14B4.3
> I       WormBase        exon    9283664 9283944 .       -       .
Parent=Transcript:F14B4.3
> I       WormBase        exon    9284000 9284125 .       -       .
Parent=Transcript:F14B4.3
> I       WormBase        exon    9284355 9284781 .       -       .
Parent=Transcript:F14B4.3
> I       WormBase        exon    9285184 9286019 .       -       .
Parent=Transcript:F14B4.3
> I       WormBase        exon    9286182 9286325 .       -       .
Parent=Transcript:F14B4.3
> I       WormBase        exon    9286381 9286569 .       -       .
Parent=Transcript:F14B4.3
> I       WormBase        CDS     9281301 9281597 .       -       0
ID=CDS:F14B4.3;Parent=Transcript:F14B4.3
> I       WormBase        CDS     9281879 9282076 .       -       0
ID=CDS:F14B4.3;Parent=Transcript:F14B4.3
> I       WormBase        CDS     9282128 9282236 .       -       1
ID=CDS:F14B4.3;Parent=Transcript:F14B4.3
> I       WormBase        CDS     9282686 9283477 .       -       1
ID=CDS:F14B4.3;Parent=Transcript:F14B4.3
> I       WormBase        CDS     9283664 9283944 .       -       0
ID=CDS:F14B4.3;Parent=Transcript:F14B4.3
> I       WormBase        CDS     9284000 9284125 .       -       0
ID=CDS:F14B4.3;Parent=Transcript:F14B4.3
> I       WormBase        CDS     9284355 9284781 .       -       1
ID=CDS:F14B4.3;Parent=Transcript:F14B4.3
> I       WormBase        CDS     9285184 9286019 .       -       0
ID=CDS:F14B4.3;Parent=Transcript:F14B4.3
> I       WormBase        CDS     9286182 9286325 .       -       0
ID=CDS:F14B4.3;Parent=Transcript:F14B4.3
> I       WormBase        CDS     9286381 9286554 .       -       0
ID=CDS:F14B4.3;Parent=Transcript:F14B4.3
> _______________________________________________
> dev mailing list
> [hidden email]
> http://mail.intermine.org/cgi-bin/mailman/listinfo/dev
>


_______________________________________________
dev mailing list
[hidden email]
http://mail.intermine.org/cgi-bin/mailman/listinfo/dev

_______________________________________________
dev mailing list
[hidden email]
http://mail.intermine.org/cgi-bin/mailman/listinfo/dev