Issue loading GFF track data to Apollo 2.0.7

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Issue loading GFF track data to Apollo 2.0.7

Kevin Pepper

Hi,

 

I am trying to import a gff3 file to Web Apollo using flatfile-to-json.pl. Here is an example of the file features:

 

Tbg972_01       chado   contig  1       177788  .       +       .       ID=gamb1273d11_p2kAQ

###

Tbg972_01       chado   gene    3502    4323    .       -       .       ID=Tbg972.1.10

Tbg972_01       chado   mRNA    3502    4323    .       -       .       ID=Tbg972.1.10:mRNA;Parent=Tbg972.1.10

Tbg972_01       chado   CDS     3502    4323    .       -       0       ID=Tbg972.1.10:exon:1;Parent=Tbg972.1.10:mRNA

###

Tbg972_01       chado   polypeptide     3502    4323    .       -       .       ID=Tbg972.1.10:pep;Derives_from=Tbg972.1.10:mRNA;comment=GPI-Anchor Signal predicted for Tbg972.1.10 by DGPI v2.04 with cleavage site probability 0.702 near 253;orthologous_to=Tcongolense:TcIL3000_1_110 link%3DTcIL3000_1_110.1:pep type%3Dorthologous_to%2C Tbruceibrucei927:Tb927.1.530 link%3DTb927.1.530.2:pep type%3Dorthologous_to%2C Tcruzi:TcCLB.511417.30 link%3DTcCLB.511417.30:pep type%3Dorthologous_to%2C Tcruzi:TcCLB.505997.200 link%3DTcCLB.505997.200:pep type%3Dorthologous_to%3B cluster_name%3DTrypanosome:ORTHOMCL7618%3B program%3DOrthoMCL%3B rank%3D0,Tcruzi:TcCLB.511417.30 link%3DTcCLB.511417.30:pep type%3Dorthologous_to%2C Tcongolense:TcIL3000_1_110 link%3DTcIL3000_1_110.1:pep type%3Dorthologous_to%2C Tbruceibrucei927:Tb927.1.530 link%3DTb927.1.530.2:pep type%3Dorthologous_to%2C Tcruzi:TcCLB.505997.200 link%3DTcCLB.505997.200:pep type%3Dorthologous_to%3B cluster_name%3DTrypanosoma:ORTHOMCL6863%3B program%3DOrthoMCL%3B rank%3D0;product=term%3Dhypothetical protein%2C conserved%3B;translation=mrsrymrkgnesvtvavssftredafflscmrslgigvapylrcrsfpqdgfplklfstlaetpenrivaavpcttvwtvddvhdddveglmpplqscqeacssphvskhfdllylslyfaiqacrttstsswshwqrqlsppatfksdvedaaatflqilsensivppmelmlnmcrytqthscrltkdrmklnlkegpvlavmplvdlmisqnvaegnvalrrcdarqlrslmrsnplkcakqhlsvcdddaaywlletvvavkeyiplnl

 

I get the following errors when running the script and no tracks are loaded:

 

GFF3 parse error: some features reference other features that do not exist in the file (or in the same '###' scope).  A list of them:

 ID                 |           Cannot Find

----------------------------------------------------------------------

Tbg972.1.10:pep     | Parent=Tbg972.1.10:mRNA

 

If I remove the ### between the CDS and polypeptide lines in the file then it will load without errors but the polypeptide track will not show (which has all the annotation).

 

Is there a way to enable flatfile-to-json.pl to cope with this feature structure?

 

Here’s how I’ve been running it…

flatfile-to-json.pl --tracklabel gene --key gene –gff /apollodata/data.gff3 --type gene --out /apollodata/tracks/

flatfile-to-json.pl --tracklabel mRNA --key mRNA –gff /apollodata/data.gff3 --type mRNA --out /apollodata/tracks/

flatfile-to-json.pl --tracklabel CDS --key CDS –gff /apollodata/data.gff3 --type CDS --out /apollodata/tracks/

flatfile-to-json.pl --tracklabel polypeptide --key polypeptide --gff /apollodata/data.gff3 --type polypeptide --out /apollodata/tracks/

 

Also, do you have to run it for each track or is there a way to process the whole file in one go?

 

Any help much appreciated.

 

Thanks,

 

Kevin Pepper

 

-- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.



This list is for the Apollo Annotation Editing Tool. Info at http://genomearchitect.org/
If you wish to unsubscribe from the Apollo List: 1. From the address with which you subscribed to the list, send a message to [hidden email] | 2. In the subject line of your email type: unsubscribe apollo | 3. Leave the message body blank.

Reply | Threaded
Open this post in threaded view
|

Re: Issue loading GFF track data to Apollo 2.0.7

nathandunn

This is more of a JBrowse question, so cross-posting it here as well.

Also, if you follow the spec (https://github.com/The-Sequence-Ontology/Specifications/blob/master/gff3.md) I think the ### belongs at the end of the last line.

###
Tbg972_01 chado contig 1 177788 . + . ID=gamb1273d11_p2kAQ
Tbg972_01 chado gene 3502 4323 . - . ID=Tbg972.1.10
Tbg972_01 chado mRNA 3502 4323 . - . ID=Tbg972.1.10:mRNA;Parent=Tbg972.1.10
Tbg972_01 chado CDS 3502 4323 . - 0 ID=Tbg972.1.10:exon:1;Parent=Tbg972.1.10:mRNA
Tbg972_01 chado polypeptide 3502 4323 . - . ID=Tbg972.1.10:pep;Derives_from=Tbg972.1.10:mRNA;comment=GPI-Anchor Signal predicted for Tbg972.1.10 by DGPI v2.04 with cleavage site probability 0.702 near 253;orthologous_to=Tcongolense:TcIL3000_1_110 link%3DTcIL3000_1_110.1:pep type%3Dorthologous_to%2C Tbruceibrucei927:Tb927.1.530 link%3DTb927.1.530.2:pep type%3Dorthologous_to%2C Tcruzi:TcCLB.511417.30 link%3DTcCLB.511417.30:pep type%3Dorthologous_to%2C Tcruzi:TcCLB.505997.200 link%3DTcCLB.505997.200:pep type%3Dorthologous_to%3B cluster_name%3DTrypanosome:ORTHOMCL7618%3B program%3DOrthoMCL%3B rank%3D0,Tcruzi:TcCLB.511417.30 link%3DTcCLB.511417.30:pep type%3Dorthologous_to%2C Tcongolense:TcIL3000_1_110 link%3DTcIL3000_1_110.1:pep type%3Dorthologous_to%2C Tbruceibrucei927:Tb927.1.530 link%3DTb927.1.530.2:pep type%3Dorthologous_to%2C Tcruzi:TcCLB.505997.200 link%3DTcCLB.505997.200:pep type%3Dorthologous_to%3B cluster_name%3DTrypanosoma:ORTHOMCL6863%3B program%3DOrthoMCL%3B rank%3D0;product=term%3Dhypothetical protein%2C conserved%3B;translation=mrsrymrkgnesvtvavssftredafflscmrslgigvapylrcrsfpqdgfplklfstlaetpenrivaavpcttvwtvddvhdddveglmpplqscqeacssphvskhfdllylslyfaiqacrttstsswshwqrqlsppatfksdvedaaatflqilsensivppmelmlnmcrytqthscrltkdrmklnlkegpvlavmplvdlmisqnvaegnvalrrcdarqlrslmrsnplkcakqhlsvcdddaaywlletvvavkeyiplnl
###

This seems to fix it.  I think if you have it up above it closes the ID resolution and that is probably why its breaking.    

There might be an error in the tool producing the GFF3, as well.

If you don’t want to worry about fixing it, you can just remove them altogether, as well.

grep -v ^###$ test.gff3 > clean-test.gff3



Nathan

On Nov 30, 2017, at 2:02 AM, Kevin Pepper <[hidden email]> wrote:

Hi,
 
I am trying to import a gff3 file to Web Apollo using flatfile-to-json.pl. Here is an example of the file features:
 
Tbg972_01       chado   contig  1       177788  .       +       .       ID=gamb1273d11_p2kAQ
###
Tbg972_01       chado   gene    3502    4323    .       -       .       ID=Tbg972.1.10
Tbg972_01       chado   mRNA    3502    4323    .       -       .       ID=Tbg972.1.10:mRNA;Parent=Tbg972.1.10
Tbg972_01       chado   CDS     3502    4323    .       -       0       ID=Tbg972.1.10:exon:1;Parent=Tbg972.1.10:mRNA
###
Tbg972_01       chado   polypeptide     3502    4323    .       -       .       ID=Tbg972.1.10:pep;Derives_from=Tbg972.1.10:mRNA;comment=GPI-Anchor Signal predicted for Tbg972.1.10 by DGPI v2.04 with cleavage site probability 0.702 near 253;orthologous_to=Tcongolense:TcIL3000_1_110 link%3DTcIL3000_1_110.1:pep type%3Dorthologous_to%2C Tbruceibrucei927:Tb927.1.530 link%3DTb927.1.530.2:pep type%3Dorthologous_to%2C Tcruzi:TcCLB.511417.30 link%3DTcCLB.511417.30:pep type%3Dorthologous_to%2C Tcruzi:TcCLB.505997.200 link%3DTcCLB.505997.200:pep type%3Dorthologous_to%3B cluster_name%3DTrypanosome:ORTHOMCL7618%3B program%3DOrthoMCL%3B rank%3D0,Tcruzi:TcCLB.511417.30 link%3DTcCLB.511417.30:pep type%3Dorthologous_to%2C Tcongolense:TcIL3000_1_110 link%3DTcIL3000_1_110.1:pep type%3Dorthologous_to%2C Tbruceibrucei927:Tb927.1.530 link%3DTb927.1.530.2:pep type%3Dorthologous_to%2C Tcruzi:TcCLB.505997.200 link%3DTcCLB.505997.200:pep type%3Dorthologous_to%3B cluster_name%3DTrypanosoma:ORTHOMCL6863%3B program%3DOrthoMCL%3B rank%3D0;product=term%3Dhypothetical protein%2C conserved%3B;translation=mrsrymrkgnesvtvavssftredafflscmrslgigvapylrcrsfpqdgfplklfstlaetpenrivaavpcttvwtvddvhdddveglmpplqscqeacssphvskhfdllylslyfaiqacrttstsswshwqrqlsppatfksdvedaaatflqilsensivppmelmlnmcrytqthscrltkdrmklnlkegpvlavmplvdlmisqnvaegnvalrrcdarqlrslmrsnplkcakqhlsvcdddaaywlletvvavkeyiplnl
 
I get the following errors when running the script and no tracks are loaded:
 
GFF3 parse error: some features reference other features that do not exist in the file (or in the same '###' scope).  A list of them:
 ID                 |           Cannot Find
----------------------------------------------------------------------
Tbg972.1.10:pep     | Parent=Tbg972.1.10:mRNA
 
If I remove the ### between the CDS and polypeptide lines in the file then it will load without errors but the polypeptide track will not show (which has all the annotation).
 
Is there a way to enable flatfile-to-json.pl to cope with this feature structure?
 
Here’s how I’ve been running it…
flatfile-to-json.pl --tracklabel gene --key gene –gff /apollodata/data.gff3 --type gene --out /apollodata/tracks/
flatfile-to-json.pl --tracklabel mRNA --key mRNA –gff /apollodata/data.gff3 --type mRNA --out /apollodata/tracks/
flatfile-to-json.pl --tracklabel CDS --key CDS –gff /apollodata/data.gff3 --type CDS --out /apollodata/tracks/
flatfile-to-json.pl --tracklabel polypeptide --key polypeptide --gff /apollodata/data.gff3 --type polypeptide --out /apollodata/tracks/
 
Also, do you have to run it for each track or is there a way to process the whole file in one go?
 
Any help much appreciated.
 
Thanks,
 
Kevin Pepper
 
-- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. 


This list is for the Apollo Annotation Editing Tool. Info at http://genomearchitect.org/
If you wish to unsubscribe from the Apollo List: 1. From the address with which you subscribed to the list, send a message to [hidden email] | 2. In the subject line of your email type: unsubscribe apollo | 3. Leave the message body blank. 






This list is for the Apollo Annotation Editing Tool. Info at http://genomearchitect.org/
If you wish to unsubscribe from the Apollo List: 1. From the address with which you subscribed to the list, send a message to [hidden email] | 2. In the subject line of your email type: unsubscribe apollo | 3. Leave the message body blank.