InterProScan loading into JBrowse

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

InterProScan loading into JBrowse

mictadlo
Hi
Our annotation is based on BRAKER2 additionally I ran InterProScan and I got GFF3, TSV and XML. I tried to load the GFF3 file using 

> perl /apollo/jbrowse/bin/flatfile-to-json.pl --gff /apollo/braker-soft_utr-interproscan.gff3 --compress --trackType HTMLFeatures --trackLabel "interproscan" --out /apollo/

It appears that no InterProScan results has been loaded. I assume that the InterProScan's GFF3 file does not contain any chromosome names:

##gff-version 3
##feature-ontology http://song.cvs.sourceforge.net/viewvc/song/ontology/sofa.obo?revision=1.269
##interproscan-version 5.36-75.0
##sequence-region g109343.t1 1 1358
g109343.t1      .       polypeptide     1       1358    .       +       .       ID=g109343.t1;md5=3e908dc966fefe367e64dc9d98b0d3ab
g109343.t1      ProSiteProfiles protein_match   628     724     18.261  +       .       date=20-07-2019;Target=g109343.t1 628 724;Ontology_term="GO:0015074";ID=match$1_628_724;signature_desc=Integrase c
atalytic domain profile.;Name=PS50994;status=T;Dbxref="InterPro:IPR001584"
g109343.t1      SUPERFAMILY     protein_match   586     624     4.19E-5 +       .       date=20-07-2019;Target=g109343.t1 586 624;Ontology_term="GO:0003676","GO:0008270";ID=match$2_586_624;Name=SSF57756
;status=T;Dbxref="InterPro:IPR036875"
g109343.t1      SUPERFAMILY     protein_match   622     725     4.93E-29        +       .       date=20-07-2019;Target=g109343.t1 622 725;ID=match$3_622_725;Name=SSF53098;status=T;Dbxref="InterPro:IPR01
2337"
g109343.t1      ProSiteProfiles protein_match   278     294     9.636   +       .       date=20-07-2019;Target=g109343.t1 278 294;Ontology_term="GO:0003676","GO:0008270";ID=match$4_278_294;signature_des
c=Zinc finger CCHC-type profile.;Name=PS50158;status=T;Dbxref="InterPro:IPR001878"
g109343.t1      SMART   protein_match   600     616     0.36    +       .       date=20-07-2019;Target=g109343.t1 600 616;Ontology_term="GO:0003676","GO:0008270";ID=match$5_600_616;Name=SM00343;status=T
;Dbxref="InterPro:IPR001878"
g109343.t1      SMART   protein_match   278     294     6.3E-4  +       .       date=20-07-2019;Target=g109343.t1 278 294;Ontology_term="GO:0003676","GO:0008270";ID=match$5_278_294;Name=SM00343;status=T
;Dbxref="InterPro:IPR001878"
g109343.t1      Pfam    protein_match   82      216     1.2E-24 +       .       date=20-07-2019;Target=g109343.t1 82 216;ID=match$6_82_216;signature_desc=gag-polypeptide of LTR copia-type;Name=PF14223;s
tatus=T

Is there a solution how to load InterProScan results as a track into JBrowse or are there any scripts which combine InterProScan with BRAKER2?

Thank you in advance,

Michal

--
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
Reply | Threaded
Open this post in threaded view
|

Re: InterProScan loading into JBrowse

nathandunn
Cross posting to jbrowse list 

On Jul 23, 2019, at 12:17 AM, Michał T. Lorenc <[hidden email]> wrote:

Hi
Our annotation is based on BRAKER2 additionally I ran InterProScan and I got GFF3, TSV and XML. I tried to load the GFF3 file using 

> perl /apollo/jbrowse/bin/flatfile-to-json.pl --gff /apollo/braker-soft_utr-interproscan.gff3 --compress --trackType HTMLFeatures --trackLabel "interproscan" --out /apollo/

It appears that no InterProScan results has been loaded. I assume that the InterProScan's GFF3 file does not contain any chromosome names:

##gff-version 3
##feature-ontology http://song.cvs.sourceforge.net/viewvc/song/ontology/sofa.obo?revision=1.269
##interproscan-version 5.36-75.0
##sequence-region g109343.t1 1 1358
g109343.t1      .       polypeptide     1       1358    .       +       .       ID=g109343.t1;md5=3e908dc966fefe367e64dc9d98b0d3ab
g109343.t1      ProSiteProfiles protein_match   628     724     18.261  +       .       date=20-07-2019;Target=g109343.t1 628 724;Ontology_term="GO:0015074";ID=match$1_628_724;signature_desc=Integrase c
atalytic domain profile.;Name=PS50994;status=T;Dbxref="InterPro:IPR001584"
g109343.t1      SUPERFAMILY     protein_match   586     624     4.19E-5 +       .       date=20-07-2019;Target=g109343.t1 586 624;Ontology_term="GO:0003676","GO:0008270";ID=match$2_586_624;Name=SSF57756
;status=T;Dbxref="InterPro:IPR036875"
g109343.t1      SUPERFAMILY     protein_match   622     725     4.93E-29        +       .       date=20-07-2019;Target=g109343.t1 622 725;ID=match$3_622_725;Name=SSF53098;status=T;Dbxref="InterPro:IPR01
2337"
g109343.t1      ProSiteProfiles protein_match   278     294     9.636   +       .       date=20-07-2019;Target=g109343.t1 278 294;Ontology_term="GO:0003676","GO:0008270";ID=match$4_278_294;signature_des
c=Zinc finger CCHC-type profile.;Name=PS50158;status=T;Dbxref="InterPro:IPR001878"
g109343.t1      SMART   protein_match   600     616     0.36    +       .       date=20-07-2019;Target=g109343.t1 600 616;Ontology_term="GO:0003676","GO:0008270";ID=match$5_600_616;Name=SM00343;status=T
;Dbxref="InterPro:IPR001878"
g109343.t1      SMART   protein_match   278     294     6.3E-4  +       .       date=20-07-2019;Target=g109343.t1 278 294;Ontology_term="GO:0003676","GO:0008270";ID=match$5_278_294;Name=SM00343;status=T
;Dbxref="InterPro:IPR001878"
g109343.t1      Pfam    protein_match   82      216     1.2E-24 +       .       date=20-07-2019;Target=g109343.t1 82 216;ID=match$6_82_216;signature_desc=gag-polypeptide of LTR copia-type;Name=PF14223;s
tatus=T

Is there a solution how to load InterProScan results as a track into JBrowse or are there any scripts which combine InterProScan with BRAKER2?

Thank you in advance,

Michal

--
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
Reply | Threaded
Open this post in threaded view
|

Re: InterProScan loading into JBrowse

Colin
It looks like maybe these annotations are in protein space. I guess what you are looking for is to convert this protein annotation into genome coordinates.

This isn't something that is done by jbrowse tools.

I would maybe ask the BRAKER team, as they probably work with this type of scenario often.

-Colin

On Tue, Jul 23, 2019 at 12:52 PM Lbl <[hidden email]> wrote:
Cross posting to jbrowse list 

On Jul 23, 2019, at 12:17 AM, Michał T. Lorenc <[hidden email]> wrote:

Hi
Our annotation is based on BRAKER2 additionally I ran InterProScan and I got GFF3, TSV and XML. I tried to load the GFF3 file using 

> perl /apollo/jbrowse/bin/flatfile-to-json.pl --gff /apollo/braker-soft_utr-interproscan.gff3 --compress --trackType HTMLFeatures --trackLabel "interproscan" --out /apollo/

It appears that no InterProScan results has been loaded. I assume that the InterProScan's GFF3 file does not contain any chromosome names:

##gff-version 3
##feature-ontology http://song.cvs.sourceforge.net/viewvc/song/ontology/sofa.obo?revision=1.269
##interproscan-version 5.36-75.0
##sequence-region g109343.t1 1 1358
g109343.t1      .       polypeptide     1       1358    .       +       .       ID=g109343.t1;md5=3e908dc966fefe367e64dc9d98b0d3ab
g109343.t1      ProSiteProfiles protein_match   628     724     18.261  +       .       date=20-07-2019;Target=g109343.t1 628 724;Ontology_term="GO:0015074";ID=match$1_628_724;signature_desc=Integrase c
atalytic domain profile.;Name=PS50994;status=T;Dbxref="InterPro:IPR001584"
g109343.t1      SUPERFAMILY     protein_match   586     624     4.19E-5 +       .       date=20-07-2019;Target=g109343.t1 586 624;Ontology_term="GO:0003676","GO:0008270";ID=match$2_586_624;Name=SSF57756
;status=T;Dbxref="InterPro:IPR036875"
g109343.t1      SUPERFAMILY     protein_match   622     725     4.93E-29        +       .       date=20-07-2019;Target=g109343.t1 622 725;ID=match$3_622_725;Name=SSF53098;status=T;Dbxref="InterPro:IPR01
2337"
g109343.t1      ProSiteProfiles protein_match   278     294     9.636   +       .       date=20-07-2019;Target=g109343.t1 278 294;Ontology_term="GO:0003676","GO:0008270";ID=match$4_278_294;signature_des
c=Zinc finger CCHC-type profile.;Name=PS50158;status=T;Dbxref="InterPro:IPR001878"
g109343.t1      SMART   protein_match   600     616     0.36    +       .       date=20-07-2019;Target=g109343.t1 600 616;Ontology_term="GO:0003676","GO:0008270";ID=match$5_600_616;Name=SM00343;status=T
;Dbxref="InterPro:IPR001878"
g109343.t1      SMART   protein_match   278     294     6.3E-4  +       .       date=20-07-2019;Target=g109343.t1 278 294;Ontology_term="GO:0003676","GO:0008270";ID=match$5_278_294;Name=SM00343;status=T
;Dbxref="InterPro:IPR001878"
g109343.t1      Pfam    protein_match   82      216     1.2E-24 +       .       date=20-07-2019;Target=g109343.t1 82 216;ID=match$6_82_216;signature_desc=gag-polypeptide of LTR copia-type;Name=PF14223;s
tatus=T

Is there a solution how to load InterProScan results as a track into JBrowse or are there any scripts which combine InterProScan with BRAKER2?

Thank you in advance,

Michal

--
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].

--
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
Reply | Threaded
Open this post in threaded view
|

Re: InterProScan loading into JBrowse

Jacques Dainat-4
In reply to this post by mictadlo
Hi,

The gff3 from interproscan is really particular, it is not really intended to be loaded in a genome browser. As you said there is no chromosome names in the first column.
Usually you load/lift the functional annotation of the interproscan result into the structural annotation that is in gff format. You can find different scripts to do so. I used the gff3_sp_manage_functional_annotation.pl script from the GAAS repository. It uses the tsv file from interproscan. You can find an example of its use here: https://nbisweden.github.io/workshop-genome_annotation_elixir/labs/functional_annotation.

Best regards,

Jacques

-------------------------------------------------
Jacques Dainat, Ph.D.
NBIS (National Bioinformatics Infrastructure Sweden)
Genome Annotation Service
http://nbis.se/about/staff/jacques-dainat

Contact — 
Address: Uppsala University, Biomedicinska Centrum
Department of Medical Biochemistry Microbiology, Genomics
Husargatan 3, box 582
S-75123 Uppsala Sweden
Phone: +46 18 471 46 25

On 23 Jul 2019, at 09:17, Michał T. Lorenc <[hidden email]> wrote:

Hi
Our annotation is based on BRAKER2 additionally I ran InterProScan and I got GFF3, TSV and XML. I tried to load the GFF3 file using 

> perl /apollo/jbrowse/bin/flatfile-to-json.pl --gff /apollo/braker-soft_utr-interproscan.gff3 --compress --trackType HTMLFeatures --trackLabel "interproscan" --out /apollo/

It appears that no InterProScan results has been loaded. I assume that the InterProScan's GFF3 file does not contain any chromosome names:

##gff-version 3
##feature-ontology http://song.cvs.sourceforge.net/viewvc/song/ontology/sofa.obo?revision=1.269
##interproscan-version 5.36-75.0
##sequence-region g109343.t1 1 1358
g109343.t1      .       polypeptide     1       1358    .       +       .       ID=g109343.t1;md5=3e908dc966fefe367e64dc9d98b0d3ab
g109343.t1      ProSiteProfiles protein_match   628     724     18.261  +       .       date=20-07-2019;Target=g109343.t1 628 724;Ontology_term="GO:0015074";ID=match$1_628_724;signature_desc=Integrase c
atalytic domain profile.;Name=PS50994;status=T;Dbxref="InterPro:IPR001584"
g109343.t1      SUPERFAMILY     protein_match   586     624     4.19E-5 +       .       date=20-07-2019;Target=g109343.t1 586 624;Ontology_term="GO:0003676","GO:0008270";ID=match$2_586_624;Name=SSF57756
;status=T;Dbxref="InterPro:IPR036875"
g109343.t1      SUPERFAMILY     protein_match   622     725     4.93E-29        +       .       date=20-07-2019;Target=g109343.t1 622 725;ID=match$3_622_725;Name=SSF53098;status=T;Dbxref="InterPro:IPR01
2337"
g109343.t1      ProSiteProfiles protein_match   278     294     9.636   +       .       date=20-07-2019;Target=g109343.t1 278 294;Ontology_term="GO:0003676","GO:0008270";ID=match$4_278_294;signature_des
c=Zinc finger CCHC-type profile.;Name=PS50158;status=T;Dbxref="InterPro:IPR001878"
g109343.t1      SMART   protein_match   600     616     0.36    +       .       date=20-07-2019;Target=g109343.t1 600 616;Ontology_term="GO:0003676","GO:0008270";ID=match$5_600_616;Name=SM00343;status=T
;Dbxref="InterPro:IPR001878"
g109343.t1      SMART   protein_match   278     294     6.3E-4  +       .       date=20-07-2019;Target=g109343.t1 278 294;Ontology_term="GO:0003676","GO:0008270";ID=match$5_278_294;Name=SM00343;status=T
;Dbxref="InterPro:IPR001878"
g109343.t1      Pfam    protein_match   82      216     1.2E-24 +       .       date=20-07-2019;Target=g109343.t1 82 216;ID=match$6_82_216;signature_desc=gag-polypeptide of LTR copia-type;Name=PF14223;s
tatus=T

Is there a solution how to load InterProScan results as a track into JBrowse or are there any scripts which combine InterProScan with BRAKER2?

Thank you in advance,

Michal

--
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].

--
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].