est_gff input does not provide any gene model

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

est_gff input does not provide any gene model

Jacques Dainat-3
Hello,

I’m using usually Cufflinks output to feed Maker through the est_gff parameter, combined with the est2genome=1 parameter I get the wanted output.
This time I used Stringtie output to feed Maker, but I don’t have any gene model predicted using the est2genome parameter.

Any explanation ? Is it due to the gff3 format differences between these two file ?

Cufflinks output example:
Pnalgiovense_4592      Cufflinks       match   363     977     17.844829       -       .       ID=1:s3_c1_r1.4.2;Name=1:s3_c1_r1.4.2;
Pnalgiovense_4592      Cufflinks       match_part      363     666     17.844829       -       .       ID=1:s3_c1_r1.4.2:exon-1;Name=1:s3_c1_r1.4.2;Parent=1:s3_c1_r1.4.2;Target=1:s3_c1_r1.4.2 1 304 +;
Pnalgiovense_4592      Cufflinks       match_part      743     977     17.844829       -       .       ID=1:s3_c1_r1.4.2:exon-2;Name=1:s3_c1_r1.4.2;Parent=1:s3_c1_r1.4.2;Target=1:s3_c1_r1.4.2 305 539 +;

Stringtie output example:
Pnalgiovense_112      StringTie       gene    20      1256    1000    +       .       ID=HtMm_All.12253;cov=8.028295;fPKM=1.214491;gene_id=HtMm_All.12253;tPM=2.706611;transcript_id=HtMm_All.12253.1
Pnalgiovense_112      StringTie       mRNA    20      1256    1000    +       .       ID=HtMm_All.12253.1;Parent=HtMm_All.12253;cov=8.028295;fPKM=1.214491;gene_id=HtMm_All.12253;tPM=2.706611;transcript_id=HtMm_All.12253.1
Pnalgiovense_112      StringTie       exon    20      1256    1000    +       .       ID=HtMm_All.12253.1-exon-1;Parent=HtMm_All.12253.1;cov=8.028295;exon_number=1;gene_id=HtMm_All.12253;transcript_id=HtMm_All.12253.1


If it’s the Stringtie output that is problematic how can I fix it ? Removing gene, changing mRNA by match and exons by match_part is enough ?

Best regards,


Jacques Dainat, PhD
NBIS (National Bioinformatics Infrastructure Sweden)
Genome Annotation Service

Address: (room E10:4204 - last floor)
Uppsala University, BMC
Department of Medical Biochemistry Microbiology, Genomics
Husargatan 3, box 582
S-75123 Uppsala Sweden
Phone: 01 84 71 46 25


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: est_gff input does not provide any gene model

Carson Holt-2
Evidence such as est_gff has to follow the alignment format used by GFF3 (i.e. match/match_part) whereas you are providing gene models (i.e. gene/mRNA/exon/CDS). Note that match/match_part are two level features whereas gene models are 3 levels. You need to reformat to match/match_part.

—Carson


On Oct 31, 2016, at 4:51 AM, Jacques Dainat <[hidden email]> wrote:

Hello,

I’m using usually Cufflinks output to feed Maker through the est_gff parameter, combined with the est2genome=1 parameter I get the wanted output.
This time I used Stringtie output to feed Maker, but I don’t have any gene model predicted using the est2genome parameter.

Any explanation ? Is it due to the gff3 format differences between these two file ?

Cufflinks output example:
Pnalgiovense_4592      Cufflinks       match   363     977     17.844829       -       .       ID=1:s3_c1_r1.4.2;Name=1:s3_c1_r1.4.2;
Pnalgiovense_4592      Cufflinks       match_part      363     666     17.844829       -       .       ID=1:s3_c1_r1.4.2:exon-1;Name=1:s3_c1_r1.4.2;Parent=1:s3_c1_r1.4.2;Target=1:s3_c1_r1.4.2 1 304 +;
Pnalgiovense_4592      Cufflinks       match_part      743     977     17.844829       -       .       ID=1:s3_c1_r1.4.2:exon-2;Name=1:s3_c1_r1.4.2;Parent=1:s3_c1_r1.4.2;Target=1:s3_c1_r1.4.2 305 539 +;

Stringtie output example:
Pnalgiovense_112      StringTie       gene    20      1256    1000    +       .       ID=HtMm_All.12253;cov=8.028295;fPKM=1.214491;gene_id=HtMm_All.12253;tPM=2.706611;transcript_id=HtMm_All.12253.1
Pnalgiovense_112      StringTie       mRNA    20      1256    1000    +       .       ID=HtMm_All.12253.1;Parent=HtMm_All.12253;cov=8.028295;fPKM=1.214491;gene_id=HtMm_All.12253;tPM=2.706611;transcript_id=HtMm_All.12253.1
Pnalgiovense_112      StringTie       exon    20      1256    1000    +       .       ID=HtMm_All.12253.1-exon-1;Parent=HtMm_All.12253.1;cov=8.028295;exon_number=1;gene_id=HtMm_All.12253;transcript_id=HtMm_All.12253.1


If it’s the Stringtie output that is problematic how can I fix it ? Removing gene, changing mRNA by match and exons by match_part is enough ?

Best regards,


Jacques Dainat, PhD
NBIS (National Bioinformatics Infrastructure Sweden)
Genome Annotation Service

Address: (room E10:4204 - last floor)
Uppsala University, BMC
Department of Medical Biochemistry Microbiology, Genomics
Husargatan 3, box 582
S-75123 Uppsala Sweden
Phone: 01 84 71 46 25

_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: est_gff input does not provide any gene model

Jacques Dainat-3
Thank you for the quick confirmation !

Just for clarification, what I provided to Maker was a correct gff3 file that indeed contain gene,mRNA,exon types but does not contain any CDS.

I haven’t seen any information about the particular gff3 feature types expected for the est_gff files supplied. I think you should communicate more about it (within the maker_opt.ctl ?).
It would be nice to stop the pipeline if the file provided contains no information. (When the file provided doesn’t exits too. The warning is not obvious to catch when launching on a cluster...)

A last question. do the scores from the score column are used by MAKER from the est_gff file ?

Jacques 

On 01 Nov 2016, at 04:24, Carson Holt <[hidden email]> wrote:

Evidence such as est_gff has to follow the alignment format used by GFF3 (i.e. match/match_part) whereas you are providing gene models (i.e. gene/mRNA/exon/CDS). Note that match/match_part are two level features whereas gene models are 3 levels. You need to reformat to match/match_part.

—Carson


On Oct 31, 2016, at 4:51 AM, Jacques Dainat <[hidden email]> wrote:

Hello,

I’m using usually Cufflinks output to feed Maker through the est_gff parameter, combined with the est2genome=1 parameter I get the wanted output.
This time I used Stringtie output to feed Maker, but I don’t have any gene model predicted using the est2genome parameter.

Any explanation ? Is it due to the gff3 format differences between these two file ?

Cufflinks output example:
Pnalgiovense_4592      Cufflinks       match   363     977     17.844829       -       .       ID=1:s3_c1_r1.4.2;Name=1:s3_c1_r1.4.2;
Pnalgiovense_4592      Cufflinks       match_part      363     666     17.844829       -       .       ID=1:s3_c1_r1.4.2:exon-1;Name=1:s3_c1_r1.4.2;Parent=1:s3_c1_r1.4.2;Target=1:s3_c1_r1.4.2 1 304 +;
Pnalgiovense_4592      Cufflinks       match_part      743     977     17.844829       -       .       ID=1:s3_c1_r1.4.2:exon-2;Name=1:s3_c1_r1.4.2;Parent=1:s3_c1_r1.4.2;Target=1:s3_c1_r1.4.2 305 539 +;

Stringtie output example:
Pnalgiovense_112      StringTie       gene    20      1256    1000    +       .       ID=HtMm_All.12253;cov=8.028295;fPKM=1.214491;gene_id=HtMm_All.12253;tPM=2.706611;transcript_id=HtMm_All.12253.1
Pnalgiovense_112      StringTie       mRNA    20      1256    1000    +       .       ID=HtMm_All.12253.1;Parent=HtMm_All.12253;cov=8.028295;fPKM=1.214491;gene_id=HtMm_All.12253;tPM=2.706611;transcript_id=HtMm_All.12253.1
Pnalgiovense_112      StringTie       exon    20      1256    1000    +       .       ID=HtMm_All.12253.1-exon-1;Parent=HtMm_All.12253.1;cov=8.028295;exon_number=1;gene_id=HtMm_All.12253;transcript_id=HtMm_All.12253.1


If it’s the Stringtie output that is problematic how can I fix it ? Removing gene, changing mRNA by match and exons by match_part is enough ?

Best regards,


Jacques Dainat, PhD
NBIS (National Bioinformatics Infrastructure Sweden)
Genome Annotation Service

Address: (room E10:4204 - last floor)
Uppsala University, BMC
Department of Medical Biochemistry Microbiology, Genomics
Husargatan 3, box 582
S-75123 Uppsala Sweden
Phone: 01 84 71 46 25

_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org



_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: est_gff input does not provide any gene model

Carson Holt-2
The score will be ignored. The format to be used for evidence alignments is specified in the GFF3 spec (https://github.com/The-Sequence-Ontology/Specifications/blob/master/gff3.md). An EST alignment example is also given as part of the GFF3 Spec.

—Carson


On Nov 1, 2016, at 10:08 AM, Jacques Dainat <[hidden email]> wrote:

Thank you for the quick confirmation !

Just for clarification, what I provided to Maker was a correct gff3 file that indeed contain gene,mRNA,exon types but does not contain any CDS.

I haven’t seen any information about the particular gff3 feature types expected for the est_gff files supplied. I think you should communicate more about it (within the maker_opt.ctl ?).
It would be nice to stop the pipeline if the file provided contains no information. (When the file provided doesn’t exits too. The warning is not obvious to catch when launching on a cluster...)

A last question. do the scores from the score column are used by MAKER from the est_gff file ?

Jacques 

On 01 Nov 2016, at 04:24, Carson Holt <[hidden email]> wrote:

Evidence such as est_gff has to follow the alignment format used by GFF3 (i.e. match/match_part) whereas you are providing gene models (i.e. gene/mRNA/exon/CDS). Note that match/match_part are two level features whereas gene models are 3 levels. You need to reformat to match/match_part.

—Carson


On Oct 31, 2016, at 4:51 AM, Jacques Dainat <[hidden email]> wrote:

Hello,

I’m using usually Cufflinks output to feed Maker through the est_gff parameter, combined with the est2genome=1 parameter I get the wanted output.
This time I used Stringtie output to feed Maker, but I don’t have any gene model predicted using the est2genome parameter.

Any explanation ? Is it due to the gff3 format differences between these two file ?

Cufflinks output example:
Pnalgiovense_4592      Cufflinks       match   363     977     17.844829       -       .       ID=1:s3_c1_r1.4.2;Name=1:s3_c1_r1.4.2;
Pnalgiovense_4592      Cufflinks       match_part      363     666     17.844829       -       .       ID=1:s3_c1_r1.4.2:exon-1;Name=1:s3_c1_r1.4.2;Parent=1:s3_c1_r1.4.2;Target=1:s3_c1_r1.4.2 1 304 +;
Pnalgiovense_4592      Cufflinks       match_part      743     977     17.844829       -       .       ID=1:s3_c1_r1.4.2:exon-2;Name=1:s3_c1_r1.4.2;Parent=1:s3_c1_r1.4.2;Target=1:s3_c1_r1.4.2 305 539 +;

Stringtie output example:
Pnalgiovense_112      StringTie       gene    20      1256    1000    +       .       ID=HtMm_All.12253;cov=8.028295;fPKM=1.214491;gene_id=HtMm_All.12253;tPM=2.706611;transcript_id=HtMm_All.12253.1
Pnalgiovense_112      StringTie       mRNA    20      1256    1000    +       .       ID=HtMm_All.12253.1;Parent=HtMm_All.12253;cov=8.028295;fPKM=1.214491;gene_id=HtMm_All.12253;tPM=2.706611;transcript_id=HtMm_All.12253.1
Pnalgiovense_112      StringTie       exon    20      1256    1000    +       .       ID=HtMm_All.12253.1-exon-1;Parent=HtMm_All.12253.1;cov=8.028295;exon_number=1;gene_id=HtMm_All.12253;transcript_id=HtMm_All.12253.1


If it’s the Stringtie output that is problematic how can I fix it ? Removing gene, changing mRNA by match and exons by match_part is enough ?

Best regards,


Jacques Dainat, PhD
NBIS (National Bioinformatics Infrastructure Sweden)
Genome Annotation Service

Address: (room E10:4204 - last floor)
Uppsala University, BMC
Department of Medical Biochemistry Microbiology, Genomics
Husargatan 3, box 582
S-75123 Uppsala Sweden
Phone: 01 84 71 46 25

_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org




_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org