QI codes insufficient - how to get frac exons with EST only?

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

QI codes insufficient - how to get frac exons with EST only?

Matt Simenc
Hey MAKER people,

I would like to make a Venn diagram showing the kinds of evidence supporting gene models in my MAKER annotation where the left side shows number of genes with EST support only, the right side shows number of genes with protein support only, and the intersection shows number of genes with EST and protein support.

QI summary has:

Fraction of exons that overlap an EST alignment
Fraction of exons that overlap EST or Protein alignments

Please correct me if I'm wrong, because I am interpreting the first to be fraction of exons that overlap an EST alignment and possibly also a protein alignment. If that is the case then we can't calculate the number of genes that overlap only EST or (EST and protein) from the QI information.

Anyone have a way to do this or have a script to parse the MAKER GFF3 to get this?

Thanks!!!
Matt Simenc

_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: QI codes insufficient - how to get frac exons with EST only?

Michael Campbell
Hi Matt,

I have a hacky way that I’ve done it. It requires running MAKER two more times but they are quicker runs.

To identify the genes that have protein support I pass all of the annotation back to MAKER using the model_gff option in the maker_opts.ctl file. Then I pull out all of the protein2genome features from the big MAKER GFF3 file and pass them in using the protein_gff option. I turn off all repeat masking and run MAKER. It runs fast because it doesn’t have to run any gene finders, align evidence, or repeatmask. In the output any gene with an AED less than 1 has protein support. Then I do the same thing with est2genome lines from the big GFF3 file and put them in as est_gff. The output of that one gives you genes with EST support. Then the genes with an AED of less than one in both sets have support from protein and EST.

Hope this helps,
Mike

> On Oct 11, 2017, at 10:53 AM, Matt Simenc <[hidden email]> wrote:
>
> Hey MAKER people,
>
> I would like to make a Venn diagram showing the kinds of evidence supporting gene models in my MAKER annotation where the left side shows number of genes with EST support only, the right side shows number of genes with protein support only, and the intersection shows number of genes with EST and protein support.
>
> QI summary has:
>
> Fraction of exons that overlap an EST alignment
> Fraction of exons that overlap EST or Protein alignments
>
> Please correct me if I'm wrong, because I am interpreting the first to be fraction of exons that overlap an EST alignment and possibly also a protein alignment. If that is the case then we can't calculate the number of genes that overlap only EST or (EST and protein) from the QI information.
>
> Anyone have a way to do this or have a script to parse the MAKER GFF3 to get this?
>
> Thanks!!!
> Matt Simenc
> _______________________________________________
> maker-devel mailing list
> [hidden email]
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: QI codes insufficient - how to get frac exons with EST only?

Carson Holt-2
Also look at GAL for building GFF3 feature queries —> https://github.com/The-Sequence-Ontology/GAL

—Carson


 
On Oct 11, 2017, at 9:18 AM, Michael Campbell <[hidden email]> wrote:

Hi Matt,

I have a hacky way that I’ve done it. It requires running MAKER two more times but they are quicker runs.

To identify the genes that have protein support I pass all of the annotation back to MAKER using the model_gff option in the maker_opts.ctl file. Then I pull out all of the protein2genome features from the big MAKER GFF3 file and pass them in using the protein_gff option. I turn off all repeat masking and run MAKER. It runs fast because it doesn’t have to run any gene finders, align evidence, or repeatmask. In the output any gene with an AED less than 1 has protein support. Then I do the same thing with est2genome lines from the big GFF3 file and put them in as est_gff. The output of that one gives you genes with EST support. Then the genes with an AED of less than one in both sets have support from protein and EST.

Hope this helps,
Mike

On Oct 11, 2017, at 10:53 AM, Matt Simenc <[hidden email]> wrote:

Hey MAKER people,

I would like to make a Venn diagram showing the kinds of evidence supporting gene models in my MAKER annotation where the left side shows number of genes with EST support only, the right side shows number of genes with protein support only, and the intersection shows number of genes with EST and protein support.

QI summary has:

Fraction of exons that overlap an EST alignment
Fraction of exons that overlap EST or Protein alignments

Please correct me if I'm wrong, because I am interpreting the first to be fraction of exons that overlap an EST alignment and possibly also a protein alignment. If that is the case then we can't calculate the number of genes that overlap only EST or (EST and protein) from the QI information.

Anyone have a way to do this or have a script to parse the MAKER GFF3 to get this?

Thanks!!!
Matt Simenc
_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: QI codes insufficient - how to get frac exons with EST only?

Matt Simenc
Very good, thank you!

Matt

On Wed, Oct 11, 2017 at 8:22 AM, Carson Holt <[hidden email]> wrote:
Also look at GAL for building GFF3 feature queries —> https://github.com/The-Sequence-Ontology/GAL

—Carson


 
On Oct 11, 2017, at 9:18 AM, Michael Campbell <[hidden email]> wrote:

Hi Matt,

I have a hacky way that I’ve done it. It requires running MAKER two more times but they are quicker runs.

To identify the genes that have protein support I pass all of the annotation back to MAKER using the model_gff option in the maker_opts.ctl file. Then I pull out all of the protein2genome features from the big MAKER GFF3 file and pass them in using the protein_gff option. I turn off all repeat masking and run MAKER. It runs fast because it doesn’t have to run any gene finders, align evidence, or repeatmask. In the output any gene with an AED less than 1 has protein support. Then I do the same thing with est2genome lines from the big GFF3 file and put them in as est_gff. The output of that one gives you genes with EST support. Then the genes with an AED of less than one in both sets have support from protein and EST.

Hope this helps,
Mike

On Oct 11, 2017, at 10:53 AM, Matt Simenc <[hidden email]> wrote:

Hey MAKER people,

I would like to make a Venn diagram showing the kinds of evidence supporting gene models in my MAKER annotation where the left side shows number of genes with EST support only, the right side shows number of genes with protein support only, and the intersection shows number of genes with EST and protein support.

QI summary has:

Fraction of exons that overlap an EST alignment
Fraction of exons that overlap EST or Protein alignments

Please correct me if I'm wrong, because I am interpreting the first to be fraction of exons that overlap an EST alignment and possibly also a protein alignment. If that is the case then we can't calculate the number of genes that overlap only EST or (EST and protein) from the QI information.

Anyone have a way to do this or have a script to parse the MAKER GFF3 to get this?

Thanks!!!
Matt Simenc
_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org



_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org