Quantcast

Differences in non_overlapping protein file between runs

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Differences in non_overlapping protein file between runs

YannDussert
Hello,

First, thank you for developing MAKER, this is a great annotation tool!

I am trying to annotate the genome of a biotrophic oomycete with MAKER.
After reading multiple posts on this list, I first used RNA-seq data and
a protein set from other oomycetes to create a first training set. I
then used augustus, snap (both trained with models from the first round)
and genemark for ab-initio gene prediction during a second round (masked
and unmasked genome). I ran MAKER with the following options:
single_exon=1, split_hit=5000, correct_est_fusion=1.

After the second round, I had only around 11000 annotated genes (96%
completeness with Busco V2), whereas I'm expecting between 13000-17000
genes (numbers from other annotated oomycetes). There was only around
1500 genes in the non_overlapping protein file. After looking at the
annotation on a genome browser, one of the problems was apparently gene
fusions due to bad protein evidence. Following the advice on another
post, I tried running MAKER by passing the ab-initio predictions with
pred_gff, to avoid using bad protein hints for gene predictors. I still
have around 11000 annotated genes, but now there are 10000 genes in the
non_overlapping protein file. Why this difference? I thought that this
file included gene predictions not supported by any evidence, did I miss
something?

Thank you in advance for your answer.

Best regards,
Yann

_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Differences in non_overlapping protein file between runs

Carson Holt-2
My guess is that there is either an issue with the GFF3 file you supplied, so its features are not overlapping anything.

—Carson


> On Mar 6, 2017, at 9:51 AM, YannDussert <[hidden email]> wrote:
>
> Hello,
>
> First, thank you for developing MAKER, this is a great annotation tool!
>
> I am trying to annotate the genome of a biotrophic oomycete with MAKER. After reading multiple posts on this list, I first used RNA-seq data and a protein set from other oomycetes to create a first training set. I then used augustus, snap (both trained with models from the first round) and genemark for ab-initio gene prediction during a second round (masked and unmasked genome). I ran MAKER with the following options: single_exon=1, split_hit=5000, correct_est_fusion=1.
>
> After the second round, I had only around 11000 annotated genes (96% completeness with Busco V2), whereas I'm expecting between 13000-17000 genes (numbers from other annotated oomycetes). There was only around 1500 genes in the non_overlapping protein file. After looking at the annotation on a genome browser, one of the problems was apparently gene fusions due to bad protein evidence. Following the advice on another post, I tried running MAKER by passing the ab-initio predictions with pred_gff, to avoid using bad protein hints for gene predictors. I still have around 11000 annotated genes, but now there are 10000 genes in the non_overlapping protein file. Why this difference? I thought that this file included gene predictions not supported by any evidence, did I miss something?
>
> Thank you in advance for your answer.
>
> Best regards,
> Yann
>
> _______________________________________________
> maker-devel mailing list
> [hidden email]
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Differences in non_overlapping protein file between runs

YannDussert
Hi,

Thank you for your answer.To get my gff with ab-initio predictions, I just took the corresponding lines in the maker gff from the previous round.

I can't see any problem with it, it looks like this:

Plvit001        augustus_masked match   66626   70338   0.85    +       .       ID=Plvit001:hit:12095:4.5.0.0;Name=augustus_masked-Plvit001-abinit-gene-0.7-mRNA-1
Plvit001        augustus_masked match_part      66626   67586   0.85    +       .       ID=Plvit001:hsp:27621:4.5.0.0;Parent=Plvit001:hit:12095:4.5.0.0;Target=augustus_masked-Plvit001-abinit-gene-0.7-mRNA-1 1 961 +;Gap=M961
Plvit001        augustus        match   66626   70338   1       +       .       ID=Plvit001:hit:12088:4.5.0.0;Name=augustus-Plvit001-abinit-gene-0.0-mRNA-1
Plvit001        augustus        match_part      66626   70096   1       +       .       ID=Plvit001:hsp:27610:4.5.0.0;Parent=Plvit001:hit:12088:4.5.0.0;Target=augustus-Plvit001-abinit-gene-0.0-mRNA-1 1 3471 +;Gap=M3471
Plvit001        augustus_masked match_part      68166   68486   0.85    +       .       ID=Plvit001:hsp:27622:4.5.0.0;Parent=Plvit001:hit:12095:4.5.0.0;Target=augustus_masked-Plvit001-abinit-gene-0.7-mRNA-1 962 1282 +;Gap=M321
Plvit001        augustus_masked match_part      69504   70096   0.85    +       .       ID=Plvit001:hsp:27623:4.5.0.0;Parent=Plvit001:hit:12095:4.5.0.0;Target=augustus_masked-Plvit001-abinit-gene-0.7-mRNA-1 1283 1875 +;Gap=M593
Plvit001        augustus_masked match_part      70174   70338   0.85    +       .       ID=Plvit001:hsp:27624:4.5.0.0;Parent=Plvit001:hit:12095:4.5.0.0;Target=augustus_masked-Plvit001-abinit-gene-0.7-mRNA-1 1876 2040 +;Gap=M165


Best regards,

Yann

On 09/03/2017 18:52, Carson Holt wrote:
My guess is that there is either an issue with the GFF3 file you supplied, so its features are not overlapping anything.

—Carson


On Mar 6, 2017, at 9:51 AM, YannDussert [hidden email] wrote:

Hello,

First, thank you for developing MAKER, this is a great annotation tool!

I am trying to annotate the genome of a biotrophic oomycete with MAKER. After reading multiple posts on this list, I first used RNA-seq data and a protein set from other oomycetes to create a first training set. I then used augustus, snap (both trained with models from the first round) and genemark for ab-initio gene prediction during a second round (masked and unmasked genome). I ran MAKER with the following options: single_exon=1, split_hit=5000, correct_est_fusion=1.

After the second round, I had only around 11000 annotated genes (96% completeness with Busco V2), whereas I'm expecting between 13000-17000 genes (numbers from other annotated oomycetes). There was only around 1500 genes in the non_overlapping protein file. After looking at the annotation on a genome browser, one of the problems was apparently gene fusions due to bad protein evidence. Following the advice on another post, I tried running MAKER by passing the ab-initio predictions with pred_gff, to avoid using bad protein hints for gene predictors. I still have around 11000 annotated genes, but now there are 10000 genes in the non_overlapping protein file. Why this difference? I thought that this file included gene predictions not supported by any evidence, did I miss something?

Thank you in advance for your answer.

Best regards,
Yann

_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

    


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Differences in non_overlapping protein file between runs

Carson Holt-2
I see you have both masked and unmasked augustus calls, so you may have a lot of non-masked predictions in your second run that are entirely contained in transposons and repeat regions (that is why they do not overlap).

Really the easiest thing to do would be to open the results in a browser, find one of the ones listed as non-overlapping, and then look at it to see why it is not overlapping. You can then look at that specific location directly in the file as needed, but it will be much easier to interpret looking at the features drawn in a browser (like Apollo - desktop version).

—Carson 

On Mar 10, 2017, at 3:53 AM, YannDussert <[hidden email]> wrote:

Hi,

Thank you for your answer.To get my gff with ab-initio predictions, I just took the corresponding lines in the maker gff from the previous round.

I can't see any problem with it, it looks like this:

Plvit001        augustus_masked match   66626   70338   0.85    +       .       ID=Plvit001:hit:12095:4.5.0.0;Name=augustus_masked-Plvit001-abinit-gene-0.7-mRNA-1
Plvit001        augustus_masked match_part      66626   67586   0.85    +       .       ID=Plvit001:hsp:27621:4.5.0.0;Parent=Plvit001:hit:12095:4.5.0.0;Target=augustus_masked-Plvit001-abinit-gene-0.7-mRNA-1 1 961 +;Gap=M961
Plvit001        augustus        match   66626   70338   1       +       .       ID=Plvit001:hit:12088:4.5.0.0;Name=augustus-Plvit001-abinit-gene-0.0-mRNA-1
Plvit001        augustus        match_part      66626   70096   1       +       .       ID=Plvit001:hsp:27610:4.5.0.0;Parent=Plvit001:hit:12088:4.5.0.0;Target=augustus-Plvit001-abinit-gene-0.0-mRNA-1 1 3471 +;Gap=M3471
Plvit001        augustus_masked match_part      68166   68486   0.85    +       .       ID=Plvit001:hsp:27622:4.5.0.0;Parent=Plvit001:hit:12095:4.5.0.0;Target=augustus_masked-Plvit001-abinit-gene-0.7-mRNA-1 962 1282 +;Gap=M321
Plvit001        augustus_masked match_part      69504   70096   0.85    +       .       ID=Plvit001:hsp:27623:4.5.0.0;Parent=Plvit001:hit:12095:4.5.0.0;Target=augustus_masked-Plvit001-abinit-gene-0.7-mRNA-1 1283 1875 +;Gap=M593
Plvit001        augustus_masked match_part      70174   70338   0.85    +       .       ID=Plvit001:hsp:27624:4.5.0.0;Parent=Plvit001:hit:12095:4.5.0.0;Target=augustus_masked-Plvit001-abinit-gene-0.7-mRNA-1 1876 2040 +;Gap=M165


Best regards,

Yann

On 09/03/2017 18:52, Carson Holt wrote:
My guess is that there is either an issue with the GFF3 file you supplied, so its features are not overlapping anything.

—Carson


On Mar 6, 2017, at 9:51 AM, YannDussert [hidden email] wrote:

Hello,

First, thank you for developing MAKER, this is a great annotation tool!

I am trying to annotate the genome of a biotrophic oomycete with MAKER. After reading multiple posts on this list, I first used RNA-seq data and a protein set from other oomycetes to create a first training set. I then used augustus, snap (both trained with models from the first round) and genemark for ab-initio gene prediction during a second round (masked and unmasked genome). I ran MAKER with the following options: single_exon=1, split_hit=5000, correct_est_fusion=1.

After the second round, I had only around 11000 annotated genes (96% completeness with Busco V2), whereas I'm expecting between 13000-17000 genes (numbers from other annotated oomycetes). There was only around 1500 genes in the non_overlapping protein file. After looking at the annotation on a genome browser, one of the problems was apparently gene fusions due to bad protein evidence. Following the advice on another post, I tried running MAKER by passing the ab-initio predictions with pred_gff, to avoid using bad protein hints for gene predictors. I still have around 11000 annotated genes, but now there are 10000 genes in the non_overlapping protein file. Why this difference? I thought that this file included gene predictions not supported by any evidence, did I miss something?

Thank you in advance for your answer.

Best regards,
Yann

_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

    



_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Loading...