gene annotation for a better genome

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

gene annotation for a better genome

Quanwei Zhang
Hello:

Recently, we got a new version of NMR genome, whose genome had been assembled and annotated a few years ago. We can download the gene annotation from NCBI.

Now we want to annotate the new genome using Maker2 pipeline. I wonder how can I fully make use of existing annotations. On the other hand, since the previous genome is not very well assemblies, some genes annotation maybe false positives. I hope those false positive genes in previous annotation won't mislead Maker2 for current gene annotation.

Do you have any suggestions. Thanks

Best
Quanwei 

_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: gene annotation for a better genome

Carson Holt-2
You can try using the est2genome=1 option to map the old models forward onto the new assembly as if they were ESTs (add a line that says est_forward=1 to the control file to maintain old naming and set est=1 to the old model transcript file). Then provide the final models as a pred_gff for a subsuquent run (i.e. a traditional MAKER run where you are annotating the new assembly with transcript and protein evidence and ab initio predictors). Don’t supply the old models to est= on that run.

The idea behind doing it this way is:
1. You need to get old models onto the new assembly so coordinates will change. So by doing it this way, you will at least be able to move many models forward based on homology.
2. By providing the models to pred_gff on a subsequent MAKER run, you are just letting old models compete against new annotations. They will be rejected if they have no evidence support, or can be kept if they score better than alternate models from SNAP/Augustus. That way you have the chance to integrate old models while at the same time rejecting some old models that have no evidence overlap.

—Carson


> On Sep 28, 2017, at 6:05 AM, Quanwei Zhang <[hidden email]> wrote:
>
> Hello:
>
> Recently, we got a new version of NMR genome, whose genome had been assembled and annotated a few years ago. We can download the gene annotation from NCBI.
>
> Now we want to annotate the new genome using Maker2 pipeline. I wonder how can I fully make use of existing annotations. On the other hand, since the previous genome is not very well assemblies, some genes annotation maybe false positives. I hope those false positive genes in previous annotation won't mislead Maker2 for current gene annotation.
>
> Do you have any suggestions. Thanks
>
> Best
> Quanwei  
> _______________________________________________
> maker-devel mailing list
> [hidden email]
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: gene annotation for a better genome

Quanwei Zhang
Dear Carson:

Thank you again for your suggestions. I just get the new genome assembly of NMR and start to do gene annotation. I understand you ideas about this. But can I simply use the old genome transcripts as transcript evidence, and just following the standard Maker2 pipeline? I set est2genome=1 and provide the mRNA sequences in the fasta format for the first round training of SNAP.

For transcripts I have the following choices. I think the first choice is more reliable and better, right?
(1) There are about 60,000 RefSeq transcripts from NCBI. So I downloaded those sequences in fasta format.
(2) We have the raw data of RNA-seq from 11 tissues, we can do assembly by trinity for each sample and then get the transcripts. But I think most of the RNA-seq should have been submitted to NCBI.

BTW, if we use the RefSeq data from NCBI, we can download the mRNA sequences, coding sequences or protein sequences. I wonder which type of data are the best to train the SNAP? For Augustus, we will use BUSCO to train it.

Many thanks.

Best
Quanwei




2017-09-29 12:36 GMT-04:00 Carson Holt <[hidden email]>:
You can try using the est2genome=1 option to map the old models forward onto the new assembly as if they were ESTs (add a line that says est_forward=1 to the control file to maintain old naming and set est=1 to the old model transcript file). Then provide the final models as a pred_gff for a subsuquent run (i.e. a traditional MAKER run where you are annotating the new assembly with transcript and protein evidence and ab initio predictors). Don’t supply the old models to est= on that run.

The idea behind doing it this way is:
1. You need to get old models onto the new assembly so coordinates will change. So by doing it this way, you will at least be able to move many models forward based on homology.
2. By providing the models to pred_gff on a subsequent MAKER run, you are just letting old models compete against new annotations. They will be rejected if they have no evidence support, or can be kept if they score better than alternate models from SNAP/Augustus. That way you have the chance to integrate old models while at the same time rejecting some old models that have no evidence overlap.

—Carson


> On Sep 28, 2017, at 6:05 AM, Quanwei Zhang <[hidden email]> wrote:
>
> Hello:
>
> Recently, we got a new version of NMR genome, whose genome had been assembled and annotated a few years ago. We can download the gene annotation from NCBI.
>
> Now we want to annotate the new genome using Maker2 pipeline. I wonder how can I fully make use of existing annotations. On the other hand, since the previous genome is not very well assemblies, some genes annotation maybe false positives. I hope those false positive genes in previous annotation won't mislead Maker2 for current gene annotation.
>
> Do you have any suggestions. Thanks
>
> Best
> Quanwei
> _______________________________________________
> maker-devel mailing list
> [hidden email]
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org



_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: gene annotation for a better genome

Carson Holt-2
Yes. If you use est2genome it will just align the model, and then find the longest ORF. So it is a quick way to jsut align old models to the new assembly. Alternatively you can just do de novo annotation.

—Carson



On Oct 24, 2017, at 10:54 AM, Quanwei Zhang <[hidden email]> wrote:

Dear Carson:

Thank you again for your suggestions. I just get the new genome assembly of NMR and start to do gene annotation. I understand you ideas about this. But can I simply use the old genome transcripts as transcript evidence, and just following the standard Maker2 pipeline? I set est2genome=1 and provide the mRNA sequences in the fasta format for the first round training of SNAP.

For transcripts I have the following choices. I think the first choice is more reliable and better, right?
(1) There are about 60,000 RefSeq transcripts from NCBI. So I downloaded those sequences in fasta format.
(2) We have the raw data of RNA-seq from 11 tissues, we can do assembly by trinity for each sample and then get the transcripts. But I think most of the RNA-seq should have been submitted to NCBI.

BTW, if we use the RefSeq data from NCBI, we can download the mRNA sequences, coding sequences or protein sequences. I wonder which type of data are the best to train the SNAP? For Augustus, we will use BUSCO to train it.

Many thanks.

Best
Quanwei




2017-09-29 12:36 GMT-04:00 Carson Holt <[hidden email]>:
You can try using the est2genome=1 option to map the old models forward onto the new assembly as if they were ESTs (add a line that says est_forward=1 to the control file to maintain old naming and set est=1 to the old model transcript file). Then provide the final models as a pred_gff for a subsuquent run (i.e. a traditional MAKER run where you are annotating the new assembly with transcript and protein evidence and ab initio predictors). Don’t supply the old models to est= on that run.

The idea behind doing it this way is:
1. You need to get old models onto the new assembly so coordinates will change. So by doing it this way, you will at least be able to move many models forward based on homology.
2. By providing the models to pred_gff on a subsequent MAKER run, you are just letting old models compete against new annotations. They will be rejected if they have no evidence support, or can be kept if they score better than alternate models from SNAP/Augustus. That way you have the chance to integrate old models while at the same time rejecting some old models that have no evidence overlap.

—Carson


> On Sep 28, 2017, at 6:05 AM, Quanwei Zhang <[hidden email]> wrote:
>
> Hello:
>
> Recently, we got a new version of NMR genome, whose genome had been assembled and annotated a few years ago. We can download the gene annotation from NCBI.
>
> Now we want to annotate the new genome using Maker2 pipeline. I wonder how can I fully make use of existing annotations. On the other hand, since the previous genome is not very well assemblies, some genes annotation maybe false positives. I hope those false positive genes in previous annotation won't mislead Maker2 for current gene annotation.
>
> Do you have any suggestions. Thanks
>
> Best
> Quanwei
> _______________________________________________
> maker-devel mailing list
> [hidden email]
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org




_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org