How sensitive is MAKER to redundant/partial transcripts?

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

How sensitive is MAKER to redundant/partial transcripts?

Lior Glick
Dear MAKER users,

I am new to MAKER and would like your advice.
I am planning to annotate multiple genomes of tomato variants and wild relatives. To this end, I have been working on generating a diverse transcripts data set to be used as input for MAKER (along with protein sequences and the 'official' tomato annotation). My transcripts set was generated by collecting multiple available RNA-Seq results from SRA, covering diverse variants, conditions and tissues, and assembling them into transcripts using Trinity. My goal is to have a data set as diverse and broad as possible.
Now I have ~30 fasta files of transcripts, originating from different studies. Of course, many of the transcripts are redundant and/or partial. I am exploring ways to merge the multiple data sets into a non-redundant one, while also stitching partial transcripts into longer ones based on overlaps.
However, this turns out to be not-so-trivial and I am wandering if this is really necessary in order to get a good annotation? Maybe I can just concatenate all my transcriptome assembly results, and MAKER will handle redundant and partial transcripts?
Can someone clarify how this works, and try to assess if an annotation based on a merged data set should be superior to one that didn't undergo such a process? If someone has actual experience with such data, that  would be really helpful, but any advice would be highly appreciated.

Thanks a lot and best regards,
Lior


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: How sensitive is MAKER to redundant/partial transcripts?

Carson Holt-2
MAKER will collapse redundant evidence after alignment, so it will primarily just increase run time. The main issue with so many datasets would be false positive alignments (assembled background transcription). You can look at individual contigs in Apollo, IGV, or other browser to see where spurious alignments occur and if they are overall associated with a particular dataset (it’s ok to throw out a noisy dataset especially if you have additional data).

—Carson


On Jul 4, 2018, at 6:32 AM, Lior Glick <[hidden email]> wrote:

Dear MAKER users,

I am new to MAKER and would like your advice.
I am planning to annotate multiple genomes of tomato variants and wild relatives. To this end, I have been working on generating a diverse transcripts data set to be used as input for MAKER (along with protein sequences and the 'official' tomato annotation). My transcripts set was generated by collecting multiple available RNA-Seq results from SRA, covering diverse variants, conditions and tissues, and assembling them into transcripts using Trinity. My goal is to have a data set as diverse and broad as possible.
Now I have ~30 fasta files of transcripts, originating from different studies. Of course, many of the transcripts are redundant and/or partial. I am exploring ways to merge the multiple data sets into a non-redundant one, while also stitching partial transcripts into longer ones based on overlaps.
However, this turns out to be not-so-trivial and I am wandering if this is really necessary in order to get a good annotation? Maybe I can just concatenate all my transcriptome assembly results, and MAKER will handle redundant and partial transcripts?
Can someone clarify how this works, and try to assess if an annotation based on a merged data set should be superior to one that didn't undergo such a process? If someone has actual experience with such data, that  would be really helpful, but any advice would be highly appreciated.

Thanks a lot and best regards,
Lior

_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org