How sensitive is MAKER to redundant/partial transcripts?

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

How sensitive is MAKER to redundant/partial transcripts?

Lior Glick-2
Dear MAKER users,

I am new to MAKER and would like your advice.
I am planning to annotate multiple genomes of tomato variants and wild relatives. To this end, I have been working on generating a diverse transcripts data set to be used as input for MAKER (along with protein sequences and the 'official' tomato annotation). My transcripts set was generated by collecting multiple available RNA-Seq results from SRA, covering diverse variants, conditions and tissues, and assembling them into transcripts using Trinity. My goal is to have a data set as diverse and broad as possible.
Now I have ~30 fasta files of transcripts, originating from different studies. Of course, many of the transcripts are redundant and/or partial. I am exploring ways to merge the multiple data sets into a non-redundant one, while also stitching partial transcripts into longer ones based on overlaps.
However, this turns out to be not-so-trivial and I am wandering if this is really necessary in order to get a good annotation? Maybe I can just concatenate all my transcriptome assembly results, and MAKER will handle redundant and partial transcripts?
Can someone clarify how this works, and try to assess if an annotation based on a merged data set should be superior to one that didn't undergo such a process? If someone has actual experience with such data, that  would be really helpful, but any advice would be highly appreciated.

Thanks a lot and best regards,
Lior

_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: How sensitive is MAKER to redundant/partial transcripts?

Carson Holt-2
MAKER will automatically collapse redundant evidence. The only thing you may need to worry about with too many datasets is background transcription. With more datasets you will have more spurious assemblies from background transcription (if you sequence deep enough everything is transcribed at some level). You should also look at the results in a browser like apollo, you may find that some datasets are more noisy than others and it would be beneficial to drop them especially if they are redundant. So always do a  visual review of results.

—Carson



> On Jul 4, 2018, at 6:28 AM, Lior Glick <[hidden email]> wrote:
>
> Dear MAKER users,
>
> I am new to MAKER and would like your advice.
> I am planning to annotate multiple genomes of tomato variants and wild relatives. To this end, I have been working on generating a diverse transcripts data set to be used as input for MAKER (along with protein sequences and the 'official' tomato annotation). My transcripts set was generated by collecting multiple available RNA-Seq results from SRA, covering diverse variants, conditions and tissues, and assembling them into transcripts using Trinity. My goal is to have a data set as diverse and broad as possible.
> Now I have ~30 fasta files of transcripts, originating from different studies. Of course, many of the transcripts are redundant and/or partial. I am exploring ways to merge the multiple data sets into a non-redundant one, while also stitching partial transcripts into longer ones based on overlaps.
> However, this turns out to be not-so-trivial and I am wandering if this is really necessary in order to get a good annotation? Maybe I can just concatenate all my transcriptome assembly results, and MAKER will handle redundant and partial transcripts?
> Can someone clarify how this works, and try to assess if an annotation based on a merged data set should be superior to one that didn't undergo such a process? If someone has actual experience with such data, that  would be really helpful, but any advice would be highly appreciated.
>
> Thanks a lot and best regards,
> Lior
> _______________________________________________
> maker-devel mailing list
> [hidden email]
> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Ask for help about the collapse of Maker (version 2.31.9) when annotated with Fgenesh

史俊鹏
Dear Carson,

First of all, I must apologize that I could't post my questions in Google group since I can't get access to Google in mainland China.

I am using Maker (version 2.31.9) to annotate several foxtail millet genomes. I combined Augustus and Fgenesh (v.3.1.1) for the de novo annotation of these genomes.

The majority of contigs were anotated well with maker pipeline. While, several contigs failed when annotated with Fgenesh with the following error information:

#--------- command -------------#
Widget::fgenesh:
/NAS7/home/shijunpeng/software/maker/bin/../lib/Widget/fgenesh/fgenesh_wrap /NAS7/home/shijunpeng/software/fgenesh/fgenesh /NAS7/home/shijunpeng/software/fgenesh/Monocots /tmp/43438.1.all.q/maker_8zLUxB/0/108_0.4597215-4597401.Monocots.auto_annotator.fgenesh.fasta -exon_table:/tmp/43438.1.all.q/maker_8zLUxB/0/108_0.4597215-4597401.Monocots.auto_annotator.xdef.fgenesh > /tmp/43438.1.all.q/maker_8zLUxB/0/108_0.4597215-
#-------------------------------#
ERROR: FgenesH failed
--> rank=NA, hostname=bioinfor3.local
ERROR: Failed while annotating transcripts
ERROR: Chunk failed at level:1, tier_type:4
FAILED CONTIG:scaffold_1

ERROR: Chunk failed at level:6, tier_type:0
FAILED CONTIG:scaffold_1
###############################################################################################################################################

A system core file generated after this collapse. I checked the temperate fasta file 108_0.4597215-4597401.Monocots.auto_annotator.fgenesh.fasta to be normal about ~300 bp.

I also checked my original sequence file and confirmed no problem (A,T,C,G and N). I also tried to set the pred_flank option from 200 (original) to 0 and the error still exists.

I ran the Maker pipeline in a single node with 16 processors and 256 Gb RAMs, so it may be not due to the MPI problems.

Below were my detailed maker bahavior options:
#-----MAKER Behavior Options
max_dna_len=300000 #length for dividing up contigs into chunks (increases/decreases memory usage)
min_contig=10000 #skip genome contigs below this length (under 10kb are often useless)

pred_flank=0 #flank for extending evidence clusters sent to gene predictors
pred_stats=1 #report AED and QI statistics for all predictions as well as models
AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1)
min_protein=0 #require at least this many amino acids in predicted proteins
alt_splice=1 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no
always_complete=1 #extra steps to force start and stop codons, 1 = yes, 0 = no
map_forward=1 #map names and attributes forward from old GFF3 genes, 1 = yes, 0 = no
keep_preds=0 #Concordance threshold to add unsupported gene prediction (bound by 0 and 1)

split_hit=10000 #length for the splitting of hits (expected max intron size for evidence alignments)
single_exon=0 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no
single_length=250 #min length required for single exon ESTs if 'single_exon is enabled'
correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion genes

tries=5 #number of times to try a contig if there is a failure for some reason
clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 = no
clean_up=0 #removes theVoid directory with individual analysis files, 1 = yes, 0 = no
TMP= #specify a directory other than the system default temporary directory for temporary files

Could you please help me to solve this error? I am looking forward to hearing from you.

Sincerely,
Junpeng

--
Junpeng Shi, PhD
State Key Lab For Agrobiotech, China Agricultural University
National Maize Improvement Center of China
Center For Life Science, NO.2,
The West Street of Yuanmingyuan Park, Beijing, P.R.China
Tel:+86-13581863941
_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Ask for help about the collapse of Maker (version 2.31.9) when annotated with Fgenesh

Carson Holt-2
It’s failing when given hints. It’s possible there is more detail further back (try running just the failing contig and collecting the entire STDERR to send). Fgenesh is hard for me to test since we don’t have a license to run it anymore. But if it is MAKER that fails, then you would get a more informative error. If it’s FGENESH that fails directly (non-zero exit status), it could kill everything and it all depends on whether they report something useful. Try collecting the full STDERR for the contig first. If that doesn’t work I can help you to collect the files used and command line used so you can run FGENESH all by itself (outside of MAEKR) and send a test dataset to the developers if necessary.

—Carson


> On Jul 14, 2018, at 2:04 AM, 史俊鹏 <[hidden email]> wrote:
>
> Dear Carson,
>
> First of all, I must apologize that I could't post my questions in Google group since I can't get access to Google in mainland China.
>
> I am using Maker (version 2.31.9) to annotate several foxtail millet genomes. I combined Augustus and Fgenesh (v.3.1.1) for the de novo annotation of these genomes.
>
> The majority of contigs were anotated well with maker pipeline. While, several contigs failed when annotated with Fgenesh with the following error information:
>
> #--------- command -------------#
> Widget::fgenesh:
> /NAS7/home/shijunpeng/software/maker/bin/../lib/Widget/fgenesh/fgenesh_wrap /NAS7/home/shijunpeng/software/fgenesh/fgenesh /NAS7/home/shijunpeng/software/fgenesh/Monocots /tmp/43438.1.all.q/maker_8zLUxB/0/108_0.4597215-4597401.Monocots.auto_annotator.fgenesh.fasta -exon_table:/tmp/43438.1.all.q/maker_8zLUxB/0/108_0.4597215-4597401.Monocots.auto_annotator.xdef.fgenesh > /tmp/43438.1.all.q/maker_8zLUxB/0/108_0.4597215-
> #-------------------------------#
> ERROR: FgenesH failed
> --> rank=NA, hostname=bioinfor3.local
> ERROR: Failed while annotating transcripts
> ERROR: Chunk failed at level:1, tier_type:4
> FAILED CONTIG:scaffold_1
>
> ERROR: Chunk failed at level:6, tier_type:0
> FAILED CONTIG:scaffold_1
> ###############################################################################################################################################
>
> A system core file generated after this collapse. I checked the temperate fasta file 108_0.4597215-4597401.Monocots.auto_annotator.fgenesh.fasta to be normal about ~300 bp.
>
> I also checked my original sequence file and confirmed no problem (A,T,C,G and N). I also tried to set the pred_flank option from 200 (original) to 0 and the error still exists.
>
> I ran the Maker pipeline in a single node with 16 processors and 256 Gb RAMs, so it may be not due to the MPI problems.
>
> Below were my detailed maker bahavior options:
> #-----MAKER Behavior Options
> max_dna_len=300000 #length for dividing up contigs into chunks (increases/decreases memory usage)
> min_contig=10000 #skip genome contigs below this length (under 10kb are often useless)
>
> pred_flank=0 #flank for extending evidence clusters sent to gene predictors
> pred_stats=1 #report AED and QI statistics for all predictions as well as models
> AED_threshold=1 #Maximum Annotation Edit Distance allowed (bound by 0 and 1)
> min_protein=0 #require at least this many amino acids in predicted proteins
> alt_splice=1 #Take extra steps to try and find alternative splicing, 1 = yes, 0 = no
> always_complete=1 #extra steps to force start and stop codons, 1 = yes, 0 = no
> map_forward=1 #map names and attributes forward from old GFF3 genes, 1 = yes, 0 = no
> keep_preds=0 #Concordance threshold to add unsupported gene prediction (bound by 0 and 1)
>
> split_hit=10000 #length for the splitting of hits (expected max intron size for evidence alignments)
> single_exon=0 #consider single exon EST evidence when generating annotations, 1 = yes, 0 = no
> single_length=250 #min length required for single exon ESTs if 'single_exon is enabled'
> correct_est_fusion=0 #limits use of ESTs in annotation to avoid fusion genes
>
> tries=5 #number of times to try a contig if there is a failure for some reason
> clean_try=0 #remove all data from previous run before retrying, 1 = yes, 0 = no
> clean_up=0 #removes theVoid directory with individual analysis files, 1 = yes, 0 = no
> TMP= #specify a directory other than the system default temporary directory for temporary files
>
> Could you please help me to solve this error? I am looking forward to hearing from you.
>
> Sincerely,
> Junpeng
>
> --
> Junpeng Shi, PhD
> State Key Lab For Agrobiotech, China Agricultural University
> National Maize Improvement Center of China
> Center For Life Science, NO.2,
> The West Street of Yuanmingyuan Park, Beijing, P.R.China
> Tel:+86-13581863941


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org