Using GeneMark-ET with RNAseq intron hints

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Using GeneMark-ET with RNAseq intron hints

Ray Cui-2
Hello,

         I have sucessfully installed Maker beta 3, working with both Augustus and SNAP. I also want to try adding GeneMark-ES to the ab initio predictor.
         When I read the GeneMark-ES manual, it says that one can use RNAseq data to aid training. I'm wondering what would be the best way to integrate Genemark-ET predictions into Maker. Should I run Genemark-ET independent of Maker, then integrate the GFF at some point during the maker process? If so, how should I edit the configuration file? Currently maker has an option called "gmhmm". Should I then train GeneMark by myself with RNAseq data, then feed the hmm to maker?
          
          And perhaps an unrelated question is that now Maker beta 3 supports EVM. I'm wondering how EVM is used by Maker (at which step, what does it do), and how does it differ from what Maker is designed for (both reconciles different gene models). 

Best Regards,
Ray

Dr. Rongfeng (Ray) Cui
Max-Planck-Institut für Biologie des Alterns / Max Planck Institute for Biology of Ageing
Wissenschaftlicher MA / Postdoctoral researcher
Office: Joseph-Stelzmann 9b, D-50931 Köln / Cologne
Postal address: Postfach 41 06 23, D-50866 Köln / Cologne
Tel.:+49 (0)221 496 
Mobile:           +49 0221 37970 496



_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Using GeneMark-ET with RNAseq intron hints

Daniel Ence
Hi Ray, 

I think you’re on the right track with training Genemark with RNAseq data. It should only change the training steps, which are external to MAKER, but not how MAKER runs Genemark. You’ll still give MAKER the path to the “es.mod" file made by Genemark. 

For the 2nd question, in the MAKER beta 3, MAKER creates a control file for EVM, in which you set your weights for the various inputs, and then MAKER runs EVM alongside all the other gene predictors and chooses the model that is best supported by the evidence. 

~Daniel

On Feb 14, 2017, at 7:38 AM, Ray Cui <[hidden email]> wrote:

Hello,

         I have sucessfully installed Maker beta 3, working with both Augustus and SNAP. I also want to try adding GeneMark-ES to the ab initio predictor.
         When I read the GeneMark-ES manual, it says that one can use RNAseq data to aid training. I'm wondering what would be the best way to integrate Genemark-ET predictions into Maker. Should I run Genemark-ET independent of Maker, then integrate the GFF at some point during the maker process? If so, how should I edit the configuration file? Currently maker has an option called "gmhmm". Should I then train GeneMark by myself with RNAseq data, then feed the hmm to maker?
          
          And perhaps an unrelated question is that now Maker beta 3 supports EVM. I'm wondering how EVM is used by Maker (at which step, what does it do), and how does it differ from what Maker is designed for (both reconciles different gene models). 

Best Regards,
Ray

Dr. Rongfeng (Ray) Cui
Max-Planck-Institut für Biologie des Alterns / Max Planck Institute for Biology of Ageing
Wissenschaftlicher MA / Postdoctoral researcher
Office: Joseph-Stelzmann 9b, D-50931 Köln / Cologne
Postal address: Postfach 41 06 23, D-50866 Köln / Cologne
Tel.:+49 (0)221 496 
Mobile:           +49 0221 37970 496


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Using GeneMark-ET with RNAseq intron hints

Ray Cui-2
In reply to this post by Ray Cui-2
Hi Daniel,

        thanks! It seems that Genemark-ET has a "--training" flag, is that the flag I should use when training or should I just let Genemark also perform the prediction? 

Ray

Dr. Rongfeng (Ray) Cui
Max-Planck-Institut für Biologie des Alterns / Max Planck Institute for Biology of Ageing
Wissenschaftlicher MA / Postdoctoral researcher
Office: Joseph-Stelzmann 9b, D-50931 Köln / Cologne
Postal address: Postfach 41 06 23, D-50866 Köln / Cologne
Tel.:+49 (0)221 496 
Mobile:           +49 0221 37970 496



On Tue, Feb 14, 2017 at 3:43 PM, Ence,daniel <[hidden email]> wrote:
Hi Ray, 

I think you’re on the right track with training Genemark with RNAseq data. It should only change the training steps, which are external to MAKER, but not how MAKER runs Genemark. You’ll still give MAKER the path to the “es.mod" file made by Genemark. 

For the 2nd question, in the MAKER beta 3, MAKER creates a control file for EVM, in which you set your weights for the various inputs, and then MAKER runs EVM alongside all the other gene predictors and chooses the model that is best supported by the evidence. 

~Daniel



On Feb 14, 2017, at 7:38 AM, Ray Cui <[hidden email]> wrote:

Hello,

         I have sucessfully installed Maker beta 3, working with both Augustus and SNAP. I also want to try adding GeneMark-ES to the ab initio predictor.
         When I read the GeneMark-ES manual, it says that one can use RNAseq data to aid training. I'm wondering what would be the best way to integrate Genemark-ET predictions into Maker. Should I run Genemark-ET independent of Maker, then integrate the GFF at some point during the maker process? If so, how should I edit the configuration file? Currently maker has an option called "gmhmm". Should I then train GeneMark by myself with RNAseq data, then feed the hmm to maker?
          
          And perhaps an unrelated question is that now Maker beta 3 supports EVM. I'm wondering how EVM is used by Maker (at which step, what does it do), and how does it differ from what Maker is designed for (both reconciles different gene models). 

Best Regards,
Ray

Dr. Rongfeng (Ray) Cui
Max-Planck-Institut für Biologie des Alterns / Max Planck Institute for Biology of Ageing
Wissenschaftlicher MA / Postdoctoral researcher
Office: Joseph-Stelzmann 9b, D-50931 Köln / Cologne
Postal address: Postfach 41 06 23, D-50866 Köln / Cologne
Tel.:<a href="tel:+49%20221%20496" value="+49221496" target="_blank">+49 (0)221 496 
Mobile:           +49 0221 37970 496


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org



_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Using GeneMark-ET with RNAseq intron hints

Carson Holt-2
MAKER was support was designed with GeneMark-ES. It may or may not work with GeneMark-ET. So any MAKER related archive posts etc. will be related to the latter.

With GeneMark-ES, you simply provided a genome assembly and let it run. It would then produce several files and output directories. The es.mod file was the one you provided to MAKER. I don’t know how this compares to GeneMark-ET.

—Carson


On Feb 14, 2017, at 8:44 AM, Ray Cui <[hidden email]> wrote:

Hi Daniel,

        thanks! It seems that Genemark-ET has a "--training" flag, is that the flag I should use when training or should I just let Genemark also perform the prediction? 

Ray

Dr. Rongfeng (Ray) Cui
Max-Planck-Institut für Biologie des Alterns / Max Planck Institute for Biology of Ageing
Wissenschaftlicher MA / Postdoctoral researcher
Office: Joseph-Stelzmann 9b, D-50931 Köln / Cologne
Postal address: Postfach 41 06 23, D-50866 Köln / Cologne
Tel.:+49 (0)221 496 
Mobile:           +49 0221 37970 496



On Tue, Feb 14, 2017 at 3:43 PM, Ence,daniel <[hidden email]> wrote:
Hi Ray, 

I think you’re on the right track with training Genemark with RNAseq data. It should only change the training steps, which are external to MAKER, but not how MAKER runs Genemark. You’ll still give MAKER the path to the “es.mod" file made by Genemark. 

For the 2nd question, in the MAKER beta 3, MAKER creates a control file for EVM, in which you set your weights for the various inputs, and then MAKER runs EVM alongside all the other gene predictors and chooses the model that is best supported by the evidence. 

~Daniel



On Feb 14, 2017, at 7:38 AM, Ray Cui <[hidden email]> wrote:

Hello,

         I have sucessfully installed Maker beta 3, working with both Augustus and SNAP. I also want to try adding GeneMark-ES to the ab initio predictor.
         When I read the GeneMark-ES manual, it says that one can use RNAseq data to aid training. I'm wondering what would be the best way to integrate Genemark-ET predictions into Maker. Should I run Genemark-ET independent of Maker, then integrate the GFF at some point during the maker process? If so, how should I edit the configuration file? Currently maker has an option called "gmhmm". Should I then train GeneMark by myself with RNAseq data, then feed the hmm to maker?
          
          And perhaps an unrelated question is that now Maker beta 3 supports EVM. I'm wondering how EVM is used by Maker (at which step, what does it do), and how does it differ from what Maker is designed for (both reconciles different gene models). 

Best Regards,
Ray

Dr. Rongfeng (Ray) Cui
Max-Planck-Institut für Biologie des Alterns / Max Planck Institute for Biology of Ageing
Wissenschaftlicher MA / Postdoctoral researcher
Office: Joseph-Stelzmann 9b, D-50931 Köln / Cologne
Postal address: Postfach 41 06 23, D-50866 Köln / Cologne
Tel.:<a href="tel:+49%20221%20496" value="+49221496" target="_blank" class="">+49 (0)221 496 
Mobile:           +49 0221 37970 496


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Using GeneMark-ET with RNAseq intron hints

Ray Cui-2
Dear Carson,

        I have now run GeneMark-ET, and it produces a trained .mod file. I think it can be then passed to Maker. Do you know what is the final constructed command line in Maker that calls genemark? Genemark-et and es use the same perl script so one probably only needs to use the  --prediction  and --predict_with xxx.mod options to predict genes using the species specific parameters (bypassing regular training and prediction steps)


Best Regards,
Ray

Dr. Rongfeng (Ray) Cui
Max-Planck-Institut für Biologie des Alterns / Max Planck Institute for Biology of Ageing
Wissenschaftlicher MA / Postdoctoral researcher
Office: Joseph-Stelzmann 9b, D-50931 Köln / Cologne
Postal address: Postfach 41 06 23, D-50866 Köln / Cologne
Tel.:+49 (0)221 496 
Mobile:           +49 0221 37970 496



On Mon, Feb 20, 2017 at 6:39 AM, Carson Holt <[hidden email]> wrote:
MAKER was support was designed with GeneMark-ES. It may or may not work with GeneMark-ET. So any MAKER related archive posts etc. will be related to the latter.

With GeneMark-ES, you simply provided a genome assembly and let it run. It would then produce several files and output directories. The es.mod file was the one you provided to MAKER. I don’t know how this compares to GeneMark-ET.

—Carson



On Feb 14, 2017, at 8:44 AM, Ray Cui <[hidden email]> wrote:

Hi Daniel,

        thanks! It seems that Genemark-ET has a "--training" flag, is that the flag I should use when training or should I just let Genemark also perform the prediction? 

Ray

Dr. Rongfeng (Ray) Cui
Max-Planck-Institut für Biologie des Alterns / Max Planck Institute for Biology of Ageing
Wissenschaftlicher MA / Postdoctoral researcher
Office: Joseph-Stelzmann 9b, D-50931 Köln / Cologne
Postal address: Postfach 41 06 23, D-50866 Köln / Cologne
Tel.:<a href="tel:+49%20221%20496" value="+49221496" target="_blank">+49 (0)221 496 
Mobile:           +49 0221 37970 496



On Tue, Feb 14, 2017 at 3:43 PM, Ence,daniel <[hidden email]> wrote:
Hi Ray, 

I think you’re on the right track with training Genemark with RNAseq data. It should only change the training steps, which are external to MAKER, but not how MAKER runs Genemark. You’ll still give MAKER the path to the “es.mod" file made by Genemark. 

For the 2nd question, in the MAKER beta 3, MAKER creates a control file for EVM, in which you set your weights for the various inputs, and then MAKER runs EVM alongside all the other gene predictors and chooses the model that is best supported by the evidence. 

~Daniel



On Feb 14, 2017, at 7:38 AM, Ray Cui <[hidden email]> wrote:

Hello,

         I have sucessfully installed Maker beta 3, working with both Augustus and SNAP. I also want to try adding GeneMark-ES to the ab initio predictor.
         When I read the GeneMark-ES manual, it says that one can use RNAseq data to aid training. I'm wondering what would be the best way to integrate Genemark-ET predictions into Maker. Should I run Genemark-ET independent of Maker, then integrate the GFF at some point during the maker process? If so, how should I edit the configuration file? Currently maker has an option called "gmhmm". Should I then train GeneMark by myself with RNAseq data, then feed the hmm to maker?
          
          And perhaps an unrelated question is that now Maker beta 3 supports EVM. I'm wondering how EVM is used by Maker (at which step, what does it do), and how does it differ from what Maker is designed for (both reconciles different gene models). 

Best Regards,
Ray

Dr. Rongfeng (Ray) Cui
Max-Planck-Institut für Biologie des Alterns / Max Planck Institute for Biology of Ageing
Wissenschaftlicher MA / Postdoctoral researcher
Office: Joseph-Stelzmann 9b, D-50931 Köln / Cologne
Postal address: Postfach 41 06 23, D-50866 Köln / Cologne
Tel.:<a href="tel:+49%20221%20496" value="+49221496" target="_blank">+49 (0)221 496 
Mobile:           +49 0221 37970 496


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org



_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Using GeneMark-ET with RNAseq intron hints

Carson Holt-2
The names of scripts used are listed in the maker_exe.ctl file. It depends on if formatting or any flags have changed between versions.

—Carson


On Feb 20, 2017, at 1:59 AM, Ray Cui <[hidden email]> wrote:

Dear Carson,

        I have now run GeneMark-ET, and it produces a trained .mod file. I think it can be then passed to Maker. Do you know what is the final constructed command line in Maker that calls genemark? Genemark-et and es use the same perl script so one probably only needs to use the  --prediction  and --predict_with xxx.mod options to predict genes using the species specific parameters (bypassing regular training and prediction steps)


Best Regards,
Ray

Dr. Rongfeng (Ray) Cui
Max-Planck-Institut für Biologie des Alterns / Max Planck Institute for Biology of Ageing
Wissenschaftlicher MA / Postdoctoral researcher
Office: Joseph-Stelzmann 9b, D-50931 Köln / Cologne
Postal address: Postfach 41 06 23, D-50866 Köln / Cologne
Tel.:+49 (0)221 496 
Mobile:           +49 0221 37970 496



On Mon, Feb 20, 2017 at 6:39 AM, Carson Holt <[hidden email]> wrote:
MAKER was support was designed with GeneMark-ES. It may or may not work with GeneMark-ET. So any MAKER related archive posts etc. will be related to the latter.

With GeneMark-ES, you simply provided a genome assembly and let it run. It would then produce several files and output directories. The es.mod file was the one you provided to MAKER. I don’t know how this compares to GeneMark-ET.

—Carson



On Feb 14, 2017, at 8:44 AM, Ray Cui <[hidden email]> wrote:

Hi Daniel,

        thanks! It seems that Genemark-ET has a "--training" flag, is that the flag I should use when training or should I just let Genemark also perform the prediction? 

Ray

Dr. Rongfeng (Ray) Cui
Max-Planck-Institut für Biologie des Alterns / Max Planck Institute for Biology of Ageing
Wissenschaftlicher MA / Postdoctoral researcher
Office: Joseph-Stelzmann 9b, D-50931 Köln / Cologne
Postal address: Postfach 41 06 23, D-50866 Köln / Cologne
Tel.:<a href="tel:+49%20221%20496" value="+49221496" target="_blank" class="">+49 (0)221 496 
Mobile:           +49 0221 37970 496



On Tue, Feb 14, 2017 at 3:43 PM, Ence,daniel <[hidden email]> wrote:
Hi Ray, 

I think you’re on the right track with training Genemark with RNAseq data. It should only change the training steps, which are external to MAKER, but not how MAKER runs Genemark. You’ll still give MAKER the path to the “es.mod" file made by Genemark. 

For the 2nd question, in the MAKER beta 3, MAKER creates a control file for EVM, in which you set your weights for the various inputs, and then MAKER runs EVM alongside all the other gene predictors and chooses the model that is best supported by the evidence. 

~Daniel



On Feb 14, 2017, at 7:38 AM, Ray Cui <[hidden email]> wrote:

Hello,

         I have sucessfully installed Maker beta 3, working with both Augustus and SNAP. I also want to try adding GeneMark-ES to the ab initio predictor.
         When I read the GeneMark-ES manual, it says that one can use RNAseq data to aid training. I'm wondering what would be the best way to integrate Genemark-ET predictions into Maker. Should I run Genemark-ET independent of Maker, then integrate the GFF at some point during the maker process? If so, how should I edit the configuration file? Currently maker has an option called "gmhmm". Should I then train GeneMark by myself with RNAseq data, then feed the hmm to maker?
          
          And perhaps an unrelated question is that now Maker beta 3 supports EVM. I'm wondering how EVM is used by Maker (at which step, what does it do), and how does it differ from what Maker is designed for (both reconciles different gene models). 

Best Regards,
Ray

Dr. Rongfeng (Ray) Cui
Max-Planck-Institut für Biologie des Alterns / Max Planck Institute for Biology of Ageing
Wissenschaftlicher MA / Postdoctoral researcher
Office: Joseph-Stelzmann 9b, D-50931 Köln / Cologne
Postal address: Postfach 41 06 23, D-50866 Köln / Cologne
Tel.:<a href="tel:+49%20221%20496" value="+49221496" target="_blank" class="">+49 (0)221 496 
Mobile:           +49 0221 37970 496


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org




_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Using GeneMark-ET with RNAseq intron hints

Carson Holt-2
Final results with source maker will be of type gene/mRNA/exon/CDS. They have been further processed beyond the raw results, and may include extensions such as the addition of UTR for example (or hint based recomputation in the case of SNAP and Augustus). The gene ID of the maker model will let you know the source before additional processing was applied.  Raw results will also be in the file as type match/match_part and source evm/snap/augustus, but are only there for reference purposes (there will also be a raw fasta from each source, but only for reference purposes). All models compete against each other, and the one best matching the evidence is kept. So if SNAP or Augustus scores better than EVM, then that model will be kept for that locus. You can find more detail in the MAKER wiki and the MAKER2 paper for how models compete. 

So the final result is not a superset, rather a merged subset from each potential source.

EVM is not used to obtain a consensus gene model. Its results compete just like all other algorithms. This is because when EVM works it produces beautiful models that score really well, but when it doesn’t work it produces either no model or partial models.

—Carson


On Mar 16, 2017, at 3:07 AM, Ray Cui <[hidden email]> wrote:

Dear Carson,

        thank you so much! I am now peeking into the results for the finished scaffolds. In the gff file, the gene id confuses me a bit. In this file, column 2 is always "maker", but the "ID" attribute in the annotation is prefixed with "snap", "maker", "evm" , "augustus" etc. Does that mean the final annotation is a superset of all gene predictors? If EVM was used to obtain a consensus gene model, why would the other models still show up in the final result set?

Best Regards,
Ray

Dr. Rongfeng (Ray) Cui
Max-Planck-Institut für Biologie des Alterns / Max Planck Institute for Biology of Ageing
Wissenschaftlicher MA / Postdoctoral researcher
Office: Joseph-Stelzmann 9b, D-50931 Köln / Cologne
Postal address: Postfach 41 06 23, D-50866 Köln / Cologne
Tel.:+49 (0)221 496 
Mobile:           +49 0221 37970 496



On Wed, Mar 15, 2017 at 3:52 PM, Carson Holt <[hidden email]> wrote:
Maybe. I haven’t tested this, but it should work. Maker supports labels for input by placing a ‘:’ and a label after each file name.

Example—>
est=file1.fasta:label_1,file2.fasta:label_2

If you label your files, then the label will go into the GFF3. So instead of est2genome in column 2, you will get est2genome:label_1 in column 2.

As a result, you should be able to add that label to the EVM settings like so and it will match column 2 of the GFF3—>
evmtrans:est2genome:label1=10

I don’t know if the label will force anything raw analysis to rerun, but it shouldn’t.


—Carson



On Mar 15, 2017, at 5:13 AM, Ray Cui <[hidden email]> wrote:

Hi Carson,

       currently I am partitioning the protein evidence based on phylogenetic relationship into several datasets, supplied as comma delimited list. Is it possible then to specify higher weight for protein2genome models from closer related species than further related taxa?

Ray 

Dr. Rongfeng (Ray) Cui
Max-Planck-Institut für Biologie des Alterns / Max Planck Institute for Biology of Ageing
Wissenschaftlicher MA / Postdoctoral researcher
Office: Joseph-Stelzmann 9b, D-50931 Köln / Cologne
Postal address: Postfach 41 06 23, D-50866 Köln / Cologne
Tel.:<a href="tel:+49%20221%20496" value="+49221496" target="_blank" class="">+49 (0)221 496 
Mobile:           +49 0221 37970 496



On Wed, Mar 15, 2017 at 11:47 AM, Ray Cui <[hidden email]> wrote:
Dear Carson,

       thank you for the pointers! Before running the first round of Maker, I mapped conspecific Trinity assembled proteins (long, "full length" subset) to an earlier version of the genome assembly using my own pipeline and trained Augustus and SNAP that way. I also trained Genemark-ET using TopHat alignments per their instructions. I'm wondering if it will be worth doing a second round, but I guess I will see.

       It is good to know that MAKER will reuse the old results. 

Best Regards,
Ray

Dr. Rongfeng (Ray) Cui
Max-Planck-Institut für Biologie des Alterns / Max Planck Institute for Biology of Ageing
Wissenschaftlicher MA / Postdoctoral researcher
Office: Joseph-Stelzmann 9b, D-50931 Köln / Cologne
Postal address: Postfach 41 06 23, D-50866 Köln / Cologne
Tel.:<a href="tel:+49%20221%20496" value="+49221496" target="_blank" class="">+49 (0)221 496 
Mobile:           +49 0221 37970 496



On Tue, Mar 14, 2017 at 5:58 PM, Carson Holt <[hidden email]> wrote:
You can find lots of info in the devel archives on training. Example —> https://groups.google.com/forum/#!topic/maker-devel/FWMSTdqWQqI


MAKER will reuse old raw results if you rerun in the same directory (only deleting what would be different given altered settings between runs). It will see the existing alignments archived in the datastore as raw reports and just reuse them. The exception to this are the exonerate alignments. They are generated relatively quickly compared to the BLAS T runs, so rerunning them is not too much overhead. Also they are not archived because doing so created IO issues (exonerate is not running in bulk batches like BLAST, rather as multiple small separate runs for each polished read, and archiving a lot of small raw reports can occur so fast when using MPI that it crashes storage servers). So we decided to just not archive exonerate rather than develop a database like bundling/compression mechanism to get around the IO issues.

Thanks,
Carson


On Mar 14, 2017, at 10:44 AM, Ray Cui <[hidden email]> wrote:

Hi Carson,
          Thanks for your prompt response!

          I have a somewhat unrelated question. After the first run of Maker, I want to train Augustus, SNAP and Genemark-ET using the most reliable gene models produced in the first round. What would be a good way to select these gene models? 
          After retraining the ab initio predictors, I also wonder if it's necessary to redo all the alignments (blastx, est2genome, protein2genome etc) in the second iteration, since they are exactly the same as the first run. Perhaps maker can take in the alignment results from the previous run? 

Best Regards,
Ray

Dr. Rongfeng (Ray) Cui
Max-Planck-Institut für Biologie des Alterns / Max Planck Institute for Biology of Ageing
Wissenschaftlicher MA / Postdoctoral researcher
Office: Joseph-Stelzmann 9b, D-50931 Köln / Cologne
Postal address: Postfach 41 06 23, D-50866 Köln / Cologne
Tel.:<a href="tel:+49%20221%20496" value="+49221496" target="_blank" class="">+49 (0)221 496 
Mobile:           +49 0221 37970 496



On Tue, Mar 14, 2017 at 5:37 PM, Ray Cui <[hidden email]> wrote:
I see. If my evm config looks like this:
evmab=5 #default weight for source unspecified ab initio predictions
evmab:snap=5 #weight for snap sourced predictions
evmab:augustus=10 #weight for augustus sourced predictions
evmab:fgenesh=10 #weight for fgenesh sourced predictions
evmab:genemark=5 #weight for genemark sourced predictions

and Column 2 in the genemark.gff is "GeneMark.hmm" , then the value from "evmab" (=5) will be used, is that correct?

Best Regards,
Ray

Dr. Rongfeng (Ray) Cui
Max-Planck-Institut für Biologie des Alterns / Max Planck Institute for Biology of Ageing
Wissenschaftlicher MA / Postdoctoral researcher
Office: Joseph-Stelzmann 9b, D-50931 Köln / Cologne
Postal address: Postfach 41 06 23, D-50866 Köln / Cologne
Tel.:<a href="tel:+49%20221%20496" value="+49221496" target="_blank" class="">+49 (0)221 496 
Mobile:           +49 0221 37970 496



On Tue, Mar 14, 2017 at 5:29 PM, Carson Holt <[hidden email]> wrote:
Column 2 in the GFF3 file is the source column. It is used to specify the source fo the data. That column will also be used by EVM to bin features by their source and apply weights based on source.

—Carson

On Mar 14, 2017, at 10:26 AM, Ray Cui <[hidden email]> wrote:

Thanks! I didn't know you can also name the gff, but I think using the default is fine, that's what I'm doing now.


Best Regards,
Ray

Dr. Rongfeng (Ray) Cui
Max-Planck-Institut für Biologie des Alterns / Max Planck Institute for Biology of Ageing
Wissenschaftlicher MA / Postdoctoral researcher
Office: Joseph-Stelzmann 9b, D-50931 Köln / Cologne
Postal address: Postfach 41 06 23, D-50866 Köln / Cologne
Tel.:<a href="tel:+49%20221%20496" value="+49221496" target="_blank" class="">+49 (0)221 496 
Mobile:           +49 0221 37970 496



On Tue, Mar 14, 2017 at 5:11 PM, Carson Holt <[hidden email]> wrote:

These are set in the maker_evm.ctl file.

Use whatever you used in the source column of the input GFF3. For example if column 2 is set as GENEMARK, then do this —>
evmab:GENEMARK=7

This also works —>
evmab:pred_gff:GENEMARK=7

Or just set the default —>
evmab=7

—Carson




On Mar 10, 2017, at 8:48 AM, Ray Cui <[hidden email]> wrote:

Dear Carson,

       I think it may be the most straight foward to input the GFF3 instead.

       What is the correct way of setting a weight for the EVM step for this GFF3 models passed through the pred_gff option?

Ray

Dr. Rongfeng (Ray) Cui
Max-Planck-Institut für Biologie des Alterns / Max Planck Institute for Biology of Ageing
Wissenschaftlicher MA / Postdoctoral researcher
Office: Joseph-Stelzmann 9b, D-50931 Köln / Cologne
Postal address: Postfach 41 06 23, D-50866 Köln / Cologne
Tel.:<a href="tel:+49%20221%20496" value="+49221496" target="_blank" class="">+49 (0)221 496 
Mobile:           +49 0221 37970 496



On Mon, Feb 20, 2017 at 10:53 AM, Carson Holt <[hidden email]> wrote:
It may work as is as long as you don’t need any of the additional options that have been added. If not, you can also just run it outside of MAKER then provide the result in GFF3 format to pred_gff.

—Carson

On Feb 20, 2017, at 2:51 AM, Ray Cui <[hidden email]> wrote:

I see. Is there any recent plans to incorporate it into Maker?

If not, I could try to see if I can adapt the current Maker script.

Ray

Dr. Rongfeng (Ray) Cui
Max-Planck-Institut für Biologie des Alterns / Max Planck Institute for Biology of Ageing
Wissenschaftlicher MA / Postdoctoral researcher
Office: Joseph-Stelzmann 9b, D-50931 Köln / Cologne
Postal address: Postfach 41 06 23, D-50866 Köln / Cologne
Tel.:<a href="tel:+49%20221%20496" value="+49221496" target="_blank" class="">+49 (0)221 496 
Mobile:           +49 0221 37970 496



On Mon, Feb 20, 2017 at 10:46 AM, Carson Holt <[hidden email]> wrote:
Yes. This is a recent update. It’s an attempt to merge GeneMark-ET and GeneMark-EP into GeneMark-ES scripts.

—Carson



On Feb 20, 2017, at 2:43 AM, Ray Cui <[hidden email]> wrote:

I see, I will take a look at the wrapper gmhmm_wrap. 

I think there must have been a big update between different Genemark versions. It seems that they now also supports evidence being fed into the prediction stage.

The name of the latest version of the genemark script has been changed to "gmes_petap.pl", with the following command lines options:

Usage:  /beegfs/group_dv/software/source/gm_et_linux_64/gmes_petap/gmes_petap.pl  [options]  --sequence [filename]

GeneMark-ES Suite version 4.33
   includes transcript (GeneMark-ET) and protein (GeneMark-EP) based training and prediction

Input sequence/s should be in FASTA format

Algorithm options
  --ES           to run self-training
  --fungus       to run algorithm with branch point model (most useful for fungal genomes)
  --ET           [filename]; to run training with introns coordinates from RNA-Seq read alignments (GFF format)
  --et_score     [number]; 4 (default) minimum score of intron in initiation of the ET algorithm
  --evidence     [filename]; to use in prediction external evidence (RNA or protein) mapped to genome
  --training_only     to run only training step
  --prediction_only   to run only prediction step
  --predict_with [filename]; predict genes using this file species specific parameters (bypass regular training and prediction steps)

Sequence pre-processing options
  --max_contig   [number]; 5000000 (default) will split input genomic sequence into contigs shorter then max_contig
  --min_contig   [number]; 50000 (default); will ignore contigs shorter then min_contig in training 
  --max_gap      [number]; 5000 (default); will split sequence at gaps longer than max_gap
                 Letters 'n' and 'N' are interpreted as standing within gaps 
  --max_mask     [number]; 5000 (default); will split sequence at repeats longer then max_mask
                 Letters 'x' and 'X' are interpreted as results of hard masking of repeats
  --soft_mask    [number] to indicate that lowercase letters stand for repeats; utilize only lowercase repeats longer than specified length

Run options
  --cores        [number]; 1 (default) to run program with multiple threads 
  --pbs          to run on cluster with PBS support
  --v            verbose

Customizing parameters:
  --max_intron          [number]; default 10000 (3000 fungi), maximum length of intron
  --max_intergenic      [number]; default 10000, maximum length of intergenic regions
  --min_gene_prediction [number]; default 300 (120 fungi) minimum allowed gene length in prediction step

Developer options:
  --usr_cfg      [filename]; to customize configuration file
  --ini_mod      [filename]; use this file with parameters for algorithm initiation
  --test_set     [filename]; to evaluate prediction accuracy on the given test set
  --key_bin
  --debug
# -------------------


Dr. Rongfeng (Ray) Cui
Max-Planck-Institut für Biologie des Alterns / Max Planck Institute for Biology of Ageing
Wissenschaftlicher MA / Postdoctoral researcher
Office: Joseph-Stelzmann 9b, D-50931 Köln / Cologne
Postal address: Postfach 41 06 23, D-50866 Köln / Cologne
Tel.:<a href="tel:+49%20221%20496" value="+49221496" target="_blank" class="">+49 (0)221 496 
Mobile:           +49 0221 37970 496



On Mon, Feb 20, 2017 at 10:28 AM, Carson Holt <[hidden email]> wrote:
Also note that the gmhmme3 executable distributed with different flavors of genemark has had the same name but has been quite different in both command line structure and output between flavors.

—Carson



On Feb 20, 2017, at 2:08 AM, Ray Cui <[hidden email]> wrote:

Thanks. 

Are the "--max_intron" and "--max_intergenic" parameters automatically set by Maker when calling Genemark?
If you can point me to the part of the maker source code that construct the final genemark command line I can also take a look.

Best Regards,
Ray

Dr. Rongfeng (Ray) Cui
Max-Planck-Institut für Biologie des Alterns / Max Planck Institute for Biology of Ageing
Wissenschaftlicher MA / Postdoctoral researcher
Office: Joseph-Stelzmann 9b, D-50931 Köln / Cologne
Postal address: Postfach 41 06 23, D-50866 Köln / Cologne
Tel.:<a href="tel:+49%20221%20496" value="+49221496" target="_blank" class="">+49 (0)221 496 
Mobile:           +49 0221 37970 496



On Mon, Feb 20, 2017 at 10:02 AM, Carson Holt <[hidden email]> wrote:
The names of scripts used are listed in the maker_exe.ctl file. It depends on if formatting or any flags have changed between versions.

—Carson



On Feb 20, 2017, at 1:59 AM, Ray Cui <[hidden email]> wrote:

Dear Carson,

        I have now run GeneMark-ET, and it produces a trained .mod file. I think it can be then passed to Maker. Do you know what is the final constructed command line in Maker that calls genemark? Genemark-et and es use the same perl script so one probably only needs to use the  --prediction  and --predict_with xxx.mod options to predict genes using the species specific parameters (bypassing regular training and prediction steps)


Best Regards,
Ray

Dr. Rongfeng (Ray) Cui
Max-Planck-Institut für Biologie des Alterns / Max Planck Institute for Biology of Ageing
Wissenschaftlicher MA / Postdoctoral researcher
Office: Joseph-Stelzmann 9b, D-50931 Köln / Cologne
Postal address: Postfach 41 06 23, D-50866 Köln / Cologne
Tel.:<a href="tel:+49%20221%20496" value="+49221496" target="_blank" class="">+49 (0)221 496 
Mobile:           +49 0221 37970 496



On Mon, Feb 20, 2017 at 6:39 AM, Carson Holt <[hidden email]> wrote:
MAKER was support was designed with GeneMark-ES. It may or may not work with GeneMark-ET. So any MAKER related archive posts etc. will be related to the latter.

With GeneMark-ES, you simply provided a genome assembly and let it run. It would then produce several files and output directories. The es.mod file was the one you provided to MAKER. I don’t know how this compares to GeneMark-ET.

—Carson



On Feb 14, 2017, at 8:44 AM, Ray Cui <[hidden email]> wrote:

Hi Daniel,

        thanks! It seems that Genemark-ET has a "--training" flag, is that the flag I should use when training or should I just let Genemark also perform the prediction? 

Ray

Dr. Rongfeng (Ray) Cui
Max-Planck-Institut für Biologie des Alterns / Max Planck Institute for Biology of Ageing
Wissenschaftlicher MA / Postdoctoral researcher
Office: Joseph-Stelzmann 9b, D-50931 Köln / Cologne
Postal address: Postfach 41 06 23, D-50866 Köln / Cologne
Tel.:<a href="tel:+49%20221%20496" value="+49221496" target="_blank" class="">+49 (0)221 496 
Mobile:           +49 0221 37970 496



On Tue, Feb 14, 2017 at 3:43 PM, Ence,daniel <[hidden email]> wrote:
Hi Ray, 

I think you’re on the right track with training Genemark with RNAseq data. It should only change the training steps, which are external to MAKER, but not how MAKER runs Genemark. You’ll still give MAKER the path to the “es.mod" file made by Genemark. 

For the 2nd question, in the MAKER beta 3, MAKER creates a control file for EVM, in which you set your weights for the various inputs, and then MAKER runs EVM alongside all the other gene predictors and chooses the model that is best supported by the evidence. 

~Daniel



On Feb 14, 2017, at 7:38 AM, Ray Cui <[hidden email]> wrote:

Hello,

         I have sucessfully installed Maker beta 3, working with both Augustus and SNAP. I also want to try adding GeneMark-ES to the ab initio predictor.
         When I read the GeneMark-ES manual, it says that one can use RNAseq data to aid training. I'm wondering what would be the best way to integrate Genemark-ET predictions into Maker. Should I run Genemark-ET independent of Maker, then integrate the GFF at some point during the maker process? If so, how should I edit the configuration file? Currently maker has an option called "gmhmm". Should I then train GeneMark by myself with RNAseq data, then feed the hmm to maker?
          
          And perhaps an unrelated question is that now Maker beta 3 supports EVM. I'm wondering how EVM is used by Maker (at which step, what does it do), and how does it differ from what Maker is designed for (both reconciles different gene models). 

Best Regards,
Ray

Dr. Rongfeng (Ray) Cui
Max-Planck-Institut für Biologie des Alterns / Max Planck Institute for Biology of Ageing
Wissenschaftlicher MA / Postdoctoral researcher
Office: Joseph-Stelzmann 9b, D-50931 Köln / Cologne
Postal address: Postfach 41 06 23, D-50866 Köln / Cologne
Tel.:<a href="tel:+49%20221%20496" value="+49221496" target="_blank" class="">+49 (0)221 496 
Mobile:           +49 0221 37970 496


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org






















_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Using GeneMark-ET with RNAseq intron hints

Ray Cui-2
Dear Carson,

         thank you for the explanation! Now I see why sometimes it seems that EVM doesn't produce any model for a particular cluster.

Best Regards,
Ray

Dr. Rongfeng (Ray) Cui
Max-Planck-Institut für Biologie des Alterns / Max Planck Institute for Biology of Ageing
Wissenschaftlicher MA / Postdoctoral researcher
Office: Joseph-Stelzmann 9b, D-50931 Köln / Cologne
Postal address: Postfach 41 06 23, D-50866 Köln / Cologne
Tel.:+49 (0)221 496 
Mobile:           +49 0221 37970 496



On Thu, Mar 16, 2017 at 4:19 PM, Carson Holt <[hidden email]> wrote:
Final results with source maker will be of type gene/mRNA/exon/CDS. They have been further processed beyond the raw results, and may include extensions such as the addition of UTR for example (or hint based recomputation in the case of SNAP and Augustus). The gene ID of the maker model will let you know the source before additional processing was applied.  Raw results will also be in the file as type match/match_part and source evm/snap/augustus, but are only there for reference purposes (there will also be a raw fasta from each source, but only for reference purposes). All models compete against each other, and the one best matching the evidence is kept. So if SNAP or Augustus scores better than EVM, then that model will be kept for that locus. You can find more detail in the MAKER wiki and the MAKER2 paper for how models compete. 

So the final result is not a superset, rather a merged subset from each potential source.

EVM is not used to obtain a consensus gene model. Its results compete just like all other algorithms. This is because when EVM works it produces beautiful models that score really well, but when it doesn’t work it produces either no model or partial models.

—Carson


On Mar 16, 2017, at 3:07 AM, Ray Cui <[hidden email]> wrote:

Dear Carson,

        thank you so much! I am now peeking into the results for the finished scaffolds. In the gff file, the gene id confuses me a bit. In this file, column 2 is always "maker", but the "ID" attribute in the annotation is prefixed with "snap", "maker", "evm" , "augustus" etc. Does that mean the final annotation is a superset of all gene predictors? If EVM was used to obtain a consensus gene model, why would the other models still show up in the final result set?

Best Regards,
Ray

Dr. Rongfeng (Ray) Cui
Max-Planck-Institut für Biologie des Alterns / Max Planck Institute for Biology of Ageing
Wissenschaftlicher MA / Postdoctoral researcher
Office: Joseph-Stelzmann 9b, D-50931 Köln / Cologne
Postal address: Postfach 41 06 23, D-50866 Köln / Cologne
Tel.:<a href="tel:+49%20221%20496" value="+49221496" target="_blank">+49 (0)221 496 
Mobile:           +49 0221 37970 496



On Wed, Mar 15, 2017 at 3:52 PM, Carson Holt <[hidden email]> wrote:
Maybe. I haven’t tested this, but it should work. Maker supports labels for input by placing a ‘:’ and a label after each file name.

Example—>
est=file1.fasta:label_1,file2.fasta:label_2

If you label your files, then the label will go into the GFF3. So instead of est2genome in column 2, you will get est2genome:label_1 in column 2.

As a result, you should be able to add that label to the EVM settings like so and it will match column 2 of the GFF3—>
evmtrans:est2genome:label1=10

I don’t know if the label will force anything raw analysis to rerun, but it shouldn’t.


—Carson



On Mar 15, 2017, at 5:13 AM, Ray Cui <[hidden email]> wrote:

Hi Carson,

       currently I am partitioning the protein evidence based on phylogenetic relationship into several datasets, supplied as comma delimited list. Is it possible then to specify higher weight for protein2genome models from closer related species than further related taxa?

Ray 

Dr. Rongfeng (Ray) Cui
Max-Planck-Institut für Biologie des Alterns / Max Planck Institute for Biology of Ageing
Wissenschaftlicher MA / Postdoctoral researcher
Office: Joseph-Stelzmann 9b, D-50931 Köln / Cologne
Postal address: Postfach 41 06 23, D-50866 Köln / Cologne
Tel.:<a href="tel:+49%20221%20496" value="+49221496" target="_blank">+49 (0)221 496 
Mobile:           +49 0221 37970 496



On Wed, Mar 15, 2017 at 11:47 AM, Ray Cui <[hidden email]> wrote:
Dear Carson,

       thank you for the pointers! Before running the first round of Maker, I mapped conspecific Trinity assembled proteins (long, "full length" subset) to an earlier version of the genome assembly using my own pipeline and trained Augustus and SNAP that way. I also trained Genemark-ET using TopHat alignments per their instructions. I'm wondering if it will be worth doing a second round, but I guess I will see.

       It is good to know that MAKER will reuse the old results. 

Best Regards,
Ray

Dr. Rongfeng (Ray) Cui
Max-Planck-Institut für Biologie des Alterns / Max Planck Institute for Biology of Ageing
Wissenschaftlicher MA / Postdoctoral researcher
Office: Joseph-Stelzmann 9b, D-50931 Köln / Cologne
Postal address: Postfach 41 06 23, D-50866 Köln / Cologne
Tel.:<a href="tel:+49%20221%20496" value="+49221496" target="_blank">+49 (0)221 496 
Mobile:           +49 0221 37970 496



On Tue, Mar 14, 2017 at 5:58 PM, Carson Holt <[hidden email]> wrote:
You can find lots of info in the devel archives on training. Example —> https://groups.google.com/forum/#!topic/maker-devel/FWMSTdqWQqI


MAKER will reuse old raw results if you rerun in the same directory (only deleting what would be different given altered settings between runs). It will see the existing alignments archived in the datastore as raw reports and just reuse them. The exception to this are the exonerate alignments. They are generated relatively quickly compared to the BLAS T runs, so rerunning them is not too much overhead. Also they are not archived because doing so created IO issues (exonerate is not running in bulk batches like BLAST, rather as multiple small separate runs for each polished read, and archiving a lot of small raw reports can occur so fast when using MPI that it crashes storage servers). So we decided to just not archive exonerate rather than develop a database like bundling/compression mechanism to get around the IO issues.

Thanks,
Carson


On Mar 14, 2017, at 10:44 AM, Ray Cui <[hidden email]> wrote:

Hi Carson,
          Thanks for your prompt response!

          I have a somewhat unrelated question. After the first run of Maker, I want to train Augustus, SNAP and Genemark-ET using the most reliable gene models produced in the first round. What would be a good way to select these gene models? 
          After retraining the ab initio predictors, I also wonder if it's necessary to redo all the alignments (blastx, est2genome, protein2genome etc) in the second iteration, since they are exactly the same as the first run. Perhaps maker can take in the alignment results from the previous run? 

Best Regards,
Ray

Dr. Rongfeng (Ray) Cui
Max-Planck-Institut für Biologie des Alterns / Max Planck Institute for Biology of Ageing
Wissenschaftlicher MA / Postdoctoral researcher
Office: Joseph-Stelzmann 9b, D-50931 Köln / Cologne
Postal address: Postfach 41 06 23, D-50866 Köln / Cologne
Tel.:<a href="tel:+49%20221%20496" value="+49221496" target="_blank">+49 (0)221 496 
Mobile:           +49 0221 37970 496



On Tue, Mar 14, 2017 at 5:37 PM, Ray Cui <[hidden email]> wrote:
I see. If my evm config looks like this:
evmab=5 #default weight for source unspecified ab initio predictions
evmab:snap=5 #weight for snap sourced predictions
evmab:augustus=10 #weight for augustus sourced predictions
evmab:fgenesh=10 #weight for fgenesh sourced predictions
evmab:genemark=5 #weight for genemark sourced predictions

and Column 2 in the genemark.gff is "GeneMark.hmm" , then the value from "evmab" (=5) will be used, is that correct?

Best Regards,
Ray

Dr. Rongfeng (Ray) Cui
Max-Planck-Institut für Biologie des Alterns / Max Planck Institute for Biology of Ageing
Wissenschaftlicher MA / Postdoctoral researcher
Office: Joseph-Stelzmann 9b, D-50931 Köln / Cologne
Postal address: Postfach 41 06 23, D-50866 Köln / Cologne
Tel.:<a href="tel:+49%20221%20496" value="+49221496" target="_blank">+49 (0)221 496 
Mobile:           +49 0221 37970 496



On Tue, Mar 14, 2017 at 5:29 PM, Carson Holt <[hidden email]> wrote:
Column 2 in the GFF3 file is the source column. It is used to specify the source fo the data. That column will also be used by EVM to bin features by their source and apply weights based on source.

—Carson

On Mar 14, 2017, at 10:26 AM, Ray Cui <[hidden email]> wrote:

Thanks! I didn't know you can also name the gff, but I think using the default is fine, that's what I'm doing now.


Best Regards,
Ray

Dr. Rongfeng (Ray) Cui
Max-Planck-Institut für Biologie des Alterns / Max Planck Institute for Biology of Ageing
Wissenschaftlicher MA / Postdoctoral researcher
Office: Joseph-Stelzmann 9b, D-50931 Köln / Cologne
Postal address: Postfach 41 06 23, D-50866 Köln / Cologne
Tel.:<a href="tel:+49%20221%20496" value="+49221496" target="_blank">+49 (0)221 496 
Mobile:           +49 0221 37970 496



On Tue, Mar 14, 2017 at 5:11 PM, Carson Holt <[hidden email]> wrote:

These are set in the maker_evm.ctl file.

Use whatever you used in the source column of the input GFF3. For example if column 2 is set as GENEMARK, then do this —>
evmab:GENEMARK=7

This also works —>
evmab:pred_gff:GENEMARK=7

Or just set the default —>
evmab=7

—Carson




On Mar 10, 2017, at 8:48 AM, Ray Cui <[hidden email]> wrote:

Dear Carson,

       I think it may be the most straight foward to input the GFF3 instead.

       What is the correct way of setting a weight for the EVM step for this GFF3 models passed through the pred_gff option?

Ray

Dr. Rongfeng (Ray) Cui
Max-Planck-Institut für Biologie des Alterns / Max Planck Institute for Biology of Ageing
Wissenschaftlicher MA / Postdoctoral researcher
Office: Joseph-Stelzmann 9b, D-50931 Köln / Cologne
Postal address: Postfach 41 06 23, D-50866 Köln / Cologne
Tel.:<a href="tel:+49%20221%20496" value="+49221496" target="_blank">+49 (0)221 496 
Mobile:           +49 0221 37970 496



On Mon, Feb 20, 2017 at 10:53 AM, Carson Holt <[hidden email]> wrote:
It may work as is as long as you don’t need any of the additional options that have been added. If not, you can also just run it outside of MAKER then provide the result in GFF3 format to pred_gff.

—Carson

On Feb 20, 2017, at 2:51 AM, Ray Cui <[hidden email]> wrote:

I see. Is there any recent plans to incorporate it into Maker?

If not, I could try to see if I can adapt the current Maker script.

Ray

Dr. Rongfeng (Ray) Cui
Max-Planck-Institut für Biologie des Alterns / Max Planck Institute for Biology of Ageing
Wissenschaftlicher MA / Postdoctoral researcher
Office: Joseph-Stelzmann 9b, D-50931 Köln / Cologne
Postal address: Postfach 41 06 23, D-50866 Köln / Cologne
Tel.:<a href="tel:+49%20221%20496" value="+49221496" target="_blank">+49 (0)221 496 
Mobile:           +49 0221 37970 496



On Mon, Feb 20, 2017 at 10:46 AM, Carson Holt <[hidden email]> wrote:
Yes. This is a recent update. It’s an attempt to merge GeneMark-ET and GeneMark-EP into GeneMark-ES scripts.

—Carson



On Feb 20, 2017, at 2:43 AM, Ray Cui <[hidden email]> wrote:

I see, I will take a look at the wrapper gmhmm_wrap. 

I think there must have been a big update between different Genemark versions. It seems that they now also supports evidence being fed into the prediction stage.

The name of the latest version of the genemark script has been changed to "gmes_petap.pl", with the following command lines options:

Usage:  /beegfs/group_dv/software/source/gm_et_linux_64/gmes_petap/gmes_petap.pl  [options]  --sequence [filename]

GeneMark-ES Suite version 4.33
   includes transcript (GeneMark-ET) and protein (GeneMark-EP) based training and prediction

Input sequence/s should be in FASTA format

Algorithm options
  --ES           to run self-training
  --fungus       to run algorithm with branch point model (most useful for fungal genomes)
  --ET           [filename]; to run training with introns coordinates from RNA-Seq read alignments (GFF format)
  --et_score     [number]; 4 (default) minimum score of intron in initiation of the ET algorithm
  --evidence     [filename]; to use in prediction external evidence (RNA or protein) mapped to genome
  --training_only     to run only training step
  --prediction_only   to run only prediction step
  --predict_with [filename]; predict genes using this file species specific parameters (bypass regular training and prediction steps)

Sequence pre-processing options
  --max_contig   [number]; 5000000 (default) will split input genomic sequence into contigs shorter then max_contig
  --min_contig   [number]; 50000 (default); will ignore contigs shorter then min_contig in training 
  --max_gap      [number]; 5000 (default); will split sequence at gaps longer than max_gap
                 Letters 'n' and 'N' are interpreted as standing within gaps 
  --max_mask     [number]; 5000 (default); will split sequence at repeats longer then max_mask
                 Letters 'x' and 'X' are interpreted as results of hard masking of repeats
  --soft_mask    [number] to indicate that lowercase letters stand for repeats; utilize only lowercase repeats longer than specified length

Run options
  --cores        [number]; 1 (default) to run program with multiple threads 
  --pbs          to run on cluster with PBS support
  --v            verbose

Customizing parameters:
  --max_intron          [number]; default 10000 (3000 fungi), maximum length of intron
  --max_intergenic      [number]; default 10000, maximum length of intergenic regions
  --min_gene_prediction [number]; default 300 (120 fungi) minimum allowed gene length in prediction step

Developer options:
  --usr_cfg      [filename]; to customize configuration file
  --ini_mod      [filename]; use this file with parameters for algorithm initiation
  --test_set     [filename]; to evaluate prediction accuracy on the given test set
  --key_bin
  --debug
# -------------------


Dr. Rongfeng (Ray) Cui
Max-Planck-Institut für Biologie des Alterns / Max Planck Institute for Biology of Ageing
Wissenschaftlicher MA / Postdoctoral researcher
Office: Joseph-Stelzmann 9b, D-50931 Köln / Cologne
Postal address: Postfach 41 06 23, D-50866 Köln / Cologne
Tel.:<a href="tel:+49%20221%20496" value="+49221496" target="_blank">+49 (0)221 496 
Mobile:           +49 0221 37970 496



On Mon, Feb 20, 2017 at 10:28 AM, Carson Holt <[hidden email]> wrote:
Also note that the gmhmme3 executable distributed with different flavors of genemark has had the same name but has been quite different in both command line structure and output between flavors.

—Carson



On Feb 20, 2017, at 2:08 AM, Ray Cui <[hidden email]> wrote:

Thanks. 

Are the "--max_intron" and "--max_intergenic" parameters automatically set by Maker when calling Genemark?
If you can point me to the part of the maker source code that construct the final genemark command line I can also take a look.

Best Regards,
Ray

Dr. Rongfeng (Ray) Cui
Max-Planck-Institut für Biologie des Alterns / Max Planck Institute for Biology of Ageing
Wissenschaftlicher MA / Postdoctoral researcher
Office: Joseph-Stelzmann 9b, D-50931 Köln / Cologne
Postal address: Postfach 41 06 23, D-50866 Köln / Cologne
Tel.:<a href="tel:+49%20221%20496" value="+49221496" target="_blank">+49 (0)221 496 
Mobile:           +49 0221 37970 496



On Mon, Feb 20, 2017 at 10:02 AM, Carson Holt <[hidden email]> wrote:
The names of scripts used are listed in the maker_exe.ctl file. It depends on if formatting or any flags have changed between versions.

—Carson



On Feb 20, 2017, at 1:59 AM, Ray Cui <[hidden email]> wrote:

Dear Carson,

        I have now run GeneMark-ET, and it produces a trained .mod file. I think it can be then passed to Maker. Do you know what is the final constructed command line in Maker that calls genemark? Genemark-et and es use the same perl script so one probably only needs to use the  --prediction  and --predict_with xxx.mod options to predict genes using the species specific parameters (bypassing regular training and prediction steps)


Best Regards,
Ray

Dr. Rongfeng (Ray) Cui
Max-Planck-Institut für Biologie des Alterns / Max Planck Institute for Biology of Ageing
Wissenschaftlicher MA / Postdoctoral researcher
Office: Joseph-Stelzmann 9b, D-50931 Köln / Cologne
Postal address: Postfach 41 06 23, D-50866 Köln / Cologne
Tel.:<a href="tel:+49%20221%20496" value="+49221496" target="_blank">+49 (0)221 496 
Mobile:           +49 0221 37970 496



On Mon, Feb 20, 2017 at 6:39 AM, Carson Holt <[hidden email]> wrote:
MAKER was support was designed with GeneMark-ES. It may or may not work with GeneMark-ET. So any MAKER related archive posts etc. will be related to the latter.

With GeneMark-ES, you simply provided a genome assembly and let it run. It would then produce several files and output directories. The es.mod file was the one you provided to MAKER. I don’t know how this compares to GeneMark-ET.

—Carson



On Feb 14, 2017, at 8:44 AM, Ray Cui <[hidden email]> wrote:

Hi Daniel,

        thanks! It seems that Genemark-ET has a "--training" flag, is that the flag I should use when training or should I just let Genemark also perform the prediction? 

Ray

Dr. Rongfeng (Ray) Cui
Max-Planck-Institut für Biologie des Alterns / Max Planck Institute for Biology of Ageing
Wissenschaftlicher MA / Postdoctoral researcher
Office: Joseph-Stelzmann 9b, D-50931 Köln / Cologne
Postal address: Postfach 41 06 23, D-50866 Köln / Cologne
Tel.:<a href="tel:+49%20221%20496" value="+49221496" target="_blank">+49 (0)221 496 
Mobile:           +49 0221 37970 496



On Tue, Feb 14, 2017 at 3:43 PM, Ence,daniel <[hidden email]> wrote:
Hi Ray, 

I think you’re on the right track with training Genemark with RNAseq data. It should only change the training steps, which are external to MAKER, but not how MAKER runs Genemark. You’ll still give MAKER the path to the “es.mod" file made by Genemark. 

For the 2nd question, in the MAKER beta 3, MAKER creates a control file for EVM, in which you set your weights for the various inputs, and then MAKER runs EVM alongside all the other gene predictors and chooses the model that is best supported by the evidence. 

~Daniel



On Feb 14, 2017, at 7:38 AM, Ray Cui <[hidden email]> wrote:

Hello,

         I have sucessfully installed Maker beta 3, working with both Augustus and SNAP. I also want to try adding GeneMark-ES to the ab initio predictor.
         When I read the GeneMark-ES manual, it says that one can use RNAseq data to aid training. I'm wondering what would be the best way to integrate Genemark-ET predictions into Maker. Should I run Genemark-ET independent of Maker, then integrate the GFF at some point during the maker process? If so, how should I edit the configuration file? Currently maker has an option called "gmhmm". Should I then train GeneMark by myself with RNAseq data, then feed the hmm to maker?
          
          And perhaps an unrelated question is that now Maker beta 3 supports EVM. I'm wondering how EVM is used by Maker (at which step, what does it do), and how does it differ from what Maker is designed for (both reconciles different gene models). 

Best Regards,
Ray

Dr. Rongfeng (Ray) Cui
Max-Planck-Institut für Biologie des Alterns / Max Planck Institute for Biology of Ageing
Wissenschaftlicher MA / Postdoctoral researcher
Office: Joseph-Stelzmann 9b, D-50931 Köln / Cologne
Postal address: Postfach 41 06 23, D-50866 Köln / Cologne
Tel.:<a href="tel:+49%20221%20496" value="+49221496" target="_blank">+49 (0)221 496 
Mobile:           +49 0221 37970 496


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org























_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: Using GeneMark-ET with RNAseq intron hints

Carson Holt-2
1. Verify that the issue is not being caused by hints from evidence (i.e. that you aren’t feeding fused mRNA-seq assemblies or protein evidence). Fused evidence will result in hints that fuse models.
2. If it still have an issue, then drop SNAP. Not all predictors work well on all genomes.

Also no one can post to the google group. It’s just for archival. All message have to go to the mailing list here, and they then get archived on google —> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
The mailing list logs shows that you requested to unsubscribed earlier today.

—Carson


On Mar 16, 2017, at 11:22 AM, Ray Cui <[hidden email]> wrote:

Hi Carson,

        due to some reason I can't seem to post anymore on the google group.

        After looking at the results, it appears that SNAP performs poorly compared to genemark-ET and augustus. It looks like it's very prone to fusing neighboring genes and getting false positives. Is that a general thing you see in vertebrate genomes with SNAP? I saw that you didn't recommend SNAP for primates, perhaps the issue is similar?

        Attached you can see a screen shot of IGV browser, with all evidence tracks separated.

Ray

Dr. Rongfeng (Ray) Cui
Max-Planck-Institut für Biologie des Alterns / Max Planck Institute for Biology of Ageing
Wissenschaftlicher MA / Postdoctoral researcher
Office: Joseph-Stelzmann 9b, D-50931 Köln / Cologne
Postal address: Postfach 41 06 23, D-50866 Köln / Cologne
Tel.:+49 (0)221 496 
Mobile:           +49 0221 37970 496



On Thu, Mar 16, 2017 at 5:02 PM, Ray Cui <[hidden email]> wrote:
Dear Carson,

         thank you for the explanation! Now I see why sometimes it seems that EVM doesn't produce any model for a particular cluster.

Best Regards,
Ray

Dr. Rongfeng (Ray) Cui
Max-Planck-Institut für Biologie des Alterns / Max Planck Institute for Biology of Ageing
Wissenschaftlicher MA / Postdoctoral researcher
Office: Joseph-Stelzmann 9b, D-50931 Köln / Cologne
Postal address: Postfach 41 06 23, D-50866 Köln / Cologne
Tel.:<a href="tel:+49%20221%20496" value="+49221496" target="_blank" class="">+49 (0)221 496 
Mobile:           +49 0221 37970 496



On Thu, Mar 16, 2017 at 4:19 PM, Carson Holt <[hidden email]> wrote:
Final results with source maker will be of type gene/mRNA/exon/CDS. They have been further processed beyond the raw results, and may include extensions such as the addition of UTR for example (or hint based recomputation in the case of SNAP and Augustus). The gene ID of the maker model will let you know the source before additional processing was applied.  Raw results will also be in the file as type match/match_part and source evm/snap/augustus, but are only there for reference purposes (there will also be a raw fasta from each source, but only for reference purposes). All models compete against each other, and the one best matching the evidence is kept. So if SNAP or Augustus scores better than EVM, then that model will be kept for that locus. You can find more detail in the MAKER wiki and the MAKER2 paper for how models compete. 

So the final result is not a superset, rather a merged subset from each potential source.

EVM is not used to obtain a consensus gene model. Its results compete just like all other algorithms. This is because when EVM works it produces beautiful models that score really well, but when it doesn’t work it produces either no model or partial models.

—Carson


On Mar 16, 2017, at 3:07 AM, Ray Cui <[hidden email]> wrote:

Dear Carson,

        thank you so much! I am now peeking into the results for the finished scaffolds. In the gff file, the gene id confuses me a bit. In this file, column 2 is always "maker", but the "ID" attribute in the annotation is prefixed with "snap", "maker", "evm" , "augustus" etc. Does that mean the final annotation is a superset of all gene predictors? If EVM was used to obtain a consensus gene model, why would the other models still show up in the final result set?

Best Regards,
Ray

Dr. Rongfeng (Ray) Cui
Max-Planck-Institut für Biologie des Alterns / Max Planck Institute for Biology of Ageing
Wissenschaftlicher MA / Postdoctoral researcher
Office: Joseph-Stelzmann 9b, D-50931 Köln / Cologne
Postal address: Postfach 41 06 23, D-50866 Köln / Cologne
Tel.:<a href="tel:+49%20221%20496" value="+49221496" target="_blank" class="">+49 (0)221 496 
Mobile:           +49 0221 37970 496



On Wed, Mar 15, 2017 at 3:52 PM, Carson Holt <[hidden email]> wrote:
Maybe. I haven’t tested this, but it should work. Maker supports labels for input by placing a ‘:’ and a label after each file name.

Example—>
est=file1.fasta:label_1,file2.fasta:label_2

If you label your files, then the label will go into the GFF3. So instead of est2genome in column 2, you will get est2genome:label_1 in column 2.

As a result, you should be able to add that label to the EVM settings like so and it will match column 2 of the GFF3—>
evmtrans:est2genome:label1=10

I don’t know if the label will force anything raw analysis to rerun, but it shouldn’t.


—Carson



On Mar 15, 2017, at 5:13 AM, Ray Cui <[hidden email]> wrote:

Hi Carson,

       currently I am partitioning the protein evidence based on phylogenetic relationship into several datasets, supplied as comma delimited list. Is it possible then to specify higher weight for protein2genome models from closer related species than further related taxa?

Ray 

Dr. Rongfeng (Ray) Cui
Max-Planck-Institut für Biologie des Alterns / Max Planck Institute for Biology of Ageing
Wissenschaftlicher MA / Postdoctoral researcher
Office: Joseph-Stelzmann 9b, D-50931 Köln / Cologne
Postal address: Postfach 41 06 23, D-50866 Köln / Cologne
Tel.:<a href="tel:+49%20221%20496" value="+49221496" target="_blank" class="">+49 (0)221 496 
Mobile:           +49 0221 37970 496



On Wed, Mar 15, 2017 at 11:47 AM, Ray Cui <[hidden email]> wrote:
Dear Carson,

       thank you for the pointers! Before running the first round of Maker, I mapped conspecific Trinity assembled proteins (long, "full length" subset) to an earlier version of the genome assembly using my own pipeline and trained Augustus and SNAP that way. I also trained Genemark-ET using TopHat alignments per their instructions. I'm wondering if it will be worth doing a second round, but I guess I will see.

       It is good to know that MAKER will reuse the old results. 

Best Regards,
Ray

Dr. Rongfeng (Ray) Cui
Max-Planck-Institut für Biologie des Alterns / Max Planck Institute for Biology of Ageing
Wissenschaftlicher MA / Postdoctoral researcher
Office: Joseph-Stelzmann 9b, D-50931 Köln / Cologne
Postal address: Postfach 41 06 23, D-50866 Köln / Cologne
Tel.:<a href="tel:+49%20221%20496" value="+49221496" target="_blank" class="">+49 (0)221 496 
Mobile:           +49 0221 37970 496



On Tue, Mar 14, 2017 at 5:58 PM, Carson Holt <[hidden email]> wrote:
You can find lots of info in the devel archives on training. Example —> https://groups.google.com/forum/#!topic/maker-devel/FWMSTdqWQqI


MAKER will reuse old raw results if you rerun in the same directory (only deleting what would be different given altered settings between runs). It will see the existing alignments archived in the datastore as raw reports and just reuse them. The exception to this are the exonerate alignments. They are generated relatively quickly compared to the BLAS T runs, so rerunning them is not too much overhead. Also they are not archived because doing so created IO issues (exonerate is not running in bulk batches like BLAST, rather as multiple small separate runs for each polished read, and archiving a lot of small raw reports can occur so fast when using MPI that it crashes storage servers). So we decided to just not archive exonerate rather than develop a database like bundling/compression mechanism to get around the IO issues.

Thanks,
Carson


On Mar 14, 2017, at 10:44 AM, Ray Cui <[hidden email]> wrote:

Hi Carson,
          Thanks for your prompt response!

          I have a somewhat unrelated question. After the first run of Maker, I want to train Augustus, SNAP and Genemark-ET using the most reliable gene models produced in the first round. What would be a good way to select these gene models? 
          After retraining the ab initio predictors, I also wonder if it's necessary to redo all the alignments (blastx, est2genome, protein2genome etc) in the second iteration, since they are exactly the same as the first run. Perhaps maker can take in the alignment results from the previous run? 

Best Regards,
Ray

Dr. Rongfeng (Ray) Cui
Max-Planck-Institut für Biologie des Alterns / Max Planck Institute for Biology of Ageing
Wissenschaftlicher MA / Postdoctoral researcher
Office: Joseph-Stelzmann 9b, D-50931 Köln / Cologne
Postal address: Postfach 41 06 23, D-50866 Köln / Cologne
Tel.:<a href="tel:+49%20221%20496" value="+49221496" target="_blank" class="">+49 (0)221 496 
Mobile:           +49 0221 37970 496



On Tue, Mar 14, 2017 at 5:37 PM, Ray Cui <[hidden email]> wrote:
I see. If my evm config looks like this:
evmab=5 #default weight for source unspecified ab initio predictions
evmab:snap=5 #weight for snap sourced predictions
evmab:augustus=10 #weight for augustus sourced predictions
evmab:fgenesh=10 #weight for fgenesh sourced predictions
evmab:genemark=5 #weight for genemark sourced predictions

and Column 2 in the genemark.gff is "GeneMark.hmm" , then the value from "evmab" (=5) will be used, is that correct?

Best Regards,
Ray

Dr. Rongfeng (Ray) Cui
Max-Planck-Institut für Biologie des Alterns / Max Planck Institute for Biology of Ageing
Wissenschaftlicher MA / Postdoctoral researcher
Office: Joseph-Stelzmann 9b, D-50931 Köln / Cologne
Postal address: Postfach 41 06 23, D-50866 Köln / Cologne
Tel.:<a href="tel:+49%20221%20496" value="+49221496" target="_blank" class="">+49 (0)221 496 
Mobile:           +49 0221 37970 496



On Tue, Mar 14, 2017 at 5:29 PM, Carson Holt <[hidden email]> wrote:
Column 2 in the GFF3 file is the source column. It is used to specify the source fo the data. That column will also be used by EVM to bin features by their source and apply weights based on source.

—Carson

On Mar 14, 2017, at 10:26 AM, Ray Cui <[hidden email]> wrote:

Thanks! I didn't know you can also name the gff, but I think using the default is fine, that's what I'm doing now.


Best Regards,
Ray

Dr. Rongfeng (Ray) Cui
Max-Planck-Institut für Biologie des Alterns / Max Planck Institute for Biology of Ageing
Wissenschaftlicher MA / Postdoctoral researcher
Office: Joseph-Stelzmann 9b, D-50931 Köln / Cologne
Postal address: Postfach 41 06 23, D-50866 Köln / Cologne
Tel.:<a href="tel:+49%20221%20496" value="+49221496" target="_blank" class="">+49 (0)221 496 
Mobile:           +49 0221 37970 496



On Tue, Mar 14, 2017 at 5:11 PM, Carson Holt <[hidden email]> wrote:

These are set in the maker_evm.ctl file.

Use whatever you used in the source column of the input GFF3. For example if column 2 is set as GENEMARK, then do this —>
evmab:GENEMARK=7

This also works —>
evmab:pred_gff:GENEMARK=7

Or just set the default —>
evmab=7

—Carson




On Mar 10, 2017, at 8:48 AM, Ray Cui <[hidden email]> wrote:

Dear Carson,

       I think it may be the most straight foward to input the GFF3 instead.

       What is the correct way of setting a weight for the EVM step for this GFF3 models passed through the pred_gff option?

Ray

Dr. Rongfeng (Ray) Cui
Max-Planck-Institut für Biologie des Alterns / Max Planck Institute for Biology of Ageing
Wissenschaftlicher MA / Postdoctoral researcher
Office: Joseph-Stelzmann 9b, D-50931 Köln / Cologne
Postal address: Postfach 41 06 23, D-50866 Köln / Cologne
Tel.:<a href="tel:+49%20221%20496" value="+49221496" target="_blank" class="">+49 (0)221 496 
Mobile:           +49 0221 37970 496



On Mon, Feb 20, 2017 at 10:53 AM, Carson Holt <[hidden email]> wrote:
It may work as is as long as you don’t need any of the additional options that have been added. If not, you can also just run it outside of MAKER then provide the result in GFF3 format to pred_gff.

—Carson

On Feb 20, 2017, at 2:51 AM, Ray Cui <[hidden email]> wrote:

I see. Is there any recent plans to incorporate it into Maker?

If not, I could try to see if I can adapt the current Maker script.

Ray

Dr. Rongfeng (Ray) Cui
Max-Planck-Institut für Biologie des Alterns / Max Planck Institute for Biology of Ageing
Wissenschaftlicher MA / Postdoctoral researcher
Office: Joseph-Stelzmann 9b, D-50931 Köln / Cologne
Postal address: Postfach 41 06 23, D-50866 Köln / Cologne
Tel.:<a href="tel:+49%20221%20496" value="+49221496" target="_blank" class="">+49 (0)221 496 
Mobile:           +49 0221 37970 496



On Mon, Feb 20, 2017 at 10:46 AM, Carson Holt <[hidden email]> wrote:
Yes. This is a recent update. It’s an attempt to merge GeneMark-ET and GeneMark-EP into GeneMark-ES scripts.

—Carson



On Feb 20, 2017, at 2:43 AM, Ray Cui <[hidden email]> wrote:

I see, I will take a look at the wrapper gmhmm_wrap. 

I think there must have been a big update between different Genemark versions. It seems that they now also supports evidence being fed into the prediction stage.

The name of the latest version of the genemark script has been changed to "gmes_petap.pl", with the following command lines options:

Usage:  /beegfs/group_dv/software/source/gm_et_linux_64/gmes_petap/gmes_petap.pl  [options]  --sequence [filename]

GeneMark-ES Suite version 4.33
   includes transcript (GeneMark-ET) and protein (GeneMark-EP) based training and prediction

Input sequence/s should be in FASTA format

Algorithm options
  --ES           to run self-training
  --fungus       to run algorithm with branch point model (most useful for fungal genomes)
  --ET           [filename]; to run training with introns coordinates from RNA-Seq read alignments (GFF format)
  --et_score     [number]; 4 (default) minimum score of intron in initiation of the ET algorithm
  --evidence     [filename]; to use in prediction external evidence (RNA or protein) mapped to genome
  --training_only     to run only training step
  --prediction_only   to run only prediction step
  --predict_with [filename]; predict genes using this file species specific parameters (bypass regular training and prediction steps)

Sequence pre-processing options
  --max_contig   [number]; 5000000 (default) will split input genomic sequence into contigs shorter then max_contig
  --min_contig   [number]; 50000 (default); will ignore contigs shorter then min_contig in training 
  --max_gap      [number]; 5000 (default); will split sequence at gaps longer than max_gap
                 Letters 'n' and 'N' are interpreted as standing within gaps 
  --max_mask     [number]; 5000 (default); will split sequence at repeats longer then max_mask
                 Letters 'x' and 'X' are interpreted as results of hard masking of repeats
  --soft_mask    [number] to indicate that lowercase letters stand for repeats; utilize only lowercase repeats longer than specified length

Run options
  --cores        [number]; 1 (default) to run program with multiple threads 
  --pbs          to run on cluster with PBS support
  --v            verbose

Customizing parameters:
  --max_intron          [number]; default 10000 (3000 fungi), maximum length of intron
  --max_intergenic      [number]; default 10000, maximum length of intergenic regions
  --min_gene_prediction [number]; default 300 (120 fungi) minimum allowed gene length in prediction step

Developer options:
  --usr_cfg      [filename]; to customize configuration file
  --ini_mod      [filename]; use this file with parameters for algorithm initiation
  --test_set     [filename]; to evaluate prediction accuracy on the given test set
  --key_bin
  --debug
# -------------------


Dr. Rongfeng (Ray) Cui
Max-Planck-Institut für Biologie des Alterns / Max Planck Institute for Biology of Ageing
Wissenschaftlicher MA / Postdoctoral researcher
Office: Joseph-Stelzmann 9b, D-50931 Köln / Cologne
Postal address: Postfach 41 06 23, D-50866 Köln / Cologne
Tel.:<a href="tel:+49%20221%20496" value="+49221496" target="_blank" class="">+49 (0)221 496 
Mobile:           +49 0221 37970 496



On Mon, Feb 20, 2017 at 10:28 AM, Carson Holt <[hidden email]> wrote:
Also note that the gmhmme3 executable distributed with different flavors of genemark has had the same name but has been quite different in both command line structure and output between flavors.

—Carson



On Feb 20, 2017, at 2:08 AM, Ray Cui <[hidden email]> wrote:

Thanks. 

Are the "--max_intron" and "--max_intergenic" parameters automatically set by Maker when calling Genemark?
If you can point me to the part of the maker source code that construct the final genemark command line I can also take a look.

Best Regards,
Ray

Dr. Rongfeng (Ray) Cui
Max-Planck-Institut für Biologie des Alterns / Max Planck Institute for Biology of Ageing
Wissenschaftlicher MA / Postdoctoral researcher
Office: Joseph-Stelzmann 9b, D-50931 Köln / Cologne
Postal address: Postfach 41 06 23, D-50866 Köln / Cologne
Tel.:<a href="tel:+49%20221%20496" value="+49221496" target="_blank" class="">+49 (0)221 496 
Mobile:           +49 0221 37970 496



On Mon, Feb 20, 2017 at 10:02 AM, Carson Holt <[hidden email]> wrote:
The names of scripts used are listed in the maker_exe.ctl file. It depends on if formatting or any flags have changed between versions.

—Carson



On Feb 20, 2017, at 1:59 AM, Ray Cui <[hidden email]> wrote:

Dear Carson,

        I have now run GeneMark-ET, and it produces a trained .mod file. I think it can be then passed to Maker. Do you know what is the final constructed command line in Maker that calls genemark? Genemark-et and es use the same perl script so one probably only needs to use the  --prediction  and --predict_with xxx.mod options to predict genes using the species specific parameters (bypassing regular training and prediction steps)


Best Regards,
Ray

Dr. Rongfeng (Ray) Cui
Max-Planck-Institut für Biologie des Alterns / Max Planck Institute for Biology of Ageing
Wissenschaftlicher MA / Postdoctoral researcher
Office: Joseph-Stelzmann 9b, D-50931 Köln / Cologne
Postal address: Postfach 41 06 23, D-50866 Köln / Cologne
Tel.:<a href="tel:+49%20221%20496" value="+49221496" target="_blank" class="">+49 (0)221 496 
Mobile:           +49 0221 37970 496



On Mon, Feb 20, 2017 at 6:39 AM, Carson Holt <[hidden email]> wrote:
MAKER was support was designed with GeneMark-ES. It may or may not work with GeneMark-ET. So any MAKER related archive posts etc. will be related to the latter.

With GeneMark-ES, you simply provided a genome assembly and let it run. It would then produce several files and output directories. The es.mod file was the one you provided to MAKER. I don’t know how this compares to GeneMark-ET.

—Carson



On Feb 14, 2017, at 8:44 AM, Ray Cui <[hidden email]> wrote:

Hi Daniel,

        thanks! It seems that Genemark-ET has a "--training" flag, is that the flag I should use when training or should I just let Genemark also perform the prediction? 

Ray

Dr. Rongfeng (Ray) Cui
Max-Planck-Institut für Biologie des Alterns / Max Planck Institute for Biology of Ageing
Wissenschaftlicher MA / Postdoctoral researcher
Office: Joseph-Stelzmann 9b, D-50931 Köln / Cologne
Postal address: Postfach 41 06 23, D-50866 Köln / Cologne
Tel.:<a href="tel:+49%20221%20496" value="+49221496" target="_blank" class="">+49 (0)221 496 
Mobile:           +49 0221 37970 496



On Tue, Feb 14, 2017 at 3:43 PM, Ence,daniel <[hidden email]> wrote:
Hi Ray, 

I think you’re on the right track with training Genemark with RNAseq data. It should only change the training steps, which are external to MAKER, but not how MAKER runs Genemark. You’ll still give MAKER the path to the “es.mod" file made by Genemark. 

For the 2nd question, in the MAKER beta 3, MAKER creates a control file for EVM, in which you set your weights for the various inputs, and then MAKER runs EVM alongside all the other gene predictors and chooses the model that is best supported by the evidence. 

~Daniel



On Feb 14, 2017, at 7:38 AM, Ray Cui <[hidden email]> wrote:

Hello,

         I have sucessfully installed Maker beta 3, working with both Augustus and SNAP. I also want to try adding GeneMark-ES to the ab initio predictor.
         When I read the GeneMark-ES manual, it says that one can use RNAseq data to aid training. I'm wondering what would be the best way to integrate Genemark-ET predictions into Maker. Should I run Genemark-ET independent of Maker, then integrate the GFF at some point during the maker process? If so, how should I edit the configuration file? Currently maker has an option called "gmhmm". Should I then train GeneMark by myself with RNAseq data, then feed the hmm to maker?
          
          And perhaps an unrelated question is that now Maker beta 3 supports EVM. I'm wondering how EVM is used by Maker (at which step, what does it do), and how does it differ from what Maker is designed for (both reconciles different gene models). 

Best Regards,
Ray

Dr. Rongfeng (Ray) Cui
Max-Planck-Institut für Biologie des Alterns / Max Planck Institute for Biology of Ageing
Wissenschaftlicher MA / Postdoctoral researcher
Office: Joseph-Stelzmann 9b, D-50931 Köln / Cologne
Postal address: Postfach 41 06 23, D-50866 Köln / Cologne
Tel.:<a href="tel:+49%20221%20496" value="+49221496" target="_blank" class="">+49 (0)221 496 
Mobile:           +49 0221 37970 496


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org























<example.pdf>


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org