Maker annotation of large scaffolds

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Maker annotation of large scaffolds

Aravind PRASAD

Hi All,

 

I’m trying to annotate a fish genome using Maker pipeline. It could finish the annotation for maximum scaffolds except 5 of them which are of size around 100M base pairs. The current clusters in our institute has a time limit of 24hrs for a job and these scaffolds could not be annotated with in that time.

Can you please suggest any other way of finishing the annotation for large scaffolds?

 

I thought of chunking up the scaffolds to run, but, I’m afraid that would split a gene into two.

Thanks for your time.

 

Regards,

[hidden email] :: Research Officer :: Comparative and Medical Genomics Lab :: Institue of Molecular and Cell Biology (IMCB) :: Agency for Science, Technology and Research (A*STAR)

61 Biopolis Drive :: #5-04 Proteos :: Singapore 138673:: DID (+65) 6586 9573 :: Fax (+65) 6779 1117 :: http://www.imcb.a-star.edu.sg/

 

2    

 

 



Note: This message may contain confidential information. If this Email/Fax has been sent to you by mistake, please notify the sender and delete it immediately. Thank you.

_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Maker annotation of large scaffolds

Seth_m55
I would think splitting could work if you generate a sufficient overlap.  IE 1-100k, 50-150k, etc.  Reassembling the annotations for the overlap regions may be tricky if you get conflicting annotations though.

Seth Munholland, B.Sc.
Department of Biological Sciences
Rm. 304 Biology Building
University of Windsor
401 Sunset Ave. N9B 3P4
T: (519) 253-3000 Ext: 4755

On Thu, Jun 22, 2017 at 2:39 AM, Aravind PRASAD <[hidden email]> wrote:

Hi All,

 

I’m trying to annotate a fish genome using Maker pipeline. It could finish the annotation for maximum scaffolds except 5 of them which are of size around 100M base pairs. The current clusters in our institute has a time limit of 24hrs for a job and these scaffolds could not be annotated with in that time.

Can you please suggest any other way of finishing the annotation for large scaffolds?

 

I thought of chunking up the scaffolds to run, but, I’m afraid that would split a gene into two.

Thanks for your time.

 

Regards,

[hidden email] :: Research Officer :: Comparative and Medical Genomics Lab :: Institue of Molecular and Cell Biology (IMCB) :: Agency for Science, Technology and Research (A*STAR)

61 Biopolis Drive :: #5-04 Proteos :: Singapore 138673:: DID <a href="tel:+65%206586%209573" value="+6565869573" target="_blank">(+65) 6586 9573 :: Fax <a href="tel:+65%206779%201117" value="+6567791117" target="_blank">(+65) 6779 1117 :: http://www.imcb.a-star.edu.sg/

 

2    

 

 



Note: This message may contain confidential information. If this Email/Fax has been sent to you by mistake, please notify the sender and delete it immediately. Thank you.

_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org



_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Maker annotation of large scaffolds

Carson Holt-2
If running under MPI, the only step that should take a long time would be a final clustering step (the clustering is not parallelized). It should run in well under 24 hours though, so perhaps it is a memory issue or a feature depth issue. You can try running the contig by itself and setting all the bast_depth parameters in maker_bopts.ctl to 10 to help both.

Otherwise making a large overlap for subdivided contigs (50-100kb) should be enough. Alternatively look for streches of NNNNNN’s in the contig and split on those.

—Carson



On Jun 22, 2017, at 9:43 AM, Seth Munholland <[hidden email]> wrote:

I would think splitting could work if you generate a sufficient overlap.  IE 1-100k, 50-150k, etc.  Reassembling the annotations for the overlap regions may be tricky if you get conflicting annotations though.

Seth Munholland, B.Sc.
Department of Biological Sciences
Rm. 304 Biology Building
University of Windsor
401 Sunset Ave. N9B 3P4
T: (519) 253-3000 Ext: 4755

On Thu, Jun 22, 2017 at 2:39 AM, Aravind PRASAD <[hidden email]> wrote:

Hi All,

 

I’m trying to annotate a fish genome using Maker pipeline. It could finish the annotation for maximum scaffolds except 5 of them which are of size around 100M base pairs. The current clusters in our institute has a time limit of 24hrs for a job and these scaffolds could not be annotated with in that time.

Can you please suggest any other way of finishing the annotation for large scaffolds?

 

I thought of chunking up the scaffolds to run, but, I’m afraid that would split a gene into two.

Thanks for your time.

 

Regards,

[hidden email] :: Research Officer :: Comparative and Medical Genomics Lab :: Institue of Molecular and Cell Biology (IMCB) :: Agency for Science, Technology and Research (A*STAR)

61 Biopolis Drive :: #5-04 Proteos :: Singapore 138673:: DID <a href="tel:+65%206586%209573" value="+6565869573" target="_blank" class="">(+65) 6586 9573 :: Fax <a href="tel:+65%206779%201117" value="+6567791117" target="_blank" class="">(+65) 6779 1117 :: http://www.imcb.a-star.edu.sg/

 

<image002.png>    

 
 


Note: This message may contain confidential information. If this Email/Fax has been sent to you by mistake, please notify the sender and delete it immediately. Thank you.

_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Maker annotation of large scaffolds

Aravind PRASAD

Hi All,

 

Thank you for your inputs. Currently, I’m not using the MPI version but running Maker in multiple instances. Previously, I tried to run the MPI version but failed. Though the installation had no issues with MPI-Maker.

 

Carson, Can you please explain what exactly does the blast_depth option does while running Maker?

 

Thank you all for your time!

 

Regards,

Aravind.

 

 

From: Carson Holt [mailto:[hidden email]]
Sent: Friday, 23 June, 2017 12:06 PM
To: Seth Munholland
Cc: Aravind PRASAD; [hidden email]
Subject: Re: [maker-devel] Maker annotation of large scaffolds

 

If running under MPI, the only step that should take a long time would be a final clustering step (the clustering is not parallelized). It should run in well under 24 hours though, so perhaps it is a memory issue or a feature depth issue. You can try running the contig by itself and setting all the bast_depth parameters in maker_bopts.ctl to 10 to help both.

 

Otherwise making a large overlap for subdivided contigs (50-100kb) should be enough. Alternatively look for streches of NNNNNN’s in the contig and split on those.

 

—Carson

 

 

 

On Jun 22, 2017, at 9:43 AM, Seth Munholland <[hidden email]> wrote:

 

I would think splitting could work if you generate a sufficient overlap.  IE 1-100k, 50-150k, etc.  Reassembling the annotations for the overlap regions may be tricky if you get conflicting annotations though.


Seth Munholland, B.Sc.

Department of Biological Sciences
Rm. 304 Biology Building
University of Windsor
401 Sunset Ave. N9B 3P4
T: (519) 253-3000 Ext: 4755

 

On Thu, Jun 22, 2017 at 2:39 AM, Aravind PRASAD <[hidden email]> wrote:

Hi All,

 

I’m trying to annotate a fish genome using Maker pipeline. It could finish the annotation for maximum scaffolds except 5 of them which are of size around 100M base pairs. The current clusters in our institute has a time limit of 24hrs for a job and these scaffolds could not be annotated with in that time.

Can you please suggest any other way of finishing the annotation for large scaffolds?

 

I thought of chunking up the scaffolds to run, but, I’m afraid that would split a gene into two.

Thanks for your time.

 

Regards,

[hidden email] :: Research Officer :: Comparative and Medical Genomics Lab :: Institue of Molecular and Cell Biology (IMCB) :: Agency for Science, Technology and Research (A*STAR)

61 Biopolis Drive :: #5-04 Proteos :: Singapore 138673:: DID <a href="tel:&#43;65%206586%209573" target="_blank">(+65) 6586 9573 :: Fax <a href="tel:&#43;65%206779%201117" target="_blank"> (+65) 6779 1117 :: http://www.imcb.a-star.edu.sg/

 

<image002.png>    

 

 



Note: This message may contain confidential information. If this Email/Fax has been sent to you by mistake, please notify the sender and delete it immediately. Thank you.


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

 

_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

 



Note: This message may contain confidential information. If this Email/Fax has been sent to you by mistake, please notify the sender and delete it immediately. Thank you.

_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Maker annotation of large scaffolds

Carson Holt-2
All results are kept and placed into the final GFF3 unless you set blast_depth. Basically, most alignments are redundant and can be thrown away early in the process. But maker does not do this by default because most users tend to want to see all evidence. MAKER only uses 10 alignments to build it’s calculations anyways. The rest are just kept for reference. But the cost of keeping the other alignments around can be substantial if you are in a region with deep evidence depth (I’ve seen regions with 5,000 - 10,000 evidence alignments for some datasets). So if you set blast_depth, it tells MAKER you are ok with throwing out the extra depth early (MAKER still parses all alignments it just throws extra ones away as it determines they are not useful or are redundant). This saves a lot of time and RAM downstream at the cost of losing the alignments in the report. A depth of 10 means that no more than 10 alignments per data source will be kept per locus.

—Carson



On Jun 23, 2017, at 2:25 AM, Aravind PRASAD <[hidden email]> wrote:

Hi All,

 

Thank you for your inputs. Currently, I’m not using the MPI version but running Maker in multiple instances. Previously, I tried to run the MPI version but failed. Though the installation had no issues with MPI-Maker.

 

Carson, Can you please explain what exactly does the blast_depth option does while running Maker?

 

Thank you all for your time!

 

Regards,
Aravind.

 

 

From: Carson Holt [[hidden email]] 
Sent: Friday, 23 June, 2017 12:06 PM
To: Seth Munholland
Cc: Aravind PRASAD; [hidden email]
Subject: Re: [maker-devel] Maker annotation of large scaffolds

 

If running under MPI, the only step that should take a long time would be a final clustering step (the clustering is not parallelized). It should run in well under 24 hours though, so perhaps it is a memory issue or a feature depth issue. You can try running the contig by itself and setting all the bast_depth parameters in maker_bopts.ctl to 10 to help both.

 

Otherwise making a large overlap for subdivided contigs (50-100kb) should be enough. Alternatively look for streches of NNNNNN’s in the contig and split on those.

 

—Carson

 

 

 

On Jun 22, 2017, at 9:43 AM, Seth Munholland <[hidden email]> wrote:

 

I would think splitting could work if you generate a sufficient overlap.  IE 1-100k, 50-150k, etc.  Reassembling the annotations for the overlap regions may be tricky if you get conflicting annotations though.

Seth Munholland, B.Sc.
Department of Biological Sciences
Rm. 304 Biology Building
University of Windsor
401 Sunset Ave. N9B 3P4
T: (519) 253-3000 Ext: 4755

 

On Thu, Jun 22, 2017 at 2:39 AM, Aravind PRASAD <[hidden email]> wrote:
Hi All,

 

I’m trying to annotate a fish genome using Maker pipeline. It could finish the annotation for maximum scaffolds except 5 of them which are of size around 100M base pairs. The current clusters in our institute has a time limit of 24hrs for a job and these scaffolds could not be annotated with in that time.
Can you please suggest any other way of finishing the annotation for large scaffolds?

 

I thought of chunking up the scaffolds to run, but, I’m afraid that would split a gene into two.
Thanks for your time.

 

Regards,
[hidden email] :: Research Officer :: Comparative and Medical Genomics Lab :: Institue of Molecular and Cell Biology (IMCB) :: Agency for Science, Technology and Research (A*STAR)
61 Biopolis Drive :: #5-04 Proteos :: Singapore 138673:: DID <a href="tel:+65%206586%209573" target="_blank" style="color: purple; text-decoration: underline;" class="">(+65) 6586 9573 :: Fax <a href="tel:+65%206779%201117" target="_blank" style="color: purple; text-decoration: underline;" class="">(+65) 6779 1117 :: http://www.imcb.a-star.edu.sg/

 

<image002.png>     

 

 



Note: This message may contain confidential information. If this Email/Fax has been sent to you by mistake, please notify the sender and delete it immediately. Thank you.


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

 

_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

 



Note: This message may contain confidential information. If this Email/Fax has been sent to you by mistake, please notify the sender and delete it immediately. Thank you.


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Maker annotation of large scaffolds

Carson Holt-2
In reply to this post by Aravind PRASAD
Also you can run MPI within a single node and not across nodes. This will still give a performance bonus equal to the MPI process count

—Carson


On Jun 23, 2017, at 2:25 AM, Aravind PRASAD <[hidden email]> wrote:

Hi All,

 

Thank you for your inputs. Currently, I’m not using the MPI version but running Maker in multiple instances. Previously, I tried to run the MPI version but failed. Though the installation had no issues with MPI-Maker.

 

Carson, Can you please explain what exactly does the blast_depth option does while running Maker?

 

Thank you all for your time!

 

Regards,
Aravind.

 

 

From: Carson Holt [[hidden email]] 
Sent: Friday, 23 June, 2017 12:06 PM
To: Seth Munholland
Cc: Aravind PRASAD; [hidden email]
Subject: Re: [maker-devel] Maker annotation of large scaffolds

 

If running under MPI, the only step that should take a long time would be a final clustering step (the clustering is not parallelized). It should run in well under 24 hours though, so perhaps it is a memory issue or a feature depth issue. You can try running the contig by itself and setting all the bast_depth parameters in maker_bopts.ctl to 10 to help both.

 

Otherwise making a large overlap for subdivided contigs (50-100kb) should be enough. Alternatively look for streches of NNNNNN’s in the contig and split on those.

 

—Carson

 

 

 

On Jun 22, 2017, at 9:43 AM, Seth Munholland <[hidden email]> wrote:

 

I would think splitting could work if you generate a sufficient overlap.  IE 1-100k, 50-150k, etc.  Reassembling the annotations for the overlap regions may be tricky if you get conflicting annotations though.

Seth Munholland, B.Sc.
Department of Biological Sciences
Rm. 304 Biology Building
University of Windsor
401 Sunset Ave. N9B 3P4
T: (519) 253-3000 Ext: 4755

 

On Thu, Jun 22, 2017 at 2:39 AM, Aravind PRASAD <[hidden email]> wrote:
Hi All,

 

I’m trying to annotate a fish genome using Maker pipeline. It could finish the annotation for maximum scaffolds except 5 of them which are of size around 100M base pairs. The current clusters in our institute has a time limit of 24hrs for a job and these scaffolds could not be annotated with in that time.
Can you please suggest any other way of finishing the annotation for large scaffolds?

 

I thought of chunking up the scaffolds to run, but, I’m afraid that would split a gene into two.
Thanks for your time.

 

Regards,
[hidden email] :: Research Officer :: Comparative and Medical Genomics Lab :: Institue of Molecular and Cell Biology (IMCB) :: Agency for Science, Technology and Research (A*STAR)
61 Biopolis Drive :: #5-04 Proteos :: Singapore 138673:: DID <a href="tel:+65%206586%209573" target="_blank" style="color: purple; text-decoration: underline;" class="">(+65) 6586 9573 :: Fax <a href="tel:+65%206779%201117" target="_blank" style="color: purple; text-decoration: underline;" class="">(+65) 6779 1117 :: http://www.imcb.a-star.edu.sg/

 

<image002.png>     

 

 



Note: This message may contain confidential information. If this Email/Fax has been sent to you by mistake, please notify the sender and delete it immediately. Thank you.


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

 

_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

 



Note: This message may contain confidential information. If this Email/Fax has been sent to you by mistake, please notify the sender and delete it immediately. Thank you.


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Maker annotation of large scaffolds

Aravind PRASAD
In reply to this post by Carson Holt-2

Thank you Carson for the explanation. The issue is now resolved for the annotation of large scaffolds with the use of MPI Maker as well as changing the blast_depth option.

 

Aravind Prasad.

 

From: Carson Holt [mailto:[hidden email]]
Sent: Tuesday, 27 June, 2017 5:48 AM
To: Aravind PRASAD
Cc: Seth Munholland; [hidden email]
Subject: Re: [maker-devel] Maker annotation of large scaffolds

 

All results are kept and placed into the final GFF3 unless you set blast_depth. Basically, most alignments are redundant and can be thrown away early in the process. But maker does not do this by default because most users tend to want to see all evidence. MAKER only uses 10 alignments to build it’s calculations anyways. The rest are just kept for reference. But the cost of keeping the other alignments around can be substantial if you are in a region with deep evidence depth (I’ve seen regions with 5,000 - 10,000 evidence alignments for some datasets). So if you set blast_depth, it tells MAKER you are ok with throwing out the extra depth early (MAKER still parses all alignments it just throws extra ones away as it determines they are not useful or are redundant). This saves a lot of time and RAM downstream at the cost of losing the alignments in the report. A depth of 10 means that no more than 10 alignments per data source will be kept per locus.

 

—Carson

 

 

 

On Jun 23, 2017, at 2:25 AM, Aravind PRASAD <[hidden email]> wrote:

 

Hi All,

 

Thank you for your inputs. Currently, I’m not using the MPI version but running Maker in multiple instances. Previously, I tried to run the MPI version but failed. Though the installation had no issues with MPI-Maker.

 

Carson, Can you please explain what exactly does the blast_depth option does while running Maker?

 

Thank you all for your time!

 

Regards,

Aravind.

 

 

From: Carson Holt [[hidden email]] 
Sent: Friday, 23 June, 2017 12:06 PM
To: Seth Munholland
Cc: Aravind PRASAD; [hidden email]
Subject: Re: [maker-devel] Maker annotation of large scaffolds

 

If running under MPI, the only step that should take a long time would be a final clustering step (the clustering is not parallelized). It should run in well under 24 hours though, so perhaps it is a memory issue or a feature depth issue. You can try running the contig by itself and setting all the bast_depth parameters in maker_bopts.ctl to 10 to help both.

 

Otherwise making a large overlap for subdivided contigs (50-100kb) should be enough. Alternatively look for streches of NNNNNN’s in the contig and split on those.

 

—Carson

 

 

 

On Jun 22, 2017, at 9:43 AM, Seth Munholland <[hidden email]> wrote:

 

I would think splitting could work if you generate a sufficient overlap.  IE 1-100k, 50-150k, etc.  Reassembling the annotations for the overlap regions may be tricky if you get conflicting annotations though.


Seth Munholland, B.Sc.

Department of Biological Sciences
Rm. 304 Biology Building
University of Windsor
401 Sunset Ave. N9B 3P4
T: (519) 253-3000 Ext: 4755

 

On Thu, Jun 22, 2017 at 2:39 AM, Aravind PRASAD <[hidden email]> wrote:

Hi All,

 

I’m trying to annotate a fish genome using Maker pipeline. It could finish the annotation for maximum scaffolds except 5 of them which are of size around 100M base pairs. The current clusters in our institute has a time limit of 24hrs for a job and these scaffolds could not be annotated with in that time.

Can you please suggest any other way of finishing the annotation for large scaffolds?

 

I thought of chunking up the scaffolds to run, but, I’m afraid that would split a gene into two.

Thanks for your time.

 

Regards,

[hidden email] :: Research Officer :: Comparative and Medical Genomics Lab :: Institue of Molecular and Cell Biology (IMCB) :: Agency for Science, Technology and Research (A*STAR)

61 Biopolis Drive :: #5-04 Proteos :: Singapore 138673:: DID <a href="tel:&#43;65%206586%209573" target="_blank">(+65) 6586 9573 :: Fax <a href="tel:&#43;65%206779%201117" target="_blank">(+65) 6779 1117 :: http://www.imcb.a-star.edu.sg/

 

<image002.png>     

 

 



Note: This message may contain confidential information. If this Email/Fax has been sent to you by mistake, please notify the sender and delete it immediately. Thank you.


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

 

_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

 



Note: This message may contain confidential information. If this Email/Fax has been sent to you by mistake, please notify the sender and delete it immediately. Thank you.

 



Note: This message may contain confidential information. If this Email/Fax has been sent to you by mistake, please notify the sender and delete it immediately. Thank you.

_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Loading...