MAKER on AWS

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

MAKER on AWS

DECKER, KEITH F [AG/1005]

I would like to evaluate the use of MAKER on AWS, but I am unsure what the best approach to parallelization would be.

I found this old post on STARCLUSTER, http://efish.integrativebiology.msu.edu/2015/02/10/annotate.html

but my understanding is that STARCLUSTER and its successors (cfncluster and parallel cluster) can be challenging to set up and use. 

 

So my questions are

 

1.  Has anyone had recent success running MAKER on cfncluster or parallel cluster in AWS?

2.  Would it be reasonable to just split up N chromosomes across N ECS instances and collect the results at the end?  If so, does it make sense to run each chromosome level annotation on for example an m4.16xlarge instance with 64 cores and 256 GB of RAM? Or is there a maximum number of cores at which the benefits from parallelization saturate?

 

Thanks and sorry for the long question

Keith



This system contains confidential and copyrighted information.  Access to the system is limited to users only and only for approved business purposes.
Anyone obtaining access to and using this system acknowledges that all information on this system including but not limited to electronic mail, word processing, directories and files, constitutes private property belonging to the Company.
Anyone using of viewing this system is further advised that the use of this system may be recorded and the information contained herein may be monitored, retrieved and reviewed if, in the Company’s sole discretion there is a business reason to do so.
If improper activity or use is suspected, all available information may be used by the Company for possible disciplinary action, prosecution, civil claim or any remedy or lawful purpose.



_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: MAKER on AWS

Carson Holt-2
You can try and stand up a cluster inside AWS, or like you said just start independent instances each with their own piece of the total dataset. There is a tools called fasta_tool inside of maker that makes it easy to split up the dataset into equal sized chunks.

Alternatively, CyVerse has set up an interesting MAKER wrapper (WQ-MAKER) that launches multiple cloud instances for MAKER and handles data chunking for you (they’ve been using XSEDE cloud resources through the NSF)  —>


—Carson




On Feb 4, 2019, at 11:09 AM, DECKER, KEITH F [AG/1005] <[hidden email]> wrote:

I would like to evaluate the use of MAKER on AWS, but I am unsure what the best approach to parallelization would be.
but my understanding is that STARCLUSTER and its successors (cfncluster and parallel cluster) can be challenging to set up and use. 
 
So my questions are
 
1.  Has anyone had recent success running MAKER on cfncluster or parallel cluster in AWS?
2.  Would it be reasonable to just split up N chromosomes across N ECS instances and collect the results at the end?  If so, does it make sense to run each chromosome level annotation on for example an m4.16xlarge instance with 64 cores and 256 GB of RAM? Or is there a maximum number of cores at which the benefits from parallelization saturate?
 
Thanks and sorry for the long question
Keith


This system contains confidential and copyrighted information.  Access to the system is limited to users only and only for approved business purposes.
Anyone obtaining access to and using this system acknowledges that all information on this system including but not limited to electronic mail, word processing, directories and files, constitutes private property belonging to the Company.
Anyone using of viewing this system is further advised that the use of this system may be recorded and the information contained herein may be monitored, retrieved and reviewed if, in the Company’s sole discretion there is a business reason to do so.
If improper activity or use is suspected, all available information may be used by the Company for possible disciplinary action, prosecution, civil claim or any remedy or lawful purpose.


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: MAKER on AWS

DECKER, KEITH F [AG/1005]

Thanks,

Do you have metrics on how MAKER performs on annotating a single chromosome on a single machine?  For example, will I see anything close to 16X speed-up using a 16 core machine, and does performance improvement saturate at a certain number of cores?

 

-Keith

 

From: Carson Holt <[hidden email]>
Date: Monday, February 4, 2019 at 12:33 PM
To: "DECKER, KEITH F [AG/1005]" <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: [maker-devel] MAKER on AWS

 

You can try and stand up a cluster inside AWS, or like you said just start independent instances each with their own piece of the total dataset. There is a tools called fasta_tool inside of maker that makes it easy to split up the dataset into equal sized chunks.

 

Alternatively, CyVerse has set up an interesting MAKER wrapper (WQ-MAKER) that launches multiple cloud instances for MAKER and handles data chunking for you (they’ve been using XSEDE cloud resources through the NSF)  —>

 

 

—Carson

 

 

 



On Feb 4, 2019, at 11:09 AM, DECKER, KEITH F [AG/1005] <[hidden email]> wrote:

 

I would like to evaluate the use of MAKER on AWS, but I am unsure what the best approach to parallelization would be.

but my understanding is that STARCLUSTER and its successors (cfncluster and parallel cluster) can be challenging to set up and use. 

 

So my questions are

 

1.  Has anyone had recent success running MAKER on cfncluster or parallel cluster in AWS?

2.  Would it be reasonable to just split up N chromosomes across N ECS instances and collect the results at the end?  If so, does it make sense to run each chromosome level annotation on for example an m4.16xlarge instance with 64 cores and 256 GB of RAM? Or is there a maximum number of cores at which the benefits from parallelization saturate?

 

Thanks and sorry for the long question

Keith



 
This system contains confidential and copyrighted information.  Access to the system is limited to users only and only for approved business purposes.
Anyone obtaining access to and using this system acknowledges that all information on this system including but not limited to electronic mail, word processing, directories and files, constitutes private property belonging to the Company.
Anyone using of viewing this system is further advised that the use of this system may be recorded and the information contained herein may be monitored, retrieved and reviewed if, in the Company’s sole discretion there is a business reason to do so.
If improper activity or use is suspected, all available information may be used by the Company for possible disciplinary action, prosecution, civil claim or any remedy or lawful purpose.
 


_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

 



This system contains confidential and copyrighted information.  Access to the system is limited to users only and only for approved business purposes.
Anyone obtaining access to and using this system acknowledges that all information on this system including but not limited to electronic mail, word processing, directories and files, constitutes private property belonging to the Company.
Anyone using of viewing this system is further advised that the use of this system may be recorded and the information contained herein may be monitored, retrieved and reviewed if, in the Company’s sole discretion there is a business reason to do so.
If improper activity or use is suspected, all available information may be used by the Company for possible disciplinary action, prosecution, civil claim or any remedy or lawful purpose.



_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
Reply | Threaded
Open this post in threaded view
|

Re: MAKER on AWS

Carson Holt-2
I don’t have cloud performance stats, but I do have cluster performance stats you may be able to somewhat correlate (attached). On a cluster we see nearly linear performance gains until ~100 CPU cores, and the plateau doesn’t fully level out until well after 600 cores (we are hitting IO and networking limits for inter-node communication). So if you are only using a single instance, you can essentially consider it the equivalent of a single real machine which would fall well under 100 CPU cores, and performance growth would be expected to be linear on that instance.

—Carson




On Feb 4, 2019, at 11:39 AM, DECKER, KEITH F [AG/1005] <[hidden email]> wrote:

Thanks,
Do you have metrics on how MAKER performs on annotating a single chromosome on a single machine?  For example, will I see anything close to 16X speed-up using a 16 core machine, and does performance improvement saturate at a certain number of cores?
 
-Keith
 
From: Carson Holt <[hidden email]>
Date: Monday, February 4, 2019 at 12:33 PM
To: "DECKER, KEITH F [AG/1005]" <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: [maker-devel] MAKER on AWS
 
You can try and stand up a cluster inside AWS, or like you said just start independent instances each with their own piece of the total dataset. There is a tools called fasta_tool inside of maker that makes it easy to split up the dataset into equal sized chunks. 
 
Alternatively, CyVerse has set up an interesting MAKER wrapper (WQ-MAKER) that launches multiple cloud instances for MAKER and handles data chunking for you (they’ve been using XSEDE cloud resources through the NSF)  —>
 
 
—Carson
 
 
 


On Feb 4, 2019, at 11:09 AM, DECKER, KEITH F [AG/1005] <[hidden email]> wrote:
 
I would like to evaluate the use of MAKER on AWS, but I am unsure what the best approach to parallelization would be.
but my understanding is that STARCLUSTER and its successors (cfncluster and parallel cluster) can be challenging to set up and use. 
 
So my questions are
 
1.  Has anyone had recent success running MAKER on cfncluster or parallel cluster in AWS?
2.  Would it be reasonable to just split up N chromosomes across N ECS instances and collect the results at the end?  If so, does it make sense to run each chromosome level annotation on for example an m4.16xlarge instance with 64 cores and 256 GB of RAM? Or is there a maximum number of cores at which the benefits from parallelization saturate?
 
Thanks and sorry for the long question
Keith


 
This system contains confidential and copyrighted information.  Access to the system is limited to users only and only for approved business purposes.
Anyone obtaining access to and using this system acknowledges that all information on this system including but not limited to electronic mail, word processing, directories and files, constitutes private property belonging to the Company.
Anyone using of viewing this system is further advised that the use of this system may be recorded and the information contained herein may be monitored, retrieved and reviewed if, in the Company’s sole discretion there is a business reason to do so.
If improper activity or use is suspected, all available information may be used by the Company for possible disciplinary action, prosecution, civil claim or any remedy or lawful purpose.
 

_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
 


This system contains confidential and copyrighted information.  Access to the system is limited to users only and only for approved business purposes.
Anyone obtaining access to and using this system acknowledges that all information on this system including but not limited to electronic mail, word processing, directories and files, constitutes private property belonging to the Company.
Anyone using of viewing this system is further advised that the use of this system may be recorded and the information contained herein may be monitored, retrieved and reviewed if, in the Company’s sole discretion there is a business reason to do so.
If improper activity or use is suspected, all available information may be used by the Company for possible disciplinary action, prosecution, civil claim or any remedy or lawful purpose.




_______________________________________________
maker-devel mailing list
[hidden email]
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

PastedGraphic-2.pdf (55K) Download Attachment