Resizing EBS on Amazon

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Resizing EBS on Amazon

Scooter Willis-2
Used cloud man to create a new cluster on Feb 22 and picked 500GB as the initial size of the data drive. Working with TCGA exome DNA seq data didn't take long to fill that up. Used the cloud man admin interface to resize from 500GB to 1TB and the resize operation took 15 hours. Not sure if that is expected so wanted to give some heads up in case that is an area for optimization.

Since I now have a local storage problem as I need to work with more than 1TB of data I tried to go the route of setting a S3 bucket using Fuse. Ran into a problem where the first s3fs software I tried to install had a version issue with Ubuntu 10. 

I remember something in a support email that better support for Amazon S3 was in the works. Can you provide any guidance or thoughts on how to work with more than 1TB of data using cost effective S3 versus expensive EBS? The same applies for storing results at S3.

With s3fs the file system can hide many of the complexities of moving files back and forth with caching where working with 30GB+ files isn't going to be fun.

Thanks

Scooter



___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/
Reply | Threaded
Open this post in threaded view
|

Re: Resizing EBS on Amazon

Dannon Baker-2
Unfortunately the snapshotting process used by the is really slow, as you've noticed, and there's not much we can do to make that faster.  We've discussed other methods for growing the filesystem but haven't finished it yet.

For working with S3, instead of s3fs you're more than welcome to give Galaxy's S3ObjectStore a shot.  There isn't much documentation available for it right now, and I'd still say it's a beta feature in need of more testing and optimization, but to enable it define the following options in your universe_wsgi.ini:

# Object store mode (valid options are: disk, s3, distributed, hierarchical)
#object_store = s3
#aws_access_key = <your access key>
#aws_secret_key = <your secret key>
#s3_bucket = <a bucket name for all your files>
#use_reduced_redundancy = True

# Size (in GB) that the cache used by object store should be limited to.
# If the value is not specified, the cache size will be limited only by the file
# system size.
object_store_cache_size = <decide based on the size of the EBS volume you want to use as scratch>

What this will do is use the EBS volume for working, exactly like galaxy does currently.  Additionally, it'll push datasets to S3 and delete them from local disk as necessary (last touched deleted first) to stay beneath object_store_cache_size.  If something swaps out, it's just fetched back from S3 as needed, but most of the time you'll be working on disk directly and pushing to S3 in the background.

I'd be more than happy to work with you if you run into any issues trying this out, this is something we've wanted to firm up for a while now and having a real live test case would be useful!

-Dannon


On Tue, Feb 26, 2013 at 7:24 PM, Scooter Willis <[hidden email]> wrote:
Used cloud man to create a new cluster on Feb 22 and picked 500GB as the initial size of the data drive. Working with TCGA exome DNA seq data didn't take long to fill that up. Used the cloud man admin interface to resize from 500GB to 1TB and the resize operation took 15 hours. Not sure if that is expected so wanted to give some heads up in case that is an area for optimization.

Since I now have a local storage problem as I need to work with more than 1TB of data I tried to go the route of setting a S3 bucket using Fuse. Ran into a problem where the first s3fs software I tried to install had a version issue with Ubuntu 10. 

I remember something in a support email that better support for Amazon S3 was in the works. Can you provide any guidance or thoughts on how to work with more than 1TB of data using cost effective S3 versus expensive EBS? The same applies for storing results at S3.

With s3fs the file system can hide many of the complexities of moving files back and forth with caching where working with 30GB+ files isn't going to be fun.

Thanks

Scooter



___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/