Data storage

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Data storage

Fanny Coffin
Hi,

I'm trying to evaluate the possibility to use Galaxy on our production
environment for NGS data.

And I've a question about the data storage. So, NGS provides huge files
that we store on our servers in a specific folder organisation. By using
Galaxy, these files have to be uploaded (in order to fill in the
database with information like the first lines, the fields...). But I'm
wondering whether these files necessarily have to be imported in the
Galaxy workspace or whether they can just be linked? My question comes
from the fact that we absolutely would like to avoid data duplication.

Could you please enlighten me about that?

Thanks in advance.

Cordially.

Fanny COFFIN

_______________________________________________
galaxy-user mailing list
[hidden email]
http://lists.bx.psu.edu/listinfo/galaxy-user
Reply | Threaded
Open this post in threaded view
|

Re: Data storage

Davide Cittaro
Hi, 

On Aug 26, 2010, at 3:22 PM, Fanny Coffin wrote:

Hi,

I'm trying to evaluate the possibility to use Galaxy on our production
environment for NGS data.

And I've a question about the data storage. So, NGS provides huge files
that we store on our servers in a specific folder organisation. By using
Galaxy, these files have to be uploaded (in order to fill in the
database with information like the first lines, the fields...). But I'm
wondering whether these files necessarily have to be imported in the
Galaxy workspace or whether they can just be linked? My question comes
from the fact that we absolutely would like to avoid data duplication.

Could you please enlighten me about that?


AFAIK most of the data will be duplicated in uploading/importing. I suggest you to deploy galaxy on a filesystem that has deduplication capabilities.
I've successfully installed galaxy on Nexenta CP3 + ZFS (waiting for Illumos). Recent ZFS builds support deduplication and compression.
HTH
d


/*
Davide Cittaro

Cogentech - Consortium for Genomic Technologies
via adamello, 16
20139 Milano
Italy

tel.: +39(02)574303007
*/




_______________________________________________
galaxy-user mailing list
[hidden email]
http://lists.bx.psu.edu/listinfo/galaxy-user
Reply | Threaded
Open this post in threaded view
|

Re: Data storage

Greg Von Kuster
In reply to this post by Fanny Coffin
Hello Fanny,

You should upload your files to a Galaxy Data Library as the upload form for data libraries allows you to upload directories of files or files from filesystem paths.  Either of these options allows you to not make copies of your files.  See our wiki at http://bitbucket.org/galaxy/galaxy-central/wiki/DataLibraries/UploadingFiles for details on options for uploading files to a data library.  For details about data libraries, see our wiki at http://bitbucket.org/galaxy/galaxy-central/wiki/DataLibraries/Libraries.


On Aug 26, 2010, at 9:22 AM, Fanny Coffin wrote:

> Hi,
>
> I'm trying to evaluate the possibility to use Galaxy on our production
> environment for NGS data.
>
> And I've a question about the data storage. So, NGS provides huge files
> that we store on our servers in a specific folder organisation. By using
> Galaxy, these files have to be uploaded (in order to fill in the
> database with information like the first lines, the fields...). But I'm
> wondering whether these files necessarily have to be imported in the
> Galaxy workspace or whether they can just be linked? My question comes
> from the fact that we absolutely would like to avoid data duplication.
>
> Could you please enlighten me about that?
>
> Thanks in advance.
>
> Cordially.
>
> Fanny COFFIN
>
> _______________________________________________
> galaxy-user mailing list
> [hidden email]
> http://lists.bx.psu.edu/listinfo/galaxy-user

Greg Von Kuster
Galaxy Development Team
[hidden email]




_______________________________________________
galaxy-user mailing list
[hidden email]
http://lists.bx.psu.edu/listinfo/galaxy-user
Reply | Threaded
Open this post in threaded view
|

Re: Data storage

Fanny Coffin
Thanks a lot for your quick answers Greg and Davide! So it looks like
that's completely feasible because we have a ZFS.
And about the user permissions on file defined on our fileserver : if I
well understood, the actual Galaxy version doesn't take them into
account, but it could be possible in further developments?

Greg Von Kuster a écrit :

> Hello Fanny,
>
> You should upload your files to a Galaxy Data Library as the upload form for data libraries allows you to upload directories of files or files from filesystem paths.  Either of these options allows you to not make copies of your files.  See our wiki at http://bitbucket.org/galaxy/galaxy-central/wiki/DataLibraries/UploadingFiles for details on options for uploading files to a data library.  For details about data libraries, see our wiki at http://bitbucket.org/galaxy/galaxy-central/wiki/DataLibraries/Libraries.
>
>
> On Aug 26, 2010, at 9:22 AM, Fanny Coffin wrote:
>
>  
>> Hi,
>>
>> I'm trying to evaluate the possibility to use Galaxy on our production
>> environment for NGS data.
>>
>> And I've a question about the data storage. So, NGS provides huge files
>> that we store on our servers in a specific folder organisation. By using
>> Galaxy, these files have to be uploaded (in order to fill in the
>> database with information like the first lines, the fields...). But I'm
>> wondering whether these files necessarily have to be imported in the
>> Galaxy workspace or whether they can just be linked? My question comes
>> from the fact that we absolutely would like to avoid data duplication.
>>
>> Could you please enlighten me about that?
>>
>> Thanks in advance.
>>
>> Cordially.
>>
>> Fanny COFFIN
>>
>> _______________________________________________
>> galaxy-user mailing list
>> [hidden email]
>> http://lists.bx.psu.edu/listinfo/galaxy-user
>>    
>
> Greg Von Kuster
> Galaxy Development Team
> [hidden email]
>
>
>
>
>
>  

_______________________________________________
galaxy-user mailing list
[hidden email]
http://lists.bx.psu.edu/listinfo/galaxy-user