Re: [galaxy-dev] Metadata indexing slow on large files
Dennis Gascoigne wrote:
> We are working with some very large sequence libraries (paired end 70M
> reads+ each end - 15Gbx2). We already know what the file types are and
> that they are appropriate for the pipeline. There seems to be a large
> amount of the processing effort expended after the completion after each
> step in the workflow analysing the files and determining their
> attributes - this is related to the size of the files (which are large
> at every step) and is of no practical use to us, except maybe on the
> final step.
> Is there any way to suppress post processing steps and simply accept the
> file as specified in the tool output tags? How can we reduce or
> eliminate verification/indexing on metadata tags - what implications
> should we be aware of.
To help us determine how best to address this can you provide (for the
datatypes you're using), specifically which metadata is unnecessary?
In the coming week or two we'll be making things like line/sequence
count administratively optional, which would probably solve much of this.