Quantcast

subtract

classic Classic list List threaded Threaded
2 messages Options
| Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

subtract

Xianrong Wong
Hello, I am using the subtract (whole dataset) tool.  I converted my fastq file to tabular with 2 columns:  1. Identifier and 2. sequence.  I then "selected (a few) lines that match an expression" from this initial tabular file and am trying to get a final dataset that is devoid of reads with the few selected lines - thus I subtract the dataset of selected lines from the initial dataset.  This tool works with I am performing the workflow on a relatively small file (1/50 the size of a whole sequencing experiment) but repeatly fails when I input the full fastq file.  Any idea why this is so?
 
Jose

___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/
| Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: subtract

Jennifer Jackson
Hello,

Using the 'Subtract' tool between FASTQ datasets can be memory intensive
since it literally involves sorting and then comparing each character
between the two files. This is likely not necessary. I have seen queries
such as yours run successfully on even very large datasets by
eliminating the Subtract step and instead using a 'Select' with "NOT
Matching' on the original dataset.

Example:

current dataflow:
1 - original file A
2 - select positive match expression 'X' to create file B
3 - subtract file B from file A to create file C

better:
1 - original file A
2 - select negative match expression 'X' to create file C

If this failure is on the public main Galaxy server and you do not wish
to change your query, then moving to a cloud instance and experimenting
with larger memory options is one suggestion: http://usegalaxy.org/cloud

Hopefully this helps,

Jen
Galaxy team

On 4/29/12 6:16 PM, Xianrong Wong wrote:

> Hello, I am using the subtract (whole dataset) tool.  I converted my
> fastq file to tabular with 2 columns:  1. Identifier and 2. sequence.  I
> then "selected (a few) lines that match an expression" from this initial
> tabular file and am trying to get a final dataset that is devoid of
> reads with the few selected lines - thus I subtract the dataset of
> selected lines from the initial dataset.  This tool works with I am
> performing the workflow on a relatively small file (1/50 the size of a
> whole sequencing experiment) but repeatly fails when I input the full
> fastq file.  Any idea why this is so?
> Jose
>
>
> ___________________________________________________________
> The Galaxy User list should be used for the discussion of
> Galaxy analysis and other features on the public server
> at usegalaxy.org.  Please keep all replies on the list by
> using "reply all" in your mail client.  For discussion of
> local Galaxy instances and the Galaxy source code, please
> use the Galaxy Development list:
>
>    http://lists.bx.psu.edu/listinfo/galaxy-dev
>
> To manage your subscriptions to this and other Galaxy lists,
> please use the interface at:
>
>    http://lists.bx.psu.edu/

--
Jennifer Jackson
http://galaxyproject.org
___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/
Loading...