Is it possible to exclude duplicate reads from BAM tracks?

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Is it possible to exclude duplicate reads from BAM tracks?

Keiran Raine
Is there a setting to handle this?

I've only just noticed this as we've started marking the duplicates rather than removing them, it's causing some confusion...

Keiran Raine
Principal Bioinformatician
Cancer Genome Project
Wellcome Trust Sanger Institute

Tel:+44 (0)1223 834244 Ext: 7703
Office: H104


-- The Wellcome Trust Sanger Institute is operated by Genome Rese arch Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.

------------------------------------------------------------------------------
Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
Discover the easy way to master current and previous Microsoft technologies
and advance your career. Get an incredible 1,500+ hours of step-by-step
tutorial videos with LearnDevNow. Subscribe today and save!
http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-gbrowse mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
Reply | Threaded
Open this post in threaded view
|

Re: Is it possible to exclude duplicate reads from BAM tracks?

Lincoln Stein
Hi Keiran,

There isn't a setting for doing this. However, if the duplicate flagging is implemented as a BAM tag, then you can use the filter option:

 [MY TRACK]
 filter = sub { my $feature = shift;
                    my $flags = $feature->get_tag_values('FLAGS');
                    return $flags !~ /FDUP/;
                  }

Lincoln


On Thu, Aug 29, 2013 at 10:14 AM, Keiran Raine <[hidden email]> wrote:
Is there a setting to handle this?

I've only just noticed this as we've started marking the duplicates rather than removing them, it's causing some confusion...

Keiran Raine
Principal Bioinformatician
Cancer Genome Project
Wellcome Trust Sanger Institute

Tel:<a href="tel:%2B44%20%280%291223%20834244%C2%A0Ext%3A%207703" value="+441223834244" target="_blank">+44 (0)1223 834244 Ext: 7703
Office: H104


-- The Wellcome Trust Sanger Institute is operated by Genome Rese arch Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.

------------------------------------------------------------------------------
Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
Discover the easy way to master current and previous Microsoft technologies
and advance your career. Get an incredible 1,500+ hours of step-by-step
tutorial videos with LearnDevNow. Subscribe today and save!
http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-gbrowse mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse




--
Lincoln D. Stein
Director, Informatics and Biocomputing Platform
Ontario Institute for Cancer Research
101 College St., Suite 800
Toronto, ON, Canada M5G0A3
416 673-8514
Assistant: Renata Musa <[hidden email]>

------------------------------------------------------------------------------
Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
Discover the easy way to master current and previous Microsoft technologies
and advance your career. Get an incredible 1,500+ hours of step-by-step
tutorial videos with LearnDevNow. Subscribe today and save!
http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-gbrowse mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
Reply | Threaded
Open this post in threaded view
|

Re: Is it possible to exclude duplicate reads from BAM tracks?

Keiran Raine
Hi Lincoln.

I've tried this in the track configuration (where bgcolor etc are set).

I'm finding that none of the tags are available, in fact all of the functions get_all_tags, get_tag_values, has_tag return nothing.

For example I can see that read in the region I am looking at have the tag 'FLAGS' as the gbrowse_details page shows them:


But if I make the function:

filter = sub {
  my $f = shift;
  warn $f->has_tag('FLAGS') ? 'y' : 'n';
  }

I can see that the function has been called but I get 'n' for every read in the error log.

Switching it out for:

filter = sub {
  my $f = shift;
  warn $f->display_name;
  }

Printed out all of the read names so it's partially populated.

GBrowse 2.55
Bio::DB::Sam 1.38
BioPerl 1.6.901
Bio::Graphics 2.37

Any ideas?

Keiran Raine
Principal Bioinformatician
Cancer Genome Project
Wellcome Trust Sanger Institute

Tel:+44 (0)1223 834244 Ext: 7703
Office: H104

On 29 Aug 2013, at 15:45, Lincoln Stein <[hidden email]> wrote:

Hi Keiran,

There isn't a setting for doing this. However, if the duplicate flagging is implemented as a BAM tag, then you can use the filter option:

 [MY TRACK]
 filter = sub { my $feature = shift;
                    my $flags = $feature->get_tag_values('FLAGS');
                    return $flags !~ /FDUP/;
                  }

Lincoln


On Thu, Aug 29, 2013 at 10:14 AM, Keiran Raine <[hidden email]> wrote:
Is there a setting to handle this?

I've only just noticed this as we've started marking the duplicates rather than removing them, it's causing some confusion...

Keiran Raine
Principal Bioinformatician
Cancer Genome Project
Wellcome Trust Sanger Institute

Tel:<a href="tel:%2B44%20%280%291223%20834244%C2%A0Ext%3A%207703" value="+441223834244" target="_blank">+44 (0)1223 834244 Ext: 7703
Office: H104


-- The Wellcome Trust Sanger Institute is operated by Genome Rese arch Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.

------------------------------------------------------------------------------
Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
Discover the easy way to master current and previous Microsoft technologies
and advance your career. Get an incredible 1,500+ hours of step-by-step
tutorial videos with LearnDevNow. Subscribe today and save!
http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-gbrowse mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse




--
Lincoln D. Stein
Director, Informatics and Biocomputing Platform
Ontario Institute for Cancer Research
101 College St., Suite 800
Toronto, ON, Canada M5G0A3
416 673-8514
Assistant: Renata Musa <[hidden email]>


-- The Wellcome Trust Sanger Institute is operated by Genome Rese arch Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.

------------------------------------------------------------------------------
Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
Discover the easy way to master current and previous Microsoft technologies
and advance your career. Get an incredible 1,500+ hours of step-by-step
tutorial videos with LearnDevNow. Subscribe today and save!
http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-gbrowse mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
Reply | Threaded
Open this post in threaded view
|

Re: Is it possible to exclude duplicate reads from BAM tracks?

Timothy Parnell
Hi Keiran,

I noticed in your email that you are working with paired-end data. If I remember correctly, each of the two reads are Bio::SeqFeature::Lite objects that do not inherit the attributes of the parent paired alignment. So display_name will work, but there are no attributes.

There is  a method described in the GBrowse HowTo wiki where you can get the parent feature by calling the glyph's parent_feature method, but I don't know if that will work or not in this case.

Hope that helps,
Tim

________________________________
From: Keiran Raine [[hidden email]]
Sent: Friday, August 30, 2013 4:07 PM
To: Lincoln Stein
Cc: Gbrowse (E-mail)
Subject: Re: [Gmod-gbrowse] Is it possible to exclude duplicate reads from BAM tracks?

Hi Lincoln.

I've tried this in the track configuration (where bgcolor etc are set).

I'm finding that none of the tags are available, in fact all of the functions get_all_tags, get_tag_values, has_tag return nothing.

For example I can see that read in the region I am looking at have the tag 'FLAGS' as the gbrowse_details page shows them:

[cid:69C37422-E1D6-463D-82BB-CC541A31D4E4]

But if I make the function:

filter = sub {
  my $f = shift;
  warn $f->has_tag('FLAGS') ? 'y' : 'n';
  }

I can see that the function has been called but I get 'n' for every read in the error log.

Switching it out for:

filter = sub {
  my $f = shift;
  warn $f->display_name;
  }

Printed out all of the read names so it's partially populated.

GBrowse 2.55
Bio::DB::Sam 1.38
BioPerl 1.6.901
Bio::Graphics 2.37

Any ideas?

Keiran Raine
Principal Bioinformatician
Cancer Genome Project
Wellcome Trust Sanger Institute

[hidden email]<mailto:[hidden email]>
Tel:+44 (0)1223 834244 Ext: 7703
Office: H104

On 29 Aug 2013, at 15:45, Lincoln Stein <[hidden email]<mailto:[hidden email]>> wrote:

Hi Keiran,

There isn't a setting for doing this. However, if the duplicate flagging is implemented as a BAM tag, then you can use the filter option:

 [MY TRACK]
 filter = sub { my $feature = shift;
                    my $flags = $feature->get_tag_values('FLAGS');
                    return $flags !~ /FDUP/;
                  }

Lincoln


On Thu, Aug 29, 2013 at 10:14 AM, Keiran Raine <[hidden email]<mailto:[hidden email]>> wrote:
Is there a setting to handle this?

I've only just noticed this as we've started marking the duplicates rather than removing them, it's causing some confusion...

Keiran Raine
Principal Bioinformatician
Cancer Genome Project
Wellcome Trust Sanger Institute

[hidden email]<mailto:[hidden email]>
Tel:+44 (0)1223 834244 Ext: 7703<tel:%2B44%20%280%291223%20834244%C2%A0Ext%3A%207703>
Office: H104


-- The Wellcome Trust Sanger Institute is operated by Genome Rese arch Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.

------------------------------------------------------------------------------
Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
Discover the easy way to master current and previous Microsoft technologies
and advance your career. Get an incredible 1,500+ hours of step-by-step
tutorial videos with LearnDevNow. Subscribe today and save!
http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-gbrowse mailing list
[hidden email]<mailto:[hidden email]>
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse




--
Lincoln D. Stein
Director, Informatics and Biocomputing Platform
Ontario Institute for Cancer Research
101 College St., Suite 800
Toronto, ON, Canada M5G0A3
416 673-8514
Assistant: Renata Musa <[hidden email]<mailto:[hidden email]>>


-- The Wellcome Trust Sanger Institute is operated by Genome Rese arch Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.

------------------------------------------------------------------------------
Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
Discover the easy way to master current and previous Microsoft technologies
and advance your career. Get an incredible 1,500+ hours of step-by-step
tutorial videos with LearnDevNow. Subscribe today and save!
http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-gbrowse mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse

Screen Shot 2013-08-30 at 22.52.21.png (9K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Is it possible to exclude duplicate reads from BAM tracks?

Keiran Raine
Hi Tim,

With that in mind I did a little digging the other way, the following works

filter = sub { 
        my $f = shift;
        my ($a) = $f->get_SeqFeatures;
        return  $a->get_tag_values('FLAGS') !~ m/DUPLICATE/;  
        }

Keiran Raine
Principal Bioinformatician
Cancer Genome Project
Wellcome Trust Sanger Institute

Tel:+44 (0)1223 834244 Ext: 7703
Office: H104

On 30 Aug 2013, at 23:49, Timothy Parnell <[hidden email]> wrote:

Hi Keiran,

I noticed in your email that you are working with paired-end data. If I remember correctly, each of the two reads are Bio::SeqFeature::Lite objects that do not inherit the attributes of the parent paired alignment. So display_name will work, but there are no attributes.

There is  a method described in the GBrowse HowTo wiki where you can get the parent feature by calling the glyph's parent_feature method, but I don't know if that will work or not in this case.

Hope that helps,
Tim

________________________________
From: Keiran Raine [[hidden email]]
Sent: Friday, August 30, 2013 4:07 PM
To: Lincoln Stein
Cc: Gbrowse (E-mail)
Subject: Re: [Gmod-gbrowse] Is it possible to exclude duplicate reads from BAM tracks?

Hi Lincoln.

I've tried this in the track configuration (where bgcolor etc are set).

I'm finding that none of the tags are available, in fact all of the functions get_all_tags, get_tag_values, has_tag return nothing.

For example I can see that read in the region I am looking at have the tag 'FLAGS' as the gbrowse_details page shows them:

[cid:69C37422-E1D6-463D-82BB-CC541A31D4E4]

But if I make the function:

filter = sub {
 my $f = shift;
 warn $f->has_tag('FLAGS') ? 'y' : 'n';
 }

I can see that the function has been called but I get 'n' for every read in the error log.

Switching it out for:

filter = sub {
 my $f = shift;
 warn $f->display_name;
 }

Printed out all of the read names so it's partially populated.

GBrowse 2.55
Bio::DB::Sam 1.38
BioPerl 1.6.901
Bio::Graphics 2.37

Any ideas?

Keiran Raine
Principal Bioinformatician
Cancer Genome Project
Wellcome Trust Sanger Institute

[hidden email]<[hidden email]>
Tel:+44 (0)1223 834244 Ext: 7703
Office: H104

On 29 Aug 2013, at 15:45, Lincoln Stein <[hidden email]<[hidden email]>> wrote:

Hi Keiran,

There isn't a setting for doing this. However, if the duplicate flagging is implemented as a BAM tag, then you can use the filter option:

[MY TRACK]
filter = sub { my $feature = shift;
                   my $flags = $feature->get_tag_values('FLAGS');
                   return $flags !~ /FDUP/;
                 }

Lincoln


On Thu, Aug 29, 2013 at 10:14 AM, Keiran Raine <[hidden email]<[hidden email]>> wrote:
Is there a setting to handle this?

I've only just noticed this as we've started marking the duplicates rather than removing them, it's causing some confusion...

Keiran Raine
Principal Bioinformatician
Cancer Genome Project
Wellcome Trust Sanger Institute

[hidden email]<[hidden email]>
Tel:+44 (0)1223 834244 Ext: 7703<tel:%2B44%20%280%291223%20834244%C2%A0Ext%3A%207703>
Office: H104


-- The Wellcome Trust Sanger Institute is operated by Genome Rese arch Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.

------------------------------------------------------------------------------
Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
Discover the easy way to master current and previous Microsoft technologies
and advance your career. Get an incredible 1,500+ hours of step-by-step
tutorial videos with LearnDevNow. Subscribe today and save!
http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-gbrowse mailing list
[hidden email]<mailto:[hidden email]>
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse




--
Lincoln D. Stein
Director, Informatics and Biocomputing Platform
Ontario Institute for Cancer Research
101 College St., Suite 800
Toronto, ON, Canada M5G0A3
416 673-8514
Assistant: Renata Musa <[hidden email]<mailto:[hidden email]>>


-- The Wellcome Trust Sanger Institute is operated by Genome Rese arch Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.


-- The Wellcome Trust Sanger Institute is operated by Genome Rese arch Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.

------------------------------------------------------------------------------
Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
Discover the easy way to master current and previous Microsoft technologies
and advance your career. Get an incredible 1,500+ hours of step-by-step
tutorial videos with LearnDevNow. Subscribe today and save!
http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-gbrowse mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
Reply | Threaded
Open this post in threaded view
|

Re: Is it possible to exclude duplicate reads from BAM tracks?

Lincoln Stein
Yes, the callback gets the read-pair which is the parent of the reads. The reads have the flag info.

Lincoln


On Fri, Aug 30, 2013 at 7:10 PM, Keiran Raine <[hidden email]> wrote:
Hi Tim,

With that in mind I did a little digging the other way, the following works

filter = sub { 
        my $f = shift;
        my ($a) = $f->get_SeqFeatures;
        return  $a->get_tag_values('FLAGS') !~ m/DUPLICATE/;  
        }

Keiran Raine
Principal Bioinformatician
Cancer Genome Project
Wellcome Trust Sanger Institute

Tel:<a href="tel:%2B44%20%280%291223%20834244%C2%A0Ext%3A%207703" value="+441223834244" target="_blank">+44 (0)1223 834244 Ext: 7703
Office: H104

On 30 Aug 2013, at 23:49, Timothy Parnell <[hidden email]> wrote:

Hi Keiran,

I noticed in your email that you are working with paired-end data. If I remember correctly, each of the two reads are Bio::SeqFeature::Lite objects that do not inherit the attributes of the parent paired alignment. So display_name will work, but there are no attributes.

There is  a method described in the GBrowse HowTo wiki where you can get the parent feature by calling the glyph's parent_feature method, but I don't know if that will work or not in this case.

Hope that helps,
Tim

________________________________
From: Keiran Raine [[hidden email]]
Sent: Friday, August 30, 2013 4:07 PM
To: Lincoln Stein
Cc: Gbrowse (E-mail)
Subject: Re: [Gmod-gbrowse] Is it possible to exclude duplicate reads from BAM tracks?

Hi Lincoln.

I've tried this in the track configuration (where bgcolor etc are set).

I'm finding that none of the tags are available, in fact all of the functions get_all_tags, get_tag_values, has_tag return nothing.

For example I can see that read in the region I am looking at have the tag 'FLAGS' as the gbrowse_details page shows them:

[cid:69C37422-E1D6-463D-82BB-CC541A31D4E4]

But if I make the function:

filter = sub {
 my $f = shift;
 warn $f->has_tag('FLAGS') ? 'y' : 'n';
 }

I can see that the function has been called but I get 'n' for every read in the error log.

Switching it out for:

filter = sub {
 my $f = shift;
 warn $f->display_name;
 }

Printed out all of the read names so it's partially populated.

GBrowse 2.55
Bio::DB::Sam 1.38
BioPerl 1.6.901
Bio::Graphics 2.37

Any ideas?

Keiran Raine
Principal Bioinformatician
Cancer Genome Project
Wellcome Trust Sanger Institute

[hidden email]<[hidden email]>
Tel:<a href="tel:%2B44%20%280%291223%20834244%20Ext%3A%207703" value="+441223834244" target="_blank">+44 (0)1223 834244 Ext: 7703
Office: H104

On 29 Aug 2013, at 15:45, Lincoln Stein <[hidden email]<[hidden email]>> wrote:

Hi Keiran,

There isn't a setting for doing this. However, if the duplicate flagging is implemented as a BAM tag, then you can use the filter option:

[MY TRACK]
filter = sub { my $feature = shift;
                   my $flags = $feature->get_tag_values('FLAGS');
                   return $flags !~ /FDUP/;
                 }

Lincoln


On Thu, Aug 29, 2013 at 10:14 AM, Keiran Raine <[hidden email]<[hidden email]>> wrote:
Is there a setting to handle this?

I've only just noticed this as we've started marking the duplicates rather than removing them, it's causing some confusion...

Keiran Raine
Principal Bioinformatician
Cancer Genome Project
Wellcome Trust Sanger Institute

[hidden email]<[hidden email]>
Tel:<a href="tel:%2B44%20%280%291223%20834244%20Ext%3A%207703" value="+441223834244" target="_blank">+44 (0)1223 834244 Ext: 7703<tel:%2B44%20%280%291223%20834244%C2%A0Ext%3A%207703>
Office: H104


-- The Wellcome Trust Sanger Institute is operated by Genome Rese arch Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.

------------------------------------------------------------------------------
Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
Discover the easy way to master current and previous Microsoft technologies
and advance your career. Get an incredible 1,500+ hours of step-by-step
tutorial videos with LearnDevNow. Subscribe today and save!
http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-gbrowse mailing list
[hidden email]<mailto:[hidden email]>
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse




--
Lincoln D. Stein
Director, Informatics and Biocomputing Platform
Ontario Institute for Cancer Research
101 College St., Suite 800
Toronto, ON, Canada M5G0A3
<a href="tel:416%20673-8514" value="+14166738514" target="_blank">416 673-8514
Assistant: Renata Musa <[hidden email]<mailto:[hidden email]>>


-- The Wellcome Trust Sanger Institute is operated by Genome Rese arch Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.


-- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.



--
Lincoln D. Stein
Director, Informatics and Biocomputing Platform
Ontario Institute for Cancer Research
101 College St., Suite 800
Toronto, ON, Canada M5G0A3
416 673-8514
Assistant: Renata Musa <[hidden email]>

------------------------------------------------------------------------------
Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
Discover the easy way to master current and previous Microsoft technologies
and advance your career. Get an incredible 1,500+ hours of step-by-step
tutorial videos with LearnDevNow. Subscribe today and save!
http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-gbrowse mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
Reply | Threaded
Open this post in threaded view
|

Re: Is it possible to exclude duplicate reads from BAM tracks?

Keiran Raine
Hi Lincoln,

I've hit a snag with this (or should I say a user has).  We use a sematic zoom to switch to a wiggle plot based on a BigWig file, once in this view the filter still attempts to run.  What is the best/most efficient way to disable the filter once this level of zoom has been reached?

filter = 0
or
filter = sub {1;}

Thanks,

Keiran Raine
Principal Bioinformatician
Cancer Genome Project
Wellcome Trust Sanger Institute

Tel:+44 (0)1223 834244 Ext: 7703
Office: H104

On 3 Sep 2013, at 22:28, Lincoln Stein <[hidden email]> wrote:

Yes, the callback gets the read-pair which is the parent of the reads. The reads have the flag info.

Lincoln


On Fri, Aug 30, 2013 at 7:10 PM, Keiran Raine <[hidden email]> wrote:
Hi Tim,

With that in mind I did a little digging the other way, the following works

filter = sub { 
        my $f = shift;
        my ($a) = $f->get_SeqFeatures;
        return  $a->get_tag_values('FLAGS') !~ m/DUPLICATE/;  
        }

Keiran Raine
Principal Bioinformatician
Cancer Genome Project
Wellcome Trust Sanger Institute

Tel:<a href="tel:%2B44%20%280%291223%20834244%C2%A0Ext%3A%207703" value="+441223834244" target="_blank">+44 (0)1223 834244 Ext: 7703
Office: H104

On 30 Aug 2013, at 23:49, Timothy Parnell <[hidden email]> wrote:

Hi Keiran,

I noticed in your email that you are working with paired-end data. If I remember correctly, each of the two reads are Bio::SeqFeature::Lite objects that do not inherit the attributes of the parent paired alignment. So display_name will work, but there are no attributes.

There is  a method described in the GBrowse HowTo wiki where you can get the parent feature by calling the glyph's parent_feature method, but I don't know if that will work or not in this case.

Hope that helps,
Tim

________________________________
From: Keiran Raine [[hidden email]]
Sent: Friday, August 30, 2013 4:07 PM
To: Lincoln Stein
Cc: Gbrowse (E-mail)
Subject: Re: [Gmod-gbrowse] Is it possible to exclude duplicate reads from BAM tracks?

Hi Lincoln.

I've tried this in the track configuration (where bgcolor etc are set).

I'm finding that none of the tags are available, in fact all of the functions get_all_tags, get_tag_values, has_tag return nothing.

For example I can see that read in the region I am looking at have the tag 'FLAGS' as the gbrowse_details page shows them:

[cid:69C37422-E1D6-463D-82BB-CC541A31D4E4]

But if I make the function:

filter = sub {
 my $f = shift;
 warn $f->has_tag('FLAGS') ? 'y' : 'n';
 }

I can see that the function has been called but I get 'n' for every read in the error log.

Switching it out for:

filter = sub {
 my $f = shift;
 warn $f->display_name;
 }

Printed out all of the read names so it's partially populated.

GBrowse 2.55
Bio::DB::Sam 1.38
BioPerl 1.6.901
Bio::Graphics 2.37

Any ideas?

Keiran Raine
Principal Bioinformatician
Cancer Genome Project
Wellcome Trust Sanger Institute

[hidden email]<[hidden email]>
Tel:<a href="tel:%2B44%20%280%291223%20834244%20Ext%3A%207703" value="+441223834244" target="_blank">+44 (0)1223 834244 Ext: 7703
Office: H104

On 29 Aug 2013, at 15:45, Lincoln Stein <[hidden email]<[hidden email]>> wrote:

Hi Keiran,

There isn't a setting for doing this. However, if the duplicate flagging is implemented as a BAM tag, then you can use the filter option:

[MY TRACK]
filter = sub { my $feature = shift;
                   my $flags = $feature->get_tag_values('FLAGS');
                   return $flags !~ /FDUP/;
                 }

Lincoln


On Thu, Aug 29, 2013 at 10:14 AM, Keiran Raine <[hidden email]<[hidden email]>> wrote:
Is there a setting to handle this?

I've only just noticed this as we've started marking the duplicates rather than removing them, it's causing some confusion...

Keiran Raine
Principal Bioinformatician
Cancer Genome Project
Wellcome Trust Sanger Institute

[hidden email]<[hidden email]>
Tel:<a href="tel:%2B44%20%280%291223%20834244%20Ext%3A%207703" value="+441223834244" target="_blank">+44 (0)1223 834244 Ext: 7703<tel:%2B44%20%280%291223%20834244%C2%A0Ext%3A%207703>
Office: H104


-- The Wellcome Trust Sanger Institute is operated by Genome Rese arch Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.

------------------------------------------------------------------------------
Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
Discover the easy way to master current and previous Microsoft technologies
and advance your career. Get an incredible 1,500+ hours of step-by-step
tutorial videos with LearnDevNow. Subscribe today and save!
http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-gbrowse mailing list
[hidden email]<mailto:[hidden email]>
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse




--
Lincoln D. Stein
Director, Informatics and Biocomputing Platform
Ontario Institute for Cancer Research
101 College St., Suite 800
Toronto, ON, Canada M5G0A3
<a href="tel:416%20673-8514" value="+14166738514" target="_blank">416 673-8514
Assistant: Renata Musa <[hidden email]<mailto:[hidden email]>>


-- The Wellcome Trust Sanger Institute is operated by Genome Rese arch Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.


-- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.



--
Lincoln D. Stein
Director, Informatics and Biocomputing Platform
Ontario Institute for Cancer Research
101 College St., Suite 800
Toronto, ON, Canada M5G0A3
416 673-8514
Assistant: Renata Musa <[hidden email]>


-- The Wellcome Trust Sanger Institute is operated by Genome Rese arch Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.

------------------------------------------------------------------------------
Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
Discover the easy way to master current and previous Microsoft technologies
and advance your career. Get an incredible 1,500+ hours of step-by-step
tutorial videos with LearnDevNow. Subscribe today and save!
http://pubads.g.doubleclick.net/gampad/clk?id=58041391&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-gbrowse mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
Reply | Threaded
Open this post in threaded view
|

Re: Is it possible to exclude duplicate reads from BAM tracks?

Lincoln Stein
Hi Keiran,

filter=0 will be more efficient.

Lincoln


On Mon, Sep 9, 2013 at 10:34 AM, Keiran Raine <[hidden email]> wrote:
Hi Lincoln,

I've hit a snag with this (or should I say a user has).  We use a sematic zoom to switch to a wiggle plot based on a BigWig file, once in this view the filter still attempts to run.  What is the best/most efficient way to disable the filter once this level of zoom has been reached?

filter = 0
or
filter = sub {1;}

Thanks,


Keiran Raine
Principal Bioinformatician
Cancer Genome Project
Wellcome Trust Sanger Institute

Tel:<a href="tel:%2B44%20%280%291223%20834244%C2%A0Ext%3A%207703" value="+441223834244" target="_blank">+44 (0)1223 834244 Ext: 7703
Office: H104

On 3 Sep 2013, at 22:28, Lincoln Stein <[hidden email]> wrote:

Yes, the callback gets the read-pair which is the parent of the reads. The reads have the flag info.

Lincoln


On Fri, Aug 30, 2013 at 7:10 PM, Keiran Raine <[hidden email]> wrote:
Hi Tim,

With that in mind I did a little digging the other way, the following works

filter = sub { 
        my $f = shift;
        my ($a) = $f->get_SeqFeatures;
        return  $a->get_tag_values('FLAGS') !~ m/DUPLICATE/;  
        }

Keiran Raine
Principal Bioinformatician
Cancer Genome Project
Wellcome Trust Sanger Institute

Tel:<a href="tel:%2B44%20%280%291223%20834244%C2%A0Ext%3A%207703" value="+441223834244" target="_blank">+44 (0)1223 834244 Ext: 7703
Office: H104

On 30 Aug 2013, at 23:49, Timothy Parnell <[hidden email]> wrote:

Hi Keiran,

I noticed in your email that you are working with paired-end data. If I remember correctly, each of the two reads are Bio::SeqFeature::Lite objects that do not inherit the attributes of the parent paired alignment. So display_name will work, but there are no attributes.

There is  a method described in the GBrowse HowTo wiki where you can get the parent feature by calling the glyph's parent_feature method, but I don't know if that will work or not in this case.

Hope that helps,
Tim

________________________________
From: Keiran Raine [[hidden email]]
Sent: Friday, August 30, 2013 4:07 PM
To: Lincoln Stein
Cc: Gbrowse (E-mail)
Subject: Re: [Gmod-gbrowse] Is it possible to exclude duplicate reads from BAM tracks?

Hi Lincoln.

I've tried this in the track configuration (where bgcolor etc are set).

I'm finding that none of the tags are available, in fact all of the functions get_all_tags, get_tag_values, has_tag return nothing.

For example I can see that read in the region I am looking at have the tag 'FLAGS' as the gbrowse_details page shows them:

[cid:69C37422-E1D6-463D-82BB-CC541A31D4E4]

But if I make the function:

filter = sub {
 my $f = shift;
 warn $f->has_tag('FLAGS') ? 'y' : 'n';
 }

I can see that the function has been called but I get 'n' for every read in the error log.

Switching it out for:

filter = sub {
 my $f = shift;
 warn $f->display_name;
 }

Printed out all of the read names so it's partially populated.

GBrowse 2.55
Bio::DB::Sam 1.38
BioPerl 1.6.901
Bio::Graphics 2.37

Any ideas?

Keiran Raine
Principal Bioinformatician
Cancer Genome Project
Wellcome Trust Sanger Institute

[hidden email]<[hidden email]>
Tel:<a href="tel:%2B44%20%280%291223%20834244%20Ext%3A%207703" value="+441223834244" target="_blank">+44 (0)1223 834244 Ext: 7703
Office: H104

On 29 Aug 2013, at 15:45, Lincoln Stein <[hidden email]<[hidden email]>> wrote:

Hi Keiran,

There isn't a setting for doing this. However, if the duplicate flagging is implemented as a BAM tag, then you can use the filter option:

[MY TRACK]
filter = sub { my $feature = shift;
                   my $flags = $feature->get_tag_values('FLAGS');
                   return $flags !~ /FDUP/;
                 }

Lincoln


On Thu, Aug 29, 2013 at 10:14 AM, Keiran Raine <[hidden email]<[hidden email]>> wrote:
Is there a setting to handle this?

I've only just noticed this as we've started marking the duplicates rather than removing them, it's causing some confusion...

Keiran Raine
Principal Bioinformatician
Cancer Genome Project
Wellcome Trust Sanger Institute

[hidden email]<[hidden email]>
Tel:<a href="tel:%2B44%20%280%291223%20834244%20Ext%3A%207703" value="+441223834244" target="_blank">+44 (0)1223 834244 Ext: 7703<tel:%2B44%20%280%291223%20834244%C2%A0Ext%3A%207703>
Office: H104


-- The Wellcome Trust Sanger Institute is operated by Genome Rese arch Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.

------------------------------------------------------------------------------
Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
Discover the easy way to master current and previous Microsoft technologies
and advance your career. Get an incredible 1,500+ hours of step-by-step
tutorial videos with LearnDevNow. Subscribe today and save!
http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-gbrowse mailing list
[hidden email]<mailto:[hidden email]>
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse




--
Lincoln D. Stein
Director, Informatics and Biocomputing Platform
Ontario Institute for Cancer Research
101 College St., Suite 800
Toronto, ON, Canada M5G0A3
<a href="tel:416%20673-8514" value="+14166738514" target="_blank">416 673-8514
Assistant: Renata Musa <[hidden email]<mailto:[hidden email]>>


-- The Wellcome Trust Sanger Institute is operated by Genome Rese arch Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.


-- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.



--
Lincoln D. Stein
Director, Informatics and Biocomputing Platform
Ontario Institute for Cancer Research
101 College St., Suite 800
Toronto, ON, Canada M5G0A3
<a href="tel:416%20673-8514" value="+14166738514" target="_blank">416 673-8514
Assistant: Renata Musa <[hidden email]>


-- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.



--
Lincoln D. Stein
Director, Informatics and Biocomputing Platform
Ontario Institute for Cancer Research
101 College St., Suite 800
Toronto, ON, Canada M5G0A3
416 673-8514
Assistant: Renata Musa <[hidden email]>

------------------------------------------------------------------------------
How ServiceNow helps IT people transform IT departments:
1. Consolidate legacy IT systems to a single system of record for IT
2. Standardize and globalize service processes across IT
3. Implement zero-touch automation to replace manual, redundant tasks
http://pubads.g.doubleclick.net/gampad/clk?id=51271111&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-gbrowse mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse