Bio::DB::Sam - Bio::DB::Bam::AlignmentWrapper->padded_alignment

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Bio::DB::Sam - Bio::DB::Bam::AlignmentWrapper->padded_alignment

Keiran Raine
Hi all,

(some sections of this email have been monospaced, hopefully formatting will be retained)

I'm encountering unexpected results from Bio::DB::Sam when requesting the 'matches' component of the padded_alignment:

TAAACTATAAATAGCTCCTTTCACCTTTAGTCAAAGGAAATATCAAGAAGGCCTGTAGGGTAGCTCCCTATGGCTGTTTAAAAAGTGTTTGATTTTTATG
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
TAAACTATAAATAGCTCCTTCCACCTTTAGTCAAAGGAAATATCAAGAAGGCCTGTAGGGTAGCTCCCTATGGCTGTTTAAAAAGTGATTGGTTTTTTTT
                    *                                                                  *   *     * *

I've indicated with '*' where I would not expect to see the '|' symbol in the match string.

I've written a patch that resolves this (attached), it may not be the most efficient way to do it but it gives the correct result:

TAAACTATAAATAGCTCCTTTCACCTTTAGTCAAAGGAAATATCAAGAAGGCCTGTAGGGTAGCTCCCTATGGCTGTTTAAAAAGTGTTTGATTTTTATG
|||||||||||||||||||| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| ||| ||||| | 
TAAACTATAAATAGCTCCTTCCACCTTTAGTCAAAGGAAATATCAAGAAGGCCTGTAGGGTAGCTCCCTATGGCTGTTTAAAAAGTGATTGGTTTTTTTT

Kind regards,

Keiran Raine
Senior Computer Biologist
The Cancer Genome Project
Ext: 7703


-- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a compa ny registered in England with number 2742969, whose registered office is 2 15 Euston Road, London, NW1 2BE.


------------------------------------------------------------------------------
Free Software Download: Index, Search & Analyze Logs and other IT data in
Real-Time with Splunk. Collect, index and harness all the fast moving IT data
generated by your applications, servers and devices whether physical, virtual
or in the cloud. Deliver compliance at lower cost and gain new business
insights. http://p.sf.net/sfu/splunk-dev2dev 
_______________________________________________
Gmod-gbrowse mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse

AlignWrapper.pm.patch (674 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Bio::DB::Sam - Bio::DB::Bam::AlignmentWrapper->padded_alignment

Keiran Raine
Hi all,

Some subsequent thinking on this has lead to a modification to the patch.  This should be significantly more efficient.


-- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.



Keiran Raine
Senior Computer Biologist
The Cancer Genome Project
Ext: 7703





On 28 Feb 2011, at 21:50, Keiran Raine wrote:

Hi all,

(some sections of this email have been monospaced, hopefully formatting will be retained)

I'm encountering unexpected results from Bio::DB::Sam when requesting the 'matches' component of the padded_alignment:

TAAACTATAAATAGCTCCTTTCACCTTTAGTCAAAGGAAATATCAAGAAGGCCTGTAGGGTAGCTCCCTATGGCTGTTTAAAAAGTGTTTGATTTTTATG
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
TAAACTATAAATAGCTCCTTCCACCTTTAGTCAAAGGAAATATCAAGAAGGCCTGTAGGGTAGCTCCCTATGGCTGTTTAAAAAGTGATTGGTTTTTTTT
                    *                                                                  *   *     * *

I've indicated with '*' where I would not expect to see the '|' symbol in the match string.

I've written a patch that resolves this (attached), it may not be the most efficient way to do it but it gives the correct result:

TAAACTATAAATAGCTCCTTTCACCTTTAGTCAAAGGAAATATCAAGAAGGCCTGTAGGGTAGCTCCCTATGGCTGTTTAAAAAGTGTTTGATTTTTATG
|||||||||||||||||||| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| ||| ||||| | 
TAAACTATAAATAGCTCCTTCCACCTTTAGTCAAAGGAAATATCAAGAAGGCCTGTAGGGTAGCTCCCTATGGCTGTTTAAAAAGTGATTGGTTTTTTTT

Kind regards,

Keiran Raine
Senior Computer Biologist
The Cancer Genome Project
Ext: 7703

<AlignWrapper.pm.patch>


------------------------------------------------------------------------------
Free Software Download: Index, Search & Analyze Logs and other IT data in
Real-Time with Splunk. Collect, index and harness all the fast moving IT data
generated by your applications, servers and devices whether physical, virtual
or in the cloud. Deliver compliance at lower cost and gain new business
insights. http://p.sf.net/sfu/splunk-dev2dev 
_______________________________________________
Gmod-gbrowse mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse

AlignWrapper.pm.patch (956 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Bio::DB::Sam - Bio::DB::Bam::AlignmentWrapper->padded_alignment

Lincoln Stein
Hi,

Maybe it is counterintuitive but the pipe symbol indicates that there
is an "M" in the CIGAR string there, not that the bases are equal.
Your patch makes sense, but before I commit it could I confirm that
nobody is depending on the current behavior?

Lincoln

On Tuesday, March 1, 2011, Keiran Raine <[hidden email]> wrote:

> Hi all,
> Some subsequent thinking on this has lead to a modification to the patch.  This should be significantly more efficient.
>
>
> --
>  The Wellcome Trust Sanger Institute is operated by Genome Research
>  Limited, a charity registered in England with number 1021457 and a
>  company registered in England with number 2742969, whose registered
>  office is 215 Euston Road, London, NW1 2BE.
>
>

--
Lincoln D. Stein
Director, Informatics and Biocomputing Platform
Ontario Institute for Cancer Research
101 College St., Suite 800
Toronto, ON, Canada M5G0A3
416 673-8514
Assistant: Renata Musa <[hidden email]>

------------------------------------------------------------------------------
Free Software Download: Index, Search & Analyze Logs and other IT data in
Real-Time with Splunk. Collect, index and harness all the fast moving IT data
generated by your applications, servers and devices whether physical, virtual
or in the cloud. Deliver compliance at lower cost and gain new business
insights. http://p.sf.net/sfu/splunk-dev2dev 
_______________________________________________
Gmod-gbrowse mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
Reply | Threaded
Open this post in threaded view
|

Re: Bio::DB::Sam - Bio::DB::Bam::AlignmentWrapper->padded_alignment

Keiran Raine
Hi Lincoln,

Just to confirm, I don't mind which character denotes the mis-match (I  
understand it may want to be '-' to indicate it was not a ins/del  
region).

Thanks,

Keiran Raine
Senior Computer Biologist
The Cancer Genome Project
Ext: 7703
[hidden email]





On 1 Mar 2011, at 10:17, Lincoln Stein wrote:

> Hi,
>
> Maybe it is counterintuitive but the pipe symbol indicates that there
> is an "M" in the CIGAR string there, not that the bases are equal.
> Your patch makes sense, but before I commit it could I confirm that
> nobody is depending on the current behavior?
>
> Lincoln
>
> On Tuesday, March 1, 2011, Keiran Raine <[hidden email]> wrote:
>> Hi all,
>> Some subsequent thinking on this has lead to a modification to the  
>> patch.  This should be significantly more efficient.
>>
>>
>> --
>> The Wellcome Trust Sanger Institute is operated by Genome Research
>> Limited, a charity registered in England with number 1021457 and a
>> company registered in England with number 2742969, whose registered
>> office is 215 Euston Road, London, NW1 2BE.
>>
>>
>
> --
> Lincoln D. Stein
> Director, Informatics and Biocomputing Platform
> Ontario Institute for Cancer Research
> 101 College St., Suite 800
> Toronto, ON, Canada M5G0A3
> 416 673-8514
> Assistant: Renata Musa <[hidden email]>



--
 The Wellcome Trust Sanger Institute is operated by Genome Research
 Limited, a charity registered in England with number 1021457 and a
 company registered in England with number 2742969, whose registered
 office is 215 Euston Road, London, NW1 2BE.

------------------------------------------------------------------------------
Free Software Download: Index, Search & Analyze Logs and other IT data in
Real-Time with Splunk. Collect, index and harness all the fast moving IT data
generated by your applications, servers and devices whether physical, virtual
or in the cloud. Deliver compliance at lower cost and gain new business
insights. http://p.sf.net/sfu/splunk-dev2dev 
_______________________________________________
Gmod-gbrowse mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse