AlignIO and Gbrowse_syn

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

AlignIO and Gbrowse_syn

Smithies, Russell
I know there was some brief discussion about .maf format a few weeks ago but I've had an enquiry (as below) from a colleague.
If GBrowse_syn is using .maf format, does AlignIO need more work?
Any comments?

--Russell


I'd like to plug LASTZ alignments into GBrowse_syn. LASTZ can produce a limit number of alignment formats (http://www.bx.psu.edu/miller_lab/dist/README.lastz-1.02.00/README.lastz-1.02.00a.html#options_output). GBrowse_syn accepts clustalw format plus "other commonly used formats recognized by BioPerl's AlignIO parser" (http://gmod.org/wiki/GBrowse_syn_Database) .  Since LASTZ doesn't produce clustalw, I've tried parsing LASTZ maf output to clustalw (and other alignment formats) using AlignIO, however I run into the following issues:
*Strand info is lost (probably fair enough, since this isn't part of the clustalw format per se; incorporating strand info within sequence IDs is a GBrowse_syn clustalw specification)
*The coordinate system for reverse strand matches  differs between LASTZ .maf and BioPerl .maf: for LASTZ, coordinates relate to the reverse complemented sequence, whereas for BioPerl/GBrowse, coordinates relate to the original (non-rev complemented) sequence. E.g. a coordinate of "1" in the LASTZ .maf file refers to the last base of the original sequence; AlignIO prints "1" to the output clustalw file, but since strand info is lost it is construed as the first position at the very start of the original sequence. As a result all reverse match coordinates in the resulting clustalw output file are incorrect.
*AlignIO is unable to parse multiple, individual aligned regions within the same .maf file; it interleaves them

I would be interested to hear whether anyone has already found a solution to integrating LASTZ and GBrowse_syn... and also whether any development of AlignIO to improve support of maf format is planned.
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================

------------------------------------------------------------------------------
This SF.net email is sponsored by

Make an app they can't live without
Enter the BlackBerry Developer Challenge
http://p.sf.net/sfu/RIM-dev2dev 
_______________________________________________
Gmod-gbrowse mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
Reply | Threaded
Open this post in threaded view
|

Re: [Bioperl-l] AlignIO and Gbrowse_syn

Fields, Christopher J
Russell,

We have had very few requests to support .maf until recently, which is why there has been little done with it.  We welcome any help to improve it.  

chris

On Aug 11, 2010, at 4:31 PM, Smithies, Russell wrote:

> I know there was some brief discussion about .maf format a few weeks ago but I've had an enquiry (as below) from a colleague.
> If GBrowse_syn is using .maf format, does AlignIO need more work?
> Any comments?
>
> --Russell
>
>
> I'd like to plug LASTZ alignments into GBrowse_syn. LASTZ can produce a limit number of alignment formats (http://www.bx.psu.edu/miller_lab/dist/README.lastz-1.02.00/README.lastz-1.02.00a.html#options_output). GBrowse_syn accepts clustalw format plus "other commonly used formats recognized by BioPerl's AlignIO parser" (http://gmod.org/wiki/GBrowse_syn_Database) .  Since LASTZ doesn't produce clustalw, I've tried parsing LASTZ maf output to clustalw (and other alignment formats) using AlignIO, however I run into the following issues:
> *Strand info is lost (probably fair enough, since this isn't part of the clustalw format per se; incorporating strand info within sequence IDs is a GBrowse_syn clustalw specification)
> *The coordinate system for reverse strand matches  differs between LASTZ .maf and BioPerl .maf: for LASTZ, coordinates relate to the reverse complemented sequence, whereas for BioPerl/GBrowse, coordinates relate to the original (non-rev complemented) sequence. E.g. a coordinate of "1" in the LASTZ .maf file refers to the last base of the original sequence; AlignIO prints "1" to the output clustalw file, but since strand info is lost it is construed as the first position at the very start of the original sequence. As a result all reverse match coordinates in the resulting clustalw output file are incorrect.
> *AlignIO is unable to parse multiple, individual aligned regions within the same .maf file; it interleaves them
>
> I would be interested to hear whether anyone has already found a solution to integrating LASTZ and GBrowse_syn... and also whether any development of AlignIO to improve support of maf format is planned.
> =======================================================================
> Attention: The information contained in this message and/or attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> =======================================================================
>
> _______________________________________________
> Bioperl-l mailing list
> [hidden email]
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


------------------------------------------------------------------------------
This SF.net email is sponsored by

Make an app they can't live without
Enter the BlackBerry Developer Challenge
http://p.sf.net/sfu/RIM-dev2dev 
_______________________________________________
Gmod-gbrowse mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
Reply | Threaded
Open this post in threaded view
|

Re: [Bioperl-l] AlignIO and Gbrowse_syn

Sheldon McKay
The growse_syn dev team is pretty small (n=1) right now, so any
patches would be welcome.

Sheldon



On Wed, Aug 11, 2010 at 6:02 PM, Chris Fields <[hidden email]> wrote:

> Russell,
>
> We have had very few requests to support .maf until recently, which is why there has been little done with it.  We welcome any help to improve it.
>
> chris
>
> On Aug 11, 2010, at 4:31 PM, Smithies, Russell wrote:
>
>> I know there was some brief discussion about .maf format a few weeks ago but I've had an enquiry (as below) from a colleague.
>> If GBrowse_syn is using .maf format, does AlignIO need more work?
>> Any comments?
>>
>> --Russell
>>
>>
>> I'd like to plug LASTZ alignments into GBrowse_syn. LASTZ can produce a limit number of alignment formats (http://www.bx.psu.edu/miller_lab/dist/README.lastz-1.02.00/README.lastz-1.02.00a.html#options_output). GBrowse_syn accepts clustalw format plus "other commonly used formats recognized by BioPerl's AlignIO parser" (http://gmod.org/wiki/GBrowse_syn_Database) .  Since LASTZ doesn't produce clustalw, I've tried parsing LASTZ maf output to clustalw (and other alignment formats) using AlignIO, however I run into the following issues:
>> *Strand info is lost (probably fair enough, since this isn't part of the clustalw format per se; incorporating strand info within sequence IDs is a GBrowse_syn clustalw specification)
>> *The coordinate system for reverse strand matches  differs between LASTZ .maf and BioPerl .maf: for LASTZ, coordinates relate to the reverse complemented sequence, whereas for BioPerl/GBrowse, coordinates relate to the original (non-rev complemented) sequence. E.g. a coordinate of "1" in the LASTZ .maf file refers to the last base of the original sequence; AlignIO prints "1" to the output clustalw file, but since strand info is lost it is construed as the first position at the very start of the original sequence. As a result all reverse match coordinates in the resulting clustalw output file are incorrect.
>> *AlignIO is unable to parse multiple, individual aligned regions within the same .maf file; it interleaves them
>>
>> I would be interested to hear whether anyone has already found a solution to integrating LASTZ and GBrowse_syn... and also whether any development of AlignIO to improve support of maf format is planned.
>> =======================================================================
>> Attention: The information contained in this message and/or attachments
>> from AgResearch Limited is intended only for the persons or entities
>> to which it is addressed and may contain confidential and/or privileged
>> material. Any review, retransmission, dissemination or other use of, or
>> taking of any action in reliance upon, this information by persons or
>> entities other than the intended recipients is prohibited by AgResearch
>> Limited. If you have received this message in error, please notify the
>> sender immediately.
>> =======================================================================
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> [hidden email]
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> [hidden email]
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

------------------------------------------------------------------------------
This SF.net email is sponsored by

Make an app they can't live without
Enter the BlackBerry Developer Challenge
http://p.sf.net/sfu/RIM-dev2dev 
_______________________________________________
Gmod-gbrowse mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse