Strand specific coverage wiggle plot

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Strand specific coverage wiggle plot

Michael Dondrup-3
Hi,

we are visualizing strand specific RNA-seq data using the Bio::DB::Sam adapter with the
wiggle_xyplot glyph. Our configuration stanzas look like the example below. Is it possible
to somehow enable strand specific coverage plots e.g. by use of different colors or plotting
antisense coverage a negative values? If not is there possibly an alternative way of achieving this?
(splitting the bam file into two for each strand would be only my last resort).

Cheers
Michael



[lsalsamHBR10:database]
db_adaptor     = Bio::DB::Sam
db_args        = -bam  /export/storage/licebase/bam/10.fastq.gz.clipped.bam
search options = default

[CoverageXyplotHBR10]
feature        = coverage
glyph          = wiggle_xyplot
database       = lsalsamHBR10
height         = 50
fgcolor        = black
bicolor_pivot  = 0
pos_color      = blue
neg_color      = red
key            = Intestine (HBR-10)
category       = Reads
label          = 0 # Labels on wiggle tracks are redundant.
citation     =  non-normalized RNA-seq library,  stranded protocol, Illumina HiSeq, adapter clipped reads (TRIMMOMATIC), aligned using STAR






Michael Dondrup
Postdoctoral fellow
Sea Lice Research Centre/Department of Informatics
University of Bergen
Thormøhlensgate 55, N-5008 Bergen,
Norway


------------------------------------------------------------------------------
Subversion Kills Productivity. Get off Subversion & Make the Move to Perforce.
With Perforce, you get hassle-free workflows. Merge that actually works.
Faster operations. Version large binaries.  Built-in WAN optimization and the
freedom to use Git, Perforce or both. Make the move to Perforce.
http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-gbrowse mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse

signature.asc (465 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Strand specific coverage wiggle plot

Timothy Parnell
Hi Michael,

Unfortunately, with the underlying samtools library there is no mechanism to distinguish alignment strand when generating coverage; it simply counts all alignments.

Your only recourse, short of rewriting the samtools C library and Perl module, is to generate strand-specific coverage on your own and use that. This is something I have struggled with in my own work, so I can suggest two approaches that I have developed.

I have written a utility, bam2wig.pl, which will generate true stranded coverage and corresponding forward and reverse bigWig files. It is reasonably fast, particularly if you enable multi-threading. The description is found here http://code.google.com/p/biotoolbox/wiki/Pod_bam2wig. I just released a new version on CPAN this morning, so you may need to wait for it to get to your mirror.

The second option is to use a very fast Java utility, Sam2USeq, part of the USeq package available at http://useq.sourceforge.net. The useq format can be used as is or further converted to bigWig using utilities within the package.

The next problem is loading into GBrowse. The current GBrowse release has an upload adaptor that accepts a tar or zip file of two or more bigWig files, perfect for stranded coverage. If you choose the USeq approach, I have written a Bio::DB::USeq adaptor (available on CPAN) and modified GBrowse to use it. The nice thing about useq files is that a single file supports stranded data and they are typically smaller than corresponding bigWig files. I haven’t made a pull request to the main repo yet, but you can get my changes at https://github.com/tjparnell/GBrowse/tree/new_upload_options.

A lot of info, but let me know if you have questions. I am sure there are other options, too, but these have worked well for me.

Tim


On Mar 9, 2014, at 3:45 AM, Michael Dondrup <[hidden email]> wrote:

> Hi,
>
> we are visualizing strand specific RNA-seq data using the Bio::DB::Sam adapter with the
> wiggle_xyplot glyph. Our configuration stanzas look like the example below. Is it possible
> to somehow enable strand specific coverage plots e.g. by use of different colors or plotting
> antisense coverage a negative values? If not is there possibly an alternative way of achieving this?
> (splitting the bam file into two for each strand would be only my last resort).
>
> Cheers
> Michael
>
>
>
> [lsalsamHBR10:database]
> db_adaptor     = Bio::DB::Sam
> db_args        = -bam  /export/storage/licebase/bam/10.fastq.gz.clipped.bam
> search options = default
>
> [CoverageXyplotHBR10]
> feature        = coverage
> glyph          = wiggle_xyplot
> database       = lsalsamHBR10
> height         = 50
> fgcolor        = black
> bicolor_pivot  = 0
> pos_color      = blue
> neg_color      = red
> key            = Intestine (HBR-10)
> category       = Reads
> label          = 0 # Labels on wiggle tracks are redundant.
> citation     =  non-normalized RNA-seq library,  stranded protocol, Illumina HiSeq, adapter clipped reads (TRIMMOMATIC), aligned using STAR
>
>
>
>
>
>
> Michael Dondrup
> Postdoctoral fellow
> Sea Lice Research Centre/Department of Informatics
> University of Bergen
> Thormøhlensgate 55, N-5008 Bergen,
> Norway
>
> ------------------------------------------------------------------------------
> Subversion Kills Productivity. Get off Subversion & Make the Move to Perforce.
> With Perforce, you get hassle-free workflows. Merge that actually works.
> Faster operations. Version large binaries.  Built-in WAN optimization and the
> freedom to use Git, Perforce or both. Make the move to Perforce.
> http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk_______________________________________________
> Gmod-gbrowse mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse


------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
_______________________________________________
Gmod-gbrowse mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
Reply | Threaded
Open this post in threaded view
|

Re: Strand specific coverage wiggle plot

Michael Dondrup-3
Hi Timothy,

thank you very much for your suggestions. I will most likely try them both in the long run, but will start with the bam2wig.pl
approach. The reason for not trying the other USeq approach first  is that I am using a modified GBrowse myself, and I am
a bit concerned of causing problems when trying to merge the two. These concerns are possibly not substantial, I know.

Regarding rewriting the C library and Bio::DB::Sam adapter, I am unsure if there are changes required to the C library or only to
perl glue. If only perl it might be worth a try (it looks like that bam2wig uses the same perl module and seemingly extracts
strand information, but possibly not the coverage?)


I am trying to figure out the correct conversion options of bam2wig.pl installed from CPAN, the conversion worked quickly, but I am a
bit confused by the result. My command was like this:
nohup nice -5 bam2wig.pl bam2wig --pos span --rpm --cpu 90 --strand --in 140121_SND405_A_L003_HBR-10_R1.fastq.gz.clipped.bam &

The result consists of two files (program output):

 This program will convert bam alignments to enumerated wig data
 recording stranded positions spanning alignments
 Pre-counting total number of aligned fragments...
   16,850,687 total mapped fragments
 counted in 0.1 minutes
 Forking into 90 children for parallel conversion
 Converted 1,687 alignments on LSalAtl2s79 in 0.183 minutes
 Converted 1,174 alignments on LSalAtl2s57 in 0.183 minutes
 …..
 Converted 406 alignments on LSalAtl2s42286 in 9.500 minutes
 Converted 632 alignments on LSalAtl2s42283 in 9.500 minutes
 merging separate chromosome files
 Wrote file 140121_SND405_A_L003_HBR-10_R1.fastq.gz.clipped_f.bdg.gz
 Wrote file 140121_SND405_A_L003_HBR-10_R1.fastq.gz.clipped_r.bdg.gz
 Finished in 19.0 min

However the forward file is empty, and the reverse file contains 2238251 lines.  The input file contains stranded single-end alignments (from STAR),
with forward/reverse ratio very close to 50%. Possibly a lot of alignments got filtered, did I set some parameter wrongly?

Best
Michael

 


On Mar 10, 2014, at 8:02 PM, Timothy Parnell wrote:

> Hi Michael,
>
> Unfortunately, with the underlying samtools library there is no mechanism to distinguish alignment strand when generating coverage; it simply counts all alignments.
>
> Your only recourse, short of rewriting the samtools C library and Perl module, is to generate strand-specific coverage on your own and use that. This is something I have struggled with in my own work, so I can suggest two approaches that I have developed.
>
> I have written a utility, bam2wig.pl, which will generate true stranded coverage and corresponding forward and reverse bigWig files. It is reasonably fast, particularly if you enable multi-threading. The description is found here http://code.google.com/p/biotoolbox/wiki/Pod_bam2wig. I just released a new version on CPAN this morning, so you may need to wait for it to get to your mirror.
>
> The second option is to use a very fast Java utility, Sam2USeq, part of the USeq package available at http://useq.sourceforge.net. The useq format can be used as is or further converted to bigWig using utilities within the package.
>
> The next problem is loading into GBrowse. The current GBrowse release has an upload adaptor that accepts a tar or zip file of two or more bigWig files, perfect for stranded coverage. If you choose the USeq approach, I have written a Bio::DB::USeq adaptor (available on CPAN) and modified GBrowse to use it. The nice thing about useq files is that a single file supports stranded data and they are typically smaller than corresponding bigWig files. I haven’t made a pull request to the main repo yet, but you can get my changes at https://github.com/tjparnell/GBrowse/tree/new_upload_options.
>
> A lot of info, but let me know if you have questions. I am sure there are other options, too, but these have worked well for me.
>
> Tim
>
>
> On Mar 9, 2014, at 3:45 AM, Michael Dondrup <[hidden email]> wrote:
>
>> Hi,
>>
>> we are visualizing strand specific RNA-seq data using the Bio::DB::Sam adapter with the
>> wiggle_xyplot glyph. Our configuration stanzas look like the example below. Is it possible
>> to somehow enable strand specific coverage plots e.g. by use of different colors or plotting
>> antisense coverage a negative values? If not is there possibly an alternative way of achieving this?
>> (splitting the bam file into two for each strand would be only my last resort).
>>
>> Cheers
>> Michael
>>
>>
>>
>> [lsalsamHBR10:database]
>> db_adaptor     = Bio::DB::Sam
>> db_args        = -bam  /export/storage/licebase/bam/10.fastq.gz.clipped.bam
>> search options = default
>>
>> [CoverageXyplotHBR10]
>> feature        = coverage
>> glyph          = wiggle_xyplot
>> database       = lsalsamHBR10
>> height         = 50
>> fgcolor        = black
>> bicolor_pivot  = 0
>> pos_color      = blue
>> neg_color      = red
>> key            = Intestine (HBR-10)
>> category       = Reads
>> label          = 0 # Labels on wiggle tracks are redundant.
>> citation     =  non-normalized RNA-seq library,  stranded protocol, Illumina HiSeq, adapter clipped reads (TRIMMOMATIC), aligned using STAR
>>
>>
>>
>>
>>
>>
>> Michael Dondrup
>> Postdoctoral fellow
>> Sea Lice Research Centre/Department of Informatics
>> University of Bergen
>> Thormøhlensgate 55, N-5008 Bergen,
>> Norway
>>
>> ------------------------------------------------------------------------------
>> Subversion Kills Productivity. Get off Subversion & Make the Move to Perforce.
>> With Perforce, you get hassle-free workflows. Merge that actually works.
>> Faster operations. Version large binaries.  Built-in WAN optimization and the
>> freedom to use Git, Perforce or both. Make the move to Perforce.
>> http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk_______________________________________________
>> Gmod-gbrowse mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
>
>
> ------------------------------------------------------------------------------
> Learn Graph Databases - Download FREE O'Reilly Book
> "Graph Databases" is the definitive new guide to graph databases and their
> applications. Written by three acclaimed leaders in the field,
> this first edition is now available. Download your free book today!
> http://p.sf.net/sfu/13534_NeoTech
> _______________________________________________
> Gmod-gbrowse mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse

------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
_______________________________________________
Gmod-gbrowse mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse

signature.asc (465 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Strand specific coverage wiggle plot

Timothy Parnell
Hi,

It is the C library that does not support stranded coverage, and the perl module simply implements that method. I don’t follow the samtools C development, so if someone is doing stranded coverage I do not know about it.

My program simply walks through the alignments in the bam file and records their position or generates coverage when they pass certain criteria (aligned or not, reversed or not) based on standard SAM alignment flags. Nothing fancy.

Regarding my GBrowse fork, my changes are pretty limited to just the data upload stuff.

For any support of my BioToolBox tools, we should take this off the gbrowse mailing list.

On Mar 13, 2014, at 7:53 AM, Michael Dondrup <[hidden email]> wrote:

> Hi Timothy,
>
> thank you very much for your suggestions. I will most likely try them both in the long run, but will start with the bam2wig.pl
> approach. The reason for not trying the other USeq approach first  is that I am using a modified GBrowse myself, and I am
> a bit concerned of causing problems when trying to merge the two. These concerns are possibly not substantial, I know.
>
> Regarding rewriting the C library and Bio::DB::Sam adapter, I am unsure if there are changes required to the C library or only to
> perl glue. If only perl it might be worth a try (it looks like that bam2wig uses the same perl module and seemingly extracts
> strand information, but possibly not the coverage?)
>
>
> I am trying to figure out the correct conversion options of bam2wig.pl installed from CPAN, the conversion worked quickly, but I am a
> bit confused by the result. My command was like this:
> nohup nice -5 bam2wig.pl bam2wig --pos span --rpm --cpu 90 --strand --in 140121_SND405_A_L003_HBR-10_R1.fastq.gz.clipped.bam &
>
> The result consists of two files (program output):
>
> This program will convert bam alignments to enumerated wig data
> recording stranded positions spanning alignments
> Pre-counting total number of aligned fragments...
>   16,850,687 total mapped fragments
> counted in 0.1 minutes
> Forking into 90 children for parallel conversion
> Converted 1,687 alignments on LSalAtl2s79 in 0.183 minutes
> Converted 1,174 alignments on LSalAtl2s57 in 0.183 minutes
> …..
> Converted 406 alignments on LSalAtl2s42286 in 9.500 minutes
> Converted 632 alignments on LSalAtl2s42283 in 9.500 minutes
> merging separate chromosome files
> Wrote file 140121_SND405_A_L003_HBR-10_R1.fastq.gz.clipped_f.bdg.gz
> Wrote file 140121_SND405_A_L003_HBR-10_R1.fastq.gz.clipped_r.bdg.gz
> Finished in 19.0 min
>
> However the forward file is empty, and the reverse file contains 2238251 lines.  The input file contains stranded single-end alignments (from STAR),
> with forward/reverse ratio very close to 50%. Possibly a lot of alignments got filtered, did I set some parameter wrongly?
>
> Best
> Michael
>
>
>
>
> On Mar 10, 2014, at 8:02 PM, Timothy Parnell wrote:
>
>> Hi Michael,
>>
>> Unfortunately, with the underlying samtools library there is no mechanism to distinguish alignment strand when generating coverage; it simply counts all alignments.
>>
>> Your only recourse, short of rewriting the samtools C library and Perl module, is to generate strand-specific coverage on your own and use that. This is something I have struggled with in my own work, so I can suggest two approaches that I have developed.
>>
>> I have written a utility, bam2wig.pl, which will generate true stranded coverage and corresponding forward and reverse bigWig files. It is reasonably fast, particularly if you enable multi-threading. The description is found here http://code.google.com/p/biotoolbox/wiki/Pod_bam2wig. I just released a new version on CPAN this morning, so you may need to wait for it to get to your mirror.
>>
>> The second option is to use a very fast Java utility, Sam2USeq, part of the USeq package available at http://useq.sourceforge.net. The useq format can be used as is or further converted to bigWig using utilities within the package.
>>
>> The next problem is loading into GBrowse. The current GBrowse release has an upload adaptor that accepts a tar or zip file of two or more bigWig files, perfect for stranded coverage. If you choose the USeq approach, I have written a Bio::DB::USeq adaptor (available on CPAN) and modified GBrowse to use it. The nice thing about useq files is that a single file supports stranded data and they are typically smaller than corresponding bigWig files. I haven’t made a pull request to the main repo yet, but you can get my changes at https://github.com/tjparnell/GBrowse/tree/new_upload_options.
>>
>> A lot of info, but let me know if you have questions. I am sure there are other options, too, but these have worked well for me.
>>
>> Tim
>>
>>
>> On Mar 9, 2014, at 3:45 AM, Michael Dondrup <[hidden email]> wrote:
>>
>>> Hi,
>>>
>>> we are visualizing strand specific RNA-seq data using the Bio::DB::Sam adapter with the
>>> wiggle_xyplot glyph. Our configuration stanzas look like the example below. Is it possible
>>> to somehow enable strand specific coverage plots e.g. by use of different colors or plotting
>>> antisense coverage a negative values? If not is there possibly an alternative way of achieving this?
>>> (splitting the bam file into two for each strand would be only my last resort).
>>>
>>> Cheers
>>> Michael
>>>
>>>
>>>
>>> [lsalsamHBR10:database]
>>> db_adaptor     = Bio::DB::Sam
>>> db_args        = -bam  /export/storage/licebase/bam/10.fastq.gz.clipped.bam
>>> search options = default
>>>
>>> [CoverageXyplotHBR10]
>>> feature        = coverage
>>> glyph          = wiggle_xyplot
>>> database       = lsalsamHBR10
>>> height         = 50
>>> fgcolor        = black
>>> bicolor_pivot  = 0
>>> pos_color      = blue
>>> neg_color      = red
>>> key            = Intestine (HBR-10)
>>> category       = Reads
>>> label          = 0 # Labels on wiggle tracks are redundant.
>>> citation     =  non-normalized RNA-seq library,  stranded protocol, Illumina HiSeq, adapter clipped reads (TRIMMOMATIC), aligned using STAR
>>>
>>>
>>>
>>>
>>>
>>>
>>> Michael Dondrup
>>> Postdoctoral fellow
>>> Sea Lice Research Centre/Department of Informatics
>>> University of Bergen
>>> Thormøhlensgate 55, N-5008 Bergen,
>>> Norway
>>>
>>> ------------------------------------------------------------------------------
>>> Subversion Kills Productivity. Get off Subversion & Make the Move to Perforce.
>>> With Perforce, you get hassle-free workflows. Merge that actually works.
>>> Faster operations. Version large binaries.  Built-in WAN optimization and the
>>> freedom to use Git, Perforce or both. Make the move to Perforce.
>>> http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk_______________________________________________
>>> Gmod-gbrowse mailing list
>>> [hidden email]
>>> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
>>
>>
>> ------------------------------------------------------------------------------
>> Learn Graph Databases - Download FREE O'Reilly Book
>> "Graph Databases" is the definitive new guide to graph databases and their
>> applications. Written by three acclaimed leaders in the field,
>> this first edition is now available. Download your free book today!
>> http://p.sf.net/sfu/13534_NeoTech
>> _______________________________________________
>> Gmod-gbrowse mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
>


------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
_______________________________________________
Gmod-gbrowse mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse