Re: Displaying problem of tophat alignment in Gbrowse

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: Displaying problem of tophat alignment in Gbrowse

Scott Cain
Hi Jairui and Jack,

It's best to ask a question like this on the mailing list, since there are lots of people who work with this everyday and can usually answer faster than I can.  I've cc'ed the list here.

My initial guess is a configuration problem: for example, not getting bases would lead me to believe it's not looking in the right place for the fasta file.  Can you send the section of your configuration file dealing with setting up this bam file as a database and the track stanza, and while you're at it, verify that the path specified in the fasta section points at the appropriate fasta file for your reference sequence.

Scott



On Thu, Apr 3, 2014 at 6:23 PM, Jiarui Li <[hidden email]> wrote:
Hi Scott,

I am Jiarui Li, a post-doc of Dr.Jack Chen. I have a problem about displaying tophat alignment on Gbrowse.

I used TopHat to align RNA-seq reads to reference and got a bam file named "accepted_hits.bam", in which there are reads supporting introns. Those reads are splitted and aligned to reference, so in the bam file they have CIGAR string like "30M40N70M" and MD:Z as "MD:Z:100", which should show 30bps perfect matches, then skip 40 bps in reference sequences, and finally 70 bps perfect matches. However, on Gbrowse, I saw a problem in those 70bps part:

First, there are no bases displaying;
Second, there are lots of red color marked bases within those 70bps, indicating mismatches
Third, when I click on that read, I saw the 40bps part is missing in reference sequence.

Could you help me with that please?
Many thanks!

Cheers,
Jiarui



--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research

------------------------------------------------------------------------------

_______________________________________________
Gmod-gbrowse mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
Reply | Threaded
Open this post in threaded view
|

Re: Displaying problem of tophat alignment in Gbrowse

Sheldon McKay
Also check that you loaded the reference sequence into your database, if you are using a database.

Sheldon



On Fri, Apr 4, 2014 at 12:11 PM, Scott Cain <[hidden email]> wrote:
Hi Jairui and Jack,

It's best to ask a question like this on the mailing list, since there are lots of people who work with this everyday and can usually answer faster than I can.  I've cc'ed the list here.

My initial guess is a configuration problem: for example, not getting bases would lead me to believe it's not looking in the right place for the fasta file.  Can you send the section of your configuration file dealing with setting up this bam file as a database and the track stanza, and while you're at it, verify that the path specified in the fasta section points at the appropriate fasta file for your reference sequence.

Scott



On Thu, Apr 3, 2014 at 6:23 PM, Jiarui Li <[hidden email]> wrote:
Hi Scott,

I am Jiarui Li, a post-doc of Dr.Jack Chen. I have a problem about displaying tophat alignment on Gbrowse.

I used TopHat to align RNA-seq reads to reference and got a bam file named "accepted_hits.bam", in which there are reads supporting introns. Those reads are splitted and aligned to reference, so in the bam file they have CIGAR string like "30M40N70M" and MD:Z as "MD:Z:100", which should show 30bps perfect matches, then skip 40 bps in reference sequences, and finally 70 bps perfect matches. However, on Gbrowse, I saw a problem in those 70bps part:

First, there are no bases displaying;
Second, there are lots of red color marked bases within those 70bps, indicating mismatches
Third, when I click on that read, I saw the 40bps part is missing in reference sequence.

Could you help me with that please?
Many thanks!

Cheers,
Jiarui



--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     <a href="tel:216-392-3087" value="+12163923087" target="_blank">216-392-3087
Ontario Institute for Cancer Research

------------------------------------------------------------------------------

_______________________________________________
Gmod-gbrowse mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse



------------------------------------------------------------------------------

_______________________________________________
Gmod-gbrowse mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
Reply | Threaded
Open this post in threaded view
|

Re: Displaying problem of tophat alignment in Gbrowse

Jiarui Li
In reply to this post by Scott Cain
Hi Scott,

Thanks for your help. I attached the section of the config file and the test sam file. Also to make my question clear, I send you a slide with screenshot of what I saw in Gbrowse.

I checked the reference fasta file I uploaded to mysql database. Here are the starting part:

>BxCHNm3_4
GACATCTCTAATTTACTTTTTTCATCATTGAGTATAATGTGTGTAGCACTTATGCTACTCTTATCCTACTAATGTAAGAT
GGGTTTTTAGTATTCTTGAGACATGTCAAATAAAAAAGCATTCGTTATCTGCGAAGAATTATCGGCTCACTGTTTGTGGG
CATACTTGGGATATTCAAGGTCTTTGGTAGTTTCGTAAGTAAACATGTTTTATAACTTTGTGAACTTTGACAAACGGCAA
GGCAAACATACGACGCGCAACACTGTGAATAAATAGGAATTCTTTGACTGTGAAAGGTTGACATTATTACCATATACAAT
ACGACAAATTTTGAGAATACAATAATCAGAAAAAATGGAAAGAAAGTGTGGTAAATTCGGTACAAAGCATAATTTCCTGT
ATTTACAATGTAATGTATATCCAAATTAAGTGTAATTCTGGTAAGAGTATTTACAATTTTACTTTATAAACAAGTCAAAT
TAGGTTTGGTTAGTAATAACTGATCCGATGTGGTCCATTTGGAATGTTTTTCTTGTCGACAGAGTTCTTCAATGTGTTCG
CAAACACTTGAATTTTCCTTGAGAGATCGTTGTCTTCAAATTCGTTGAGGTTTAAAGCAAAATCGTTGTTAATCCACTGG
ATTTTCAGACCCGATTTGGAGTAAAGAATCTCGATTTGCTTGGAGTAAAGTCCGTCTTCGTGTTGAAATCGGAGGAAGTC
TTTCTTGGATTGGATGAGCCAAAATTCAACAGTAAAGTTGATATCGAATTTGGAGTTGAAAGTGAGTTCGGAGTAGTCCT
TGAATTTTCTTGCGTATTTTAAGAGTTGAGTGTAGTCGAAGTCACGGAGGTGGGATGGGGCCTAAAAAATATAGCTGAAG
...

I think the reference fasta file is fine because the GC content and the sequences are fine in Gbrowse.

Cheers,
Jiarui

----- Original Message -----
From: "Scott Cain" <[hidden email]>
To: "Jiarui Li" <[hidden email]>, "Gbrowse (E-mail)" <[hidden email]>
Cc: "Jack Chen" <[hidden email]>, "Jeff Chu" <[hidden email]>
Sent: Friday, 4 April, 2014 09:11:16
Subject: Re: Displaying problem of tophat alignment in Gbrowse


Hi Jairui and Jack,


It's best to ask a question like this on the mailing list, since there are lots of people who work with this everyday and can usually answer faster than I can. I've cc'ed the list here.


My initial guess is a configuration problem: for example, not getting bases would lead me to believe it's not looking in the right place for the fasta file. Can you send the section of your configuration file dealing with setting up this bam file as a database and the track stanza, and while you're at it, verify that the path specified in the fasta section points at the appropriate fasta file for your reference sequence.


Scott





On Thu, Apr 3, 2014 at 6:23 PM, Jiarui Li < [hidden email] > wrote:


Hi Scott,

I am Jiarui Li, a post-doc of Dr.Jack Chen. I have a problem about displaying tophat alignment on Gbrowse.

I used TopHat to align RNA-seq reads to reference and got a bam file named "accepted_hits.bam", in which there are reads supporting introns. Those reads are splitted and aligned to reference, so in the bam file they have CIGAR string like "30M40N70M" and MD:Z as "MD:Z:100", which should show 30bps perfect matches, then skip 40 bps in reference sequences, and finally 70 bps perfect matches. However, on Gbrowse, I saw a problem in those 70bps part:

First, there are no bases displaying;
Second, there are lots of red color marked bases within those 70bps, indicating mismatches
Third, when I click on that read, I saw the 40bps part is missing in reference sequence.

Could you help me with that please?
Many thanks!

Cheers,
Jiarui




--
------------------------------------------------------------------------
Scott Cain, Ph. D. scott at scottcain dot net
GMOD Coordinator ( http://gmod.org/ ) 216-392-3087
Ontario Institute for Cancer Research
------------------------------------------------------------------------------

_______________________________________________
Gmod-gbrowse mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse

Displaying problem of tophat bam file.pptx (383K) Download Attachment
test.config.txt (1K) Download Attachment
test.v1.sam (8K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Displaying problem of tophat alignment in Gbrowse

Scott Cain
Hi Jiarui,

I'd like to try to replicate this--could you send me the fasta file for the reference sequence as well?

Thanks,
Scott



On Fri, Apr 4, 2014 at 2:48 PM, Jiarui Li <[hidden email]> wrote:
Hi Scott,

Thanks for your help. I attached the section of the config file and the test sam file. Also to make my question clear, I send you a slide with screenshot of what I saw in Gbrowse.

I checked the reference fasta file I uploaded to mysql database. Here are the starting part:

>BxCHNm3_4
GACATCTCTAATTTACTTTTTTCATCATTGAGTATAATGTGTGTAGCACTTATGCTACTCTTATCCTACTAATGTAAGAT
GGGTTTTTAGTATTCTTGAGACATGTCAAATAAAAAAGCATTCGTTATCTGCGAAGAATTATCGGCTCACTGTTTGTGGG
CATACTTGGGATATTCAAGGTCTTTGGTAGTTTCGTAAGTAAACATGTTTTATAACTTTGTGAACTTTGACAAACGGCAA
GGCAAACATACGACGCGCAACACTGTGAATAAATAGGAATTCTTTGACTGTGAAAGGTTGACATTATTACCATATACAAT
ACGACAAATTTTGAGAATACAATAATCAGAAAAAATGGAAAGAAAGTGTGGTAAATTCGGTACAAAGCATAATTTCCTGT
ATTTACAATGTAATGTATATCCAAATTAAGTGTAATTCTGGTAAGAGTATTTACAATTTTACTTTATAAACAAGTCAAAT
TAGGTTTGGTTAGTAATAACTGATCCGATGTGGTCCATTTGGAATGTTTTTCTTGTCGACAGAGTTCTTCAATGTGTTCG
CAAACACTTGAATTTTCCTTGAGAGATCGTTGTCTTCAAATTCGTTGAGGTTTAAAGCAAAATCGTTGTTAATCCACTGG
ATTTTCAGACCCGATTTGGAGTAAAGAATCTCGATTTGCTTGGAGTAAAGTCCGTCTTCGTGTTGAAATCGGAGGAAGTC
TTTCTTGGATTGGATGAGCCAAAATTCAACAGTAAAGTTGATATCGAATTTGGAGTTGAAAGTGAGTTCGGAGTAGTCCT
TGAATTTTCTTGCGTATTTTAAGAGTTGAGTGTAGTCGAAGTCACGGAGGTGGGATGGGGCCTAAAAAATATAGCTGAAG
...

I think the reference fasta file is fine because the GC content and the sequences are fine in Gbrowse.

Cheers,
Jiarui

----- Original Message -----
From: "Scott Cain" <[hidden email]>
To: "Jiarui Li" <[hidden email]>, "Gbrowse (E-mail)" <[hidden email]>
Cc: "Jack Chen" <[hidden email]>, "Jeff Chu" <[hidden email]>
Sent: Friday, 4 April, 2014 09:11:16
Subject: Re: Displaying problem of tophat alignment in Gbrowse


Hi Jairui and Jack,


It's best to ask a question like this on the mailing list, since there are lots of people who work with this everyday and can usually answer faster than I can. I've cc'ed the list here.


My initial guess is a configuration problem: for example, not getting bases would lead me to believe it's not looking in the right place for the fasta file. Can you send the section of your configuration file dealing with setting up this bam file as a database and the track stanza, and while you're at it, verify that the path specified in the fasta section points at the appropriate fasta file for your reference sequence.


Scott





On Thu, Apr 3, 2014 at 6:23 PM, Jiarui Li < [hidden email] > wrote:


Hi Scott,

I am Jiarui Li, a post-doc of Dr.Jack Chen. I have a problem about displaying tophat alignment on Gbrowse.

I used TopHat to align RNA-seq reads to reference and got a bam file named "accepted_hits.bam", in which there are reads supporting introns. Those reads are splitted and aligned to reference, so in the bam file they have CIGAR string like "30M40N70M" and MD:Z as "MD:Z:100", which should show 30bps perfect matches, then skip 40 bps in reference sequences, and finally 70 bps perfect matches. However, on Gbrowse, I saw a problem in those 70bps part:

First, there are no bases displaying;
Second, there are lots of red color marked bases within those 70bps, indicating mismatches
Third, when I click on that read, I saw the 40bps part is missing in reference sequence.

Could you help me with that please?
Many thanks!

Cheers,
Jiarui




--
------------------------------------------------------------------------
Scott Cain, Ph. D. scott at scottcain dot net
GMOD Coordinator ( http://gmod.org/ ) <a href="tel:216-392-3087" value="+12163923087">216-392-3087
Ontario Institute for Cancer Research



--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research

------------------------------------------------------------------------------

_______________________________________________
Gmod-gbrowse mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
Reply | Threaded
Open this post in threaded view
|

Re: Displaying problem of tophat alignment in Gbrowse

Jiarui Li
Hi Scott,

This is the fasta file. Thank you very much!

Cheers,
Jiarui

----- Original Message -----
From: "Scott Cain" <[hidden email]>
To: "Jiarui Li" <[hidden email]>
Cc: "Jack Chen" <[hidden email]>, "Jeff Chu" <[hidden email]>, "Gbrowse (E-mail)" <[hidden email]>
Sent: Friday, 4 April, 2014 14:03:00
Subject: Re: Displaying problem of tophat alignment in Gbrowse


Hi Jiarui,


I'd like to try to replicate this--could you send me the fasta file for the reference sequence as well?


Thanks,
Scott





On Fri, Apr 4, 2014 at 2:48 PM, Jiarui Li < [hidden email] > wrote:


Hi Scott,

Thanks for your help. I attached the section of the config file and the test sam file. Also to make my question clear, I send you a slide with screenshot of what I saw in Gbrowse.

I checked the reference fasta file I uploaded to mysql database. Here are the starting part:

>BxCHNm3_4
GACATCTCTAATTTACTTTTTTCATCATTGAGTATAATGTGTGTAGCACTTATGCTACTCTTATCCTACTAATGTAAGAT
GGGTTTTTAGTATTCTTGAGACATGTCAAATAAAAAAGCATTCGTTATCTGCGAAGAATTATCGGCTCACTGTTTGTGGG
CATACTTGGGATATTCAAGGTCTTTGGTAGTTTCGTAAGTAAACATGTTTTATAACTTTGTGAACTTTGACAAACGGCAA
GGCAAACATACGACGCGCAACACTGTGAATAAATAGGAATTCTTTGACTGTGAAAGGTTGACATTATTACCATATACAAT
ACGACAAATTTTGAGAATACAATAATCAGAAAAAATGGAAAGAAAGTGTGGTAAATTCGGTACAAAGCATAATTTCCTGT
ATTTACAATGTAATGTATATCCAAATTAAGTGTAATTCTGGTAAGAGTATTTACAATTTTACTTTATAAACAAGTCAAAT
TAGGTTTGGTTAGTAATAACTGATCCGATGTGGTCCATTTGGAATGTTTTTCTTGTCGACAGAGTTCTTCAATGTGTTCG
CAAACACTTGAATTTTCCTTGAGAGATCGTTGTCTTCAAATTCGTTGAGGTTTAAAGCAAAATCGTTGTTAATCCACTGG
ATTTTCAGACCCGATTTGGAGTAAAGAATCTCGATTTGCTTGGAGTAAAGTCCGTCTTCGTGTTGAAATCGGAGGAAGTC
TTTCTTGGATTGGATGAGCCAAAATTCAACAGTAAAGTTGATATCGAATTTGGAGTTGAAAGTGAGTTCGGAGTAGTCCT
TGAATTTTCTTGCGTATTTTAAGAGTTGAGTGTAGTCGAAGTCACGGAGGTGGGATGGGGCCTAAAAAATATAGCTGAAG
...

I think the reference fasta file is fine because the GC content and the sequences are fine in Gbrowse.

Cheers,
Jiarui



----- Original Message -----
From: "Scott Cain" < [hidden email] >
To: "Jiarui Li" < [hidden email] >, "Gbrowse (E-mail)" < [hidden email] >
Cc: "Jack Chen" < [hidden email] >, "Jeff Chu" < [hidden email] >
Sent: Friday, 4 April, 2014 09:11:16
Subject: Re: Displaying problem of tophat alignment in Gbrowse


Hi Jairui and Jack,


It's best to ask a question like this on the mailing list, since there are lots of people who work with this everyday and can usually answer faster than I can. I've cc'ed the list here.


My initial guess is a configuration problem: for example, not getting bases would lead me to believe it's not looking in the right place for the fasta file. Can you send the section of your configuration file dealing with setting up this bam file as a database and the track stanza, and while you're at it, verify that the path specified in the fasta section points at the appropriate fasta file for your reference sequence.


Scott





On Thu, Apr 3, 2014 at 6:23 PM, Jiarui Li < [hidden email] > wrote:


Hi Scott,

I am Jiarui Li, a post-doc of Dr.Jack Chen. I have a problem about displaying tophat alignment on Gbrowse.

I used TopHat to align RNA-seq reads to reference and got a bam file named "accepted_hits.bam", in which there are reads supporting introns. Those reads are splitted and aligned to reference, so in the bam file they have CIGAR string like "30M40N70M" and MD:Z as "MD:Z:100", which should show 30bps perfect matches, then skip 40 bps in reference sequences, and finally 70 bps perfect matches. However, on Gbrowse, I saw a problem in those 70bps part:

First, there are no bases displaying;
Second, there are lots of red color marked bases within those 70bps, indicating mismatches
Third, when I click on that read, I saw the 40bps part is missing in reference sequence.

Could you help me with that please?
Many thanks!

Cheers,
Jiarui




--
------------------------------------------------------------------------
Scott Cain, Ph. D. scott at scottcain dot net
GMOD Coordinator ( http://gmod.org/ ) 216-392-3087


Ontario Institute for Cancer Research



--
------------------------------------------------------------------------
Scott Cain, Ph. D. scott at scottcain dot net
GMOD Coordinator ( http://gmod.org/ ) 216-392-3087
Ontario Institute for Cancer Research
------------------------------------------------------------------------------

_______________________________________________
Gmod-gbrowse mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse

test.fa (10M) Download Attachment