"Got a sequence without letters" error

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

"Got a sequence without letters" error

albert500
Hi everyone,

I'm using GBrowse 2.54 with PostgreSQL backend on a server running Ubuntu 14.04 LTS. I have imported GFF and FASTA data into a database using bp_seqfeature_load, and then connected to GBrowse with Bio::DB::SeqFeature::Store adaptor.

The features (mRNA, CDS, exons etc.) are displayed correctly. However when I try to show the DNA sequences, the track remains empty, and I find several messages in the apache error log similar as:

MSG: Got a sequence without letters. Could not guess alphabet, referer: http://127.0.0.1/cgi-bin/gbrowse/gbrowse/testdb/?name=scaffold_1%3A2016111..2016123

I’m pretty sure the FASTA files have been loaded into the database (I can see the "sequence text" values in the “sequence” table and they are not empty). The FASTA files look like:

>hxAUG26up1s1g18t1 loc=scaffold_1:173236-177401:+;type=CDS.dpx26mx19;nx=10;len=3585
ATGGAAGAACCCAAGGAAAGTCCCGAGAGTGTAATTGCATCCGTTGTGAA
TGAAAATGAGACCCCGCGAGTCTTGCCCAACTTTCAAATCAATCGTGATA
...

and  the GFF file includes entries like:

scaffold_1      dpx26mx19       mRNA    173130  190600  816,1521/1899,3.612,3502,14,3585,0      +       .       ID=hxAUG26up1s1g18t1;(and some more attributes here)
...

It seems to me that the GBrowse should be able to link them together, and show the sequences correctly. Could someone tell me where the problem is? 

Many thanks!

Albert



------------------------------------------------------------------------------

_______________________________________________
Gmod-gbrowse mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
Reply | Threaded
Open this post in threaded view
|

Re: "Got a sequence without letters" error

Scott Cain
Hi Albert,

The problem is with your GFF: the ID attribute is not for assigning names (identifiers) for use outside of the GFF file--it is only used for identifying features inside the GFF file to show what features are related to what other features (like via the Parent attribute).  You need the Name attribute of the GFF feature to match the first string after the ">" in the fasta file.  I'm guessing then in your fasta file, you'd want to call it "scaffold_1" (though I don't know for sure, because I don't know what the Name attribute of your example GFF feature is).

Scott


On Fri, Aug 21, 2015 at 9:30 AM, Zhou Albert <[hidden email]> wrote:
Hi everyone,

I'm using GBrowse 2.54 with PostgreSQL backend on a server running Ubuntu 14.04 LTS. I have imported GFF and FASTA data into a database using bp_seqfeature_load, and then connected to GBrowse with Bio::DB::SeqFeature::Store adaptor.

The features (mRNA, CDS, exons etc.) are displayed correctly. However when I try to show the DNA sequences, the track remains empty, and I find several messages in the apache error log similar as:

MSG: Got a sequence without letters. Could not guess alphabet, referer: http://127.0.0.1/cgi-bin/gbrowse/gbrowse/testdb/?name=scaffold_1%3A2016111..2016123

I’m pretty sure the FASTA files have been loaded into the database (I can see the "sequence text" values in the “sequence” table and they are not empty). The FASTA files look like:

>hxAUG26up1s1g18t1 loc=scaffold_1:173236-177401:+;type=CDS.dpx26mx19;nx=10;len=3585
ATGGAAGAACCCAAGGAAAGTCCCGAGAGTGTAATTGCATCCGTTGTGAA
TGAAAATGAGACCCCGCGAGTCTTGCCCAACTTTCAAATCAATCGTGATA
...

and  the GFF file includes entries like:

scaffold_1      dpx26mx19       mRNA    173130  190600  816,1521/1899,3.612,3502,14,3585,0      +       .       ID=hxAUG26up1s1g18t1;(and some more attributes here)
...

It seems to me that the GBrowse should be able to link them together, and show the sequences correctly. Could someone tell me where the problem is? 

Many thanks!

Albert



------------------------------------------------------------------------------

_______________________________________________
Gmod-gbrowse mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse




--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research

------------------------------------------------------------------------------

_______________________________________________
Gmod-gbrowse mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
Reply | Threaded
Open this post in threaded view
|

Re: "Got a sequence without letters" error

albert500
Hi Scott,

Thanks for the advice. I have been working on it in the last few days, and managed to convert all the Name attributes to match the FASTA tags. For instance, now I have:

scaffold_1      dpx26mx19       mRNA    1359700 1364615 106,1149/2469,2.111,2395,0,4050,0       +       .       ID=hxNCBI_GNO_546014;JGI=JGI_V11_220021;GNO=NCBI_GNO_546014;Name=hxNCBI_GNO_546014

in my GFF file, and 

>hxNCBI_GNO_546014 loc=scaffold_1:1359700-1364615:+;type=CDS.dpx26mx19;pro=157/535,pediculus_PHUM370280-PA;nx=13;len=4050
ATGGCTTCAAAAGAAACCGATCAACTAATAGAAGATGAACTTCAGGCTTT
GCATCAATCTATTGAACAATTGAACTCAGGAAATTCAGAAGTAAGCTTTC

in the FASTA file. However the problem remains (exactly the same as the original one).

Any idea what the problem is?

Thanks!
Albert



在 2015年8月21日,下午3:20,Scott Cain <[hidden email]> 写道:

Hi Albert,

The problem is with your GFF: the ID attribute is not for assigning names (identifiers) for use outside of the GFF file--it is only used for identifying features inside the GFF file to show what features are related to what other features (like via the Parent attribute).  You need the Name attribute of the GFF feature to match the first string after the ">" in the fasta file.  I'm guessing then in your fasta file, you'd want to call it "scaffold_1" (though I don't know for sure, because I don't know what the Name attribute of your example GFF feature is).

Scott


On Fri, Aug 21, 2015 at 9:30 AM, Zhou Albert <[hidden email]> wrote:
Hi everyone,

I'm using GBrowse 2.54 with PostgreSQL backend on a server running Ubuntu 14.04 LTS. I have imported GFF and FASTA data into a database using bp_seqfeature_load, and then connected to GBrowse with Bio::DB::SeqFeature::Store adaptor.

The features (mRNA, CDS, exons etc.) are displayed correctly. However when I try to show the DNA sequences, the track remains empty, and I find several messages in the apache error log similar as:

MSG: Got a sequence without letters. Could not guess alphabet, referer: http://127.0.0.1/cgi-bin/gbrowse/gbrowse/testdb/?name=scaffold_1%3A2016111..2016123

I’m pretty sure the FASTA files have been loaded into the database (I can see the "sequence text" values in the “sequence” table and they are not empty). The FASTA files look like:

>hxAUG26up1s1g18t1 loc=scaffold_1:173236-177401:+;type=CDS.dpx26mx19;nx=10;len=3585
ATGGAAGAACCCAAGGAAAGTCCCGAGAGTGTAATTGCATCCGTTGTGAA
TGAAAATGAGACCCCGCGAGTCTTGCCCAACTTTCAAATCAATCGTGATA
...

and  the GFF file includes entries like:

scaffold_1      dpx26mx19       mRNA    173130  190600  816,1521/1899,3.612,3502,14,3585,0      +       .       ID=hxAUG26up1s1g18t1;(and some more attributes here)
...

It seems to me that the GBrowse should be able to link them together, and show the sequences correctly. Could someone tell me where the problem is? 

Many thanks!

Albert



------------------------------------------------------------------------------

_______________________________________________
Gmod-gbrowse mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse




--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research


------------------------------------------------------------------------------

_______________________________________________
Gmod-gbrowse mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
Reply | Threaded
Open this post in threaded view
|

Re: "Got a sequence without letters" error

Scott Cain
Hi Albert,

You've exchanged one problem with your GFF with another: the name of the feature in the first column has to be the same as the name of the feature in the ninth column with it's a reference feature like a chromosome or scaffold, and now that I'm looking more closely at what you're trying to do, I see that it won't work: GBrowse only knows how to work with FASTA sequence that is for the reference sequence, and you are trying to provide a sequence that is only for part of the reference.  What you really need is the FASTA for scaffold_1.

Scott


On Wed, Aug 26, 2015 at 4:44 AM, Zhou Albert <[hidden email]> wrote:
Hi Scott,

Thanks for the advice. I have been working on it in the last few days, and managed to convert all the Name attributes to match the FASTA tags. For instance, now I have:

scaffold_1      dpx26mx19       mRNA    1359700 1364615 106,1149/2469,2.111,2395,0,4050,0       +       .       ID=hxNCBI_GNO_546014;JGI=JGI_V11_220021;GNO=NCBI_GNO_546014;Name=hxNCBI_GNO_546014

in my GFF file, and 

>hxNCBI_GNO_546014 loc=scaffold_1:1359700-1364615:+;type=CDS.dpx26mx19;pro=157/535,pediculus_PHUM370280-PA;nx=13;len=4050
ATGGCTTCAAAAGAAACCGATCAACTAATAGAAGATGAACTTCAGGCTTT
GCATCAATCTATTGAACAATTGAACTCAGGAAATTCAGAAGTAAGCTTTC

in the FASTA file. However the problem remains (exactly the same as the original one).

Any idea what the problem is?

Thanks!
Albert



在 2015年8月21日,下午3:20,Scott Cain <[hidden email]> 写道:

Hi Albert,

The problem is with your GFF: the ID attribute is not for assigning names (identifiers) for use outside of the GFF file--it is only used for identifying features inside the GFF file to show what features are related to what other features (like via the Parent attribute).  You need the Name attribute of the GFF feature to match the first string after the ">" in the fasta file.  I'm guessing then in your fasta file, you'd want to call it "scaffold_1" (though I don't know for sure, because I don't know what the Name attribute of your example GFF feature is).

Scott


On Fri, Aug 21, 2015 at 9:30 AM, Zhou Albert <[hidden email]> wrote:
Hi everyone,

I'm using GBrowse 2.54 with PostgreSQL backend on a server running Ubuntu 14.04 LTS. I have imported GFF and FASTA data into a database using bp_seqfeature_load, and then connected to GBrowse with Bio::DB::SeqFeature::Store adaptor.

The features (mRNA, CDS, exons etc.) are displayed correctly. However when I try to show the DNA sequences, the track remains empty, and I find several messages in the apache error log similar as:

MSG: Got a sequence without letters. Could not guess alphabet, referer: http://127.0.0.1/cgi-bin/gbrowse/gbrowse/testdb/?name=scaffold_1%3A2016111..2016123

I’m pretty sure the FASTA files have been loaded into the database (I can see the "sequence text" values in the “sequence” table and they are not empty). The FASTA files look like:

>hxAUG26up1s1g18t1 loc=scaffold_1:173236-177401:+;type=CDS.dpx26mx19;nx=10;len=3585
ATGGAAGAACCCAAGGAAAGTCCCGAGAGTGTAATTGCATCCGTTGTGAA
TGAAAATGAGACCCCGCGAGTCTTGCCCAACTTTCAAATCAATCGTGATA
...

and  the GFF file includes entries like:

scaffold_1      dpx26mx19       mRNA    173130  190600  816,1521/1899,3.612,3502,14,3585,0      +       .       ID=hxAUG26up1s1g18t1;(and some more attributes here)
...

It seems to me that the GBrowse should be able to link them together, and show the sequences correctly. Could someone tell me where the problem is? 

Many thanks!

Albert



------------------------------------------------------------------------------

_______________________________________________
Gmod-gbrowse mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse




--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     <a href="tel:216-392-3087" value="+12163923087" target="_blank">216-392-3087
Ontario Institute for Cancer Research




--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research

------------------------------------------------------------------------------

_______________________________________________
Gmod-gbrowse mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
Reply | Threaded
Open this post in threaded view
|

Re: "Got a sequence without letters" error

albert500
Hi Scott,

I have modified the FASTA files, and everything work fine now.

Many thanks!
Albert


在 2015年8月26日,下午4:32,Scott Cain <[hidden email]> 写道:

Hi Albert,

You've exchanged one problem with your GFF with another: the name of the feature in the first column has to be the same as the name of the feature in the ninth column with it's a reference feature like a chromosome or scaffold, and now that I'm looking more closely at what you're trying to do, I see that it won't work: GBrowse only knows how to work with FASTA sequence that is for the reference sequence, and you are trying to provide a sequence that is only for part of the reference.  What you really need is the FASTA for scaffold_1.

Scott


On Wed, Aug 26, 2015 at 4:44 AM, Zhou Albert <[hidden email]> wrote:
Hi Scott,

Thanks for the advice. I have been working on it in the last few days, and managed to convert all the Name attributes to match the FASTA tags. For instance, now I have:

scaffold_1      dpx26mx19       mRNA    1359700 1364615 106,1149/2469,2.111,2395,0,4050,0       +       .       ID=hxNCBI_GNO_546014;JGI=JGI_V11_220021;GNO=NCBI_GNO_546014;Name=hxNCBI_GNO_546014

in my GFF file, and 

>hxNCBI_GNO_546014 loc=scaffold_1:1359700-1364615:+;type=CDS.dpx26mx19;pro=157/535,pediculus_PHUM370280-PA;nx=13;len=4050
ATGGCTTCAAAAGAAACCGATCAACTAATAGAAGATGAACTTCAGGCTTT
GCATCAATCTATTGAACAATTGAACTCAGGAAATTCAGAAGTAAGCTTTC

in the FASTA file. However the problem remains (exactly the same as the original one).

Any idea what the problem is?

Thanks!
Albert



在 2015年8月21日,下午3:20,Scott Cain <[hidden email]> 写道:

Hi Albert,

The problem is with your GFF: the ID attribute is not for assigning names (identifiers) for use outside of the GFF file--it is only used for identifying features inside the GFF file to show what features are related to what other features (like via the Parent attribute).  You need the Name attribute of the GFF feature to match the first string after the ">" in the fasta file.  I'm guessing then in your fasta file, you'd want to call it "scaffold_1" (though I don't know for sure, because I don't know what the Name attribute of your example GFF feature is).

Scott


On Fri, Aug 21, 2015 at 9:30 AM, Zhou Albert <[hidden email]> wrote:
Hi everyone,

I'm using GBrowse 2.54 with PostgreSQL backend on a server running Ubuntu 14.04 LTS. I have imported GFF and FASTA data into a database using bp_seqfeature_load, and then connected to GBrowse with Bio::DB::SeqFeature::Store adaptor.

The features (mRNA, CDS, exons etc.) are displayed correctly. However when I try to show the DNA sequences, the track remains empty, and I find several messages in the apache error log similar as:

MSG: Got a sequence without letters. Could not guess alphabet, referer: http://127.0.0.1/cgi-bin/gbrowse/gbrowse/testdb/?name=scaffold_1%3A2016111..2016123

I’m pretty sure the FASTA files have been loaded into the database (I can see the "sequence text" values in the “sequence” table and they are not empty). The FASTA files look like:

>hxAUG26up1s1g18t1 loc=scaffold_1:173236-177401:+;type=CDS.dpx26mx19;nx=10;len=3585
ATGGAAGAACCCAAGGAAAGTCCCGAGAGTGTAATTGCATCCGTTGTGAA
TGAAAATGAGACCCCGCGAGTCTTGCCCAACTTTCAAATCAATCGTGATA
...

and  the GFF file includes entries like:

scaffold_1      dpx26mx19       mRNA    173130  190600  816,1521/1899,3.612,3502,14,3585,0      +       .       ID=hxAUG26up1s1g18t1;(and some more attributes here)
...

It seems to me that the GBrowse should be able to link them together, and show the sequences correctly. Could someone tell me where the problem is? 

Many thanks!

Albert



------------------------------------------------------------------------------

_______________________________________________
Gmod-gbrowse mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse




--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     <a href="tel:216-392-3087" value="+12163923087" target="_blank" class="">216-392-3087
Ontario Institute for Cancer Research




--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research


------------------------------------------------------------------------------

_______________________________________________
Gmod-gbrowse mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse