Fwd: GFF3/Bio::DB::SeqFeature::Store headache....

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Fwd: GFF3/Bio::DB::SeqFeature::Store headache....

girlwithglasses


---------- Forwarded message ----------
From: James Abbott <[hidden email]>
Date: Tue, Dec 11, 2012 at 3:00 AM
Subject: GFF3/Bio::DB::SeqFeature::Store headache....
To: [hidden email]


Hello,

I have what is probably an 'inability to see wood from trees' problem creating a Bio::DB::SeqFeature::Store driven gbrowse instance from a gff3 file. This is something I've done plenty of times before but for some reason I just get 'landmark not found' errors when trying to access any contigs via gbrowse (1.70) and can't for the  life of me see what's wrong. I'm aware this normally results from a gff format problem, so I've validated the gff (which was fine).

The top of my gff looks like:

##gff-version 3
contig_000001   BluGen  contig  1       1312    .       .       . ID=contig_000001;Name=contig_000001
contig_000002   BluGen  contig  1       1067    .       .       . ID=contig_000002;Name=contig_000002
contig_000003   BluGen  contig  1       2044    .       .       . ID=contig_000003;Name=contig_000003
contig_000004   BluGen  contig  1       15746   .       .       . ID=contig_000004;Name=contig_000004

So ID and Name match for contigs...I've read that case sensitivity is an issue so I've tried 'name' and 'Name' but with no difference.

The genes/mRNA/CDS features in the gff look like:

contig_000586   BluGen  gene    12441   13894   .       -       . ID=bgh00001;Name="Aquaporin 1";Ontology_term="GO:0006810","GO:0016020","GO:0005215"
contig_000586   BluGen  mRNA    12441   12791   .       -       . ID=bgh00001m1;Parent=bgh00001;Name=bgh00001m1
contig_000586   BluGen  mRNA    12845   13063   .       -       . ID=bgh00001m2;Parent=bgh00001;Name=bgh00001m2
contig_000586   BluGen  mRNA    13113   13413   .       -       . ID=bgh00001m3;Parent=bgh00001;Name=bgh00001m3
contig_000586   BluGen  mRNA    13460   13894   .       -       . ID=bgh00001m4;Parent=bgh00001;Name=bgh00001m4
contig_000586   BluGen  CDS     12745   12791   .       -       0 Parent=bgh00001m1
contig_000586   BluGen  CDS     12845   13063   .       -       0 Parent=bgh00001m2
contig_000586   BluGen  CDS     13113   13413   .       -       0 Parent=bgh00001m3
contig_000586   BluGen  CDS     13460   13768   .       -       0 Parent=bgh00001m4

Once the data is loaded, the following test script

>  my @ids = $db->seq_ids();
>     foreach my $id (@ids) {
>        print $id, "\n";
>     }
>     my @types = $db->types();
>     print "\nTypes: \n";
>     foreach my $type (@types) {
>         print $type, "\n";
>     }
>
>     my $contig = $db->fetch_sequence('contig_000001',1=>1000);
>     print "\ncontig = $contig\n";

Produces the following:
==================================================================
<snip>
contig_015108
contig_015109
contig_015110
contig_015111

Types:
CDS:BluGen
contig:BluGen
gene:BluGen
mRNA:BluGen
tRNA:BluGen

contig = TTGTATCAAGCAACTAAGTTTCACTTGGCCACATTACATGGGAGCTAGGAAGGAATGTGAGACGGCGAAGTAGAATTGCTAAGTGAGAGAGTCAGCTAGATGGCAAAAATGACGACTGGCAGTGGCGGAGCAATATGTCATTCTCACCAACAGAGTACGTACTGGATGAGCTAGGAGGATGTACAAATAGTCATTACCCGTAGTTGTGGTACTTCTCTCTTCATATAGTTTAATCTTTCTAAAAGTACACTACAACCAGCTTGGTTTTGTCACAGAATGAAACAGCGCTTATCAACAGCTTTCCACCCACAAACAGACGGTGCCACCGAAAGAATGAATGAGGAAATCTTAGCGTACCTACGAGCATTTATTACTTATACACAGTTTGACTGGAAAGATTTGCTTCTGTGCGCAATGCTGGATTTAAATAATAGAACATCAGCAGCGTTAGGAATGAGCCCATTTTTTGCTGAACATGGTTATCATGCAGAGCCAATTCAACAAATTGAATATAGCAGCACCCCATTAAGGCCAGAGAAGAATGCGCAAAAGTTTGTTGAAAGACTAAGAGAAGCAGAAGTGTTACGACGTGCGCTAGGGGTACTGGAGATCGCCGTCTGCCGAAGGTAATGTATTAATAACTGTTCAGATCATAGTTGCTAACGAAGGGTACTCAGATATAATCCAACTGGCGAGGTGCCAGGTGGTCAAAGGTCGGAGAACTACAGGATAGTCGAGACACGAAGTCAAGAAGTCGAGTTGCCGAGCCAGCGATTAAGGCTGATAGATCAAGGTAAACGCAAGTACAAATAATGTAAATACTAAATGAAAGACTGAGGATATTGGACTGTGAGGATCTAAAATTATGATATTATATAAGTGCTAGATTTCTGTCACAGGCTCATACCTGGTTATCATTAGGCTGCTGCCTAACAAATAAGGCTTGAACTCTTCCAACCATGTTTAACACCATACATGTCATGAACCCCTCCACAACTC
A
==================================================================

So the contig ids are known we have the expected range of types and sequences can be retrieved from the database to match the contig ids.

Reference class is set to 'contig' in the conf file, while 'automatic classes' is set to 'contig gene', however searching for 'contig_000001' produces the 'landmark not found' error (as does searching for contig:contig_000001 or contig:BluGen:contig_000001). Likewise searching for genes by their ID/Name fails.

I've tried the obvious thinkgs like dropping the database to make sure gbrowse is pointing at the right thing, but nothing jumps out at me.

Are there any indications there what I might be doing wrong?

Many thanks,
James
--
Dr. James Abbott
Lead Bioinformatician
Bioinformatics Support Service
Imperial College, London



--
Amelia Ireland
GMOD Community Support || http://gmod.org

------------------------------------------------------------------------------
LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
Remotely access PCs and mobile devices and provide instant support
Improve your efficiency, and focus on delivering more value-add services
Discover what IT Professionals Know. Rescue delivers
http://p.sf.net/sfu/logmein_12329d2d
_______________________________________________
Gmod-gbrowse mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: GFF3/Bio::DB::SeqFeature::Store headache....

Fields, Christopher J
Do you need sequence-region directives? For all ref sequences in column 1:

##sequence-region contig_000001 1 1312
##sequence-region contig_000002 1 1067


These seem to be missing.  From the GFF3 specification:

"Lines beginning with ## are pragmas that provide meta-information
about the document.  Blank lines and lines beginning with a single #
are ignored.

Line 0 gives the GFF version using the ##gff-version pragma. Line 1
indicates the boundaries of the region being annotated (a 1,497,228 bp
region named "ctg123") using the ##sequence-region pragma."

chris

On Dec 11, 2012, at 10:03 AM, Amelia Ireland <[hidden email]>
 wrote:

>
>
> ---------- Forwarded message ----------
> From: James Abbott <[hidden email]>
> Date: Tue, Dec 11, 2012 at 3:00 AM
> Subject: GFF3/Bio::DB::SeqFeature::Store headache....
> To: [hidden email]
>
>
> Hello,
>
> I have what is probably an 'inability to see wood from trees' problem creating a Bio::DB::SeqFeature::Store driven gbrowse instance from a gff3 file. This is something I've done plenty of times before but for some reason I just get 'landmark not found' errors when trying to access any contigs via gbrowse (1.70) and can't for the  life of me see what's wrong. I'm aware this normally results from a gff format problem, so I've validated the gff (which was fine).
>
> The top of my gff looks like:
>
> ##gff-version 3
> contig_000001   BluGen  contig  1       1312    .       .       . ID=contig_000001;Name=contig_000001
> contig_000002   BluGen  contig  1       1067    .       .       . ID=contig_000002;Name=contig_000002
> contig_000003   BluGen  contig  1       2044    .       .       . ID=contig_000003;Name=contig_000003
> contig_000004   BluGen  contig  1       15746   .       .       . ID=contig_000004;Name=contig_000004
>
> So ID and Name match for contigs...I've read that case sensitivity is an issue so I've tried 'name' and 'Name' but with no difference.
>
> The genes/mRNA/CDS features in the gff look like:
>
> contig_000586   BluGen  gene    12441   13894   .       -       . ID=bgh00001;Name="Aquaporin 1";Ontology_term="GO:0006810","GO:0016020","GO:0005215"
> contig_000586   BluGen  mRNA    12441   12791   .       -       . ID=bgh00001m1;Parent=bgh00001;Name=bgh00001m1
> contig_000586   BluGen  mRNA    12845   13063   .       -       . ID=bgh00001m2;Parent=bgh00001;Name=bgh00001m2
> contig_000586   BluGen  mRNA    13113   13413   .       -       . ID=bgh00001m3;Parent=bgh00001;Name=bgh00001m3
> contig_000586   BluGen  mRNA    13460   13894   .       -       . ID=bgh00001m4;Parent=bgh00001;Name=bgh00001m4
> contig_000586   BluGen  CDS     12745   12791   .       -       0 Parent=bgh00001m1
> contig_000586   BluGen  CDS     12845   13063   .       -       0 Parent=bgh00001m2
> contig_000586   BluGen  CDS     13113   13413   .       -       0 Parent=bgh00001m3
> contig_000586   BluGen  CDS     13460   13768   .       -       0 Parent=bgh00001m4
>
> Once the data is loaded, the following test script
>
> >  my @ids = $db->seq_ids();
> >     foreach my $id (@ids) {
> >        print $id, "\n";
> >     }
> >     my @types = $db->types();
> >     print "\nTypes: \n";
> >     foreach my $type (@types) {
> >         print $type, "\n";
> >     }
> >
> >     my $contig = $db->fetch_sequence('contig_000001',1=>1000);
> >     print "\ncontig = $contig\n";
>
> Produces the following:
> ==================================================================
> <snip>
> contig_015108
> contig_015109
> contig_015110
> contig_015111
>
> Types:
> CDS:BluGen
> contig:BluGen
> gene:BluGen
> mRNA:BluGen
> tRNA:BluGen
>
> contig = TTGTATCAAGCAACTAAGTTTCACTTGGCCACATTACATGGGAGCTAGGAAGGAATGTGAGACGGCGAAGTAGAATTGCTAAGTGAGAGAGTCAGCTAGATGGCAAAAATGACGACTGGCAGTGGCGGAGCAATATGTCATTCTCACCAACAGAGTACGTACTGGATGAGCTAGGAGGATGTACAAATAGTCATTACCCGTAGTTGTGGTACTTCTCTCTTCATATAGTTTAATCTTTCTAAAAGTACACTACAACCAGCTTGGTTTTGTCACAGAATGAAACAGCGCTTATCAACAGCTTTCCACCCACAAACAGACGGTGCCACCGAAAGAATGAATGAGGAAATCTTAGCGTACCTACGAGCATTTATTACTTATACACAGTTTGACTGGAAAGATTTGCTTCTGTGCGCAATGCTGGATTTAAATAATAGAACATCAGCAGCGTTAGGAATGAGCCCATTTTTTGCTGAACATGGTTATCATGCAGAGCCAATTCAACAAATTGAATATAGCAGCACCCCATTAAGGCCAGAGAAGAATGCGCAAAAGTTTGTTGAAAGACTAAGAGAAGCAGAAGTGTTACGACGTGCGCTAGGGGTACTGGAGATCGCCGTCTGCCGAAGGTAATGTATTAATAACTGTTCAGATCATAGTTGCTAACGAAGGGTACTCAGATATAATCCAACTGGCGAGGTGCCAGGTGGTCAAAGGTCGGAGAACTACAGGATAGTCGAGACACGAAGTCAAGAAGTCGAGTTGCCGAGCCAGCGATTAAGGCTGATAGATCAAGGTAAACGCAAGTACAAATAATGTAAATACTAAATGAAAGACTGAGGATATTGGACTGTGAGGATCTAAAATTATGATATTATATAAGTGCTAGATTTCTGTCACAGGCTCATACCTGGTTATCATTAGGCTGCTGCCTAACAAATAAGGCTTGAACTCTTCCAACCATGTTTAACACCATACATGTCATGAACCCCTCCACAACTC
> A
> ==================================================================
>
> So the contig ids are known we have the expected range of types and sequences can be retrieved from the database to match the contig ids.
>
> Reference class is set to 'contig' in the conf file, while 'automatic classes' is set to 'contig gene', however searching for 'contig_000001' produces the 'landmark not found' error (as does searching for contig:contig_000001 or contig:BluGen:contig_000001). Likewise searching for genes by their ID/Name fails.
>
> I've tried the obvious thinkgs like dropping the database to make sure gbrowse is pointing at the right thing, but nothing jumps out at me.
>
> Are there any indications there what I might be doing wrong?
>
> Many thanks,
> James
> --
> Dr. James Abbott
> Lead Bioinformatician
> Bioinformatics Support Service
> Imperial College, London
>
>
>
> --
> Amelia Ireland
> GMOD Community Support || http://gmod.org
> ------------------------------------------------------------------------------
> LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
> Remotely access PCs and mobile devices and provide instant support
> Improve your efficiency, and focus on delivering more value-add services
> Discover what IT Professionals Know. Rescue delivers
> http://p.sf.net/sfu/logmein_12329d2d_______________________________________________
> Gmod-gbrowse mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse


------------------------------------------------------------------------------
LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
Remotely access PCs and mobile devices and provide instant support
Improve your efficiency, and focus on delivering more value-add services
Discover what IT Professionals Know. Rescue delivers
http://p.sf.net/sfu/logmein_12329d2d
_______________________________________________
Gmod-gbrowse mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse