[Gmod-ajax] genes at the end of circular genomes

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

[Gmod-ajax] genes at the end of circular genomes

David Breimann
Hello,

I have experience some unexpected results with circular genomes. For
example, here is how you can regenrate the problem:
take NC_005707 genbank
(ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Bacillus_cereus_ATCC_10987/NC_005707.gbk)
and fasta (ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Bacillus_cereus_ATCC_10987/NC_005707.fna).
Make sure you pulled the latest bioperl-live and run
bp_genbank2gff3.pl to get a gff3 file from the genbank. Run
prepare-refseqs.pl with the fasta file then add a "gene" track using
the gff3.

* BCE_A0242 actually spans join(207497..208369,1..687) (see genbank)
* bp_genbank2gff3.pl converts this to two lines:
  NC_005707 GenBank gene 207497 208369 . + 1 ID=BCE_A0242;Dbxref=GeneID:2753198;Name=BCE_A0242;old_locus_tag=BCEA0242
  NC_005707 GenBank gene 1 687 . + 1 ID=BCE_A0242;Dbxref=GeneID:2753198;Name=BCE_A0242;old_locus_tag=BCEA0242
* JBrowse displays a single BCE_A0242 gene that spans the entire chromosome (!):
 trackData.json: 0,208369,1,"BCE_A0242","BCE_A0242","gene"

As mentioned here
(http://lists.open-bio.org/pipermail/bioperl-l/2010-August/033915.html),
the way bp_genbank2gff3.pl converts this gene  does not conform with
GFF3 specifications for circular genomes. I hope this would be fixed
soon by the great bioperl developers. Is this the reason for the
erroneous display by JBrowse?

Also, just out of curiosity, I noticed the following fact about the
trackData.json file:
... "featureNCList":[[0,208369,1,"BCE_A0242","BCE_A0242","gene",[[749,2804,1,"BCE_A0001","BCE_A0001","gene"],[2796,3177,1,"BCE_A0002","BCE_A0002","gene"],[3345,3444,-1,"BCE_A0003","BCE_A0003","gene"],[3505,4213,1,"BCE_A0004","BCE_A0004","gene"]
 ...

I don't understand the braces "rules". The first feature (BCE_A0242)
has no closing braces, the second feature (BCE_A0001) has two opening
braces, and the remaining features look normal - a single opening '[`
and a single closing `]` - what's the rule here? (again, just out of
curiosity... this is the first time I looked t this file).

Thank you
Dave

------------------------------------------------------------------------------
This SF.net email is sponsored by

Make an app they can't live without
Enter the BlackBerry Developer Challenge
http://p.sf.net/sfu/RIM-dev2dev 
_______________________________________________
Gmod-ajax mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-ajax
Reply | Threaded
Open this post in threaded view
|

Re: genes at the end of circular genomes

Mitch Skinner
  On 08/18/2010 12:13 AM, David Breimann wrote:
> As mentioned here
> (http://lists.open-bio.org/pipermail/bioperl-l/2010-August/033915.html),
> the way bp_genbank2gff3.pl converts this gene  does not conform with
> GFF3 specifications for circular genomes. I hope this would be fixed
> soon by the great bioperl developers. Is this the reason for the
> erroneous display by JBrowse?

When multiple GFF lines have the same ID, bioperl treats that as one
feature with multiple locations.  This can be used, for example, to
model coding-sequence features in organisms with mRNA splicing.  In my
experience, the more common approach is to model each CDS segment as a
separate subfeature of the transcript feature.  This latter approach,
using subfeatures, is the one that JBrowse currently supports.  At some
point we should probably support the multiple-location thing; I've
created ticket #62 for this:

http://jbrowse.lighthouseapp.com/projects/23792-jbrowse/tickets/62-support-the-bioperl-multiple-location-mechanism

JBrowse currently just uses the "start" and "end" properties of the
bioperl object.  In this case, the "start" of the multiple-location
feature is the earliest start position of any of the GFF lines, and the
"end" is the latest ending position of any of the GFF lines.  And that's
why you see something that covers the entire chromosome.

Once JBrowse handles the multiple-location thing better, then you should
see something closer to what you wanted to see.  However, there's a
complication: in the CDS use case, the feature is usually displayed as
having something (introns) in between the different locations.  But I'm
sure that's not what you want in this case.

There's also a ticket for circular genome support:

http://jbrowse.lighthouseapp.com/projects/23792/tickets/51-circular-genome-support

Ideally, it would be nice to be able to keep scrolling once you get to
the end of a circular chromosome.  But supporting that may be a little
way in the future.

As a temporary hack, you could get something closer to what you're
looking for, visually, if you give the two GFF lines different values
for the "ID=" bit in the ninth column.

> Also, just out of curiosity, I noticed the following fact about the
> trackData.json file:
> ... "featureNCList":[[0,208369,1,"BCE_A0242","BCE_A0242","gene",[[749,2804,1,"BCE_A0001","BCE_A0001","gene"],[2796,3177,1,"BCE_A0002","BCE_A0002","gene"],[3345,3444,-1,"BCE_A0003","BCE_A0003","gene"],[3505,4213,1,"BCE_A0004","BCE_A0004","gene"]
>   ...
>
> I don't understand the braces "rules". The first feature (BCE_A0242)
> has no closing braces, the second feature (BCE_A0001) has two opening
> braces, and the remaining features look normal - a single opening '[`
> and a single closing `]` - what's the rule here? (again, just out of
> curiosity... this is the first time I looked t this file).

The features are in a nested array structure called a nested containment
list:

http://bioinformatics.oxfordjournals.org/cgi/content/short/23/11/1386

You don't see a closing bracket where you were expecting to see one
because the first feature in the list above contains a "sublist" with
other features in it.  The closing bracket is at the end.

The nested containment list structure allows JBrowse to do fast interval
overlap queries.  Having that query mechanism allows JBrowse to draw
only the features you're looking at, and not have to process the entire
chromosome's worth each time it puts some features on screen.

Mitch

------------------------------------------------------------------------------
This SF.net email is sponsored by

Make an app they can't live without
Enter the BlackBerry Developer Challenge
http://p.sf.net/sfu/RIM-dev2dev 
_______________________________________________
Gmod-ajax mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-ajax