An alteranate high-throughput sequencing chado representation

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

An alteranate high-throughput sequencing chado representation

Karl O. Pinc
Hi,

Here at Babase we want to store high-throughput
sequencing SNV analysis in Chado.  We were
not happy with the existing approaches and
came up with our own.  We welcome any feedback.

This is a work in progress, but the broad outlines
of our storage model seem clear at this point.

We re-analyze the
same site in the same or different individuals
over time.  The central idea behind our approach
is to store each site ever analyzed as a single
feature, per individual analyzed.  The
information on the overall analysis is a row
in the the ANALYSIS table.  The results relating
to the analysis of each site on each individual
are stored in ANALYSISFEATURE rows, and related
ANALYSISFEATUREPROP rows.

Among other advantages this lets us lift
old results onto a new reference genome
by adding a relatively few FEATURELOC rows
and lets us re-analyze the same sites and
relate these analysis per-individual
as well as per-site.

Note that we have made a number of extensions
to Chado's Analysis module.  It would be nice if
these made it back into Chado.

I'll digress here and talk about the extensions.
Most are entirely un-controversial.  There is
a new column:

  ANALYSIS.CVTerm_Id

And there are the following new tables:

  ANALYSIS_DBXREF (Analysis to External Database Object
                   Cross-References)
  ANALYSIS_CVTERM (Analysis Typing/Tagging)
  ANALYSIS_RELATIONSHIP (Analysis Inter-Relationships)

There is also an

  ANALYSIS_ND_EXPERIMENT (Analysis to Natural Diversity
                          Experiment Result Set
                          Relationships)

table, which allows for relationships between
analysis and nd experiments.  This new table
lets us record the material that was fed
to the high throughput sequencer and it's preparation.


You can find a broad outline here:

http://papio.biology.duke.edu/babase_chado_html/

or here:

http://papio.biology.duke.edu/babase_chado.pdf


Some specifics of the storage model are here, particularly the part
before the option detail:

http://papio.biology.duke.edu/babase_chado_html/chado_load_vcf-ref.html

The ER diagrams should be helpful:

http://papio.biology.duke.edu/babase_chado_html/Entity-Relationship-
Diagrams.html

For details on the extensions made to Chado see:

http://papio.biology.duke.edu/babase_chado_html/Babase-Chado-
Extensions.html

Thanks for any feedback.

Regards,

Karl <[hidden email]>
Free Software:  "You don't pay back, you pay forward."
                 -- Robert A. Heinlein

------------------------------------------------------------------------------
Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema