An alteranate high-throughput sequencing chado representation
Here at Babase we want to store high-throughput
sequencing SNV analysis in Chado. We were
not happy with the existing approaches and
came up with our own. We welcome any feedback.
This is a work in progress, but the broad outlines
of our storage model seem clear at this point.
We re-analyze the
same site in the same or different individuals
over time. The central idea behind our approach
is to store each site ever analyzed as a single
feature, per individual analyzed. The
information on the overall analysis is a row
in the the ANALYSIS table. The results relating
to the analysis of each site on each individual
are stored in ANALYSISFEATURE rows, and related
Among other advantages this lets us lift
old results onto a new reference genome
by adding a relatively few FEATURELOC rows
and lets us re-analyze the same sites and
relate these analysis per-individual
as well as per-site.
Note that we have made a number of extensions
to Chado's Analysis module. It would be nice if
these made it back into Chado.
I'll digress here and talk about the extensions.
Most are entirely un-controversial. There is
a new column: