microarray chip analysis data - how to model in Chado

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

microarray chip analysis data - how to model in Chado

Brian Repko-2
Chado experts:
 
We are modeling Affymetrix (and other) CDFs, probesets, probes and alignments in Chado. We are doing this by making use of some of the mage module tables (ARRAYDESIGN, ELEMENT, ELEMENT_RELATIONSHIP) in addition to the main sequence module tables.
 
Our alignments are your basic match / match_part type features (we are doing 100% so we don't make them analysis features).
 
However, based on the alignments we do some calculations for probeset / gene attributes - specificity and sensitivity.
These are calculated (R scripts) based on the alignments of the probes and the probeset to probe mapping in the CDF.
 
So now I have these scores for gene / probeset tuples and I'm not sure how to store those.
 
One thought is to model them as analysis_features with feature-relationships to the gene and probeset features.
This would match how alignments are - analysis_features (match) with feature-locations to the features that are matched.
 
The other thought is to make a feature-relationship between the gene and probeset features with a feature-relationship property of the sensitivity score and another feature-relationship property for the specificity score.  This is easy to do but only works for properties that determined by 2 features.  If I ever had an analysis that was determined by 3 variables, I couldn't do this.
 
Any thoughts as to the "Chado way" of modeling these types of analysis results?
 
Thanks for any input,
Brian

------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_feb
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: microarray chip analysis data - how to model in Chado

Scott Cain
Hi Brian,

Let me start with I don't really know what the best way to do this is--I don't understand the nature of the data well enough to say with any conviction.

It feels to me that the right way to do it is to use analysisfeature in the way you've described (so the probe would be related to the gene via feature_relationship and the computed attributes of the probe would be in analysis feature).  You don't mention a down side to this approach like you did for your second suggestion.  Are you concerned about it in some way?

Scott


On Tue, Feb 19, 2013 at 11:01 AM, Brian Repko <[hidden email]> wrote:
Chado experts:
 
We are modeling Affymetrix (and other) CDFs, probesets, probes and alignments in Chado. We are doing this by making use of some of the mage module tables (ARRAYDESIGN, ELEMENT, ELEMENT_RELATIONSHIP) in addition to the main sequence module tables.
 
Our alignments are your basic match / match_part type features (we are doing 100% so we don't make them analysis features).
 
However, based on the alignments we do some calculations for probeset / gene attributes - specificity and sensitivity.
These are calculated (R scripts) based on the alignments of the probes and the probeset to probe mapping in the CDF.
 
So now I have these scores for gene / probeset tuples and I'm not sure how to store those.
 
One thought is to model them as analysis_features with feature-relationships to the gene and probeset features.
This would match how alignments are - analysis_features (match) with feature-locations to the features that are matched.
 
The other thought is to make a feature-relationship between the gene and probeset features with a feature-relationship property of the sensitivity score and another feature-relationship property for the specificity score.  This is easy to do but only works for properties that determined by 2 features.  If I ever had an analysis that was determined by 3 variables, I couldn't do this.
 
Any thoughts as to the "Chado way" of modeling these types of analysis results?
 
Thanks for any input,
Brian

------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_feb
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema




--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research
------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_feb
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: microarray chip analysis data - how to model in Chado

Scott Cain
Hi Brian,

Please keep this on the schema mailing list--since I'm not used to working with data in this way, we definitely want to keep as many eyes on it as possible.  I'll try to respond in more detail shortly.

Scott


On Wed, Feb 20, 2013 at 4:22 PM, Brian Repko <[hidden email]> wrote:
Scott,
 
Let me try to describe the data another way and then address the adv / disadv of the two approaches.
 
For alignments, in Chado, one typically has
 
* Feature (ID = 1, type=match)
* FeatureLocation (feature = match, sourceFeature = chromosome, transcript, etc.)
* FeatureLocation (feature = match, sourceFeature = cDNA, EST, etc.)
* Analysis (ID = 1, the alignment algorithm used)
* AnalysisFeature (analysis ID = 1, feature ID = 1, alignment scores)

So for these calculations - I have a specificity "analysis" that for a given gene and probe returns a score
 
* Feature (ID = 2, type= "quality_value")
* FeatureRelationship (object = 2, subject = gene, type = "calculated-by")
* FeatureRelationship (object = 2, subject = probe, type = "calculated-by")
* Analysis (ID = 1, the specificity algorithm)
* AnalysisFeature (analysis ID = 1, feature ID = 2, specificity score)
 
and I would do the same for sensitivity.
 
The other option is to create
 
FeatureRelationship (object = probe, subject = gene, type = "analysis-relationship")
FeatureRelationshipProperty (type = "sensitivity", value = specificity-score)
FeatureRelationshipProperty (type = "specificity", value = sensitivity-score)
 
So one problem with both schemes is what to use for "types" - Option 1 is a bit better on that (but not much).
SO and SO-REL don't have great terms for this stuff.
 
Option 1 can handle functions that take more than 2 inputs.  Option 2 can't.
 
But looking up scores for Option 1 is tougher / slower - I have to know which analysis is the right one.
And option 1 could be trouble for functions that take multiple inputs of the same type (if order matters)
 
function(gene, gene) = interaction score??
 
Option 2 suffers from - which is the subject and which is the object? - which feature OWNS this score.
Option 1 is easier for me to load (pretty simple GFF3).
 
Does that help explain?
Any thoughts on this?
 
Thanks for replying Scott,
Brian
 
----- Original message -----
From: Scott Cain <[hidden email]>
To: Brian Repko <[hidden email]>
Subject: Re: [Gmod-schema] microarray chip analysis data - how to model in Chado
Date: Wed, 20 Feb 2013 15:11:50 -0500
 
Hi Brian,
 
Let me start with I don't really know what the best way to do this is--I don't understand the nature of the data well enough to say with any conviction.
 
It feels to me that the right way to do it is to use analysisfeature in the way you've described (so the probe would be related to the gene via feature_relationship and the computed attributes of the probe would be in analysis feature).  You don't mention a down side to this approach like you did for your second suggestion.  Are you concerned about it in some way?
 
Scott
 
 
On Tue, Feb 19, 2013 at 11:01 AM, Brian Repko <[hidden email]> wrote:

Chado experts:
 
We are modeling Affymetrix (and other) CDFs, probesets, probes and alignments in Chado. We are doing this by making use of some of the mage module tables (ARRAYDESIGN, ELEMENT, ELEMENT_RELATIONSHIP) in addition to the main sequence module tables.
 
Our alignments are your basic match / match_part type features (we are doing 100% so we don't make them analysis features).
 
However, based on the alignments we do some calculations for probeset / gene attributes - specificity and sensitivity.
These are calculated (R scripts) based on the alignments of the probes and the probeset to probe mapping in the CDF.
 
So now I have these scores for gene / probeset tuples and I'm not sure how to store those.
 
One thought is to model them as analysis_features with feature-relationships to the gene and probeset features.
This would match how alignments are - analysis_features (match) with feature-locations to the features that are matched.
 
The other thought is to make a feature-relationship between the gene and probeset features with a feature-relationship property of the sensitivity score and another feature-relationship property for the specificity score.  This is easy to do but only works for properties that determined by 2 features.  If I ever had an analysis that was determined by 3 variables, I couldn't do this.
 
Any thoughts as to the "Chado way" of modeling these types of analysis results?
 
Thanks for any input,
Brian
 
------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
_______________________________________________
Gmod-schema mailing list
 
 
 
--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     <a href="tel:216-392-3087" value="+12163923087" target="_blank">216-392-3087
Ontario Institute for Cancer Research



--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research
------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_feb
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: microarray chip analysis data - how to model in Chado

Brian Repko-2
Sorry - I did not look at the reply-to.
 
----- Original message -----
From: Scott Cain <[hidden email]>
To: Brian Repko <[hidden email]>
Cc: "GMOD Schema/Chado List" <[hidden email]>
Subject: Re: [Gmod-schema] microarray chip analysis data - how to model in Chado
Date: Wed, 20 Feb 2013 17:13:52 -0500
 
Hi Brian,
 
Please keep this on the schema mailing list--since I'm not used to working with data in this way, we definitely want to keep as many eyes on it as possible.  I'll try to respond in more detail shortly.
 
Scott
 
 
On Wed, Feb 20, 2013 at 4:22 PM, Brian Repko <[hidden email]> wrote:
Scott,
 
Let me try to describe the data another way and then address the adv / disadv of the two approaches.
 
For alignments, in Chado, one typically has
 
* Feature (ID = 1, type=match)
* FeatureLocation (feature = match, sourceFeature = chromosome, transcript, etc.)
* FeatureLocation (feature = match, sourceFeature = cDNA, EST, etc.)
* Analysis (ID = 1, the alignment algorithm used)
* AnalysisFeature (analysis ID = 1, feature ID = 1, alignment scores)
 
So for these calculations - I have a specificity "analysis" that for a given gene and probe returns a score
 
* Feature (ID = 2, type= "quality_value")
* FeatureRelationship (object = 2, subject = gene, type = "calculated-by")
* FeatureRelationship (object = 2, subject = probe, type = "calculated-by")
* Analysis (ID = 1, the specificity algorithm)
* AnalysisFeature (analysis ID = 1, feature ID = 2, specificity score)
 
and I would do the same for sensitivity.
 
The other option is to create
 
FeatureRelationship (object = probe, subject = gene, type = "analysis-relationship")
FeatureRelationshipProperty (type = "sensitivity", value = specificity-score)
FeatureRelationshipProperty (type = "specificity", value = sensitivity-score)
 
So one problem with both schemes is what to use for "types" - Option 1 is a bit better on that (but not much).
SO and SO-REL don't have great terms for this stuff.
 
Option 1 can handle functions that take more than 2 inputs.  Option 2 can't.
 
But looking up scores for Option 1 is tougher / slower - I have to know which analysis is the right one.
And option 1 could be trouble for functions that take multiple inputs of the same type (if order matters)
 
function(gene, gene) = interaction score??
 
Option 2 suffers from - which is the subject and which is the object? - which feature OWNS this score.
Option 1 is easier for me to load (pretty simple GFF3).
 
Does that help explain?
Any thoughts on this?
 
Thanks for replying Scott,
Brian
 
----- Original message -----
From: Scott Cain <[hidden email]>
To: Brian Repko <[hidden email]>
Subject: Re: [Gmod-schema] microarray chip analysis data - how to model in Chado
Date: Wed, 20 Feb 2013 15:11:50 -0500
 
Hi Brian,
 
Let me start with I don't really know what the best way to do this is--I don't understand the nature of the data well enough to say with any conviction.
 
It feels to me that the right way to do it is to use analysisfeature in the way you've described (so the probe would be related to the gene via feature_relationship and the computed attributes of the probe would be in analysis feature).  You don't mention a down side to this approach like you did for your second suggestion.  Are you concerned about it in some way?
 
Scott
 
 
On Tue, Feb 19, 2013 at 11:01 AM, Brian Repko <[hidden email]> wrote:

Chado experts:
 
We are modeling Affymetrix (and other) CDFs, probesets, probes and alignments in Chado. We are doing this by making use of some of the mage module tables (ARRAYDESIGN, ELEMENT, ELEMENT_RELATIONSHIP) in addition to the main sequence module tables.
 
Our alignments are your basic match / match_part type features (we are doing 100% so we don't make them analysis features).
 
However, based on the alignments we do some calculations for probeset / gene attributes - specificity and sensitivity.
These are calculated (R scripts) based on the alignments of the probes and the probeset to probe mapping in the CDF.
 
So now I have these scores for gene / probeset tuples and I'm not sure how to store those.
 
One thought is to model them as analysis_features with feature-relationships to the gene and probeset features.
This would match how alignments are - analysis_features (match) with feature-locations to the features that are matched.
 
The other thought is to make a feature-relationship between the gene and probeset features with a feature-relationship property of the sensitivity score and another feature-relationship property for the specificity score.  This is easy to do but only works for properties that determined by 2 features.  If I ever had an analysis that was determined by 3 variables, I couldn't do this.
 
Any thoughts as to the "Chado way" of modeling these types of analysis results?
 
Thanks for any input,
Brian
 
------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
_______________________________________________
Gmod-schema mailing list
 
 
 
--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     <a href="tel:216-392-3087" target="_blank">216-392-3087
Ontario Institute for Cancer Research
 
 
 
--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research

------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_feb
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: microarray chip analysis data - how to model in Chado

Stephen Ficklin-2
Hi Brian & Scott,

I'm butting in where I normally don't, but I have a need for storing data microarray related data as well so the subject caught my attention.    Scott, I have a suggestion further down in the email (sorry for the length) that may help solve Brian's problem as well as one we encountered with Tripal earlier.

If I understand your option #1, you want to create a new record for the feature table for each sensitivity/specificity score you have.  This new feature can then be associated with other features (more than two) that were used to derive that score via the feature_relationship table.  So, if I'm reading between the lines correctly you are using all of the probes in a probeset to calculate the sensitivity/specificity for a gene (or at least the subset of probes that overlap with the gene) and you want to associate all of those probes with the sensitivity/specificity score you've generated?

So, if I'm understanding what you're after (correct me if I'm wrong), then I think creating a feature as a place holder that you can link to for storing the score breaks the paradigm for the feature table.   It certainly works and I think the better of the two options.  But, the score is derived from the use of other features and the end result is not really a new feature (or an alignment that can be represented by a feature).  If you did go this route could you localize your new feature to the landmark sequence and would it make sense to someone if they saw it in GBrowse?  I know not all features are intended to be localized to a sequence, but I'm assuming that anything in the feature table could potentially be aligned to a sequence.

Instead, what if we had a table named 'analysisresult', with the columns: analysisresult_id, analysis_id, type_id, and value.  And then had a second table named 'analysisresult_feature' with columns: analysiresult_id, feature_id.   This way you could store your sensitivity/specificity result in the analysisresult table, and then link up features that were used in that analysis in the analysisresult_feature table.

Incidentally, I think this would resolve another problem we inadvertently introduced when we developed Tripal.  We needed a way to associate which features were used in certain analyses.  For example, we performed a blast analysis on a set of features we wanted to know which features they were and then associate results with them.  We did this by storing records in the analysisfeature table and blast results in the analysisfeatureprop table.  But, we learned later that this confused Gbrowse as it was only expecting to see entries in the analysisfeature table related to the source of the feature.... not other analyses they were involved with.  Scott had to make a code change in GBrowse to ignore the featureanalysis records if Tripal was being used, which is unfortunate.

So, I think the addition of these two tables would help solve your problem but also resolve the issue we had with Tripal and leave the analysisfeature table for what it was originally designed for... storing details about the source of features.  And, you would not be required to generate new features that may not really be features.

Stephen


On 2/20/2013 5:43 PM, Brian Repko wrote:
Sorry - I did not look at the reply-to.
 
----- Original message -----
From: Scott Cain <[hidden email]>
To: Brian Repko <[hidden email]>
Cc: "GMOD Schema/Chado List" <[hidden email]>
Subject: Re: [Gmod-schema] microarray chip analysis data - how to model in Chado
Date: Wed, 20 Feb 2013 17:13:52 -0500
 
Hi Brian,
 
Please keep this on the schema mailing list--since I'm not used to working with data in this way, we definitely want to keep as many eyes on it as possible.  I'll try to respond in more detail shortly.
 
Scott
 
 
On Wed, Feb 20, 2013 at 4:22 PM, Brian Repko <[hidden email]> wrote:
Scott,
 
Let me try to describe the data another way and then address the adv / disadv of the two approaches.
 
For alignments, in Chado, one typically has
 
* Feature (ID = 1, type=match)
* FeatureLocation (feature = match, sourceFeature = chromosome, transcript, etc.)
* FeatureLocation (feature = match, sourceFeature = cDNA, EST, etc.)
* Analysis (ID = 1, the alignment algorithm used)
* AnalysisFeature (analysis ID = 1, feature ID = 1, alignment scores)
 
So for these calculations - I have a specificity "analysis" that for a given gene and probe returns a score
 
* Feature (ID = 2, type= "quality_value")
* FeatureRelationship (object = 2, subject = gene, type = "calculated-by")
* FeatureRelationship (object = 2, subject = probe, type = "calculated-by")
* Analysis (ID = 1, the specificity algorithm)
* AnalysisFeature (analysis ID = 1, feature ID = 2, specificity score)
 
and I would do the same for sensitivity.
 
The other option is to create
 
FeatureRelationship (object = probe, subject = gene, type = "analysis-relationship")
FeatureRelationshipProperty (type = "sensitivity", value = specificity-score)
FeatureRelationshipProperty (type = "specificity", value = sensitivity-score)
 
So one problem with both schemes is what to use for "types" - Option 1 is a bit better on that (but not much).
SO and SO-REL don't have great terms for this stuff.
 
Option 1 can handle functions that take more than 2 inputs.  Option 2 can't.
 
But looking up scores for Option 1 is tougher / slower - I have to know which analysis is the right one.
And option 1 could be trouble for functions that take multiple inputs of the same type (if order matters)
 
function(gene, gene) = interaction score??
 
Option 2 suffers from - which is the subject and which is the object? - which feature OWNS this score.
Option 1 is easier for me to load (pretty simple GFF3).
 
Does that help explain?
Any thoughts on this?
 
Thanks for replying Scott,
Brian
 
----- Original message -----
From: Scott Cain <[hidden email]>
To: Brian Repko <[hidden email]>
Subject: Re: [Gmod-schema] microarray chip analysis data - how to model in Chado
Date: Wed, 20 Feb 2013 15:11:50 -0500
 
Hi Brian,
 
Let me start with I don't really know what the best way to do this is--I don't understand the nature of the data well enough to say with any conviction.
 
It feels to me that the right way to do it is to use analysisfeature in the way you've described (so the probe would be related to the gene via feature_relationship and the computed attributes of the probe would be in analysis feature).  You don't mention a down side to this approach like you did for your second suggestion.  Are you concerned about it in some way?
 
Scott
 
 
On Tue, Feb 19, 2013 at 11:01 AM, Brian Repko <[hidden email]> wrote:

Chado experts:
 
We are modeling Affymetrix (and other) CDFs, probesets, probes and alignments in Chado. We are doing this by making use of some of the mage module tables (ARRAYDESIGN, ELEMENT, ELEMENT_RELATIONSHIP) in addition to the main sequence module tables.
 
Our alignments are your basic match / match_part type features (we are doing 100% so we don't make them analysis features).
 
However, based on the alignments we do some calculations for probeset / gene attributes - specificity and sensitivity.
These are calculated (R scripts) based on the alignments of the probes and the probeset to probe mapping in the CDF.
 
So now I have these scores for gene / probeset tuples and I'm not sure how to store those.
 
One thought is to model them as analysis_features with feature-relationships to the gene and probeset features.
This would match how alignments are - analysis_features (match) with feature-locations to the features that are matched.
 
The other thought is to make a feature-relationship between the gene and probeset features with a feature-relationship property of the sensitivity score and another feature-relationship property for the specificity score.  This is easy to do but only works for properties that determined by 2 features.  If I ever had an analysis that was determined by 3 variables, I couldn't do this.
 
Any thoughts as to the "Chado way" of modeling these types of analysis results?
 
Thanks for any input,
Brian
 
------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
_______________________________________________
Gmod-schema mailing list
 
 
 
--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     <a moz-do-not-send="true" href="tel:216-392-3087" target="_blank">216-392-3087
Ontario Institute for Cancer Research
 
 
 
--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research


------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_feb


_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema


------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_feb
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: microarray chip analysis data - how to model in Chado

Brian Repko-2
Stephen,
 
Thanks for this input.
 
Your restatement of Option 1 is correct.  Your reading between the lines is also correct - wish I could share the algorithm but I can't.
I agree with your thoughts 100% on this design not using "feature" well.  Which is why I'm asking.
 
I also looked at other existing tables - I thought about using the elementresult table as well - but that implies an acquisition and I don't have that - it also is hard to tie the score to a tuple.
 
Your suggestion of additional tables was something that crossed my mind.  We also looked at Allen Day's affymetrix tables for ideas as well - but that was more for cdf / probeset / probe modeling.

My understanding of the feature and analysisfeature tables is that when feature has the is_analysis flag set to true, then the feature will have an analysisfeature row (or many) with scores for various analyses - this is useful for alignments.  We also used it initially to keep track of where features came from - our loading routines were the "analysis" and we used analysisfeature to track which ones were added/updated by that load.  That was a mess and we now just use a "source" featureproperty.
 
We have our own web app on Chado (not GBrowse) so we've not hit unintended consequences in the browser.
 
That said - the idea is when looking at a particular gene and displaying the probeset / probe tracks, we can display sensitivity and specificity scores in a tooltip / hover over the track.  Likewise, when looking at a probeset and displaying the genes hit we can display the scores as well - we get at this data both ways though the first will probably be used more.  I could see the first as a potential use case for GBrowse.
 
We may just store this data as a non-chado custom table.
 
However a generalized "companalysis" module would need to model inputs and outputs - and those can be ordered as well - and not well defined for a relational database (with foreign keys).  Are inputs always features? What if one of the inputs is organism? Are outputs features? Are they scores? Are they multiple columns of data (technically we have more than the scores but we are limiting our use case).  In the end this becomes exceedingly difficult and probably best modeled on a per-analysis basis. 
 
As someone interested in contributing to the project - I'd be happy to be part of a discussion around a more generalized companalysis module (actually there are some great papers on RNA-seq and flybase that might help) but that should probably be moved off of this discussion and made a discussion on its own.
 
Brian
 
----- Original message -----
From: Stephen Ficklin <[hidden email]>
To: Brian Repko <[hidden email]>
Cc: Scott Cain <[hidden email]>, "GMOD Schema/Chado List" <[hidden email]>
Subject: Re: [Gmod-schema] microarray chip analysis data - how to model in Chado
Date: Thu, 21 Feb 2013 12:45:48 -0500
 
Hi Brian & Scott,
I'm butting in where I normally don't, but I have a need for storing data microarray related data as well so the subject caught my attention.    Scott, I have a suggestion further down in the email (sorry for the length) that may help solve Brian's problem as well as one we encountered with Tripal earlier.
If I understand your option #1, you want to create a new record for the feature table for each sensitivity/specificity score you have.  This new feature can then be associated with other features (more than two) that were used to derive that score via the feature_relationship table.  So, if I'm reading between the lines correctly you are using all of the probes in a probeset to calculate the sensitivity/specificity for a gene (or at least the subset of probes that overlap with the gene) and you want to associate all of those probes with the sensitivity/specificity score you've generated?
So, if I'm understanding what you're after (correct me if I'm wrong), then I think creating a feature as a place holder that you can link to for storing the score breaks the paradigm for the feature table.   It certainly works and I think the better of the two options.  But, the score is derived from the use of other features and the end result is not really a new feature (or an alignment that can be represented by a feature).  If you did go this route could you localize your new feature to the landmark sequence and would it make sense to someone if they saw it in GBrowse?  I know not all features are intended to be localized to a sequence, but I'm assuming that anything in the feature table could potentially be aligned to a sequence.
Instead, what if we had a table named 'analysisresult', with the columns: analysisresult_id, analysis_id, type_id, and value.  And then had a second table named 'analysisresult_feature' with columns: analysiresult_id, feature_id.   This way you could store your sensitivity/specificity result in the analysisresult table, and then link up features that were used in that analysis in the analysisresult_feature table.
Incidentally, I think this would resolve another problem we inadvertently introduced when we developed Tripal.  We needed a way to associate which features were used in certain analyses.  For example, we performed a blast analysis on a set of features we wanted to know which features they were and then associate results with them.  We did this by storing records in the analysisfeature table and blast results in the analysisfeatureprop table.  But, we learned later that this confused Gbrowse as it was only expecting to see entries in the analysisfeature table related to the source of the feature.... not other analyses they were involved with.  Scott had to make a code change in GBrowse to ignore the featureanalysis records if Tripal was being used, which is unfortunate.
So, I think the addition of these two tables would help solve your problem but also resolve the issue we had with Tripal and leave the analysisfeature table for what it was originally designed for... storing details about the source of features.  And, you would not be required to generate new features that may not really be features.
Stephen
On 2/20/2013 5:43 PM, Brian Repko wrote:
Sorry - I did not look at the reply-to.
 
----- Original message -----
From: Scott Cain <[hidden email]>
To: Brian Repko <[hidden email]>
Cc: "GMOD Schema/Chado List" <[hidden email]>
Subject: Re: [Gmod-schema] microarray chip analysis data - how to model in Chado
Date: Wed, 20 Feb 2013 17:13:52 -0500
 
Hi Brian,
 
Please keep this on the schema mailing list--since I'm not used to working with data in this way, we definitely want to keep as many eyes on it as possible.  I'll try to respond in more detail shortly.
 
Scott
 
 
On Wed, Feb 20, 2013 at 4:22 PM, Brian Repko <[hidden email]> wrote:
Scott,
 
Let me try to describe the data another way and then address the adv / disadv of the two approaches.
 
For alignments, in Chado, one typically has
 
* Feature (ID = 1, type=match)
* FeatureLocation (feature = match, sourceFeature = chromosome, transcript, etc.)
* FeatureLocation (feature = match, sourceFeature = cDNA, EST, etc.)
* Analysis (ID = 1, the alignment algorithm used)
* AnalysisFeature (analysis ID = 1, feature ID = 1, alignment scores)
 
So for these calculations - I have a specificity "analysis" that for a given gene and probe returns a score
 
* Feature (ID = 2, type= "quality_value")
* FeatureRelationship (object = 2, subject = gene, type = "calculated-by")
* FeatureRelationship (object = 2, subject = probe, type = "calculated-by")
* Analysis (ID = 1, the specificity algorithm)
* AnalysisFeature (analysis ID = 1, feature ID = 2, specificity score)
 
and I would do the same for sensitivity.
 
The other option is to create
 
FeatureRelationship (object = probe, subject = gene, type = "analysis-relationship")
FeatureRelationshipProperty (type = "sensitivity", value = specificity-score)
FeatureRelationshipProperty (type = "specificity", value = sensitivity-score)
 
So one problem with both schemes is what to use for "types" - Option 1 is a bit better on that (but not much).
SO and SO-REL don't have great terms for this stuff.
 
Option 1 can handle functions that take more than 2 inputs.  Option 2 can't.
 
But looking up scores for Option 1 is tougher / slower - I have to know which analysis is the right one.
And option 1 could be trouble for functions that take multiple inputs of the same type (if order matters)
 
function(gene, gene) = interaction score??
 
Option 2 suffers from - which is the subject and which is the object? - which feature OWNS this score.
Option 1 is easier for me to load (pretty simple GFF3).
 
Does that help explain?
Any thoughts on this?
 
Thanks for replying Scott,
Brian
 
----- Original message -----
From: Scott Cain <[hidden email]>
To: Brian Repko <[hidden email]>
Subject: Re: [Gmod-schema] microarray chip analysis data - how to model in Chado
Date: Wed, 20 Feb 2013 15:11:50 -0500
 
Hi Brian,
 
Let me start with I don't really know what the best way to do this is--I don't understand the nature of the data well enough to say with any conviction.
 
It feels to me that the right way to do it is to use analysisfeature in the way you've described (so the probe would be related to the gene via feature_relationship and the computed attributes of the probe would be in analysis feature).  You don't mention a down side to this approach like you did for your second suggestion.  Are you concerned about it in some way?
 
Scott
 
 
On Tue, Feb 19, 2013 at 11:01 AM, Brian Repko <[hidden email]> wrote:
 
Chado experts:
 
We are modeling Affymetrix (and other) CDFs, probesets, probes and alignments in Chado. We are doing this by making use of some of the mage module tables (ARRAYDESIGN, ELEMENT, ELEMENT_RELATIONSHIP) in addition to the main sequence module tables.
 
Our alignments are your basic match / match_part type features (we are doing 100% so we don't make them analysis features).
 
However, based on the alignments we do some calculations for probeset / gene attributes - specificity and sensitivity.
These are calculated (R scripts) based on the alignments of the probes and the probeset to probe mapping in the CDF.
 
So now I have these scores for gene / probeset tuples and I'm not sure how to store those.
 
One thought is to model them as analysis_features with feature-relationships to the gene and probeset features.
This would match how alignments are - analysis_features (match) with feature-locations to the features that are matched.
 
The other thought is to make a feature-relationship between the gene and probeset features with a feature-relationship property of the sensitivity score and another feature-relationship property for the specificity score.  This is easy to do but only works for properties that determined by 2 features.  If I ever had an analysis that was determined by 3 variables, I couldn't do this.
 
Any thoughts as to the "Chado way" of modeling these types of analysis results?
 
Thanks for any input,
Brian
 
------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
_______________________________________________
Gmod-schema mailing list
 
 
 
--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     <a href="tel:216-392-3087" target="_blank">216-392-3087
Ontario Institute for Cancer Research
 
 
 
--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research
 
 
------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_feb
 
 
_______________________________________________
Gmod-schema mailing list
[hidden email]https://lists.sourceforge.net/lists/listinfo/gmod-schema

------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_feb
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema