chado bioinfo analysis results

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

chado bioinfo analysis results

Alexie Papanicolaou
Hi guys

While discussing with some developers on storing the results of
bioinformatic analyses in Chado (let's call them BLAST for brevity), I
came across Stephen's recent email:

“"
Incidentally, I think this would resolve another problem we
inadvertently introduced when we developed Tripal.  We needed a way to
associate which features were used in certain analyses.  For example,
we performed a blast analysis on a set of features we wanted to know
which features they were and then associate results with them.  We did
this by storing records in the analysisfeature table and blast results
in the analysisfeatureprop table.  But, we learned later that this
confused Gbrowse as it was only expecting to see entries in the
analysisfeature table related to the source of the feature.... not
other analyses they were involved with.  Scott had to make a code
change in GBrowse to ignore the featureanalysis records if Tripal was
being used, which is unfortunate.”
"

We have been using a method similar to Stephen's except that we did
not use the database with GBrowse. However, the problem we have now is
that we're moving away from storing the BLAST results (as there are
too many). We are actually moving towards indexed flatfiles and then
write plugins for genome browsers (not GBrowse though) to grab data.

However we would still like to store the information that a particular
feature (say a gene) is annotated or similar to a dbxref (say a
UniProt ID in the BLAST example). As the analysis table cannot be
used, I was thinking of having to use the feature_dbxref and store the
‘analysis' metadata (score, etc) in feature_dbxrefprop (which doesn't
exist yet but it I guess it would be mimicking dbxrefprop. This
metadata is quite limited (probably a few rows of a few bytes each)
and to be honest I'm not even sure we need them (maybe just date,
score and the reason for linking)….

This is not the cleanest way of doing it, because well.. the gene
feature is not a crossreference to the Uniprot database, just similar
to it (in the above example). So clearly we would benefit from some
kind of standard approach of storing the analysis results without
having to create features….

I feel that Brian's idea for a generalized table is good but I would
like to separate the deliverables in short and long term ones. The
need to store the information that analysis has been run on a feature
(and some metadata) in Chado is an immediate short term one.

Storing the details of the analysis would be challenging and a long
term goal (personally, I feel that Chado is not the correct approach
for that).

What do you think WRT short term goal? Can we use e.g Tripal's
approach or create an analysis table specifically suited around that
use case?

a

--
Dr. Alexie Papanicolaou

------------------------------------------------------------------------------
Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester  
Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice" in the  
endpoint security space. For insight on selecting the right partner to
tackle endpoint security challenges, access the full report.
http://p.sf.net/sfu/symantec-dev2dev
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: chado bioinfo analysis results

Siddhartha Basu
Hi Alex,


On Wed, 06 Mar 2013, Alexie Papanicolaou wrote:

> Hi guys
>
> While discussing with some developers on storing the results of
> bioinformatic analyses in Chado (let's call them BLAST for brevity), I
> came across Stephen's recent email:
>
> “"
> Incidentally, I think this would resolve another problem we
> inadvertently introduced when we developed Tripal.  We needed a way to
> associate which features were used in certain analyses.  For example,
> we performed a blast analysis on a set of features we wanted to know
> which features they were and then associate results with them.  We did
> this by storing records in the analysisfeature table and blast results
> in the analysisfeatureprop table.  But, we learned later that this
> confused Gbrowse as it was only expecting to see entries in the
> analysisfeature table related to the source of the feature.... not
> other analyses they were involved with.  Scott had to make a code
> change in GBrowse to ignore the featureanalysis records if Tripal was
> being used, which is unfortunate.”
> "
>
> We have been using a method similar to Stephen's except that we did
> not use the database with GBrowse. However, the problem we have now is
> that we're moving away from storing the BLAST results (as there are
> too many). We are actually moving towards indexed flatfiles and then
> write plugins for genome browsers (not GBrowse though) to grab data.
>
> However we would still like to store the information that a particular
> feature (say a gene) is annotated or similar to a dbxref (say a
> UniProt ID in the BLAST example). As the analysis table cannot be
> used, I was thinking of having to use the feature_dbxref and store the
> ‘analysis' metadata (score, etc) in feature_dbxrefprop (which doesn't
> exist yet but it I guess it would be mimicking dbxrefprop. This
> metadata is quite limited (probably a few rows of a few bytes each)
> and to be honest I'm not even sure we need them (maybe just date,
> score and the reason for linking)….
I am not sure about the conflict with analysis table here ,
Is it because gbrowse chado adapter cannot read from there ?
What's you chado model to store analysis anyway ? Is it more or less similar to
here http://gmod.org/wiki/Chado_Companalysis_Module.
What are the sore points of companalysis store model of chado in regard
to your analyzed dataset ? What metadata of your analysis you cannot
store in chado or chado lacks any provision for storing that .
Could you please illustrate your point with a small example.

thanks,
-siddhartha






>
> This is not the cleanest way of doing it, because well.. the gene
> feature is not a crossreference to the Uniprot database, just similar
> to it (in the above example). So clearly we would benefit from some
> kind of standard approach of storing the analysis results without
> having to create features….
>
> I feel that Brian's idea for a generalized table is good but I would
> like to separate the deliverables in short and long term ones. The
> need to store the information that analysis has been run on a feature
> (and some metadata) in Chado is an immediate short term one.
>
> Storing the details of the analysis would be challenging and a long
> term goal (personally, I feel that Chado is not the correct approach
> for that).
>
> What do you think WRT short term goal? Can we use e.g Tripal's
> approach or create an analysis table specifically suited around that
> use case?
>
> a
>
> --
> Dr. Alexie Papanicolaou
>
> ------------------------------------------------------------------------------
> Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester  
> Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice" in the  
> endpoint security space. For insight on selecting the right partner to
> tackle endpoint security challenges, access the full report.
> http://p.sf.net/sfu/symantec-dev2dev
> _______________________________________________
> Gmod-schema mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gmod-schema

------------------------------------------------------------------------------
Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester  
Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice" in the  
endpoint security space. For insight on selecting the right partner to
tackle endpoint security challenges, access the full report.
http://p.sf.net/sfu/symantec-dev2dev
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: chado bioinfo analysis results

Brian Repko-2
In reply to this post by Alexie Papanicolaou
Alexie,

What we ended up doing was taking the idea of "knowledge management" from the GMOD Pathways project and basically creating a table for "RDF-like data" - subject - predicate - object.  For us, the subject was a tuple (gene and probeset).  The predicates come from a new CV / CVTERM - and we have 4 - has-gene, has-probeset, sensitivity-score-of and specificity-score-of and the subject is the gene feature ID, probeset feature ID, and 2 scores.

We've not designed the table yet - but are thinking that subject and object columns will be created for various data types (string, number, feature_ID, etc.)

Not sure if that helps - but that is our short-term solution.  A long-term solution would be to create a "knowledge management module" for Chado that is based on the GMOD Pathways schema (frame-based knowledge management system) but works with other modules (module entities as frames in the km system).

Brian

----- Original message -----
From: Alexie Papanicolaou <[hidden email]>
To: [hidden email]
Cc: [hidden email], stephen ficklin <[hidden email]>, scott <[hidden email]>
Subject: chado bioinfo analysis results
Date: Wed, 6 Mar 2013 16:15:03 +1100

Hi guys

While discussing with some developers on storing the results of
bioinformatic analyses in Chado (let's call them BLAST for brevity), I
came across Stephen's recent email:

“"
Incidentally, I think this would resolve another problem we
inadvertently introduced when we developed Tripal.  We needed a way to
associate which features were used in certain analyses.  For example,
we performed a blast analysis on a set of features we wanted to know
which features they were and then associate results with them.  We did
this by storing records in the analysisfeature table and blast results
in the analysisfeatureprop table.  But, we learned later that this
confused Gbrowse as it was only expecting to see entries in the
analysisfeature table related to the source of the feature.... not
other analyses they were involved with.  Scott had to make a code
change in GBrowse to ignore the featureanalysis records if Tripal was
being used, which is unfortunate.”
"

We have been using a method similar to Stephen's except that we did
not use the database with GBrowse. However, the problem we have now is
that we're moving away from storing the BLAST results (as there are
too many). We are actually moving towards indexed flatfiles and then
write plugins for genome browsers (not GBrowse though) to grab data.

However we would still like to store the information that a particular
feature (say a gene) is annotated or similar to a dbxref (say a
UniProt ID in the BLAST example). As the analysis table cannot be
used, I was thinking of having to use the feature_dbxref and store the
‘analysis' metadata (score, etc) in feature_dbxrefprop (which doesn't
exist yet but it I guess it would be mimicking dbxrefprop. This
metadata is quite limited (probably a few rows of a few bytes each)
and to be honest I'm not even sure we need them (maybe just date,
score and the reason for linking)….

This is not the cleanest way of doing it, because well.. the gene
feature is not a crossreference to the Uniprot database, just similar
to it (in the above example). So clearly we would benefit from some
kind of standard approach of storing the analysis results without
having to create features….

I feel that Brian's idea for a generalized table is good but I would
like to separate the deliverables in short and long term ones. The
need to store the information that analysis has been run on a feature
(and some metadata) in Chado is an immediate short term one.

Storing the details of the analysis would be challenging and a long
term goal (personally, I feel that Chado is not the correct approach
for that).

What do you think WRT short term goal? Can we use e.g Tripal's
approach or create an analysis table specifically suited around that
use case?

a

--
Dr. Alexie Papanicolaou

------------------------------------------------------------------------------
Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester  
Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice" in the  
endpoint security space. For insight on selecting the right partner to
tackle endpoint security challenges, access the full report.
http://p.sf.net/sfu/symantec-dev2dev
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema