Re: [Gmod-phendiver] [Gmod-schema] Proposed changes to NatDiv related modules

classic Classic list List threaded Threaded
38 messages Options
12
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Gmod-phendiver] [Gmod-schema] Proposed changes to NatDiv related modules

seth redmond
The protocol table changes (not null on the name, etc) were already agreed weren't they? we've been running into problems on these ourselves so I'd back having this done as soon as poss.  

Could you explain your reasoning behind adding a unit_id field to project prop? I can see why phenotype is treated as a special case, but I'm not sure I see any reason to do the same to project prop. 

And didn't we already discuss adding the experiment_protocolprop / experiment_phenotypeprop back in December and decide that if you were using one protocol per nd_experiment it was unnecessary? has anything changed in the meantime? I'm not sure I see much value in changing the schema in this way now that we're all developing on it and there's a paper underway.

  

On 20 Apr 2011, at 02:57, Yuri Bendana wrote:

I have some proposed changes which Chado may want adopt: 
  • Adding units_id (or unit_id) to projectprop.
  • Adding the property tables environmentprop, phenstatementprop.  These are useful when creating phenstatements.
  • Removal of NOT NULL from nd_protocol.name.   Addition of nd_protocol.type_id.  This allows you to access a protocol by type_id instead of name.
  • Removing NOT NULL from nd_experimentprop.value.  This makes it consistent with other property tables.
  • Adding property tables nd_experiment_protocolprop and nd_experiment_phenotypeprop.  I use these to store protocol values and phenotype observations specific to an nd_experiment.
The diffs to each module are below.

yuri

===================================================================
--- project.sql (revision 24826)
+++ project.sql (working copy)

-- projectprop
@@ -33,6 +33,8 @@
  type_id integer NOT NULL,
  FOREIGN KEY (type_id) REFERENCES cvterm (cvterm_id) ON DELETE CASCADE,
  value text,
+ units_id integer REFERENCES cvterm (cvterm_id) ON DELETE SET NULL,
  rank integer not null default 0,
  CONSTRAINT projectprop_c1 UNIQUE (project_id, type_id, rank)
 );

===================================================================
--- genetic.sql (revision 24826)
+++ genetic.sql (working copy)
@@ -94,6 +94,18 @@
 
+-- ================================================
+-- TABLE: environmentprop
+-- ================================================
+CREATE TABLE environmentprop (
+    environmentprop_id serial PRIMARY KEY NOT NULL,
+    environment_id integer NOT NULL REFERENCES environment (environment_id) ON DELETE CASCADE INITIALLY DEFERRED,
+    type_id integer NOT NULL REFERENCES cvterm (cvterm_id) ON DELETE CASCADE INITIALLY DEFERRED ,
+    value text,
+    units_id integer REFERENCES cvterm (cvterm_id) ON DELETE CASCADE INITIALLY DEFERRED ,
+    rank integer NOT NULL DEFAULT 0,
+    constraint environmentprop_c1 UNIQUE (environment_id,type_id,rank)
+);
 

 -- ================================================
+-- TABLE: phenstatementprop
+-- ================================================
+CREATE TABLE phenstatementprop (
+    phenstatementprop_id serial PRIMARY KEY NOT NULL,
+    phenstatement_id integer NOT NULL REFERENCES phenstatement (phenstatement_id) ON DELETE CASCADE INITIALLY DEFERRED,
+    type_id integer NOT NULL REFERENCES cvterm (cvterm_id) ON DELETE CASCADE INITIALLY DEFERRED ,
+    value text,
+    units_id integer REFERENCES cvterm (cvterm_id) ON DELETE CASCADE INITIALLY DEFERRED ,
+    rank integer NOT NULL DEFAULT 0,
+    constraint phenstatementprop_c1 UNIQUE (phenstatement_id,type_id,rank)
+);

===================================================================
--- natural_diversity.sql (revision 24826)
+++ natural_diversity.sql (working copy)

@@ -63,7 +62,7 @@
     nd_experimentprop_id serial PRIMARY KEY NOT NULL,
     nd_experiment_id integer NOT NULL references nd_experiment (nd_experiment_id) on delete cascade INITIALLY DEFERRED,
     type_id integer NOT NULL references cvterm (cvterm_id) on delete cascade INITIALLY DEFERRED ,
-    value character varying(255) NOT NULL,
+    value character varying(255),
     rank integer NOT NULL default 0,
     constraint nd_experimentprop_c1 unique (nd_experiment_id,type_id,rank)
 );

 CREATE TABLE nd_protocol (
     nd_protocol_id serial PRIMARY KEY  NOT NULL,
-    name character varying(255) NOT NULL unique
+    name character varying(255) unique,
+    type_id integer NOT NULL references cvterm (cvterm_id) on delete cascade INITIALLY DEFERRED
 );
 
+CREATE TABLE nd_experiment_protocolprop (
+    nd_experiment_protocolprop_id serial PRIMARY KEY,
+    nd_experiment_protocol_id integer NOT NULL REFERENCES nd_experiment_protocol ON DELETE CASCADE INITIALLY DEFERRED,
+    type_id integer NOT NULL REFERENCES cvterm (cvterm_id) ON DELETE CASCADE INITIALLY DEFERRED,
+    value text,
+    units_id integer REFERENCES cvterm (cvterm_id) ON DELETE SET NULL,      
+    rank integer DEFAULT 0 NOT NULL,
+    CONSTRAINT nd_experiment_protocolprop_c1 UNIQUE (nd_experiment_protocol_id,type_id,rank)
+);

+CREATE TABLE nd_experiment_phenotypeprop (
+    nd_experiment_phenotypeprop_id serial PRIMARY KEY,
+    nd_experiment_phenotype_id integer NOT NULL REFERENCES nd_experiment_phenotype ON DELETE CASCADE INITIALLY DEFERRED,
+    type_id integer NOT NULL REFERENCES cvterm (cvterm_id) ON DELETE CASCADE INITIALLY DEFERRED,
+    value text,
+    units_id integer REFERENCES cvterm (cvterm_id) ON DELETE SET NULL,    
+    cvalue_id integer REFERENCES cvterm (cvterm_id) ON DELETE SET NULL,   
+    rank integer DEFAULT 0 NOT NULL,
+    CONSTRAINT nd_experiment_phenotypeprop_c1 UNIQUE (nd_experiment_phenotype_id,type_id,rank)
+);


------------------------------------------------------------------------------
Benefiting from Server Virtualization: Beyond Initial Workload
Consolidation -- Increasing the use of server virtualization is a top
priority.Virtualization can reduce costs, simplify management, and improve
application availability and disaster protection. Learn more about boosting
the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema


------------------------------------------------------------------------------
Benefiting from Server Virtualization: Beyond Initial Workload
Consolidation -- Increasing the use of server virtualization is a top
priority.Virtualization can reduce costs, simplify management, and improve
application availability and disaster protection. Learn more about boosting
the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev
_______________________________________________
Gmod-phendiver mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-phendiver
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Gmod-phendiver] [Gmod-schema] Proposed changes to NatDiv related modules

Yuri Bendana-3
On Wed, Apr 20, 2011 at 1:41 AM, seth redmond <[hidden email]> wrote:
The protocol table changes (not null on the name, etc) were already agreed weren't they? we've been running into problems on these ourselves so I'd back having this done as soon as poss.  
 
 I don't recall any of my previous proposals being formally agreed to, so Naama suggested I send them out again.


Could you explain your reasoning behind adding a unit_id field to project prop? I can see why phenotype is treated as a special case, but I'm not sure I see any reason to do the same to project prop. 

I do remember previously giving an example of a project property we have that includes a unit: "watering amount = 100 ml".
 
And didn't we already discuss adding the experiment_protocolprop / experiment_phenotypeprop back in December and decide that if you were using one protocol per nd_experiment it was unnecessary? has anything changed in the meantime? I'm not sure I see much value in changing the schema in this way now that we're all developing on it and there's a paper underway.

I find these property tables useful because I can store the experiment specific values of a protocol or phenotype while the nd_protocol and phenotype tables store only the descriptions.   This is a different way of using the schema than currently it's designed where the nd_protocol and phenotype tables store both description and values.  I prefer to keep them separate.  We have discussed this before, but I'm only restating it since Naama requested.  Even if the property tables were added to the schema, it shouldn't impact your work if you don't use them.  Future schema users may prefer my approach, or not.  But with a minor addition of some property tables, Chado can given them that choice.

yuri
 
  

On 20 Apr 2011, at 02:57, Yuri Bendana wrote:

I have some proposed changes which Chado may want adopt: 
  • Adding units_id (or unit_id) to projectprop.
  • Adding the property tables environmentprop, phenstatementprop.  These are useful when creating phenstatements.
  • Removal of NOT NULL from nd_protocol.name.   Addition of nd_protocol.type_id.  This allows you to access a protocol by type_id instead of name.
  • Removing NOT NULL from nd_experimentprop.value.  This makes it consistent with other property tables.
  • Adding property tables nd_experiment_protocolprop and nd_experiment_phenotypeprop.  I use these to store protocol values and phenotype observations specific to an nd_experiment.
The diffs to each module are below.

yuri

===================================================================
--- project.sql (revision 24826)
+++ project.sql (working copy)

-- projectprop
@@ -33,6 +33,8 @@
  type_id integer NOT NULL,
  FOREIGN KEY (type_id) REFERENCES cvterm (cvterm_id) ON DELETE CASCADE,
  value text,
+ units_id integer REFERENCES cvterm (cvterm_id) ON DELETE SET NULL,
  rank integer not null default 0,
  CONSTRAINT projectprop_c1 UNIQUE (project_id, type_id, rank)
 );

===================================================================
--- genetic.sql (revision 24826)
+++ genetic.sql (working copy)
@@ -94,6 +94,18 @@
 
+-- ================================================
+-- TABLE: environmentprop
+-- ================================================
+CREATE TABLE environmentprop (
+    environmentprop_id serial PRIMARY KEY NOT NULL,
+    environment_id integer NOT NULL REFERENCES environment (environment_id) ON DELETE CASCADE INITIALLY DEFERRED,
+    type_id integer NOT NULL REFERENCES cvterm (cvterm_id) ON DELETE CASCADE INITIALLY DEFERRED ,
+    value text,
+    units_id integer REFERENCES cvterm (cvterm_id) ON DELETE CASCADE INITIALLY DEFERRED ,
+    rank integer NOT NULL DEFAULT 0,
+    constraint environmentprop_c1 UNIQUE (environment_id,type_id,rank)
+);
 

 -- ================================================
+-- TABLE: phenstatementprop
+-- ================================================
+CREATE TABLE phenstatementprop (
+    phenstatementprop_id serial PRIMARY KEY NOT NULL,
+    phenstatement_id integer NOT NULL REFERENCES phenstatement (phenstatement_id) ON DELETE CASCADE INITIALLY DEFERRED,
+    type_id integer NOT NULL REFERENCES cvterm (cvterm_id) ON DELETE CASCADE INITIALLY DEFERRED ,
+    value text,
+    units_id integer REFERENCES cvterm (cvterm_id) ON DELETE CASCADE INITIALLY DEFERRED ,
+    rank integer NOT NULL DEFAULT 0,
+    constraint phenstatementprop_c1 UNIQUE (phenstatement_id,type_id,rank)
+);

===================================================================
--- natural_diversity.sql (revision 24826)
+++ natural_diversity.sql (working copy)

@@ -63,7 +62,7 @@
     nd_experimentprop_id serial PRIMARY KEY NOT NULL,
     nd_experiment_id integer NOT NULL references nd_experiment (nd_experiment_id) on delete cascade INITIALLY DEFERRED,
     type_id integer NOT NULL references cvterm (cvterm_id) on delete cascade INITIALLY DEFERRED ,
-    value character varying(255) NOT NULL,
+    value character varying(255),
     rank integer NOT NULL default 0,
     constraint nd_experimentprop_c1 unique (nd_experiment_id,type_id,rank)
 );

 CREATE TABLE nd_protocol (
     nd_protocol_id serial PRIMARY KEY  NOT NULL,
-    name character varying(255) NOT NULL unique
+    name character varying(255) unique,
+    type_id integer NOT NULL references cvterm (cvterm_id) on delete cascade INITIALLY DEFERRED
 );
 
+CREATE TABLE nd_experiment_protocolprop (
+    nd_experiment_protocolprop_id serial PRIMARY KEY,
+    nd_experiment_protocol_id integer NOT NULL REFERENCES nd_experiment_protocol ON DELETE CASCADE INITIALLY DEFERRED,
+    type_id integer NOT NULL REFERENCES cvterm (cvterm_id) ON DELETE CASCADE INITIALLY DEFERRED,
+    value text,
+    units_id integer REFERENCES cvterm (cvterm_id) ON DELETE SET NULL,      
+    rank integer DEFAULT 0 NOT NULL,
+    CONSTRAINT nd_experiment_protocolprop_c1 UNIQUE (nd_experiment_protocol_id,type_id,rank)
+);

+CREATE TABLE nd_experiment_phenotypeprop (
+    nd_experiment_phenotypeprop_id serial PRIMARY KEY,
+    nd_experiment_phenotype_id integer NOT NULL REFERENCES nd_experiment_phenotype ON DELETE CASCADE INITIALLY DEFERRED,
+    type_id integer NOT NULL REFERENCES cvterm (cvterm_id) ON DELETE CASCADE INITIALLY DEFERRED,
+    value text,
+    units_id integer REFERENCES cvterm (cvterm_id) ON DELETE SET NULL,    
+    cvalue_id integer REFERENCES cvterm (cvterm_id) ON DELETE SET NULL,   
+    rank integer DEFAULT 0 NOT NULL,
+    CONSTRAINT nd_experiment_phenotypeprop_c1 UNIQUE (nd_experiment_phenotype_id,type_id,rank)
+);


------------------------------------------------------------------------------
Benefiting from Server Virtualization: Beyond Initial Workload
Consolidation -- Increasing the use of server virtualization is a top
priority.Virtualization can reduce costs, simplify management, and improve
application availability and disaster protection. Learn more about boosting
the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema



------------------------------------------------------------------------------
Benefiting from Server Virtualization: Beyond Initial Workload
Consolidation -- Increasing the use of server virtualization is a top
priority.Virtualization can reduce costs, simplify management, and improve
application availability and disaster protection. Learn more about boosting
the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev
_______________________________________________
Gmod-phendiver mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-phendiver
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Gmod-phendiver] [Gmod-schema] Proposed changes to NatDiv related modules

Naama Menda
answering inline

On Wed, Apr 20, 2011 at 3:02 PM, Yuri Bendana <[hidden email]> wrote:
On Wed, Apr 20, 2011 at 1:41 AM, seth redmond <[hidden email]> wrote:
The protocol table changes (not null on the name, etc) were already agreed weren't they? we've been running into problems on these ourselves so I'd back having this done as soon as poss.  
 
 I don't recall any of my previous proposals being formally agreed to, so Naama suggested I send them out again.


Could you explain your reasoning behind adding a unit_id field to project prop? I can see why phenotype is treated as a special case, but I'm not sure I see any reason to do the same to project prop. 

I do remember previously giving an example of a project property we have that includes a unit: "watering amount = 100 ml".

I think in such cases the value '100 ml' should go as is into the value field. Such props could go into projectprop or nd_experimentprop, depending on how you store your experiments, and it is very helpful to keep all the prop tables with a similar schema.
 
 
And didn't we already discuss adding the experiment_protocolprop / experiment_phenotypeprop back in December and decide that if you were using one protocol per nd_experiment it was unnecessary? has anything changed in the meantime? I'm not sure I see much value in changing the schema in this way now that we're all developing on it and there's a paper underway.

I find these property tables useful because I can store the experiment specific values of a protocol or phenotype while the nd_protocol and phenotype tables store only the descriptions.   This is a different way of using the schema than currently it's designed where the nd_protocol and phenotype tables store both description and values.  I prefer to keep them separate.  We have discussed this before, but I'm only restating it since Naama requested.  Even if the property tables were added to the schema, it shouldn't impact your work if you don't use them.  Future schema users may prefer my approach, or not.  But with a minor addition of some property tables, Chado can given them that choice.

Adding more prop tables doesn't have very significant effect on the schema, however we need to keep in mind that the natural diversity schema is already pretty complex, and not intuitive for data storage. We have to think about best practices, which is why we are focusing on writing use cases in the paper, and in the wiki. I think we should use less tables if possible, unless there is a good reason.
One question is are those properties of the measured phenotype, or of the experiment_phenotype ? Same for protocol/experiment_protocol.
Also, what kind of properties are you trying to store?
One issue I've encountered working with properties and properties of properties, is that database performance becomes an issue, and queries become very complicated.
 
-Naama




On 20 Apr 2011, at 02:57, Yuri Bendana wrote:

I have some proposed changes which Chado may want adopt: 
  • Adding units_id (or unit_id) to projectprop.
  • Adding the property tables environmentprop, phenstatementprop.  These are useful when creating phenstatements.
  • Removal of NOT NULL from nd_protocol.name.   Addition of nd_protocol.type_id.  This allows you to access a protocol by type_id instead of name.
  • Removing NOT NULL from nd_experimentprop.value.  This makes it consistent with other property tables.
  • Adding property tables nd_experiment_protocolprop and nd_experiment_phenotypeprop.  I use these to store protocol values and phenotype observations specific to an nd_experiment.
The diffs to each module are below.

yuri

===================================================================
--- project.sql (revision 24826)
+++ project.sql (working copy)

-- projectprop
@@ -33,6 +33,8 @@
  type_id integer NOT NULL,
  FOREIGN KEY (type_id) REFERENCES cvterm (cvterm_id) ON DELETE CASCADE,
  value text,
+ units_id integer REFERENCES cvterm (cvterm_id) ON DELETE SET NULL,
  rank integer not null default 0,
  CONSTRAINT projectprop_c1 UNIQUE (project_id, type_id, rank)
 );

===================================================================
--- genetic.sql (revision 24826)
+++ genetic.sql (working copy)
@@ -94,6 +94,18 @@
 
+-- ================================================
+-- TABLE: environmentprop
+-- ================================================
+CREATE TABLE environmentprop (
+    environmentprop_id serial PRIMARY KEY NOT NULL,
+    environment_id integer NOT NULL REFERENCES environment (environment_id) ON DELETE CASCADE INITIALLY DEFERRED,
+    type_id integer NOT NULL REFERENCES cvterm (cvterm_id) ON DELETE CASCADE INITIALLY DEFERRED ,
+    value text,
+    units_id integer REFERENCES cvterm (cvterm_id) ON DELETE CASCADE INITIALLY DEFERRED ,
+    rank integer NOT NULL DEFAULT 0,
+    constraint environmentprop_c1 UNIQUE (environment_id,type_id,rank)
+);
 

 -- ================================================
+-- TABLE: phenstatementprop
+-- ================================================
+CREATE TABLE phenstatementprop (
+    phenstatementprop_id serial PRIMARY KEY NOT NULL,
+    phenstatement_id integer NOT NULL REFERENCES phenstatement (phenstatement_id) ON DELETE CASCADE INITIALLY DEFERRED,
+    type_id integer NOT NULL REFERENCES cvterm (cvterm_id) ON DELETE CASCADE INITIALLY DEFERRED ,
+    value text,
+    units_id integer REFERENCES cvterm (cvterm_id) ON DELETE CASCADE INITIALLY DEFERRED ,
+    rank integer NOT NULL DEFAULT 0,
+    constraint phenstatementprop_c1 UNIQUE (phenstatement_id,type_id,rank)
+);

===================================================================
--- natural_diversity.sql (revision 24826)
+++ natural_diversity.sql (working copy)

@@ -63,7 +62,7 @@
     nd_experimentprop_id serial PRIMARY KEY NOT NULL,
     nd_experiment_id integer NOT NULL references nd_experiment (nd_experiment_id) on delete cascade INITIALLY DEFERRED,
     type_id integer NOT NULL references cvterm (cvterm_id) on delete cascade INITIALLY DEFERRED ,
-    value character varying(255) NOT NULL,
+    value character varying(255),
     rank integer NOT NULL default 0,
     constraint nd_experimentprop_c1 unique (nd_experiment_id,type_id,rank)
 );

 CREATE TABLE nd_protocol (
     nd_protocol_id serial PRIMARY KEY  NOT NULL,
-    name character varying(255) NOT NULL unique
+    name character varying(255) unique,
+    type_id integer NOT NULL references cvterm (cvterm_id) on delete cascade INITIALLY DEFERRED
 );
 
+CREATE TABLE nd_experiment_protocolprop (
+    nd_experiment_protocolprop_id serial PRIMARY KEY,
+    nd_experiment_protocol_id integer NOT NULL REFERENCES nd_experiment_protocol ON DELETE CASCADE INITIALLY DEFERRED,
+    type_id integer NOT NULL REFERENCES cvterm (cvterm_id) ON DELETE CASCADE INITIALLY DEFERRED,
+    value text,
+    units_id integer REFERENCES cvterm (cvterm_id) ON DELETE SET NULL,      
+    rank integer DEFAULT 0 NOT NULL,
+    CONSTRAINT nd_experiment_protocolprop_c1 UNIQUE (nd_experiment_protocol_id,type_id,rank)
+);

+CREATE TABLE nd_experiment_phenotypeprop (
+    nd_experiment_phenotypeprop_id serial PRIMARY KEY,
+    nd_experiment_phenotype_id integer NOT NULL REFERENCES nd_experiment_phenotype ON DELETE CASCADE INITIALLY DEFERRED,
+    type_id integer NOT NULL REFERENCES cvterm (cvterm_id) ON DELETE CASCADE INITIALLY DEFERRED,
+    value text,
+    units_id integer REFERENCES cvterm (cvterm_id) ON DELETE SET NULL,    
+    cvalue_id integer REFERENCES cvterm (cvterm_id) ON DELETE SET NULL,   
+    rank integer DEFAULT 0 NOT NULL,
+    CONSTRAINT nd_experiment_phenotypeprop_c1 UNIQUE (nd_experiment_phenotype_id,type_id,rank)
+);


------------------------------------------------------------------------------
Benefiting from Server Virtualization: Beyond Initial Workload
Consolidation -- Increasing the use of server virtualization is a top
priority.Virtualization can reduce costs, simplify management, and improve
application availability and disaster protection. Learn more about boosting
the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema



------------------------------------------------------------------------------
Benefiting from Server Virtualization: Beyond Initial Workload
Consolidation -- Increasing the use of server virtualization is a top
priority.Virtualization can reduce costs, simplify management, and improve
application availability and disaster protection. Learn more about boosting
the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev
_______________________________________________
Gmod-phendiver mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-phendiver



------------------------------------------------------------------------------
Benefiting from Server Virtualization: Beyond Initial Workload
Consolidation -- Increasing the use of server virtualization is a top
priority.Virtualization can reduce costs, simplify management, and improve
application availability and disaster protection. Learn more about boosting
the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev
_______________________________________________
Gmod-phendiver mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-phendiver
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Gmod-phendiver] [Gmod-schema] Proposed changes to NatDiv related modules

Yuri Bendana-3


On Thu, Apr 21, 2011 at 7:51 AM, Naama Menda <[hidden email]> wrote:
On Wed, Apr 20, 2011 at 3:02 PM, Yuri Bendana <[hidden email]> wrote:
On Wed, Apr 20, 2011 at 1:41 AM, seth redmond <[hidden email]> wrote:
Could you explain your reasoning behind adding a unit_id field to project prop? I can see why phenotype is treated as a special case, but I'm not sure I see any reason to do the same to project prop. 

I do remember previously giving an example of a project property we have that includes a unit: "watering amount = 100 ml".

I think in such cases the value '100 ml' should go as is into the value field. Such props could go into projectprop or nd_experimentprop, depending on how you store your experiments, and it is very helpful to keep all the prop tables with a similar schema.

Do you mean all property tables in Chado or in NatDiv?  As far as all Chado property tables having the same design, I don't think that matters as much as long as there is some consistency in NatDiv.  We've discussed before about the generality of unit_id.  I view it as quite generic.  I would support all NatDiv related property tables which could conceivably store quantitative data to have a unit_id.  cvalue_id is also an attribute which is even more generic and would be a good candidate to have in all Chado property tables.  If unit_id is too specific, perhaps just adding cvalue_id will be enough and it can do double duty of storing unit for quantitative data or cvalue for qualitative data.
 
 
 
And didn't we already discuss adding the experiment_protocolprop / experiment_phenotypeprop back in December and decide that if you were using one protocol per nd_experiment it was unnecessary? has anything changed in the meantime? I'm not sure I see much value in changing the schema in this way now that we're all developing on it and there's a paper underway.

I find these property tables useful because I can store the experiment specific values of a protocol or phenotype while the nd_protocol and phenotype tables store only the descriptions.   This is a different way of using the schema than currently it's designed where the nd_protocol and phenotype tables store both description and values.  I prefer to keep them separate.  We have discussed this before, but I'm only restating it since Naama requested.  Even if the property tables were added to the schema, it shouldn't impact your work if you don't use them.  Future schema users may prefer my approach, or not.  But with a minor addition of some property tables, Chado can given them that choice.

Adding more prop tables doesn't have very significant effect on the schema, however we need to keep in mind that the natural diversity schema is already pretty complex, and not intuitive for data storage. We have to think about best practices, which is why we are focusing on writing use cases in the paper, and in the wiki. I think we should use less tables if possible, unless there is a good reason.
One question is are those properties of the measured phenotype, or of the experiment_phenotype ? Same for protocol/experiment_protocol.
Also, what kind of properties are you trying to store?
One issue I've encountered working with properties and properties of properties, is that database performance becomes an issue, and queries become very complicated.
  
In nd_experiment_phenotypeprop I store properties of type 'observation'.  This is the value of the observation of that phenotype in that experiment.  In nd_experiment_protocolprop, I store properties of type 'treatment amount'.  This is the value of that protocol in that experiment.  I'm not currently storing any properties of properties, but that does sound costly to query. 

------------------------------------------------------------------------------
Fulfilling the Lean Software Promise
Lean software platforms are now widely adopted and the benefits have been
demonstrated beyond question. Learn why your peers are replacing JEE
containers with lightweight application servers - and what you can gain
from the move. http://p.sf.net/sfu/vmware-sfemails
_______________________________________________
Gmod-phendiver mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-phendiver
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Gmod-phendiver] [Gmod-schema] Proposed changes to NatDiv related modules

Naama Menda
On Thu, Apr 21, 2011 at 7:32 PM, Yuri Bendana <[hidden email]> wrote:


On Thu, Apr 21, 2011 at 7:51 AM, Naama Menda <[hidden email]> wrote:
On Wed, Apr 20, 2011 at 3:02 PM, Yuri Bendana <[hidden email]> wrote:
On Wed, Apr 20, 2011 at 1:41 AM, seth redmond <[hidden email]> wrote:
Could you explain your reasoning behind adding a unit_id field to project prop? I can see why phenotype is treated as a special case, but I'm not sure I see any reason to do the same to project prop. 

I do remember previously giving an example of a project property we have that includes a unit: "watering amount = 100 ml".

I think in such cases the value '100 ml' should go as is into the value field. Such props could go into projectprop or nd_experimentprop, depending on how you store your experiments, and it is very helpful to keep all the prop tables with a similar schema.

Do you mean all property tables in Chado or in NatDiv?  As far as all Chado property tables having the same design, I don't think that matters as much as long as there is some consistency in NatDiv.  We've discussed before about the generality of unit_id.  I view it as quite generic.  I would support all NatDiv related property tables which could conceivably store quantitative data to have a unit_id.  cvalue_id is also an attribute which is even more generic and would be a good candidate to have in all Chado property tables.  If unit_id is too specific, perhaps just adding cvalue_id will be enough and it can do double duty of storing unit for quantitative data or cvalue for qualitative data.
 

Since properties are meant to be generic, I'm not sure unit_id is generic enough many props will not have units.
 
 
 
And didn't we already discuss adding the experiment_protocolprop / experiment_phenotypeprop back in December and decide that if you were using one protocol per nd_experiment it was unnecessary? has anything changed in the meantime? I'm not sure I see much value in changing the schema in this way now that we're all developing on it and there's a paper underway.

I find these property tables useful because I can store the experiment specific values of a protocol or phenotype while the nd_protocol and phenotype tables store only the descriptions.   This is a different way of using the schema than currently it's designed where the nd_protocol and phenotype tables store both description and values.  I prefer to keep them separate.  We have discussed this before, but I'm only restating it since Naama requested.  Even if the property tables were added to the schema, it shouldn't impact your work if you don't use them.  Future schema users may prefer my approach, or not.  But with a minor addition of some property tables, Chado can given them that choice.

Adding more prop tables doesn't have very significant effect on the schema, however we need to keep in mind that the natural diversity schema is already pretty complex, and not intuitive for data storage. We have to think about best practices, which is why we are focusing on writing use cases in the paper, and in the wiki. I think we should use less tables if possible, unless there is a good reason.
One question is are those properties of the measured phenotype, or of the experiment_phenotype ? Same for protocol/experiment_protocol.
Also, what kind of properties are you trying to store?
One issue I've encountered working with properties and properties of properties, is that database performance becomes an issue, and queries become very complicated.
  
In nd_experiment_phenotypeprop I store properties of type 'observation'.  This is the value of the observation of that phenotype in that experiment.  In nd_experiment_protocolprop, I store properties of type 'treatment amount'.  This is the value of that protocol in that experiment.  I'm not currently storing any properties of properties, but that does sound costly to query. 

Are you re-using phenotypes ? if not, than the property may just as well be a phenotypeprop

How about for now we merge the schema changes we have agreement on, and then deal with the less agreed upon ones ?

-Naama


------------------------------------------------------------------------------
WhatsUp Gold - Download Free Network Management Software
The most intuitive, comprehensive, and cost-effective network
management toolset available today.  Delivers lowest initial
acquisition cost and overall TCO of any competing solution.
http://p.sf.net/sfu/whatsupgold-sd
_______________________________________________
Gmod-phendiver mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-phendiver
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Gmod-phendiver] [Gmod-schema] Proposed changes to NatDiv related modules

Yuri Bendana-3
On Mon, Apr 25, 2011 at 12:41 PM, Naama Menda <[hidden email]> wrote:
On Thu, Apr 21, 2011 at 7:32 PM, Yuri Bendana <[hidden email]> wrote:


On Thu, Apr 21, 2011 at 7:51 AM, Naama Menda <[hidden email]> wrote:
On Wed, Apr 20, 2011 at 3:02 PM, Yuri Bendana <[hidden email]> wrote:
On Wed, Apr 20, 2011 at 1:41 AM, seth redmond <[hidden email]> wrote:
Could you explain your reasoning behind adding a unit_id field to project prop? I can see why phenotype is treated as a special case, but I'm not sure I see any reason to do the same to project prop. 

I do remember previously giving an example of a project property we have that includes a unit: "watering amount = 100 ml".

I think in such cases the value '100 ml' should go as is into the value field. Such props could go into projectprop or nd_experimentprop, depending on how you store your experiments, and it is very helpful to keep all the prop tables with a similar schema.

Do you mean all property tables in Chado or in NatDiv?  As far as all Chado property tables having the same design, I don't think that matters as much as long as there is some consistency in NatDiv.  We've discussed before about the generality of unit_id.  I view it as quite generic.  I would support all NatDiv related property tables which could conceivably store quantitative data to have a unit_id.  cvalue_id is also an attribute which is even more generic and would be a good candidate to have in all Chado property tables.  If unit_id is too specific, perhaps just adding cvalue_id will be enough and it can do double duty of storing unit for quantitative data or cvalue for qualitative data.
 

Since properties are meant to be generic, I'm not sure unit_id is generic enough many props will not have units.

How about cvalue_id?
 
 
 
 
And didn't we already discuss adding the experiment_protocolprop / experiment_phenotypeprop back in December and decide that if you were using one protocol per nd_experiment it was unnecessary? has anything changed in the meantime? I'm not sure I see much value in changing the schema in this way now that we're all developing on it and there's a paper underway.

I find these property tables useful because I can store the experiment specific values of a protocol or phenotype while the nd_protocol and phenotype tables store only the descriptions.   This is a different way of using the schema than currently it's designed where the nd_protocol and phenotype tables store both description and values.  I prefer to keep them separate.  We have discussed this before, but I'm only restating it since Naama requested.  Even if the property tables were added to the schema, it shouldn't impact your work if you don't use them.  Future schema users may prefer my approach, or not.  But with a minor addition of some property tables, Chado can given them that choice.

Adding more prop tables doesn't have very significant effect on the schema, however we need to keep in mind that the natural diversity schema is already pretty complex, and not intuitive for data storage. We have to think about best practices, which is why we are focusing on writing use cases in the paper, and in the wiki. I think we should use less tables if possible, unless there is a good reason.
One question is are those properties of the measured phenotype, or of the experiment_phenotype ? Same for protocol/experiment_protocol.
Also, what kind of properties are you trying to store?
One issue I've encountered working with properties and properties of properties, is that database performance becomes an issue, and queries become very complicated.
  
In nd_experiment_phenotypeprop I store properties of type 'observation'.  This is the value of the observation of that phenotype in that experiment.  In nd_experiment_protocolprop, I store properties of type 'treatment amount'.  This is the value of that protocol in that experiment.  I'm not currently storing any properties of properties, but that does sound costly to query. 

Are you re-using phenotypes ? if not, than the property may just as well be a phenotypeprop

Yes I'm reusing phenotype and protocol descriptions.
 

How about for now we merge the schema changes we have agreement on, and then deal with the less agreed upon ones ?

Sounds good.
 

------------------------------------------------------------------------------
WhatsUp Gold - Download Free Network Management Software
The most intuitive, comprehensive, and cost-effective network
management toolset available today.  Delivers lowest initial
acquisition cost and overall TCO of any competing solution.
http://p.sf.net/sfu/whatsupgold-sd
_______________________________________________
Gmod-phendiver mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-phendiver
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Gmod-phendiver] [Gmod-schema] Proposed changes to NatDiv related modules

Bob MacCallum
On Mon, Apr 25, 2011 at 9:29 PM, Yuri Bendana <[hidden email]> wrote:

> On Mon, Apr 25, 2011 at 12:41 PM, Naama Menda <[hidden email]> wrote:
>>
>> On Thu, Apr 21, 2011 at 7:32 PM, Yuri Bendana <[hidden email]> wrote:
>>>
>>>
>>> On Thu, Apr 21, 2011 at 7:51 AM, Naama Menda <[hidden email]> wrote:
>>>>
>>>> On Wed, Apr 20, 2011 at 3:02 PM, Yuri Bendana <[hidden email]>
>>>> wrote:
>>>>>>
>>>>>> On Wed, Apr 20, 2011 at 1:41 AM, seth
>>>>>> redmond <[hidden email]> wrote:
>>>>>> Could you explain your reasoning behind adding a unit_id field to
>>>>>> project prop? I can see why phenotype is treated as a special case, but I'm
>>>>>> not sure I see any reason to do the same to project prop.
>>>>>
>>>>> I do remember previously giving an example of a project property we
>>>>> have that includes a unit: "watering amount = 100 ml".
>>>>
>>>> I think in such cases the value '100 ml' should go as is into the value
>>>> field. Such props could go into projectprop or nd_experimentprop, depending
>>>> on how you store your experiments, and it is very helpful to keep all the
>>>> prop tables with a similar schema.
>>>
>>> Do you mean all property tables in Chado or in NatDiv?  As far as all
>>> Chado property tables having the same design, I don't think that matters as
>>> much as long as there is some consistency in NatDiv.  We've discussed before
>>> about the generality of unit_id.  I view it as quite generic.  I would
>>> support all NatDiv related property tables which could conceivably store
>>> quantitative data to have a unit_id.  cvalue_id is also an attribute which
>>> is even more generic and would be a good candidate to have in all Chado
>>> property tables.  If unit_id is too specific, perhaps just adding cvalue_id
>>> will be enough and it can do double duty of storing unit for quantitative
>>> data or cvalue for qualitative data.
>>>
>>
>> Since properties are meant to be generic, I'm not sure unit_id is generic
>> enough many props will not have units.
>
> How about cvalue_id?
>

I'm pretty sure there are many where a cvterm value would be good for
us too in *prop tables.
And I agree it could have a double use as a unit_id for a quantitative
value.  I would vote for cvalue
in all the natdiv prop tables and also projectprop and stockprop if possible.

It would be good to wrap up these loose ends in the schema so that we
don't have to
rebuild BCS more than necessary.  How many more issues were there?

* nd_protocol.name drop unique constraint
* more props tables?

To be honest, I've lost track a bit!

cheers,
Bob.


>>
>>
>>>>
>>>>
>>>>>
>>>>>
>>>>>>
>>>>>> And didn't we already discuss adding the experiment_protocolprop /
>>>>>> experiment_phenotypeprop back in December and decide that if you were using
>>>>>> one protocol per nd_experiment it was unnecessary? has anything changed in
>>>>>> the meantime? I'm not sure I see much value in changing the schema in this
>>>>>> way now that we're all developing on it and there's a paper underway.
>>>>>
>>>>> I find these property tables useful because I can store the experiment
>>>>> specific values of a protocol or phenotype while the nd_protocol and
>>>>> phenotype tables store only the descriptions.   This is a different way of
>>>>> using the schema than currently it's designed where the nd_protocol and
>>>>> phenotype tables store both description and values.  I prefer to keep them
>>>>> separate.  We have discussed this before, but I'm only restating it since
>>>>> Naama requested.  Even if the property tables were added to the schema, it
>>>>> shouldn't impact your work if you don't use them.  Future schema users may
>>>>> prefer my approach, or not.  But with a minor addition of some property
>>>>> tables, Chado can given them that choice.
>>>>
>>>> Adding more prop tables doesn't have very significant effect on the
>>>> schema, however we need to keep in mind that the natural diversity schema is
>>>> already pretty complex, and not intuitive for data storage. We have to think
>>>> about best practices, which is why we are focusing on writing use cases in
>>>> the paper, and in the wiki. I think we should use less tables if possible,
>>>> unless there is a good reason.
>>>> One question is are those properties of the measured phenotype, or of
>>>> the experiment_phenotype ? Same for protocol/experiment_protocol.
>>>> Also, what kind of properties are you trying to store?
>>>> One issue I've encountered working with properties and properties of
>>>> properties, is that database performance becomes an issue, and queries
>>>> become very complicated.
>>>>
>>>
>>> In nd_experiment_phenotypeprop I store properties of type 'observation'.
>>>  This is the value of the observation of that phenotype in that experiment.
>>>  In nd_experiment_protocolprop, I store properties of type 'treatment
>>> amount'.  This is the value of that protocol in that experiment.  I'm not
>>> currently storing any properties of properties, but that does sound costly
>>> to query.
>>
>> Are you re-using phenotypes ? if not, than the property may just as well
>> be a phenotypeprop
>
> Yes I'm reusing phenotype and protocol descriptions.
>
>>
>> How about for now we merge the schema changes we have agreement on, and
>> then deal with the less agreed upon ones ?
>
> Sounds good.
>

------------------------------------------------------------------------------
WhatsUp Gold - Download Free Network Management Software
The most intuitive, comprehensive, and cost-effective network
management toolset available today.  Delivers lowest initial
acquisition cost and overall TCO of any competing solution.
http://p.sf.net/sfu/whatsupgold-sd
_______________________________________________
Gmod-phendiver mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-phendiver
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Gmod-phendiver] [Gmod-schema] Proposed changes to NatDiv related modules

Naama Menda
>>
>> Since properties are meant to be generic, I'm not sure unit_id is generic
>> enough many props will not have units.
>
> How about cvalue_id?
>

I'm pretty sure there are many where a cvterm value would be good for
us too in *prop tables.
And I agree it could have a double use as a unit_id for a quantitative
value.  I would vote for cvalue
in all the natdiv prop tables and also projectprop and stockprop if possible.

cvalue is also not generic for properties of an object. Adding more cvterm fields to some prop tables will result with the same problem we are seeing now with the phenotype table;
It has 4 cvterm FKs, and using it is not intuitive and sometimes contradicting. Would have been better to have a phenotype_cvterm with a type id, and then there are no constraints on the numbers and types of cvterms you may use
 
It would be good to wrap up these loose ends in the schema so that we
don't have to
rebuild BCS more than necessary.  How many more issues were there?

BCS patches are fine , unless Rob thinks it's getting too frequent..
 
* nd_protocol.name drop unique constraint
* more props tables?

To be honest, I've lost track a bit!

I'm adding the changes with consensus. Dropping the name unique constraint is probably not one of them right now.
There are quite a few changes we have agreement on, so I'd like to commit those first, and then we can keep arguing about constraints and properties :-)
 
cheers,
Bob.


-Naama

 
>>
>>
>>>>
>>>>
>>>>>
>>>>>
>>>>>>
>>>>>> And didn't we already discuss adding the experiment_protocolprop /
>>>>>> experiment_phenotypeprop back in December and decide that if you were using
>>>>>> one protocol per nd_experiment it was unnecessary? has anything changed in
>>>>>> the meantime? I'm not sure I see much value in changing the schema in this
>>>>>> way now that we're all developing on it and there's a paper underway.
>>>>>
>>>>> I find these property tables useful because I can store the experiment
>>>>> specific values of a protocol or phenotype while the nd_protocol and
>>>>> phenotype tables store only the descriptions.   This is a different way of
>>>>> using the schema than currently it's designed where the nd_protocol and
>>>>> phenotype tables store both description and values.  I prefer to keep them
>>>>> separate.  We have discussed this before, but I'm only restating it since
>>>>> Naama requested.  Even if the property tables were added to the schema, it
>>>>> shouldn't impact your work if you don't use them.  Future schema users may
>>>>> prefer my approach, or not.  But with a minor addition of some property
>>>>> tables, Chado can given them that choice.
>>>>
>>>> Adding more prop tables doesn't have very significant effect on the
>>>> schema, however we need to keep in mind that the natural diversity schema is
>>>> already pretty complex, and not intuitive for data storage. We have to think
>>>> about best practices, which is why we are focusing on writing use cases in
>>>> the paper, and in the wiki. I think we should use less tables if possible,
>>>> unless there is a good reason.
>>>> One question is are those properties of the measured phenotype, or of
>>>> the experiment_phenotype ? Same for protocol/experiment_protocol.
>>>> Also, what kind of properties are you trying to store?
>>>> One issue I've encountered working with properties and properties of
>>>> properties, is that database performance becomes an issue, and queries
>>>> become very complicated.
>>>>
>>>
>>> In nd_experiment_phenotypeprop I store properties of type 'observation'.
>>>  This is the value of the observation of that phenotype in that experiment.
>>>  In nd_experiment_protocolprop, I store properties of type 'treatment
>>> amount'.  This is the value of that protocol in that experiment.  I'm not
>>> currently storing any properties of properties, but that does sound costly
>>> to query.
>>
>> Are you re-using phenotypes ? if not, than the property may just as well
>> be a phenotypeprop
>
> Yes I'm reusing phenotype and protocol descriptions.
>
>>
>> How about for now we merge the schema changes we have agreement on, and
>> then deal with the less agreed upon ones ?
>
> Sounds good.
>

------------------------------------------------------------------------------
WhatsUp Gold - Download Free Network Management Software
The most intuitive, comprehensive, and cost-effective network
management toolset available today.  Delivers lowest initial
acquisition cost and overall TCO of any competing solution.
http://p.sf.net/sfu/whatsupgold-sd
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema


------------------------------------------------------------------------------
WhatsUp Gold - Download Free Network Management Software
The most intuitive, comprehensive, and cost-effective network
management toolset available today.  Delivers lowest initial
acquisition cost and overall TCO of any competing solution.
http://p.sf.net/sfu/whatsupgold-sd
_______________________________________________
Gmod-phendiver mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-phendiver
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Gmod-phendiver] [Gmod-schema] Proposed changes to NatDiv related modules

Yuri Bendana-3


On Wed, Apr 27, 2011 at 8:29 AM, Naama Menda <[hidden email]> wrote:
>>
>> Since properties are meant to be generic, I'm not sure unit_id is generic
>> enough many props will not have units.
>
> How about cvalue_id?
>

I'm pretty sure there are many where a cvterm value would be good for
us too in *prop tables.
And I agree it could have a double use as a unit_id for a quantitative
value.  I would vote for cvalue
in all the natdiv prop tables and also projectprop and stockprop if possible.

cvalue is also not generic for properties of an object. Adding more cvterm fields to some prop tables will result with the same problem we are seeing now with the phenotype table;
It has 4 cvterm FKs, and using it is not intuitive and sometimes contradicting. Would have been better to have a phenotype_cvterm with a type id, and then there are no constraints on the numbers and types of cvterms you may use
  

I really don't understand how cvalue_id could not be considered generic enough. It's almost equivalent to 'value' but with the value coming from a cv.  Since Chado is designed around cv's, this field is at the core of its usage.  I think it's an oversight that Chado doesn't have cvalue_id in all its property tables, but at least the NatDiv related ones can have it.

Anyway, could we vote on this?  So far, it's at +1 (Bob and I gives +2 and Naama at -1).

yuri


------------------------------------------------------------------------------
WhatsUp Gold - Download Free Network Management Software
The most intuitive, comprehensive, and cost-effective network
management toolset available today.  Delivers lowest initial
acquisition cost and overall TCO of any competing solution.
http://p.sf.net/sfu/whatsupgold-sd
_______________________________________________
Gmod-phendiver mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-phendiver
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Gmod-phendiver] [Gmod-schema] Proposed changes to NatDiv related modules

Jonathan "Duke" Leto
In reply to this post by Bob MacCallum
Howdy,

> It would be good to wrap up these loose ends in the schema so that we
> don't have to
> rebuild BCS more than necessary.  How many more issues were there?

I wouldn't worry to much about rebuilding/releasing Bio::Chado::Schema, the
process is automated. One concern that I stil have is that we do not have
any concept of the "schema version" for each module which is stored in
the database.

Storing a "schema version" for all of Chado and a version for
each module would allow tools to be smarter, notice incompatibilities and
many other spiffy features which are impossible now.

Doing this embraces the inevitability of the schema changing over time, which
will be natural as Chado grows.

Duke

--
Jonathan "Duke" Leto <[hidden email]>
209.691.DUKE // http://leto.net
NOTE: Personal email is only checked twice a day at 10am/2pm PST,
please call/text for time-sensitive matters.

------------------------------------------------------------------------------
WhatsUp Gold - Download Free Network Management Software
The most intuitive, comprehensive, and cost-effective network
management toolset available today.  Delivers lowest initial
acquisition cost and overall TCO of any competing solution.
http://p.sf.net/sfu/whatsupgold-sd
_______________________________________________
Gmod-phendiver mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-phendiver
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Gmod-phendiver] [Gmod-schema] Proposed changes to NatDiv related modules

Naama Menda
In reply to this post by Bob MacCallum
hi Bob,

could you explain the need for a unit_id for project and stock properties?
Also for the  ND tables for that matter.

Are phenotypeprops always or most of the time quantitative? I think phenotypeprop may be the only case where cvalue_id might be needed,
although as I see it, the origin of the problem comes from storing the phenotype values in the property table, and not in the phenotype table.
If the value is stored in phenotype, then the unit could be a phenotype_cvterm (which shouldn't be deprecated, even if we add phenotypeprop) .

Regardless of what we decide, the phenotype module needs revisiting.

-Naama



I'm pretty sure there are many where a cvterm value would be good for
us too in *prop tables.
And I agree it could have a double use as a unit_id for a quantitative
value.  I would vote for cvalue
in all the natdiv prop tables and also projectprop and stockprop if possible.

It would be good to wrap up these loose ends in the schema so that we
don't have to
rebuild BCS more than necessary.  How many more issues were there?

* nd_protocol.name drop unique constraint
* more props tables?

To be honest, I've lost track a bit!

cheers,
Bob.


>>
>>
>>>>
>>>>
>>>>>
>>>>>
>>>>>>
>>>>>> And didn't we already discuss adding the experiment_protocolprop /
>>>>>> experiment_phenotypeprop back in December and decide that if you were using
>>>>>> one protocol per nd_experiment it was unnecessary? has anything changed in
>>>>>> the meantime? I'm not sure I see much value in changing the schema in this
>>>>>> way now that we're all developing on it and there's a paper underway.
>>>>>
>>>>> I find these property tables useful because I can store the experiment
>>>>> specific values of a protocol or phenotype while the nd_protocol and
>>>>> phenotype tables store only the descriptions.   This is a different way of
>>>>> using the schema than currently it's designed where the nd_protocol and
>>>>> phenotype tables store both description and values.  I prefer to keep them
>>>>> separate.  We have discussed this before, but I'm only restating it since
>>>>> Naama requested.  Even if the property tables were added to the schema, it
>>>>> shouldn't impact your work if you don't use them.  Future schema users may
>>>>> prefer my approach, or not.  But with a minor addition of some property
>>>>> tables, Chado can given them that choice.
>>>>
>>>> Adding more prop tables doesn't have very significant effect on the
>>>> schema, however we need to keep in mind that the natural diversity schema is
>>>> already pretty complex, and not intuitive for data storage. We have to think
>>>> about best practices, which is why we are focusing on writing use cases in
>>>> the paper, and in the wiki. I think we should use less tables if possible,
>>>> unless there is a good reason.
>>>> One question is are those properties of the measured phenotype, or of
>>>> the experiment_phenotype ? Same for protocol/experiment_protocol.
>>>> Also, what kind of properties are you trying to store?
>>>> One issue I've encountered working with properties and properties of
>>>> properties, is that database performance becomes an issue, and queries
>>>> become very complicated.
>>>>
>>>
>>> In nd_experiment_phenotypeprop I store properties of type 'observation'.
>>>  This is the value of the observation of that phenotype in that experiment.
>>>  In nd_experiment_protocolprop, I store properties of type 'treatment
>>> amount'.  This is the value of that protocol in that experiment.  I'm not
>>> currently storing any properties of properties, but that does sound costly
>>> to query.
>>
>> Are you re-using phenotypes ? if not, than the property may just as well
>> be a phenotypeprop
>
> Yes I'm reusing phenotype and protocol descriptions.
>
>>
>> How about for now we merge the schema changes we have agreement on, and
>> then deal with the less agreed upon ones ?
>
> Sounds good.
>

------------------------------------------------------------------------------
WhatsUp Gold - Download Free Network Management Software
The most intuitive, comprehensive, and cost-effective network
management toolset available today.  Delivers lowest initial
acquisition cost and overall TCO of any competing solution.
http://p.sf.net/sfu/whatsupgold-sd
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema


------------------------------------------------------------------------------
WhatsUp Gold - Download Free Network Management Software
The most intuitive, comprehensive, and cost-effective network
management toolset available today.  Delivers lowest initial
acquisition cost and overall TCO of any competing solution.
http://p.sf.net/sfu/whatsupgold-sd
_______________________________________________
Gmod-phendiver mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-phendiver
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Gmod-phendiver] [Gmod-schema] Proposed changes to NatDiv related modules

Sook Jung
Hi,
I agree with Naama that the problem will be solved if we just store
phenotype values in phenotype table. The phenotype table clearly has
value and cvalue_id fields for quantitative and qualitative values.
The phenotype table also has observable_id and attr_id to link to
cvterm table where the phenotype descriptors are meant to be stored.
If the phenotype descriptors have more associated data, there are
necessary tables in CV module to deal with those.
Sook

On Wed, Apr 27, 2011 at 12:07 PM, Naama Menda <[hidden email]> wrote:

>
> hi Bob,
>
> could you explain the need for a unit_id for project and stock properties?
> Also for the  ND tables for that matter.
>
> Are phenotypeprops always or most of the time quantitative? I think phenotypeprop may be the only case where cvalue_id might be needed,
> although as I see it, the origin of the problem comes from storing the phenotype values in the property table, and not in the phenotype table.
> If the value is stored in phenotype, then the unit could be a phenotype_cvterm (which shouldn't be deprecated, even if we add phenotypeprop) .
>
> Regardless of what we decide, the phenotype module needs revisiting.
>
> -Naama
>
>
>
> I'm pretty sure there are many where a cvterm value would be good for
>>
>> us too in *prop tables.
>> And I agree it could have a double use as a unit_id for a quantitative
>> value.  I would vote for cvalue
>> in all the natdiv prop tables and also projectprop and stockprop if possible.
>>
>> It would be good to wrap up these loose ends in the schema so that we
>> don't have to
>> rebuild BCS more than necessary.  How many more issues were there?
>>
>> * nd_protocol.name drop unique constraint
>> * more props tables?
>>
>> To be honest, I've lost track a bit!
>>
>> cheers,
>> Bob.
>>
>>
>> >>
>> >>
>> >>>>
>> >>>>
>> >>>>>
>> >>>>>
>> >>>>>>
>> >>>>>> And didn't we already discuss adding the experiment_protocolprop /
>> >>>>>> experiment_phenotypeprop back in December and decide that if you were using
>> >>>>>> one protocol per nd_experiment it was unnecessary? has anything changed in
>> >>>>>> the meantime? I'm not sure I see much value in changing the schema in this
>> >>>>>> way now that we're all developing on it and there's a paper underway.
>> >>>>>
>> >>>>> I find these property tables useful because I can store the experiment
>> >>>>> specific values of a protocol or phenotype while the nd_protocol and
>> >>>>> phenotype tables store only the descriptions.   This is a different way of
>> >>>>> using the schema than currently it's designed where the nd_protocol and
>> >>>>> phenotype tables store both description and values.  I prefer to keep them
>> >>>>> separate.  We have discussed this before, but I'm only restating it since
>> >>>>> Naama requested.  Even if the property tables were added to the schema, it
>> >>>>> shouldn't impact your work if you don't use them.  Future schema users may
>> >>>>> prefer my approach, or not.  But with a minor addition of some property
>> >>>>> tables, Chado can given them that choice.
>> >>>>
>> >>>> Adding more prop tables doesn't have very significant effect on the
>> >>>> schema, however we need to keep in mind that the natural diversity schema is
>> >>>> already pretty complex, and not intuitive for data storage. We have to think
>> >>>> about best practices, which is why we are focusing on writing use cases in
>> >>>> the paper, and in the wiki. I think we should use less tables if possible,
>> >>>> unless there is a good reason.
>> >>>> One question is are those properties of the measured phenotype, or of
>> >>>> the experiment_phenotype ? Same for protocol/experiment_protocol.
>> >>>> Also, what kind of properties are you trying to store?
>> >>>> One issue I've encountered working with properties and properties of
>> >>>> properties, is that database performance becomes an issue, and queries
>> >>>> become very complicated.
>> >>>>
>> >>>
>> >>> In nd_experiment_phenotypeprop I store properties of type 'observation'.
>> >>>  This is the value of the observation of that phenotype in that experiment.
>> >>>  In nd_experiment_protocolprop, I store properties of type 'treatment
>> >>> amount'.  This is the value of that protocol in that experiment.  I'm not
>> >>> currently storing any properties of properties, but that does sound costly
>> >>> to query.
>> >>
>> >> Are you re-using phenotypes ? if not, than the property may just as well
>> >> be a phenotypeprop
>> >
>> > Yes I'm reusing phenotype and protocol descriptions.
>> >
>> >>
>> >> How about for now we merge the schema changes we have agreement on, and
>> >> then deal with the less agreed upon ones ?
>> >
>> > Sounds good.
>> >
>>
>> ------------------------------------------------------------------------------
>> WhatsUp Gold - Download Free Network Management Software
>> The most intuitive, comprehensive, and cost-effective network
>> management toolset available today.  Delivers lowest initial
>> acquisition cost and overall TCO of any competing solution.
>> http://p.sf.net/sfu/whatsupgold-sd
>> _______________________________________________
>> Gmod-schema mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>
>
> ------------------------------------------------------------------------------
> WhatsUp Gold - Download Free Network Management Software
> The most intuitive, comprehensive, and cost-effective network
> management toolset available today.  Delivers lowest initial
> acquisition cost and overall TCO of any competing solution.
> http://p.sf.net/sfu/whatsupgold-sd
> _______________________________________________
> Gmod-schema mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>

------------------------------------------------------------------------------
WhatsUp Gold - Download Free Network Management Software
The most intuitive, comprehensive, and cost-effective network
management toolset available today.  Delivers lowest initial
acquisition cost and overall TCO of any competing solution.
http://p.sf.net/sfu/whatsupgold-sd
_______________________________________________
Gmod-phendiver mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-phendiver
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Gmod-phendiver] [Gmod-schema] Proposed changes to NatDiv related modules

Robert Buels
Wow.  Truly, these schema design discussions are exceptional.  OK, I'm
going to try to weigh in as tersely as possible.

We all need to put a high priority on both making decisions, and making
compromises, in order to move forward quickly.  The more it drags out,
the more we lose focus on what we're talking about.

Yuri, you seem to be going to great lengths to eliminate any redundancy
in the way your particular data is being stored, even going so far as to
making end-runs around the design of the existing modules, like
shoehorning things into phenotypeprop that were not meant to go there
(and then saying the phenotypeprop table needs more columns to model
them better), or wanting to add linking tables to go directly between
the ND tables and phenotypeprop.  These are big, pain-in-the-butt
changes, and as far as I've been able to see, the only benefit that
you're actually getting from them is being able to save on row count.
And by the time you tot up the additional rows (and columns) used in
phenotypeprop (and in extra linking tables if you're still using those),
you are probably not even saving there.  Maybe I'm wrong.  I dunno.  It
all sounds like premature optimization to me.

Anyway, let's think for a second about the priorities we have for this
module. As I see it, the priority list goes something like:

1. to write a nice paper before the sun becomes a red giant
2. to make a Chado module that will be very useful to both *us*, and to
a wide variety of other people, that need to store (and query!) ND data.
     a.) should be as easy as possible to learn to use:
           - as few tables as possible
           - as few different ways to do things as possible. (There's
             More Than One Way to To It is the main reason so many
             people hate Perl)
     b.) ND should be *able* to store any kind of data we can think of
     c.) released in a timely fashion (it has already taken too long,
         really)


Using these as a guide, I now propose the following algorithm for how to
proceed.

1. We organize the schema design issues in a very granular fashion,
rather than have them all mixed up in email:  In an issue tracker
somewhere, everybody enters each remaining imperfection they perceive in
the schema *as it stands right now in GMOD svn*.

2. We have a Skype meeting to classify each issue according to when it
will be worked on: before the first release, after the first release, or
never.

3. We work on issues that we have, in committee, decided must be
addressed before the first public release, we tag it as the first
release, and we finish writing the paper and submit it.  The paper will
contain only things that refer to the official released version.

4. We see that a schema migration system is implemented in Chado, so
that it's possible to make more frequent releases, and users can migrate
data stored in the old version of the schema.

5. We continue using the issue tracker that we set up in step 1, and
continue working on other issues.  At this point, with the migration
system in place, we will have a lot more freedom to make changes to both
the ND module as it was first released, and to other parts of Chado as
well (like the phenotype module).

Thoughts on this plan?  Does everybody know how to use an issue tracker?
  Anybody have a preferred public issue tracker they like (I was
thinking of using the recently upgraded Github Issues, which I heard
were Scott's favorite)?

Rob


------------------------------------------------------------------------------
WhatsUp Gold - Download Free Network Management Software
The most intuitive, comprehensive, and cost-effective network
management toolset available today.  Delivers lowest initial
acquisition cost and overall TCO of any competing solution.
http://p.sf.net/sfu/whatsupgold-sd
_______________________________________________
Gmod-phendiver mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-phendiver
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Gmod-phendiver] [Gmod-schema] Proposed changes to NatDiv related modules

Matt
Sorry for the duplicate post Rob- didn't hit reply-all,

Rather than (or perhaps in conjunction with) tickets why not use a
test framework for solving these ongoing issues?  I.e. actually code
tests that pass/fail. When all the proposed tests pass- the schema is
ready for prime time.  By definition the tests will act as use-case
proxies- if you can't phrase your requirements in a test your likely
have little hope of defining a schema.  Anyone could add tests that
they feel are necessary- and people can offer different (coded)
solutions for having them pass.

def "test that I can return parameter foo from protocol bar"
# do stuff
end

def "test that a phenotype can have properties bar and blorf"
# do other stuff
end

Back to lurkin

On Thu, Apr 28, 2011 at 7:39 PM, Robert Buels <[hidden email]> wrote:

> Wow.  Truly, these schema design discussions are exceptional.  OK, I'm
> going to try to weigh in as tersely as possible.
>
> We all need to put a high priority on both making decisions, and making
> compromises, in order to move forward quickly.  The more it drags out,
> the more we lose focus on what we're talking about.
>
> Yuri, you seem to be going to great lengths to eliminate any redundancy
> in the way your particular data is being stored, even going so far as to
> making end-runs around the design of the existing modules, like
> shoehorning things into phenotypeprop that were not meant to go there
> (and then saying the phenotypeprop table needs more columns to model
> them better), or wanting to add linking tables to go directly between
> the ND tables and phenotypeprop.  These are big, pain-in-the-butt
> changes, and as far as I've been able to see, the only benefit that
> you're actually getting from them is being able to save on row count.
> And by the time you tot up the additional rows (and columns) used in
> phenotypeprop (and in extra linking tables if you're still using those),
> you are probably not even saving there.  Maybe I'm wrong.  I dunno.  It
> all sounds like premature optimization to me.
>
> Anyway, let's think for a second about the priorities we have for this
> module. As I see it, the priority list goes something like:
>
> 1. to write a nice paper before the sun becomes a red giant
> 2. to make a Chado module that will be very useful to both *us*, and to
> a wide variety of other people, that need to store (and query!) ND data.
>     a.) should be as easy as possible to learn to use:
>           - as few tables as possible
>           - as few different ways to do things as possible. (There's
>             More Than One Way to To It is the main reason so many
>             people hate Perl)
>     b.) ND should be *able* to store any kind of data we can think of
>     c.) released in a timely fashion (it has already taken too long,
>         really)
>
>
> Using these as a guide, I now propose the following algorithm for how to
> proceed.
>
> 1. We organize the schema design issues in a very granular fashion,
> rather than have them all mixed up in email:  In an issue tracker
> somewhere, everybody enters each remaining imperfection they perceive in
> the schema *as it stands right now in GMOD svn*.
>
> 2. We have a Skype meeting to classify each issue according to when it
> will be worked on: before the first release, after the first release, or
> never.
>
> 3. We work on issues that we have, in committee, decided must be
> addressed before the first public release, we tag it as the first
> release, and we finish writing the paper and submit it.  The paper will
> contain only things that refer to the official released version.
>
> 4. We see that a schema migration system is implemented in Chado, so
> that it's possible to make more frequent releases, and users can migrate
> data stored in the old version of the schema.
>
> 5. We continue using the issue tracker that we set up in step 1, and
> continue working on other issues.  At this point, with the migration
> system in place, we will have a lot more freedom to make changes to both
> the ND module as it was first released, and to other parts of Chado as
> well (like the phenotype module).
>
> Thoughts on this plan?  Does everybody know how to use an issue tracker?
>  Anybody have a preferred public issue tracker they like (I was
> thinking of using the recently upgraded Github Issues, which I heard
> were Scott's favorite)?
>
> Rob
>
>
> ------------------------------------------------------------------------------
> WhatsUp Gold - Download Free Network Management Software
> The most intuitive, comprehensive, and cost-effective network
> management toolset available today.  Delivers lowest initial
> acquisition cost and overall TCO of any competing solution.
> http://p.sf.net/sfu/whatsupgold-sd
> _______________________________________________
> Gmod-phendiver mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gmod-phendiver
>

------------------------------------------------------------------------------
WhatsUp Gold - Download Free Network Management Software
The most intuitive, comprehensive, and cost-effective network
management toolset available today.  Delivers lowest initial
acquisition cost and overall TCO of any competing solution.
http://p.sf.net/sfu/whatsupgold-sd
_______________________________________________
Gmod-phendiver mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-phendiver
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Gmod-phendiver] [Gmod-schema] Proposed changes to NatDiv related modules

Yuri Bendana-3
In reply to this post by Robert Buels
On Thu, Apr 28, 2011 at 4:39 PM, Robert Buels <[hidden email]> wrote:
Wow.  Truly, these schema design discussions are exceptional.  OK, I'm
going to try to weigh in as tersely as possible.

We all need to put a high priority on both making decisions, and making
compromises, in order to move forward quickly.  The more it drags out,
the more we lose focus on what we're talking about.

I agree.
 

Yuri, you seem to be going to great lengths to eliminate any redundancy
in the way your particular data is being stored, even going so far as to
making end-runs around the design of the existing modules, like
shoehorning things into phenotypeprop that were not meant to go there
(and then saying the phenotypeprop table needs more columns to model
them better), or wanting to add linking tables to go directly between
the ND tables and phenotypeprop.  These are big, pain-in-the-butt
changes, and as far as I've been able to see, the only benefit that
you're actually getting from them is being able to save on row count.
And by the time you tot up the additional rows (and columns) used in
phenotypeprop (and in extra linking tables if you're still using those),
you are probably not even saving there.  Maybe I'm wrong.  I dunno.  It
all sounds like premature optimization to me.

Rob, you're misrepresenting what I've been saying.  You weren't a part of the phenotype hackathon discussions so in part I can understand your confusion.  The phenotypeprop changes were agreed to at the hackathon in November, but they weren't merged into the Chado svn trunk (I still don't know why).  I am proposing nothing new for phenotypeprop.  Seth can attest to this. I think it was actually his idea to include cvalue_id in phenotypeprop.  I like this idea and that's why I now propose to include cvalue_id in all NatDiv property tables.  I originally liked the idea of having units_id in some property tables but I can understand if some people think it's too specific.  Naama also brought up the issue of wanting to have uniform property tables.  cvalue_id is a more generic field and will be able to stand in for a units_id if needed.  This is a simple change.

I haven't proposed any new ' linking tables to go directly between the ND tables and phenotypeprop'.  Those linking tables already exist.  I just proposed adding a couple of property tables.  Again, a simple change.  But if people don't want them, fine.

Maybe you're thinking about the collection tables I mentioned to Scott and Naama in separate emails.  But those are for my use and I haven't proposed them for use by NatDiv.  Maybe Chado will have a Collection module, but that's a separate issue.  Refer to my first email on this thread for the DDL of what I proposed, with the only change being that instead of 'units_id' the field would be called 'cvalue_id'.

Regarding whether I'm prematurely optimizing and/or abusing the Chado schema, that's a separate issue and you're sounding judgemental.  There are parts I like about the design, others I don't as much.  But since it is so flexible, with a few tweaks it works for me.  I only propose changes that I think could be useful for more people than just myself.  Some of the changes I'm proposing are just repeats of what I proposed in December.  Since no decision was made, at Naama's request, I repeated them.  Let me remind you that these are only proposals, not mandates.  I will say that the tone of your response doesn't encourage me to submit any more proposals, which is a shame because I think open source projects should welcome contributions in code or ideas by their users.
 
Anyway, let's think for a second about the priorities we have for this
module. As I see it, the priority list goes something like:

1. to write a nice paper before the sun becomes a red giant
2. to make a Chado module that will be very useful to both *us*, and to
a wide variety of other people, that need to store (and query!) ND data.
    a.) should be as easy as possible to learn to use:
          - as few tables as possible
          - as few different ways to do things as possible. (There's
            More Than One Way to To It is the main reason so many
            people hate Perl)
    b.) ND should be *able* to store any kind of data we can think of
    c.) released in a timely fashion (it has already taken too long,
        really)


Using these as a guide, I now propose the following algorithm for how to
proceed.

1. We organize the schema design issues in a very granular fashion,
rather than have them all mixed up in email:  In an issue tracker
somewhere, everybody enters each remaining imperfection they perceive in
the schema *as it stands right now in GMOD svn*.

2. We have a Skype meeting to classify each issue according to when it
will be worked on: before the first release, after the first release, or
never.

3. We work on issues that we have, in committee, decided must be
addressed before the first public release, we tag it as the first
release, and we finish writing the paper and submit it.  The paper will
contain only things that refer to the official released version.

4. We see that a schema migration system is implemented in Chado, so
that it's possible to make more frequent releases, and users can migrate
data stored in the old version of the schema.

5. We continue using the issue tracker that we set up in step 1, and
continue working on other issues.  At this point, with the migration
system in place, we will have a lot more freedom to make changes to both
the ND module as it was first released, and to other parts of Chado as
well (like the phenotype module).

Thoughts on this plan?  Does everybody know how to use an issue tracker?
 Anybody have a preferred public issue tracker they like (I was
thinking of using the recently upgraded Github Issues, which I heard
were Scott's favorite)?

Rob

Using an issue tracker is the best format, because it's easy to forget complex details written in emails months ago.
 


------------------------------------------------------------------------------
WhatsUp Gold - Download Free Network Management Software
The most intuitive, comprehensive, and cost-effective network
management toolset available today.  Delivers lowest initial
acquisition cost and overall TCO of any competing solution.
http://p.sf.net/sfu/whatsupgold-sd
_______________________________________________
Gmod-phendiver mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-phendiver


------------------------------------------------------------------------------
WhatsUp Gold - Download Free Network Management Software
The most intuitive, comprehensive, and cost-effective network
management toolset available today.  Delivers lowest initial
acquisition cost and overall TCO of any competing solution.
http://p.sf.net/sfu/whatsupgold-sd
_______________________________________________
Gmod-phendiver mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-phendiver
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Gmod-phendiver] [Gmod-schema] Proposed changes to NatDiv related modules

Scott Cain
Hi All,

I meant to write on this thread last week (in fact, I really thought I
did, but so it goes), so sorry for my silence.  I was mostly hoping
this would shake out "on its own."  I think Rob's suggestion of an
issue tracker is key.  I would be fine if we want to use the bug
tracker at sourceforge: it's the one I'm familiar with, so I can have
it set up quickly.  I also don't mind if we use a different one if
anyone feels strongly about it.

I'm hoping to do a schema freeze for the 1.2 release of Chado in the
near future, so sorting out what needs to happen for a first release
of the NatDiv schema would be great.

Also, Rob, I've started work on a schema migration tool, which should
also be part of the 1.2 release.

Scott


On Tue, May 3, 2011 at 6:19 PM, Yuri Bendana <[hidden email]> wrote:

> On Thu, Apr 28, 2011 at 4:39 PM, Robert Buels <[hidden email]> wrote:
>>
>> Wow.  Truly, these schema design discussions are exceptional.  OK, I'm
>> going to try to weigh in as tersely as possible.
>>
>> We all need to put a high priority on both making decisions, and making
>> compromises, in order to move forward quickly.  The more it drags out,
>> the more we lose focus on what we're talking about.
>
> I agree.
>
>>
>> Yuri, you seem to be going to great lengths to eliminate any redundancy
>> in the way your particular data is being stored, even going so far as to
>> making end-runs around the design of the existing modules, like
>> shoehorning things into phenotypeprop that were not meant to go there
>> (and then saying the phenotypeprop table needs more columns to model
>> them better), or wanting to add linking tables to go directly between
>> the ND tables and phenotypeprop.  These are big, pain-in-the-butt
>> changes, and as far as I've been able to see, the only benefit that
>> you're actually getting from them is being able to save on row count.
>> And by the time you tot up the additional rows (and columns) used in
>> phenotypeprop (and in extra linking tables if you're still using those),
>> you are probably not even saving there.  Maybe I'm wrong.  I dunno.  It
>> all sounds like premature optimization to me.
>>
> Rob, you're misrepresenting what I've been saying.  You weren't a part of
> the phenotype hackathon discussions so in part I can understand your
> confusion.  The phenotypeprop changes were agreed to at the hackathon in
> November, but they weren't merged into the Chado svn trunk (I still don't
> know why).  I am proposing nothing new for phenotypeprop.  Seth can attest
> to this. I think it was actually his idea to include cvalue_id in
> phenotypeprop.  I like this idea and that's why I now propose to include
> cvalue_id in all NatDiv property tables.  I originally liked the idea of
> having units_id in some property tables but I can understand if some people
> think it's too specific.  Naama also brought up the issue of wanting to have
> uniform property tables.  cvalue_id is a more generic field and will be able
> to stand in for a units_id if needed.  This is a simple change.
> I haven't proposed any new ' linking tables to go directly between the ND
> tables and phenotypeprop'.  Those linking tables already exist.  I just
> proposed adding a couple of property tables.  Again, a simple change.  But
> if people don't want them, fine.
> Maybe you're thinking about the collection tables I mentioned to Scott and
> Naama in separate emails.  But those are for my use and I haven't proposed
> them for use by NatDiv.  Maybe Chado will have a Collection module, but
> that's a separate issue.  Refer to my first email on this thread for the DDL
> of what I proposed, with the only change being that instead of 'units_id'
> the field would be called 'cvalue_id'.
> Regarding whether I'm prematurely optimizing and/or abusing the Chado
> schema, that's a separate issue and you're sounding judgemental.  There are
> parts I like about the design, others I don't as much.  But since it is so
> flexible, with a few tweaks it works for me.  I only propose changes that I
> think could be useful for more people than just myself.  Some of the changes
> I'm proposing are just repeats of what I proposed in December.  Since no
> decision was made, at Naama's request, I repeated them.  Let me remind you
> that these are only proposals, not mandates.  I will say that the tone of
> your response doesn't encourage me to submit any more proposals, which is a
> shame because I think open source projects should welcome contributions in
> code or ideas by their users.
>
>>
>> Anyway, let's think for a second about the priorities we have for this
>> module. As I see it, the priority list goes something like:
>>
>> 1. to write a nice paper before the sun becomes a red giant
>> 2. to make a Chado module that will be very useful to both *us*, and to
>> a wide variety of other people, that need to store (and query!) ND data.
>>     a.) should be as easy as possible to learn to use:
>>           - as few tables as possible
>>           - as few different ways to do things as possible. (There's
>>             More Than One Way to To It is the main reason so many
>>             people hate Perl)
>>     b.) ND should be *able* to store any kind of data we can think of
>>     c.) released in a timely fashion (it has already taken too long,
>>         really)
>>
>>
>> Using these as a guide, I now propose the following algorithm for how to
>> proceed.
>>
>> 1. We organize the schema design issues in a very granular fashion,
>> rather than have them all mixed up in email:  In an issue tracker
>> somewhere, everybody enters each remaining imperfection they perceive in
>> the schema *as it stands right now in GMOD svn*.
>>
>> 2. We have a Skype meeting to classify each issue according to when it
>> will be worked on: before the first release, after the first release, or
>> never.
>>
>> 3. We work on issues that we have, in committee, decided must be
>> addressed before the first public release, we tag it as the first
>> release, and we finish writing the paper and submit it.  The paper will
>> contain only things that refer to the official released version.
>>
>> 4. We see that a schema migration system is implemented in Chado, so
>> that it's possible to make more frequent releases, and users can migrate
>> data stored in the old version of the schema.
>>
>> 5. We continue using the issue tracker that we set up in step 1, and
>> continue working on other issues.  At this point, with the migration
>> system in place, we will have a lot more freedom to make changes to both
>> the ND module as it was first released, and to other parts of Chado as
>> well (like the phenotype module).
>>
>> Thoughts on this plan?  Does everybody know how to use an issue tracker?
>>  Anybody have a preferred public issue tracker they like (I was
>> thinking of using the recently upgraded Github Issues, which I heard
>> were Scott's favorite)?
>>
>> Rob
>
> Using an issue tracker is the best format, because it's easy to forget
> complex details written in emails months ago.
>
>>
>>
>> ------------------------------------------------------------------------------
>> WhatsUp Gold - Download Free Network Management Software
>> The most intuitive, comprehensive, and cost-effective network
>> management toolset available today.  Delivers lowest initial
>> acquisition cost and overall TCO of any competing solution.
>> http://p.sf.net/sfu/whatsupgold-sd
>> _______________________________________________
>> Gmod-phendiver mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/gmod-phendiver
>
>
> ------------------------------------------------------------------------------
> WhatsUp Gold - Download Free Network Management Software
> The most intuitive, comprehensive, and cost-effective network
> management toolset available today.  Delivers lowest initial
> acquisition cost and overall TCO of any competing solution.
> http://p.sf.net/sfu/whatsupgold-sd
> _______________________________________________
> Gmod-phendiver mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gmod-phendiver
>
>



--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research

------------------------------------------------------------------------------
WhatsUp Gold - Download Free Network Management Software
The most intuitive, comprehensive, and cost-effective network
management toolset available today.  Delivers lowest initial
acquisition cost and overall TCO of any competing solution.
http://p.sf.net/sfu/whatsupgold-sd
_______________________________________________
Gmod-phendiver mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-phendiver
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Gmod-phendiver] [Gmod-schema] Proposed changes to NatDiv related modules

seth redmond
In reply to this post by Yuri Bendana-3
Just to be clear: the changes to the phenotype/phenotypeprop tables (i.e. non-unique uniquename and cvvalue in the prop table) were proposed at the hackathon and I'd agree with them being rolled into the core. They should all be backwards compatible with flybase and I don't see why they cause any problem. 

What I do not agree with is extending these to every prop table across chado. Phenotypes are by their nature a special case; the only table I know of where we are *obliged* to use a two-part definition (e.g. ANATOMY_ONTOLOGY:'wing' + PATO:'increased size', etc). The logic of including cvalue in phenotypeprop was to allow people to use the same E-Q descriptors to describe other aspects of the same phenotype. 
In most (all?) other cases where we use a CV term in a prop table, what we are describing should be defined by the CVterm itself. To use an example we have just been discussing at VB we have CVterms for strains of mosquito - we do not need to define this as 'strain' + 'a.gambiae M' because the fact that 'a.gambiae M' is a strain is implicit in the CVterm. If this is not possible we look to modify the ontology and not the database. I am still to hear a convincing reason why this is not possible with other prop tables.

I'd largely agree with Rob's strategy for ironing out these last few issues, though we might be able to sort this more quickly with a brief conference call. One where any remaining changes to the schema are proposed on a wiki somewhere - (i.e. clearly written on the agenda with SQL and a justification for the changes). If we can't agree on a change then we default to whatever has fewer tables or the fewest changes, at the expense of redundancy if necessary. If people still desperately want to add or modify other tables then they are of course free to fork the project and strike out on their own, but it will then be up to them to ensure their code is compatible with the core. 



-- 
Seth Redmond
  Scientific Programmer, VectorBase
  Kafatos / Christophides Groups
  Div. Cell and Molecular Biology
  Imperial College, London
[hidden email]
--

On 3 May 2011, at 23:19, Yuri Bendana wrote:

On Thu, Apr 28, 2011 at 4:39 PM, Robert Buels <[hidden email]> wrote:
Wow.  Truly, these schema design discussions are exceptional.  OK, I'm
going to try to weigh in as tersely as possible.

We all need to put a high priority on both making decisions, and making
compromises, in order to move forward quickly.  The more it drags out,
the more we lose focus on what we're talking about.

I agree.
 

Yuri, you seem to be going to great lengths to eliminate any redundancy
in the way your particular data is being stored, even going so far as to
making end-runs around the design of the existing modules, like
shoehorning things into phenotypeprop that were not meant to go there
(and then saying the phenotypeprop table needs more columns to model
them better), or wanting to add linking tables to go directly between
the ND tables and phenotypeprop.  These are big, pain-in-the-butt
changes, and as far as I've been able to see, the only benefit that
you're actually getting from them is being able to save on row count.
And by the time you tot up the additional rows (and columns) used in
phenotypeprop (and in extra linking tables if you're still using those),
you are probably not even saving there.  Maybe I'm wrong.  I dunno.  It
all sounds like premature optimization to me.

Rob, you're misrepresenting what I've been saying.  You weren't a part of the phenotype hackathon discussions so in part I can understand your confusion.  The phenotypeprop changes were agreed to at the hackathon in November, but they weren't merged into the Chado svn trunk (I still don't know why).  I am proposing nothing new for phenotypeprop.  Seth can attest to this. I think it was actually his idea to include cvalue_id in phenotypeprop.  I like this idea and that's why I now propose to include cvalue_id in all NatDiv property tables.  I originally liked the idea of having units_id in some property tables but I can understand if some people think it's too specific.  Naama also brought up the issue of wanting to have uniform property tables.  cvalue_id is a more generic field and will be able to stand in for a units_id if needed.  This is a simple change.

I haven't proposed any new ' linking tables to go directly between the ND tables and phenotypeprop'.  Those linking tables already exist.  I just proposed adding a couple of property tables.  Again, a simple change.  But if people don't want them, fine.

Maybe you're thinking about the collection tables I mentioned to Scott and Naama in separate emails.  But those are for my use and I haven't proposed them for use by NatDiv.  Maybe Chado will have a Collection module, but that's a separate issue.  Refer to my first email on this thread for the DDL of what I proposed, with the only change being that instead of 'units_id' the field would be called 'cvalue_id'.

Regarding whether I'm prematurely optimizing and/or abusing the Chado schema, that's a separate issue and you're sounding judgemental.  There are parts I like about the design, others I don't as much.  But since it is so flexible, with a few tweaks it works for me.  I only propose changes that I think could be useful for more people than just myself.  Some of the changes I'm proposing are just repeats of what I proposed in December.  Since no decision was made, at Naama's request, I repeated them.  Let me remind you that these are only proposals, not mandates.  I will say that the tone of your response doesn't encourage me to submit any more proposals, which is a shame because I think open source projects should welcome contributions in code or ideas by their users.
 
Anyway, let's think for a second about the priorities we have for this
module. As I see it, the priority list goes something like:

1. to write a nice paper before the sun becomes a red giant
2. to make a Chado module that will be very useful to both *us*, and to
a wide variety of other people, that need to store (and query!) ND data.
    a.) should be as easy as possible to learn to use:
          - as few tables as possible
          - as few different ways to do things as possible. (There's
            More Than One Way to To It is the main reason so many
            people hate Perl)
    b.) ND should be *able* to store any kind of data we can think of
    c.) released in a timely fashion (it has already taken too long,
        really)


Using these as a guide, I now propose the following algorithm for how to
proceed.

1. We organize the schema design issues in a very granular fashion,
rather than have them all mixed up in email:  In an issue tracker
somewhere, everybody enters each remaining imperfection they perceive in
the schema *as it stands right now in GMOD svn*.

2. We have a Skype meeting to classify each issue according to when it
will be worked on: before the first release, after the first release, or
never.

3. We work on issues that we have, in committee, decided must be
addressed before the first public release, we tag it as the first
release, and we finish writing the paper and submit it.  The paper will
contain only things that refer to the official released version.

4. We see that a schema migration system is implemented in Chado, so
that it's possible to make more frequent releases, and users can migrate
data stored in the old version of the schema.

5. We continue using the issue tracker that we set up in step 1, and
continue working on other issues.  At this point, with the migration
system in place, we will have a lot more freedom to make changes to both
the ND module as it was first released, and to other parts of Chado as
well (like the phenotype module).

Thoughts on this plan?  Does everybody know how to use an issue tracker?
 Anybody have a preferred public issue tracker they like (I was
thinking of using the recently upgraded Github Issues, which I heard
were Scott's favorite)?

Rob

Using an issue tracker is the best format, because it's easy to forget complex details written in emails months ago.
 


------------------------------------------------------------------------------
WhatsUp Gold - Download Free Network Management Software
The most intuitive, comprehensive, and cost-effective network
management toolset available today.  Delivers lowest initial
acquisition cost and overall TCO of any competing solution.
http://p.sf.net/sfu/whatsupgold-sd
_______________________________________________
Gmod-phendiver mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-phendiver

------------------------------------------------------------------------------
WhatsUp Gold - Download Free Network Management Software
The most intuitive, comprehensive, and cost-effective network
management toolset available today.  Delivers lowest initial
acquisition cost and overall TCO of any competing solution.
http://p.sf.net/sfu/whatsupgold-sd_______________________________________________
Gmod-phendiver mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-phendiver


------------------------------------------------------------------------------
WhatsUp Gold - Download Free Network Management Software
The most intuitive, comprehensive, and cost-effective network
management toolset available today.  Delivers lowest initial
acquisition cost and overall TCO of any competing solution.
http://p.sf.net/sfu/whatsupgold-sd
_______________________________________________
Gmod-phendiver mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-phendiver
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Gmod-phendiver] [Gmod-schema] Proposed changes to NatDiv related modules

Yuri Bendana-3
On Wed, May 4, 2011 at 4:06 AM, seth redmond <[hidden email]> wrote:

What I do not agree with is extending these to every prop table across chado. Phenotypes are by their nature a special case; the only table I know of where we are *obliged* to use a two-part definition (e.g. ANATOMY_ONTOLOGY:'wing' + PATO:'increased size', etc). The logic of including cvalue in phenotypeprop was to allow people to use the same E-Q descriptors to describe other aspects of the same phenotype. 
In most (all?) other cases where we use a CV term in a prop table, what we are describing should be defined by the CVterm itself. To use an example we have just been discussing at VB we have CVterms for strains of mosquito - we do not need to define this as 'strain' + 'a.gambiae M' because the fact that 'a.gambiae M' is a strain is implicit in the CVterm. If this is not possible we look to modify the ontology and not the database. I am still to hear a convincing reason why this is not possible with other prop tables.

I think it's reasonable for the value of a property to be drawn from a cvterm.  It doesn't have to be just a special case for phenotypes.  For example if I had an experimental property of type_id='bucket color', I could set
 cvalue_id= 'red'. That way I'm not forced to include 'red bucket', etc in my ontology for every possible color and it's better than using the free text value field. 


I'd largely agree with Rob's strategy for ironing out these last few issues, though we might be able to sort this more quickly with a brief conference call. One where any remaining changes to the schema are proposed on a wiki somewhere - (i.e. clearly written on the agenda with SQL and a justification for the changes). If we can't agree on a change then we default to whatever has fewer tables or the fewest changes, at the expense of redundancy if necessary. If people still desperately want to add or modify other tables then they are of course free to fork the project and strike out on their own, but it will then be up to them to ensure their code is compatible with the core. 


This is fine.  Chado changes very slowly so it's not a problem keeping up with it.  It would be nice if Chado were on something like github because it would be easier to fork.


------------------------------------------------------------------------------
WhatsUp Gold - Download Free Network Management Software
The most intuitive, comprehensive, and cost-effective network
management toolset available today.  Delivers lowest initial
acquisition cost and overall TCO of any competing solution.
http://p.sf.net/sfu/whatsupgold-sd
_______________________________________________
Gmod-phendiver mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-phendiver
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Gmod-phendiver] [Gmod-schema] Proposed changes to NatDiv related modules

Robert Buels
In reply to this post by Scott Cain
On 05/03/2011 07:23 PM, Scott Cain wrote:
> I meant to write on this thread last week (in fact, I really thought I
> did, but so it goes), so sorry for my silence.  I was mostly hoping
> this would shake out "on its own."  I think Rob's suggestion of an
> issue tracker is key.  I would be fine if we want to use the bug
> tracker at sourceforge: it's the one I'm familiar with, so I can have
> it set up quickly.  I also don't mind if we use a different one if
> anyone feels strongly about it.

Oooo, oooo, I have an idea!  How about I convert the Chado repo to git
(just /gmod/schema/trunk/chado), put it that at
http://github.com/gmod/chado, and then we use the Github issues for that
repository for organizing the natdiv work!  Win, win, win!  All in
favor?  Aye!

> Also, Rob, I've started work on a schema migration tool, which should
> also be part of the 1.2 release.

Sweet!  You rock Scott!

Rob

------------------------------------------------------------------------------
WhatsUp Gold - Download Free Network Management Software
The most intuitive, comprehensive, and cost-effective network
management toolset available today.  Delivers lowest initial
acquisition cost and overall TCO of any competing solution.
http://p.sf.net/sfu/whatsupgold-sd
_______________________________________________
Gmod-phendiver mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-phendiver
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Gmod-phendiver] [Gmod-schema] Proposed changes to NatDiv related modules

Yuri Bendana-3
In reply to this post by Yuri Bendana-3
On Wed, May 4, 2011 at 11:02 AM, Yuri Bendana <[hidden email]> wrote:
On Wed, May 4, 2011 at 4:06 AM, seth redmond <[hidden email]> wrote:

What I do not agree with is extending these to every prop table across chado. Phenotypes are by their nature a special case; the only table I know of where we are *obliged* to use a two-part definition (e.g. ANATOMY_ONTOLOGY:'wing' + PATO:'increased size', etc). The logic of including cvalue in phenotypeprop was to allow people to use the same E-Q descriptors to describe other aspects of the same phenotype. 
In most (all?) other cases where we use a CV term in a prop table, what we are describing should be defined by the CVterm itself. To use an example we have just been discussing at VB we have CVterms for strains of mosquito - we do not need to define this as 'strain' + 'a.gambiae M' because the fact that 'a.gambiae M' is a strain is implicit in the CVterm. If this is not possible we look to modify the ontology and not the database. I am still to hear a convincing reason why this is not possible with other prop tables.

I think it's reasonable for the value of a property to be drawn from a cvterm.  It doesn't have to be just a special case for phenotypes.  For example if I had an experimental property of type_id='bucket color', I could set
 cvalue_id= 'red'. That way I'm not forced to include 'red bucket', etc in my ontology for every possible color and it's better than using the free text value field. 


 Now that I think about it, what you're saying is that properties should be precomposed in the ontology while I'm saying that adding cvalue_id allows you to postcompose properties.  At the cost of an extra column, for me this added flexibility is worth it.

------------------------------------------------------------------------------
WhatsUp Gold - Download Free Network Management Software
The most intuitive, comprehensive, and cost-effective network
management toolset available today.  Delivers lowest initial
acquisition cost and overall TCO of any competing solution.
http://p.sf.net/sfu/whatsupgold-sd
_______________________________________________
Gmod-phendiver mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-phendiver
12
Loading...