Chado Group Module

classic Classic list List threaded Threaded
24 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: Chado Group Module

Mara Kim-2
@Stephen:  Woah, I did not even notice the problem with grpmember_cvterm.  I have gone ahead and changed the style of the linker tables to object_grpmember because I feel that two conflicts is probably a good indication that we're doing something wrong.  If anyone has a good reason to do it some other way, perhaps we can reconsider it, but for now I think this should solve the issue.

@Andy:  I'm not sure if you noticed, but the putative implementation adds the rank and unique constraint.  Is that enough to make this module compatible with chado-xml?

I am getting the feeling that any solution to the multiple grpmember_id reference problem will be non-trivial and very hacky.  I've been looking for solutions and I haven't found anything that seems performant, let alone elegant.  Perhaps the solution is something clever with shared sequences? (http://www.neilconway.org/docs/sequences/)  I will need to think over this some more.


On Mon, Feb 10, 2014 at 4:27 PM, Andy Schroeder <[hidden email]> wrote:
Hi Stephen,

I too like the rank column in the grpmember table to help order the membership.  Although, It may not make sense to rank members of some groups so I'm not sure it should be part of a unique constraint.

In order to have a unique constraint rank needs to be part of it the way the table is currently set up
.
The ordering just comes as a potential side benefit.  It would be up to the user to decide if it conveyed order information or not.

I think it may be better to name our linker tables as object-grpmember because we have the same problem with the grpmember_cvterm table.  It appears at first glance to be used for grouping cvterms. 

I think it makes sense that a type_id exists for stocks and features because they always must have at least one type.  If we have a type_id in the grpmember table then we are implying that membership in a group inherently requires a membership type.  I do like that requirement.  But if the consensus is that we don't want to require a type for all group members then we could remove it from the grpmember table and just use the grpmember_cvterm table for specifying membership types.

Also, I think ChadoXML may also have problems with the nd_experiment table as it also doesn't have a unique constraint.  
That table is already set in the Chado v1.2 release.  Is it possible for ChadoXML to adapt to tables without unique contraints?

We do not currently use the nd_ module and because key tables in the module lack such constraints then we are quite unlikely to do so.  Lack of unique constraints are problematic for valid chadoXML (at least the sort dumped and consumed by XORT).  There are in fact a few tables in the original schema that also do not have usable unique constraints (eg. eimage) and we don't use them either. If we were to do so we would modify them accordingly.   That said groups have certainly developed mechanisms that do not rely on chadoXML and the unique constraints.

However,  we are definitely interested in using this grp module so I would hope that the unique constraint can be added.

Regarding the table naming and Mara's concern about linking multiple object types to a single grpmember_id that deserves more discussion and thought.  One could potentially set up triggers to manage it but not sure that is the best solution. 

cheers,
Andy

 

Thanks!
Stephen



On 2/10/2014 3:14 PM, Mara Kim wrote:
Hello all,

I have updated the schematic and putative SQL implementation on the wiki to use grpmember (as opposed to grpmbr) and added the proposed pub provenance tables suggested by Andy (not shown in schematic).  Also, the use of rank in grp was accidental, so I went ahead and removed it.  If anyone has a use case though, it could go back in.

I had to break from the standard pub table linker style by reversing the name for grpmember (ie. pub_grpmember as opposed to grpmember_pub).  This *could* be fixed by changing the style of the grpmember linker tables to object-grpmember instead of the other way around.  Personally I prefer the object-groupmember style, but I realize others had reasons for the current style (grpmember-object).  Do others have a preference?

I haven't added the rank column to the grpmember table, yet.  I kind of like it, but I also have some misgivings.  It still doesn't solve the problem of an organism and feature linking to the same grpmember row.  Of course, no reason why things *can't* be ordered in a group.  Definitely a point for more discussion.

@Andy:  The type_id column of grpmember was to be used to indicate different kinds of members to a group.  For example, labeling certain proteins as being "well connected" in a metabolic pathway.  However, I do think there is a strong argument for *not* having a type_id column as this kind of breaks the whole idea of groups being aggregates of similar things.  Another point for discussion.


On Mon, Feb 10, 2014 at 8:46 AM, Andy Schroeder <[hidden email]> wrote:
Hi again Mara,

Thinking on the grpmbr table a bit more I think a better way to have a unique key on the table would be to add a rank column rather than name or uniquename as I initially suggested.  That way if one wanted you could use rank as a mechanism to order members of a group.  The unique key could then be group_id and rank or group_id, type_id, rank.  Which brings up another question.  What do you see as the purpose/meaning of type_id in the grpmbr table?  It could be used to enforce only identical types of things being grouped but it could also be potentially interpreted other ways so curious as to proposed usage.

And while mentioning rank I am not sure that there needs to be a rank column in the grp table.  rank is usually used as part of a composite uniquekey eg. all the prop tables and can also be used to order things in respect to another relation eg. feature_relationship.rank.  I don't see that need in the grp table but I am likely missing something.

cheers,
Andy


On Fri, Feb 7, 2014 at 3:18 PM, Andy Schroeder <[hidden email]> wrote:
Hi Mara,

The missing tables for attribution that we would normally be likely to add would be:

grp_relationship_pub
grpprop_pub
grpmbrprop_pub

and the potentially problematic (based on Stephen's previously made point):

grpmbr_pub 

(for attribution of why a thing was added to a group) - could reverse it to pub_grpmbr I suppose.

What is needed for chado-xml representation would be something like a name or uniquename column in grpmbr that had a unique constraint on it.

cheers,
Andy


On Fri, Feb 7, 2014 at 2:10 PM, Mara Kim <[hidden email]> wrote:
Hi Andy,

We can definitely add more tables to the module, especially for pub provenance.  Do you have a list of tables that you would need for your purposes?  Also, what changes would make this module with chado-xml?

Hi Stephen,

I don't think your last email was sent to the listserv.


On Fri, Feb 7, 2014 at 12:54 PM, Andy Schroeder <[hidden email]> wrote:

I believe the grpmbr table is useful because it reduces the number of tables.  For example, if there was no grpmbr table then to link an organism to a group you would need three tables: grp_organism, grp_organism_cvterm, grp_organismprop,  and those tables get repeatedly created for every data type that can be grouped.  So, to support grouping of features, organisms, stocks, libraries, analyses, pubs, studies, assays, and projects, it would require 27 new tables in Chado. For every data type that can be grouped we have to add an additional 3 tables.  With the grpmbr table we need grpmbr, grpmbr_cvterm, grpmbrprop, plus one linker table for each data type so for the example set above would require 12 new tables. 

I can see  the utility of this approach if I'm understanding correctly.

However, as currently specified there is no unique key on  this table aside from the primary key.  That is not compatible with chado-xml or at least the tools we use with it and would prevent FlyBase, at least,  from using the module, which would  be a bummer.

The other benefit is that it helps clarify the meaning of table names. For example, if we want to associate an analysis with a group such that it describes an analysis that was performed using the group members then that table (following the example of the analysisfeature table) would be analysisgrp.  But suppose we also wanted to group a set of analyses, the linker tables would be grp_analysis.   So the table name of both puts the words in reverse order and it might be a point of confusion for some folks.  We have the same problem with the grp_pub and pub_grp tables (one for specifying a publication about a group and the other for grouping pubs).  By having the grpmbr table it's much easier to distinguish members of a group from annotations about the group.

At least at FlyBase we tend to attribute most things by linking to a pub so in order to attribute membership in a grp to a pub we would run into confusing table names with or without grpmbr.  

But on the other hand because we do attribute most everything using the grpmbr approach would mean we would only need to add a single grpmbrprop_pub table and the like.

And I agree with others that spelling out the names as much as possible is desirable.   

cheers,
Andy
Stephen



On 2/6/2014 4:59 PM, Andy Schroeder wrote:
Hi Mara et al.,

One potential concern that I see that wasn't discussed during the conference call is that you could potentially link an organism and a feature (and anything else with a grpmbr linker table) to the same grpmbr_id.  Is this desirable behavior?

The only reason that I can see having the grpmbr table at all is to do just that - i.e. group things of fundamentally different types.  I am not a fan of that idea but there may be use cases that require this?  Am I missing other reasons to have a grpmbr table?  And can someone provide some use cases that would require this?

cheers,
,Andy

On Thu, Feb 6, 2014 at 4:16 PM, Mara Kim <[hidden email]> wrote:
Hello gmoders!

This is a continuation of the Chado Comparative Module discussion (http://generic-model-organism-system-database.450254.n5.nabble.com/Chado-Comparative-Module-tp5712078.html), now renamed the Group Module.

The group table has now been renamed to the grp table to avoid SQL keyword conflicts. The last discussion moved the design into a more generic model with an intermediate grpmbr table that linked the table specific linker tables to a grpmbrprop and analysisgrpmbr table.  Module specific linker tables will be supplied by submodules developed as a subset of the group module.

One potential concern that I see that wasn't discussed during the conference call is that you could potentially link an organism and a feature (and anything else with a grpmbr linker table) to the same grpmbr_id.  Is this desirable behavior?

You can see the updated schematic (note that not all tables are pictured) and the putative SQL schema on the wiki (http://gmod.org/wiki/Chado_Group_Module)

--
Mara Kim

Ph.D. Candidate
Computational Biology
Vanderbilt University
Nashville, TN

------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema




------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk


_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema





--
Mara Kim

Ph.D. Candidate
Computational Biology
Vanderbilt University
Nashville, TN





--
Mara Kim

Ph.D. Candidate
Computational Biology
Vanderbilt University
Nashville, TN





--
Mara Kim

Ph.D. Candidate
Computational Biology
Vanderbilt University
Nashville, TN

------------------------------------------------------------------------------
Android apps run on BlackBerry 10
Introducing the new BlackBerry 10.2.1 Runtime for Android apps.
Now with support for Jelly Bean, Bluetooth, Mapview and more.
Get your Android app in front of a whole new audience.  Start now.
http://pubads.g.doubleclick.net/gampad/clk?id=124407151&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Chado Group Module

adf_ncgr
In reply to this post by Andy Schroeder
Hi all-
a couple of comments with respect to some of the issues on the table.

1) use of rank in grpmember for ordering purposes:

this seems convenient, but potentially ambiguous; in the use case that Kris had mentioned ("top X genes"),
I found myself wondering if this was being defined with respect to significance values, or fold changes or
something else. I suppose one could take the position that the group type would implicitly define what was
meant by the order within the group?

2) possible linking of different types of grpmember linker table to the same grpmember_id:

it seems to me that grpmember is being introduced as a sort of inheritance construct; ie,
grpmember is like a superclass introduced to provide a common base that subsumes the
need to make subclass-specific tables for grpmemberprop, etc. while the <subytpe>_grpmember
tables are introduced because of the need to make subtype-specific referential integrity.
If people agree with this basic assessment, then I think it implies that the relationship between
grpmember and <subtype>_grpmember should be 1:1. At a minimum, I think this would mean that
grpmember_id within the <subtype>_grpmember should be unique on its own (ie even without adding
the linked-to object within the UNIQUE constraint); it could even serve as the PK of the table (I think
some approximations of inheritance within relational schemas do use this FK as PK trick). Unfortunately,
this does not address the possibility of having the same grpmember_id referenced in two different
<subtype>_grpmember tables, which I agree is troublesome; although, I also personally think that allowing groups
to be composed of heterogeneous types of objects (features and organisms) is probably OK, but I do think they
should all have different grpmember rows, just as objects of the same type should.

I guess postgres has some facilities for doing inheritance, but I'm not sure if they would necessarily address
the issue or if using very postgres-specific features is concerned acceptable chado design? Just saw Mara's
note about shared sequences; maybe that would work (but probably also very postgres-specific)...

3) type_ids for grpmembers:
I still think this is a good idea- not indicating the type of object linked, but the nature of its membership in the set.
Even if the set is composed of all features, those features may take on different roles within the group, or the same
feature might have multiple roles within the same group ("not only the president, also a client", for those of you
old enough to remember those terrible infomercials!)

hope that was of some use in the ongoing discussion...
Andrew





On 2/10/14 3:27 PM, Andy Schroeder wrote:
Hi Stephen,

I too like the rank column in the grpmember table to help order the membership.  Although, It may not make sense to rank members of some groups so I'm not sure it should be part of a unique constraint.

In order to have a unique constraint rank needs to be part of it the way the table is currently set up
.
The ordering just comes as a potential side benefit.  It would be up to the user to decide if it conveyed order information or not.

I think it may be better to name our linker tables as object-grpmember because we have the same problem with the grpmember_cvterm table.  It appears at first glance to be used for grouping cvterms. 

I think it makes sense that a type_id exists for stocks and features because they always must have at least one type.  If we have a type_id in the grpmember table then we are implying that membership in a group inherently requires a membership type.  I do like that requirement.  But if the consensus is that we don't want to require a type for all group members then we could remove it from the grpmember table and just use the grpmember_cvterm table for specifying membership types.

Also, I think ChadoXML may also have problems with the nd_experiment table as it also doesn't have a unique constraint.  
That table is already set in the Chado v1.2 release.  Is it possible for ChadoXML to adapt to tables without unique contraints?

We do not currently use the nd_ module and because key tables in the module lack such constraints then we are quite unlikely to do so.  Lack of unique constraints are problematic for valid chadoXML (at least the sort dumped and consumed by XORT).  There are in fact a few tables in the original schema that also do not have usable unique constraints (eg. eimage) and we don't use them either. If we were to do so we would modify them accordingly.   That said groups have certainly developed mechanisms that do not rely on chadoXML and the unique constraints.

However,  we are definitely interested in using this grp module so I would hope that the unique constraint can be added.

Regarding the table naming and Mara's concern about linking multiple object types to a single grpmember_id that deserves more discussion and thought.  One could potentially set up triggers to manage it but not sure that is the best solution. 

cheers,
Andy

 

Thanks!
Stephen



On 2/10/2014 3:14 PM, Mara Kim wrote:
Hello all,

I have updated the schematic and putative SQL implementation on the wiki to use grpmember (as opposed to grpmbr) and added the proposed pub provenance tables suggested by Andy (not shown in schematic).  Also, the use of rank in grp was accidental, so I went ahead and removed it.  If anyone has a use case though, it could go back in.

I had to break from the standard pub table linker style by reversing the name for grpmember (ie. pub_grpmember as opposed to grpmember_pub).  This *could* be fixed by changing the style of the grpmember linker tables to object-grpmember instead of the other way around.  Personally I prefer the object-groupmember style, but I realize others had reasons for the current style (grpmember-object).  Do others have a preference?

I haven't added the rank column to the grpmember table, yet.  I kind of like it, but I also have some misgivings.  It still doesn't solve the problem of an organism and feature linking to the same grpmember row.  Of course, no reason why things *can't* be ordered in a group.  Definitely a point for more discussion.

@Andy:  The type_id column of grpmember was to be used to indicate different kinds of members to a group.  For example, labeling certain proteins as being "well connected" in a metabolic pathway.  However, I do think there is a strong argument for *not* having a type_id column as this kind of breaks the whole idea of groups being aggregates of similar things.  Another point for discussion.


On Mon, Feb 10, 2014 at 8:46 AM, Andy Schroeder <[hidden email]> wrote:
Hi again Mara,

Thinking on the grpmbr table a bit more I think a better way to have a unique key on the table would be to add a rank column rather than name or uniquename as I initially suggested.  That way if one wanted you could use rank as a mechanism to order members of a group.  The unique key could then be group_id and rank or group_id, type_id, rank.  Which brings up another question.  What do you see as the purpose/meaning of type_id in the grpmbr table?  It could be used to enforce only identical types of things being grouped but it could also be potentially interpreted other ways so curious as to proposed usage.

And while mentioning rank I am not sure that there needs to be a rank column in the grp table.  rank is usually used as part of a composite uniquekey eg. all the prop tables and can also be used to order things in respect to another relation eg. feature_relationship.rank.  I don't see that need in the grp table but I am likely missing something.

cheers,
Andy


On Fri, Feb 7, 2014 at 3:18 PM, Andy Schroeder <[hidden email]> wrote:
Hi Mara,

The missing tables for attribution that we would normally be likely to add would be:

grp_relationship_pub
grpprop_pub
grpmbrprop_pub

and the potentially problematic (based on Stephen's previously made point):

grpmbr_pub 

(for attribution of why a thing was added to a group) - could reverse it to pub_grpmbr I suppose.

What is needed for chado-xml representation would be something like a name or uniquename column in grpmbr that had a unique constraint on it.

cheers,
Andy


On Fri, Feb 7, 2014 at 2:10 PM, Mara Kim <[hidden email]> wrote:
Hi Andy,

We can definitely add more tables to the module, especially for pub provenance.  Do you have a list of tables that you would need for your purposes?  Also, what changes would make this module with chado-xml?

Hi Stephen,

I don't think your last email was sent to the listserv.


On Fri, Feb 7, 2014 at 12:54 PM, Andy Schroeder <[hidden email]> wrote:

I believe the grpmbr table is useful because it reduces the number of tables.  For example, if there was no grpmbr table then to link an organism to a group you would need three tables: grp_organism, grp_organism_cvterm, grp_organismprop,  and those tables get repeatedly created for every data type that can be grouped.  So, to support grouping of features, organisms, stocks, libraries, analyses, pubs, studies, assays, and projects, it would require 27 new tables in Chado. For every data type that can be grouped we have to add an additional 3 tables.  With the grpmbr table we need grpmbr, grpmbr_cvterm, grpmbrprop, plus one linker table for each data type so for the example set above would require 12 new tables. 

I can see  the utility of this approach if I'm understanding correctly.

However, as currently specified there is no unique key on  this table aside from the primary key.  That is not compatible with chado-xml or at least the tools we use with it and would prevent FlyBase, at least,  from using the module, which would  be a bummer.

The other benefit is that it helps clarify the meaning of table names. For example, if we want to associate an analysis with a group such that it describes an analysis that was performed using the group members then that table (following the example of the analysisfeature table) would be analysisgrp.  But suppose we also wanted to group a set of analyses, the linker tables would be grp_analysis.   So the table name of both puts the words in reverse order and it might be a point of confusion for some folks.  We have the same problem with the grp_pub and pub_grp tables (one for specifying a publication about a group and the other for grouping pubs).  By having the grpmbr table it's much easier to distinguish members of a group from annotations about the group.

At least at FlyBase we tend to attribute most things by linking to a pub so in order to attribute membership in a grp to a pub we would run into confusing table names with or without grpmbr.  

But on the other hand because we do attribute most everything using the grpmbr approach would mean we would only need to add a single grpmbrprop_pub table and the like.

And I agree with others that spelling out the names as much as possible is desirable.   

cheers,
Andy
Stephen



On 2/6/2014 4:59 PM, Andy Schroeder wrote:
Hi Mara et al.,

One potential concern that I see that wasn't discussed during the conference call is that you could potentially link an organism and a feature (and anything else with a grpmbr linker table) to the same grpmbr_id.  Is this desirable behavior?

The only reason that I can see having the grpmbr table at all is to do just that - i.e. group things of fundamentally different types.  I am not a fan of that idea but there may be use cases that require this?  Am I missing other reasons to have a grpmbr table?  And can someone provide some use cases that would require this?

cheers,
,Andy

On Thu, Feb 6, 2014 at 4:16 PM, Mara Kim <[hidden email]> wrote:
Hello gmoders!

This is a continuation of the Chado Comparative Module discussion (http://generic-model-organism-system-database.450254.n5.nabble.com/Chado-Comparative-Module-tp5712078.html), now renamed the Group Module.

The group table has now been renamed to the grp table to avoid SQL keyword conflicts. The last discussion moved the design into a more generic model with an intermediate grpmbr table that linked the table specific linker tables to a grpmbrprop and analysisgrpmbr table.  Module specific linker tables will be supplied by submodules developed as a subset of the group module.

One potential concern that I see that wasn't discussed during the conference call is that you could potentially link an organism and a feature (and anything else with a grpmbr linker table) to the same grpmbr_id.  Is this desirable behavior?

You can see the updated schematic (note that not all tables are pictured) and the putative SQL schema on the wiki (http://gmod.org/wiki/Chado_Group_Module)

--
Mara Kim

Ph.D. Candidate
Computational Biology
Vanderbilt University
Nashville, TN

------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema




------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk


_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema





--
Mara Kim

Ph.D. Candidate
Computational Biology
Vanderbilt University
Nashville, TN





--
Mara Kim

Ph.D. Candidate
Computational Biology
Vanderbilt University
Nashville, TN




------------------------------------------------------------------------------
Android apps run on BlackBerry 10
Introducing the new BlackBerry 10.2.1 Runtime for Android apps.
Now with support for Jelly Bean, Bluetooth, Mapview and more.
Get your Android app in front of a whole new audience.  Start now.
http://pubads.g.doubleclick.net/gampad/clk?id=124407151&iu=/4140/ostg.clktrk


_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema


------------------------------------------------------------------------------
Android apps run on BlackBerry 10
Introducing the new BlackBerry 10.2.1 Runtime for Android apps.
Now with support for Jelly Bean, Bluetooth, Mapview and more.
Get your Android app in front of a whole new audience.  Start now.
http://pubads.g.doubleclick.net/gampad/clk?id=124407151&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Chado Group Module

Andy Schroeder
In reply to this post by Mara Kim-2
@Andy:  I'm not sure if you noticed, but the putative implementation adds the rank and unique constraint.  Is that enough to make this module compatible with chado-xml?

Yes it is.

cheers,
Andy

On Mon, Feb 10, 2014 at 7:22 PM, Mara Kim <[hidden email]> wrote:
@Stephen:  Woah, I did not even notice the problem with grpmember_cvterm.  I have gone ahead and changed the style of the linker tables to object_grpmember because I feel that two conflicts is probably a good indication that we're doing something wrong.  If anyone has a good reason to do it some other way, perhaps we can reconsider it, but for now I think this should solve the issue.

@Andy:  I'm not sure if you noticed, but the putative implementation adds the rank and unique constraint.  Is that enough to make this module compatible with chado-xml?

I am getting the feeling that any solution to the multiple grpmember_id reference problem will be non-trivial and very hacky.  I've been looking for solutions and I haven't found anything that seems performant, let alone elegant.  Perhaps the solution is something clever with shared sequences? (http://www.neilconway.org/docs/sequences/)  I will need to think over this some more.


On Mon, Feb 10, 2014 at 4:27 PM, Andy Schroeder <[hidden email]> wrote:
Hi Stephen,

I too like the rank column in the grpmember table to help order the membership.  Although, It may not make sense to rank members of some groups so I'm not sure it should be part of a unique constraint.

In order to have a unique constraint rank needs to be part of it the way the table is currently set up
.
The ordering just comes as a potential side benefit.  It would be up to the user to decide if it conveyed order information or not.

I think it may be better to name our linker tables as object-grpmember because we have the same problem with the grpmember_cvterm table.  It appears at first glance to be used for grouping cvterms. 

I think it makes sense that a type_id exists for stocks and features because they always must have at least one type.  If we have a type_id in the grpmember table then we are implying that membership in a group inherently requires a membership type.  I do like that requirement.  But if the consensus is that we don't want to require a type for all group members then we could remove it from the grpmember table and just use the grpmember_cvterm table for specifying membership types.

Also, I think ChadoXML may also have problems with the nd_experiment table as it also doesn't have a unique constraint.  
That table is already set in the Chado v1.2 release.  Is it possible for ChadoXML to adapt to tables without unique contraints?

We do not currently use the nd_ module and because key tables in the module lack such constraints then we are quite unlikely to do so.  Lack of unique constraints are problematic for valid chadoXML (at least the sort dumped and consumed by XORT).  There are in fact a few tables in the original schema that also do not have usable unique constraints (eg. eimage) and we don't use them either. If we were to do so we would modify them accordingly.   That said groups have certainly developed mechanisms that do not rely on chadoXML and the unique constraints.

However,  we are definitely interested in using this grp module so I would hope that the unique constraint can be added.

Regarding the table naming and Mara's concern about linking multiple object types to a single grpmember_id that deserves more discussion and thought.  One could potentially set up triggers to manage it but not sure that is the best solution. 

cheers,
Andy

 

Thanks!
Stephen



On 2/10/2014 3:14 PM, Mara Kim wrote:
Hello all,

I have updated the schematic and putative SQL implementation on the wiki to use grpmember (as opposed to grpmbr) and added the proposed pub provenance tables suggested by Andy (not shown in schematic).  Also, the use of rank in grp was accidental, so I went ahead and removed it.  If anyone has a use case though, it could go back in.

I had to break from the standard pub table linker style by reversing the name for grpmember (ie. pub_grpmember as opposed to grpmember_pub).  This *could* be fixed by changing the style of the grpmember linker tables to object-grpmember instead of the other way around.  Personally I prefer the object-groupmember style, but I realize others had reasons for the current style (grpmember-object).  Do others have a preference?

I haven't added the rank column to the grpmember table, yet.  I kind of like it, but I also have some misgivings.  It still doesn't solve the problem of an organism and feature linking to the same grpmember row.  Of course, no reason why things *can't* be ordered in a group.  Definitely a point for more discussion.

@Andy:  The type_id column of grpmember was to be used to indicate different kinds of members to a group.  For example, labeling certain proteins as being "well connected" in a metabolic pathway.  However, I do think there is a strong argument for *not* having a type_id column as this kind of breaks the whole idea of groups being aggregates of similar things.  Another point for discussion.


On Mon, Feb 10, 2014 at 8:46 AM, Andy Schroeder <[hidden email]> wrote:
Hi again Mara,

Thinking on the grpmbr table a bit more I think a better way to have a unique key on the table would be to add a rank column rather than name or uniquename as I initially suggested.  That way if one wanted you could use rank as a mechanism to order members of a group.  The unique key could then be group_id and rank or group_id, type_id, rank.  Which brings up another question.  What do you see as the purpose/meaning of type_id in the grpmbr table?  It could be used to enforce only identical types of things being grouped but it could also be potentially interpreted other ways so curious as to proposed usage.

And while mentioning rank I am not sure that there needs to be a rank column in the grp table.  rank is usually used as part of a composite uniquekey eg. all the prop tables and can also be used to order things in respect to another relation eg. feature_relationship.rank.  I don't see that need in the grp table but I am likely missing something.

cheers,
Andy


On Fri, Feb 7, 2014 at 3:18 PM, Andy Schroeder <[hidden email]> wrote:
Hi Mara,

The missing tables for attribution that we would normally be likely to add would be:

grp_relationship_pub
grpprop_pub
grpmbrprop_pub

and the potentially problematic (based on Stephen's previously made point):

grpmbr_pub 

(for attribution of why a thing was added to a group) - could reverse it to pub_grpmbr I suppose.

What is needed for chado-xml representation would be something like a name or uniquename column in grpmbr that had a unique constraint on it.

cheers,
Andy


On Fri, Feb 7, 2014 at 2:10 PM, Mara Kim <[hidden email]> wrote:
Hi Andy,

We can definitely add more tables to the module, especially for pub provenance.  Do you have a list of tables that you would need for your purposes?  Also, what changes would make this module with chado-xml?

Hi Stephen,

I don't think your last email was sent to the listserv.


On Fri, Feb 7, 2014 at 12:54 PM, Andy Schroeder <[hidden email]> wrote:

I believe the grpmbr table is useful because it reduces the number of tables.  For example, if there was no grpmbr table then to link an organism to a group you would need three tables: grp_organism, grp_organism_cvterm, grp_organismprop,  and those tables get repeatedly created for every data type that can be grouped.  So, to support grouping of features, organisms, stocks, libraries, analyses, pubs, studies, assays, and projects, it would require 27 new tables in Chado. For every data type that can be grouped we have to add an additional 3 tables.  With the grpmbr table we need grpmbr, grpmbr_cvterm, grpmbrprop, plus one linker table for each data type so for the example set above would require 12 new tables. 

I can see  the utility of this approach if I'm understanding correctly.

However, as currently specified there is no unique key on  this table aside from the primary key.  That is not compatible with chado-xml or at least the tools we use with it and would prevent FlyBase, at least,  from using the module, which would  be a bummer.

The other benefit is that it helps clarify the meaning of table names. For example, if we want to associate an analysis with a group such that it describes an analysis that was performed using the group members then that table (following the example of the analysisfeature table) would be analysisgrp.  But suppose we also wanted to group a set of analyses, the linker tables would be grp_analysis.   So the table name of both puts the words in reverse order and it might be a point of confusion for some folks.  We have the same problem with the grp_pub and pub_grp tables (one for specifying a publication about a group and the other for grouping pubs).  By having the grpmbr table it's much easier to distinguish members of a group from annotations about the group.

At least at FlyBase we tend to attribute most things by linking to a pub so in order to attribute membership in a grp to a pub we would run into confusing table names with or without grpmbr.  

But on the other hand because we do attribute most everything using the grpmbr approach would mean we would only need to add a single grpmbrprop_pub table and the like.

And I agree with others that spelling out the names as much as possible is desirable.   

cheers,
Andy
Stephen



On 2/6/2014 4:59 PM, Andy Schroeder wrote:
Hi Mara et al.,

One potential concern that I see that wasn't discussed during the conference call is that you could potentially link an organism and a feature (and anything else with a grpmbr linker table) to the same grpmbr_id.  Is this desirable behavior?

The only reason that I can see having the grpmbr table at all is to do just that - i.e. group things of fundamentally different types.  I am not a fan of that idea but there may be use cases that require this?  Am I missing other reasons to have a grpmbr table?  And can someone provide some use cases that would require this?

cheers,
,Andy

On Thu, Feb 6, 2014 at 4:16 PM, Mara Kim <[hidden email]> wrote:
Hello gmoders!

This is a continuation of the Chado Comparative Module discussion (http://generic-model-organism-system-database.450254.n5.nabble.com/Chado-Comparative-Module-tp5712078.html), now renamed the Group Module.

The group table has now been renamed to the grp table to avoid SQL keyword conflicts. The last discussion moved the design into a more generic model with an intermediate grpmbr table that linked the table specific linker tables to a grpmbrprop and analysisgrpmbr table.  Module specific linker tables will be supplied by submodules developed as a subset of the group module.

One potential concern that I see that wasn't discussed during the conference call is that you could potentially link an organism and a feature (and anything else with a grpmbr linker table) to the same grpmbr_id.  Is this desirable behavior?

You can see the updated schematic (note that not all tables are pictured) and the putative SQL schema on the wiki (http://gmod.org/wiki/Chado_Group_Module)

--
Mara Kim

Ph.D. Candidate
Computational Biology
Vanderbilt University
Nashville, TN

------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema




------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk


_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema





--
Mara Kim

Ph.D. Candidate
Computational Biology
Vanderbilt University
Nashville, TN





--
Mara Kim

Ph.D. Candidate
Computational Biology
Vanderbilt University
Nashville, TN





--
Mara Kim

Ph.D. Candidate
Computational Biology
Vanderbilt University
Nashville, TN


------------------------------------------------------------------------------
Android apps run on BlackBerry 10
Introducing the new BlackBerry 10.2.1 Runtime for Android apps.
Now with support for Jelly Bean, Bluetooth, Mapview and more.
Get your Android app in front of a whole new audience.  Start now.
http://pubads.g.doubleclick.net/gampad/clk?id=124407151&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Chado Group Module

Mara Kim-2
Hello all,

Discussion on this thread seems to have died down.  I thought I would move this forward by making this module available via git at:

I've gone ahead and called this Release Candidate 1.  You can find the SQL code as well as schema diagrams in the repository.  Testing and feedback would be greatly appreciated.


On Tue, Feb 11, 2014 at 2:21 PM, Andy Schroeder <[hidden email]> wrote:
@Andy:  I'm not sure if you noticed, but the putative implementation adds the rank and unique constraint.  Is that enough to make this module compatible with chado-xml?

Yes it is.

cheers,
Andy

On Mon, Feb 10, 2014 at 7:22 PM, Mara Kim <[hidden email]> wrote:
@Stephen:  Woah, I did not even notice the problem with grpmember_cvterm.  I have gone ahead and changed the style of the linker tables to object_grpmember because I feel that two conflicts is probably a good indication that we're doing something wrong.  If anyone has a good reason to do it some other way, perhaps we can reconsider it, but for now I think this should solve the issue.

@Andy:  I'm not sure if you noticed, but the putative implementation adds the rank and unique constraint.  Is that enough to make this module compatible with chado-xml?

I am getting the feeling that any solution to the multiple grpmember_id reference problem will be non-trivial and very hacky.  I've been looking for solutions and I haven't found anything that seems performant, let alone elegant.  Perhaps the solution is something clever with shared sequences? (http://www.neilconway.org/docs/sequences/)  I will need to think over this some more.


On Mon, Feb 10, 2014 at 4:27 PM, Andy Schroeder <[hidden email]> wrote:
Hi Stephen,

I too like the rank column in the grpmember table to help order the membership.  Although, It may not make sense to rank members of some groups so I'm not sure it should be part of a unique constraint.

In order to have a unique constraint rank needs to be part of it the way the table is currently set up
.
The ordering just comes as a potential side benefit.  It would be up to the user to decide if it conveyed order information or not.

I think it may be better to name our linker tables as object-grpmember because we have the same problem with the grpmember_cvterm table.  It appears at first glance to be used for grouping cvterms. 

I think it makes sense that a type_id exists for stocks and features because they always must have at least one type.  If we have a type_id in the grpmember table then we are implying that membership in a group inherently requires a membership type.  I do like that requirement.  But if the consensus is that we don't want to require a type for all group members then we could remove it from the grpmember table and just use the grpmember_cvterm table for specifying membership types.

Also, I think ChadoXML may also have problems with the nd_experiment table as it also doesn't have a unique constraint.  
That table is already set in the Chado v1.2 release.  Is it possible for ChadoXML to adapt to tables without unique contraints?

We do not currently use the nd_ module and because key tables in the module lack such constraints then we are quite unlikely to do so.  Lack of unique constraints are problematic for valid chadoXML (at least the sort dumped and consumed by XORT).  There are in fact a few tables in the original schema that also do not have usable unique constraints (eg. eimage) and we don't use them either. If we were to do so we would modify them accordingly.   That said groups have certainly developed mechanisms that do not rely on chadoXML and the unique constraints.

However,  we are definitely interested in using this grp module so I would hope that the unique constraint can be added.

Regarding the table naming and Mara's concern about linking multiple object types to a single grpmember_id that deserves more discussion and thought.  One could potentially set up triggers to manage it but not sure that is the best solution. 

cheers,
Andy

 

Thanks!
Stephen



On 2/10/2014 3:14 PM, Mara Kim wrote:
Hello all,

I have updated the schematic and putative SQL implementation on the wiki to use grpmember (as opposed to grpmbr) and added the proposed pub provenance tables suggested by Andy (not shown in schematic).  Also, the use of rank in grp was accidental, so I went ahead and removed it.  If anyone has a use case though, it could go back in.

I had to break from the standard pub table linker style by reversing the name for grpmember (ie. pub_grpmember as opposed to grpmember_pub).  This *could* be fixed by changing the style of the grpmember linker tables to object-grpmember instead of the other way around.  Personally I prefer the object-groupmember style, but I realize others had reasons for the current style (grpmember-object).  Do others have a preference?

I haven't added the rank column to the grpmember table, yet.  I kind of like it, but I also have some misgivings.  It still doesn't solve the problem of an organism and feature linking to the same grpmember row.  Of course, no reason why things *can't* be ordered in a group.  Definitely a point for more discussion.

@Andy:  The type_id column of grpmember was to be used to indicate different kinds of members to a group.  For example, labeling certain proteins as being "well connected" in a metabolic pathway.  However, I do think there is a strong argument for *not* having a type_id column as this kind of breaks the whole idea of groups being aggregates of similar things.  Another point for discussion.


On Mon, Feb 10, 2014 at 8:46 AM, Andy Schroeder <[hidden email]> wrote:
Hi again Mara,

Thinking on the grpmbr table a bit more I think a better way to have a unique key on the table would be to add a rank column rather than name or uniquename as I initially suggested.  That way if one wanted you could use rank as a mechanism to order members of a group.  The unique key could then be group_id and rank or group_id, type_id, rank.  Which brings up another question.  What do you see as the purpose/meaning of type_id in the grpmbr table?  It could be used to enforce only identical types of things being grouped but it could also be potentially interpreted other ways so curious as to proposed usage.

And while mentioning rank I am not sure that there needs to be a rank column in the grp table.  rank is usually used as part of a composite uniquekey eg. all the prop tables and can also be used to order things in respect to another relation eg. feature_relationship.rank.  I don't see that need in the grp table but I am likely missing something.

cheers,
Andy


On Fri, Feb 7, 2014 at 3:18 PM, Andy Schroeder <[hidden email]> wrote:
Hi Mara,

The missing tables for attribution that we would normally be likely to add would be:

grp_relationship_pub
grpprop_pub
grpmbrprop_pub

and the potentially problematic (based on Stephen's previously made point):

grpmbr_pub 

(for attribution of why a thing was added to a group) - could reverse it to pub_grpmbr I suppose.

What is needed for chado-xml representation would be something like a name or uniquename column in grpmbr that had a unique constraint on it.

cheers,
Andy


On Fri, Feb 7, 2014 at 2:10 PM, Mara Kim <[hidden email]> wrote:
Hi Andy,

We can definitely add more tables to the module, especially for pub provenance.  Do you have a list of tables that you would need for your purposes?  Also, what changes would make this module with chado-xml?

Hi Stephen,

I don't think your last email was sent to the listserv.


On Fri, Feb 7, 2014 at 12:54 PM, Andy Schroeder <[hidden email]> wrote:

I believe the grpmbr table is useful because it reduces the number of tables.  For example, if there was no grpmbr table then to link an organism to a group you would need three tables: grp_organism, grp_organism_cvterm, grp_organismprop,  and those tables get repeatedly created for every data type that can be grouped.  So, to support grouping of features, organisms, stocks, libraries, analyses, pubs, studies, assays, and projects, it would require 27 new tables in Chado. For every data type that can be grouped we have to add an additional 3 tables.  With the grpmbr table we need grpmbr, grpmbr_cvterm, grpmbrprop, plus one linker table for each data type so for the example set above would require 12 new tables. 

I can see  the utility of this approach if I'm understanding correctly.

However, as currently specified there is no unique key on  this table aside from the primary key.  That is not compatible with chado-xml or at least the tools we use with it and would prevent FlyBase, at least,  from using the module, which would  be a bummer.

The other benefit is that it helps clarify the meaning of table names. For example, if we want to associate an analysis with a group such that it describes an analysis that was performed using the group members then that table (following the example of the analysisfeature table) would be analysisgrp.  But suppose we also wanted to group a set of analyses, the linker tables would be grp_analysis.   So the table name of both puts the words in reverse order and it might be a point of confusion for some folks.  We have the same problem with the grp_pub and pub_grp tables (one for specifying a publication about a group and the other for grouping pubs).  By having the grpmbr table it's much easier to distinguish members of a group from annotations about the group.

At least at FlyBase we tend to attribute most things by linking to a pub so in order to attribute membership in a grp to a pub we would run into confusing table names with or without grpmbr.  

But on the other hand because we do attribute most everything using the grpmbr approach would mean we would only need to add a single grpmbrprop_pub table and the like.

And I agree with others that spelling out the names as much as possible is desirable.   

cheers,
Andy
Stephen



On 2/6/2014 4:59 PM, Andy Schroeder wrote:
Hi Mara et al.,

One potential concern that I see that wasn't discussed during the conference call is that you could potentially link an organism and a feature (and anything else with a grpmbr linker table) to the same grpmbr_id.  Is this desirable behavior?

The only reason that I can see having the grpmbr table at all is to do just that - i.e. group things of fundamentally different types.  I am not a fan of that idea but there may be use cases that require this?  Am I missing other reasons to have a grpmbr table?  And can someone provide some use cases that would require this?

cheers,
,Andy

On Thu, Feb 6, 2014 at 4:16 PM, Mara Kim <[hidden email]> wrote:
Hello gmoders!

This is a continuation of the Chado Comparative Module discussion (http://generic-model-organism-system-database.450254.n5.nabble.com/Chado-Comparative-Module-tp5712078.html), now renamed the Group Module.

The group table has now been renamed to the grp table to avoid SQL keyword conflicts. The last discussion moved the design into a more generic model with an intermediate grpmbr table that linked the table specific linker tables to a grpmbrprop and analysisgrpmbr table.  Module specific linker tables will be supplied by submodules developed as a subset of the group module.

One potential concern that I see that wasn't discussed during the conference call is that you could potentially link an organism and a feature (and anything else with a grpmbr linker table) to the same grpmbr_id.  Is this desirable behavior?

You can see the updated schematic (note that not all tables are pictured) and the putative SQL schema on the wiki (http://gmod.org/wiki/Chado_Group_Module)

--
Mara Kim

Ph.D. Candidate
Computational Biology
Vanderbilt University
Nashville, TN

------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema




------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk


_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema





--
Mara Kim

Ph.D. Candidate
Computational Biology
Vanderbilt University
Nashville, TN





--
Mara Kim

Ph.D. Candidate
Computational Biology
Vanderbilt University
Nashville, TN





--
Mara Kim

Ph.D. Candidate
Computational Biology
Vanderbilt University
Nashville, TN




--
Mara Kim

Ph.D. Candidate
Computational Biology
Vanderbilt University
Nashville, TN

------------------------------------------------------------------------------
Flow-based real-time traffic analytics software. Cisco certified tool.
Monitor traffic, SLAs, QoS, Medianet, WAAS etc. with NetFlow Analyzer
Customize your own dashboards, set traffic alerts and generate reports.
Network behavioral analysis & security monitoring. All-in-one tool.
http://pubads.g.doubleclick.net/gampad/clk?id=126839071&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
12