Chado Group Module

classic Classic list List threaded Threaded
24 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Chado Group Module

Mara Kim-2
Hello gmoders!

This is a continuation of the Chado Comparative Module discussion (http://generic-model-organism-system-database.450254.n5.nabble.com/Chado-Comparative-Module-tp5712078.html), now renamed the Group Module.

The group table has now been renamed to the grp table to avoid SQL keyword conflicts. The last discussion moved the design into a more generic model with an intermediate grpmbr table that linked the table specific linker tables to a grpmbrprop and analysisgrpmbr table.  Module specific linker tables will be supplied by submodules developed as a subset of the group module.

One potential concern that I see that wasn't discussed during the conference call is that you could potentially link an organism and a feature (and anything else with a grpmbr linker table) to the same grpmbr_id.  Is this desirable behavior?

You can see the updated schematic (note that not all tables are pictured) and the putative SQL schema on the wiki (http://gmod.org/wiki/Chado_Group_Module)

--
Mara Kim

Ph.D. Candidate
Computational Biology
Vanderbilt University
Nashville, TN

------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Chado Group Module

Andy Schroeder
Hi Mara et al.,

One potential concern that I see that wasn't discussed during the conference call is that you could potentially link an organism and a feature (and anything else with a grpmbr linker table) to the same grpmbr_id.  Is this desirable behavior?

The only reason that I can see having the grpmbr table at all is to do just that - i.e. group things of fundamentally different types.  I am not a fan of that idea but there may be use cases that require this?  Am I missing other reasons to have a grpmbr table?  And can someone provide some use cases that would require this?

cheers,
,Andy

On Thu, Feb 6, 2014 at 4:16 PM, Mara Kim <[hidden email]> wrote:
Hello gmoders!

This is a continuation of the Chado Comparative Module discussion (http://generic-model-organism-system-database.450254.n5.nabble.com/Chado-Comparative-Module-tp5712078.html), now renamed the Group Module.

The group table has now been renamed to the grp table to avoid SQL keyword conflicts. The last discussion moved the design into a more generic model with an intermediate grpmbr table that linked the table specific linker tables to a grpmbrprop and analysisgrpmbr table.  Module specific linker tables will be supplied by submodules developed as a subset of the group module.

One potential concern that I see that wasn't discussed during the conference call is that you could potentially link an organism and a feature (and anything else with a grpmbr linker table) to the same grpmbr_id.  Is this desirable behavior?

You can see the updated schematic (note that not all tables are pictured) and the putative SQL schema on the wiki (http://gmod.org/wiki/Chado_Group_Module)

--
Mara Kim

Ph.D. Candidate
Computational Biology
Vanderbilt University
Nashville, TN

------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema



------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Chado Group Module

Lacey-Anne Sanderson-3
In reply to this post by Mara Kim-2
Is everyone happy with grpmbr as a table name? It seems quite unreadable to me... I get condensing group to grp as a necessary evil but couldn’t we still have a grp_member table so it’s at least partially recognizable as english?

On another note, I can see a lot of use for this module with my germplasm diversity data… It would be great if you also added:
- grp_member_project as a way to say “these groups resulted from this project”
- grp_member_stock as a way to say “these stocks are all French green lentils (i.e.: part of a market class)”.

Other than that, it looks great! 
Thank you for taking this on,
~Lacey Sanderson


Date: Thu, 6 Feb 2014 15:16:38 -0600
From: Mara Kim <[hidden email]>
Subject: [Gmod-schema] Chado Group Module
To: "[hidden email]"
<[hidden email]>
Message-ID:
<[hidden email]>
Content-Type: text/plain; charset="iso-8859-1"

Hello gmoders!

This is a continuation of the Chado Comparative Module discussion (
http://generic-model-organism-system-database.450254.n5.nabble.com/Chado-Comparative-Module-tp5712078.html),
now renamed the Group Module.

The group table has now been renamed to the grp table to avoid SQL keyword
conflicts. The last discussion moved the design into a more generic model
with an intermediate grpmbr table that linked the table specific linker
tables to a grpmbrprop and analysisgrpmbr table.  Module specific linker
tables will be supplied by submodules developed as a subset of the group
module.

One potential concern that I see that wasn't discussed during the
conference call is that you could potentially link an organism and a
feature (and anything else with a grpmbr linker table) to the same
grpmbr_id.  Is this desirable behavior?

You can see the updated schematic (note that not all tables are pictured)
and the putative SQL schema on the wiki (
http://gmod.org/wiki/Chado_Group_Module)

--
Mara Kim

Ph.D. Candidate
Computational Biology
Vanderbilt University
Nashville, TN


------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Chado Group Module

Stephen Ficklin-2
In reply to this post by Mara Kim-2
Hi Mara,

I believe the natural diversity tables also have the same issue where you can associate stocks, genotypes, phenotypes, etc. with these same nd_experiment_id, but I believe the suggested use is to use a new nd_experiment_id for every association (perhaps someone can correct me on that if I'm wrong)   So, if we kept the grpmbr table the problem of associating multiple data types to the same grpmbr_id wouldn't be unprecedented for Chado, but it does cause confusion about how to store things.  But perhaps there could be a legitimate case where someone would consider two or more records (disparate or not) as a single group member?   Or would that case be more appropriate for a grpmbr_grp table?

I do have one suggestion.  If we do keep the grpmbr table can we change the name to 'grpmember' or just 'member' or something a little bit more readable?

Thanks!
Stephen

On 2/6/2014 4:16 PM, Mara Kim wrote:
Hello gmoders!

This is a continuation of the Chado Comparative Module discussion (http://generic-model-organism-system-database.450254.n5.nabble.com/Chado-Comparative-Module-tp5712078.html), now renamed the Group Module.

The group table has now been renamed to the grp table to avoid SQL keyword conflicts. The last discussion moved the design into a more generic model with an intermediate grpmbr table that linked the table specific linker tables to a grpmbrprop and analysisgrpmbr table.  Module specific linker tables will be supplied by submodules developed as a subset of the group module.

One potential concern that I see that wasn't discussed during the conference call is that you could potentially link an organism and a feature (and anything else with a grpmbr linker table) to the same grpmbr_id.  Is this desirable behavior?

You can see the updated schematic (note that not all tables are pictured) and the putative SQL schema on the wiki (http://gmod.org/wiki/Chado_Group_Module)

--
Mara Kim

Ph.D. Candidate
Computational Biology
Vanderbilt University
Nashville, TN


------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk


_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema


------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Chado Group Module

Stephen Ficklin-2
In reply to this post by Andy Schroeder
Hi Andy,

I believe the grpmbr table is useful because it reduces the number of tables.  For example, if there was no grpmbr table then to link an organism to a group you would need three tables: grp_organism, grp_organism_cvterm, grp_organismprop,  and those tables get repeatedly created for every data type that can be grouped.  So, to support grouping of features, organisms, stocks, libraries, analyses, pubs, studies, assays, and projects, it would require 27 new tables in Chado. For every data type that can be grouped we have to add an additional 3 tables.  With the grpmbr table we need grpmbr, grpmbr_cvterm, grpmbrprop, plus one linker table for each data type so for the example set above would require 12 new tables. 

The other benefit is that it helps clarify the meaning of table names. For example, if we want to associate an analysis with a group such that it describes an analysis that was performed using the group members then that table (following the example of the analysisfeature table) would be analysisgrp.  But suppose we also wanted to group a set of analyses, the linker tables would be grp_analysis.   So the table name of both puts the words in reverse order and it might be a point of confusion for some folks.  We have the same problem with the grp_pub and pub_grp tables (one for specifying a publication about a group and the other for grouping pubs).  By having the grpmbr table it's much easier to distinguish members of a group from annotations about the group.

Stephen


On 2/6/2014 4:59 PM, Andy Schroeder wrote:
Hi Mara et al.,

One potential concern that I see that wasn't discussed during the conference call is that you could potentially link an organism and a feature (and anything else with a grpmbr linker table) to the same grpmbr_id.  Is this desirable behavior?

The only reason that I can see having the grpmbr table at all is to do just that - i.e. group things of fundamentally different types.  I am not a fan of that idea but there may be use cases that require this?  Am I missing other reasons to have a grpmbr table?  And can someone provide some use cases that would require this?

cheers,
,Andy

On Thu, Feb 6, 2014 at 4:16 PM, Mara Kim <[hidden email]> wrote:
Hello gmoders!

This is a continuation of the Chado Comparative Module discussion (http://generic-model-organism-system-database.450254.n5.nabble.com/Chado-Comparative-Module-tp5712078.html), now renamed the Group Module.

The group table has now been renamed to the grp table to avoid SQL keyword conflicts. The last discussion moved the design into a more generic model with an intermediate grpmbr table that linked the table specific linker tables to a grpmbrprop and analysisgrpmbr table.  Module specific linker tables will be supplied by submodules developed as a subset of the group module.

One potential concern that I see that wasn't discussed during the conference call is that you could potentially link an organism and a feature (and anything else with a grpmbr linker table) to the same grpmbr_id.  Is this desirable behavior?

You can see the updated schematic (note that not all tables are pictured) and the putative SQL schema on the wiki (http://gmod.org/wiki/Chado_Group_Module)

--
Mara Kim

Ph.D. Candidate
Computational Biology
Vanderbilt University
Nashville, TN

------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema




------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk


_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema


------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Chado Group Module

Karl O. Pinc
In reply to this post by Stephen Ficklin-2
On 02/07/2014 08:36:33 AM, Stephen Ficklin wrote:


> I do have one suggestion.  If we do keep the grpmbr table can we
> change
> the name to 'grpmember' or just 'member' or something a little bit
> more
> readable?


FYI.  We have a GROUPS and a MEMBERS table in Babase, and it'd
be nice to avoid naming conflicts/schema qualifications
with Chado tables.  (We're putting Chado into it's a
schema in our db.)

Obviously, you need to do what's
best for Chado but I thought I'd let you know.


Karl <[hidden email]>
Free Software:  "You don't pay back, you pay forward."
                 -- Robert A. Heinlein

------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Chado Group Module

Mara Kim-2
Hi Lacey,

Perhaps you are really asking for a grp_stock and grp_project, which would properly qualify the group itself?  Assuming we rename grpmbr to grpmember, grpmember_project would refer to projects that are members of a group, and grpmember_stock would be referring to stocks that are members of a group.

Hi Karl,

What exactly would be the naming conflicts?  Could you explain some more?


On Fri, Feb 7, 2014 at 9:51 AM, Karl O. Pinc <[hidden email]> wrote:
On 02/07/2014 08:36:33 AM, Stephen Ficklin wrote:


> I do have one suggestion.  If we do keep the grpmbr table can we
> change
> the name to 'grpmember' or just 'member' or something a little bit
> more
> readable?


FYI.  We have a GROUPS and a MEMBERS table in Babase, and it'd
be nice to avoid naming conflicts/schema qualifications
with Chado tables.  (We're putting Chado into it's a
schema in our db.)

Obviously, you need to do what's
best for Chado but I thought I'd let you know.


Karl <[hidden email]>
Free Software:  "You don't pay back, you pay forward."
                 -- Robert A. Heinlein




--
Mara Kim

Ph.D. Candidate
Computational Biology
Vanderbilt University
Nashville, TN

------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Chado Group Module

Andy Schroeder
In reply to this post by Stephen Ficklin-2

I believe the grpmbr table is useful because it reduces the number of tables.  For example, if there was no grpmbr table then to link an organism to a group you would need three tables: grp_organism, grp_organism_cvterm, grp_organismprop,  and those tables get repeatedly created for every data type that can be grouped.  So, to support grouping of features, organisms, stocks, libraries, analyses, pubs, studies, assays, and projects, it would require 27 new tables in Chado. For every data type that can be grouped we have to add an additional 3 tables.  With the grpmbr table we need grpmbr, grpmbr_cvterm, grpmbrprop, plus one linker table for each data type so for the example set above would require 12 new tables. 

I can see  the utility of this approach if I'm understanding correctly.

However, as currently specified there is no unique key on  this table aside from the primary key.  That is not compatible with chado-xml or at least the tools we use with it and would prevent FlyBase, at least,  from using the module, which would  be a bummer.

The other benefit is that it helps clarify the meaning of table names. For example, if we want to associate an analysis with a group such that it describes an analysis that was performed using the group members then that table (following the example of the analysisfeature table) would be analysisgrp.  But suppose we also wanted to group a set of analyses, the linker tables would be grp_analysis.   So the table name of both puts the words in reverse order and it might be a point of confusion for some folks.  We have the same problem with the grp_pub and pub_grp tables (one for specifying a publication about a group and the other for grouping pubs).  By having the grpmbr table it's much easier to distinguish members of a group from annotations about the group.

At least at FlyBase we tend to attribute most things by linking to a pub so in order to attribute membership in a grp to a pub we would run into confusing table names with or without grpmbr.  

But on the other hand because we do attribute most everything using the grpmbr approach would mean we would only need to add a single grpmbrprop_pub table and the like.

And I agree with others that spelling out the names as much as possible is desirable.   

cheers,
Andy
Stephen



On 2/6/2014 4:59 PM, Andy Schroeder wrote:
Hi Mara et al.,

One potential concern that I see that wasn't discussed during the conference call is that you could potentially link an organism and a feature (and anything else with a grpmbr linker table) to the same grpmbr_id.  Is this desirable behavior?

The only reason that I can see having the grpmbr table at all is to do just that - i.e. group things of fundamentally different types.  I am not a fan of that idea but there may be use cases that require this?  Am I missing other reasons to have a grpmbr table?  And can someone provide some use cases that would require this?

cheers,
,Andy

On Thu, Feb 6, 2014 at 4:16 PM, Mara Kim <[hidden email]> wrote:
Hello gmoders!

This is a continuation of the Chado Comparative Module discussion (http://generic-model-organism-system-database.450254.n5.nabble.com/Chado-Comparative-Module-tp5712078.html), now renamed the Group Module.

The group table has now been renamed to the grp table to avoid SQL keyword conflicts. The last discussion moved the design into a more generic model with an intermediate grpmbr table that linked the table specific linker tables to a grpmbrprop and analysisgrpmbr table.  Module specific linker tables will be supplied by submodules developed as a subset of the group module.

One potential concern that I see that wasn't discussed during the conference call is that you could potentially link an organism and a feature (and anything else with a grpmbr linker table) to the same grpmbr_id.  Is this desirable behavior?

You can see the updated schematic (note that not all tables are pictured) and the putative SQL schema on the wiki (http://gmod.org/wiki/Chado_Group_Module)

--
Mara Kim

Ph.D. Candidate
Computational Biology
Vanderbilt University
Nashville, TN

------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema




------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk


_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema



------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Chado Group Module

Mara Kim-2
Hi Andy,

We can definitely add more tables to the module, especially for pub provenance.  Do you have a list of tables that you would need for your purposes?  Also, what changes would make this module with chado-xml?

Hi Stephen,

I don't think your last email was sent to the listserv.


On Fri, Feb 7, 2014 at 12:54 PM, Andy Schroeder <[hidden email]> wrote:

I believe the grpmbr table is useful because it reduces the number of tables.  For example, if there was no grpmbr table then to link an organism to a group you would need three tables: grp_organism, grp_organism_cvterm, grp_organismprop,  and those tables get repeatedly created for every data type that can be grouped.  So, to support grouping of features, organisms, stocks, libraries, analyses, pubs, studies, assays, and projects, it would require 27 new tables in Chado. For every data type that can be grouped we have to add an additional 3 tables.  With the grpmbr table we need grpmbr, grpmbr_cvterm, grpmbrprop, plus one linker table for each data type so for the example set above would require 12 new tables. 

I can see  the utility of this approach if I'm understanding correctly.

However, as currently specified there is no unique key on  this table aside from the primary key.  That is not compatible with chado-xml or at least the tools we use with it and would prevent FlyBase, at least,  from using the module, which would  be a bummer.

The other benefit is that it helps clarify the meaning of table names. For example, if we want to associate an analysis with a group such that it describes an analysis that was performed using the group members then that table (following the example of the analysisfeature table) would be analysisgrp.  But suppose we also wanted to group a set of analyses, the linker tables would be grp_analysis.   So the table name of both puts the words in reverse order and it might be a point of confusion for some folks.  We have the same problem with the grp_pub and pub_grp tables (one for specifying a publication about a group and the other for grouping pubs).  By having the grpmbr table it's much easier to distinguish members of a group from annotations about the group.

At least at FlyBase we tend to attribute most things by linking to a pub so in order to attribute membership in a grp to a pub we would run into confusing table names with or without grpmbr.  

But on the other hand because we do attribute most everything using the grpmbr approach would mean we would only need to add a single grpmbrprop_pub table and the like.

And I agree with others that spelling out the names as much as possible is desirable.   

cheers,
Andy
Stephen



On 2/6/2014 4:59 PM, Andy Schroeder wrote:
Hi Mara et al.,

One potential concern that I see that wasn't discussed during the conference call is that you could potentially link an organism and a feature (and anything else with a grpmbr linker table) to the same grpmbr_id.  Is this desirable behavior?

The only reason that I can see having the grpmbr table at all is to do just that - i.e. group things of fundamentally different types.  I am not a fan of that idea but there may be use cases that require this?  Am I missing other reasons to have a grpmbr table?  And can someone provide some use cases that would require this?

cheers,
,Andy

On Thu, Feb 6, 2014 at 4:16 PM, Mara Kim <[hidden email]> wrote:
Hello gmoders!

This is a continuation of the Chado Comparative Module discussion (http://generic-model-organism-system-database.450254.n5.nabble.com/Chado-Comparative-Module-tp5712078.html), now renamed the Group Module.

The group table has now been renamed to the grp table to avoid SQL keyword conflicts. The last discussion moved the design into a more generic model with an intermediate grpmbr table that linked the table specific linker tables to a grpmbrprop and analysisgrpmbr table.  Module specific linker tables will be supplied by submodules developed as a subset of the group module.

One potential concern that I see that wasn't discussed during the conference call is that you could potentially link an organism and a feature (and anything else with a grpmbr linker table) to the same grpmbr_id.  Is this desirable behavior?

You can see the updated schematic (note that not all tables are pictured) and the putative SQL schema on the wiki (http://gmod.org/wiki/Chado_Group_Module)

--
Mara Kim

Ph.D. Candidate
Computational Biology
Vanderbilt University
Nashville, TN

------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema




------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk


_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema





--
Mara Kim

Ph.D. Candidate
Computational Biology
Vanderbilt University
Nashville, TN

------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: [GMOD-devel] visualization tool for microarray data stored in chado

Scott Cain
In reply to this post by Stephen Ficklin-2
HI Peili,

The only group I know of that used Chado for microarray data is Stan Nelson's group at UCLA.  Brian O'Connor and Allen Day are alumni of that group, so they might be able to help, but that was quite some time ago.  Also, I'm going to cc this to the schema mailing list, which is where Chado-related stuff gets discussed.

Scott



On Fri, Feb 7, 2014 at 7:36 AM, Peili Zhang <[hidden email]> wrote:
Hello!

Can anyone recommend software tools for the visualization and simple querying of microarray data stored in chado-like relational database? I only need the very basic functionality, such as looking up a gene or a list of genes in certain samples and viewing the histograms or line graphs of the expression levels across samples. I don't need to have statistical analysis capability embedded in the tool but wouldn't mind having it available. The best scenario would be that there're libraries/modules visualizing microarray data in chado for plug and play. 

Thanks very much for any information you can share!

Best Regards,

Peili Zhang, Ph.D.

Senior Research Scientist

Computational Sciences

Vertex Pharmaceuticals, Inc.

50 Northern Avenue

Bsoton, MA 02210

Tel: <a href="tel:617-341-6593" value="+16173416593" target="_blank">617-341-6593

www.vrtx.com

 

This email message and any attachments are confidential and intended for use by the addressee(s) only. If you are not the intended recipient, please notify me immediately by replying to this message, and destroy all copies of this message and any attachments. Thank you.

------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-devel




--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research

------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Chado Group Module

Andy Schroeder
In reply to this post by Mara Kim-2
Hi Mara,

The missing tables for attribution that we would normally be likely to add would be:

grp_relationship_pub
grpprop_pub
grpmbrprop_pub

and the potentially problematic (based on Stephen's previously made point):

grpmbr_pub 

(for attribution of why a thing was added to a group) - could reverse it to pub_grpmbr I suppose.

What is needed for chado-xml representation would be something like a name or uniquename column in grpmbr that had a unique constraint on it.

cheers,
Andy


On Fri, Feb 7, 2014 at 2:10 PM, Mara Kim <[hidden email]> wrote:
Hi Andy,

We can definitely add more tables to the module, especially for pub provenance.  Do you have a list of tables that you would need for your purposes?  Also, what changes would make this module with chado-xml?

Hi Stephen,

I don't think your last email was sent to the listserv.


On Fri, Feb 7, 2014 at 12:54 PM, Andy Schroeder <[hidden email]> wrote:

I believe the grpmbr table is useful because it reduces the number of tables.  For example, if there was no grpmbr table then to link an organism to a group you would need three tables: grp_organism, grp_organism_cvterm, grp_organismprop,  and those tables get repeatedly created for every data type that can be grouped.  So, to support grouping of features, organisms, stocks, libraries, analyses, pubs, studies, assays, and projects, it would require 27 new tables in Chado. For every data type that can be grouped we have to add an additional 3 tables.  With the grpmbr table we need grpmbr, grpmbr_cvterm, grpmbrprop, plus one linker table for each data type so for the example set above would require 12 new tables. 

I can see  the utility of this approach if I'm understanding correctly.

However, as currently specified there is no unique key on  this table aside from the primary key.  That is not compatible with chado-xml or at least the tools we use with it and would prevent FlyBase, at least,  from using the module, which would  be a bummer.

The other benefit is that it helps clarify the meaning of table names. For example, if we want to associate an analysis with a group such that it describes an analysis that was performed using the group members then that table (following the example of the analysisfeature table) would be analysisgrp.  But suppose we also wanted to group a set of analyses, the linker tables would be grp_analysis.   So the table name of both puts the words in reverse order and it might be a point of confusion for some folks.  We have the same problem with the grp_pub and pub_grp tables (one for specifying a publication about a group and the other for grouping pubs).  By having the grpmbr table it's much easier to distinguish members of a group from annotations about the group.

At least at FlyBase we tend to attribute most things by linking to a pub so in order to attribute membership in a grp to a pub we would run into confusing table names with or without grpmbr.  

But on the other hand because we do attribute most everything using the grpmbr approach would mean we would only need to add a single grpmbrprop_pub table and the like.

And I agree with others that spelling out the names as much as possible is desirable.   

cheers,
Andy
Stephen



On 2/6/2014 4:59 PM, Andy Schroeder wrote:
Hi Mara et al.,

One potential concern that I see that wasn't discussed during the conference call is that you could potentially link an organism and a feature (and anything else with a grpmbr linker table) to the same grpmbr_id.  Is this desirable behavior?

The only reason that I can see having the grpmbr table at all is to do just that - i.e. group things of fundamentally different types.  I am not a fan of that idea but there may be use cases that require this?  Am I missing other reasons to have a grpmbr table?  And can someone provide some use cases that would require this?

cheers,
,Andy

On Thu, Feb 6, 2014 at 4:16 PM, Mara Kim <[hidden email]> wrote:
Hello gmoders!

This is a continuation of the Chado Comparative Module discussion (http://generic-model-organism-system-database.450254.n5.nabble.com/Chado-Comparative-Module-tp5712078.html), now renamed the Group Module.

The group table has now been renamed to the grp table to avoid SQL keyword conflicts. The last discussion moved the design into a more generic model with an intermediate grpmbr table that linked the table specific linker tables to a grpmbrprop and analysisgrpmbr table.  Module specific linker tables will be supplied by submodules developed as a subset of the group module.

One potential concern that I see that wasn't discussed during the conference call is that you could potentially link an organism and a feature (and anything else with a grpmbr linker table) to the same grpmbr_id.  Is this desirable behavior?

You can see the updated schematic (note that not all tables are pictured) and the putative SQL schema on the wiki (http://gmod.org/wiki/Chado_Group_Module)

--
Mara Kim

Ph.D. Candidate
Computational Biology
Vanderbilt University
Nashville, TN

------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema




------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk


_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema





--
Mara Kim

Ph.D. Candidate
Computational Biology
Vanderbilt University
Nashville, TN


------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Chado Group Module

Karl O. Pinc
In reply to this post by Mara Kim-2
On 02/07/2014 12:21:07 PM, Mara Kim wrote:

> What exactly would be the naming conflicts?  Could you explain some
> more?

Right now there aren't any naming conflicts but since
your considering table name changes I thought I'd mention
possible conflicts.  (FWIW MEMBER and MEMBERS are mighty close.)


Karl <[hidden email]>
Free Software:  "You don't pay back, you pay forward."
                 -- Robert A. Heinlein

------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Chado Group Module

Karl O. Pinc
In reply to this post by Mara Kim-2
On 02/07/2014 01:10:32 PM, Mara Kim wrote:

> I don't think your last email was sent to the listserv.

>From my perspective it would be nice if all replies went
to the listserv, although I understand why some audiences
prefer otherwise.




Karl <[hidden email]>
Free Software:  "You don't pay back, you pay forward."
                 -- Robert A. Heinlein

------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: [GMOD-devel] visualization tool for microarray data stored in chado

Peili Zhang
In reply to this post by Scott Cain
Thanks, Scott! It's been a long time. Hope all is well.


Peili Zhang, Ph.D.

Senior Research Scientist

Computational Sciences

Vertex Pharmaceuticals, Inc.

50 Northern Avenue

Bsoton, MA 02210

Tel: 617-341-6593

www.vrtx.com

 


From: Scott Cain <[hidden email]>
Date: Friday, February 7, 2014 3:08 PM
To: Peili Zhang <[hidden email]>
Cc: "[hidden email]" <[hidden email]>, GMOD Schema/Chado List <[hidden email]>, Brian O'Connor <[hidden email]>, Allen Day <[hidden email]>
Subject: Re: [GMOD-devel] visualization tool for microarray data stored in chado

HI Peili,

The only group I know of that used Chado for microarray data is Stan Nelson's group at UCLA.  Brian O'Connor and Allen Day are alumni of that group, so they might be able to help, but that was quite some time ago.  Also, I'm going to cc this to the schema mailing list, which is where Chado-related stuff gets discussed.

Scott



On Fri, Feb 7, 2014 at 7:36 AM, Peili Zhang <[hidden email]> wrote:
Hello!

Can anyone recommend software tools for the visualization and simple querying of microarray data stored in chado-like relational database? I only need the very basic functionality, such as looking up a gene or a list of genes in certain samples and viewing the histograms or line graphs of the expression levels across samples. I don't need to have statistical analysis capability embedded in the tool but wouldn't mind having it available. The best scenario would be that there're libraries/modules visualizing microarray data in chado for plug and play. 

Thanks very much for any information you can share!

Best Regards,

Peili Zhang, Ph.D.

Senior Research Scientist

Computational Sciences

Vertex Pharmaceuticals, Inc.

50 Northern Avenue

Bsoton, MA 02210

Tel: <a href="tel:617-341-6593" value="&#43;16173416593" target="_blank"> 617-341-6593

www.vrtx.com

 

This email message and any attachments are confidential and intended for use by the addressee(s) only. If you are not the intended recipient, please notify me immediately by replying to this message, and destroy all copies of this message and any attachments. Thank you.

------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-devel




--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research
This email message and any attachments are confidential and intended for use by the addressee(s) only. If you are not the intended recipient, please notify me immediately by replying to this message, and destroy all copies of this message and any attachments. Thank you.
------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Chado Group Module

Andy Schroeder
In reply to this post by Andy Schroeder
Hi again Mara,

Thinking on the grpmbr table a bit more I think a better way to have a unique key on the table would be to add a rank column rather than name or uniquename as I initially suggested.  That way if one wanted you could use rank as a mechanism to order members of a group.  The unique key could then be group_id and rank or group_id, type_id, rank.  Which brings up another question.  What do you see as the purpose/meaning of type_id in the grpmbr table?  It could be used to enforce only identical types of things being grouped but it could also be potentially interpreted other ways so curious as to proposed usage.

And while mentioning rank I am not sure that there needs to be a rank column in the grp table.  rank is usually used as part of a composite uniquekey eg. all the prop tables and can also be used to order things in respect to another relation eg. feature_relationship.rank.  I don't see that need in the grp table but I am likely missing something.

cheers,
Andy


On Fri, Feb 7, 2014 at 3:18 PM, Andy Schroeder <[hidden email]> wrote:
Hi Mara,

The missing tables for attribution that we would normally be likely to add would be:

grp_relationship_pub
grpprop_pub
grpmbrprop_pub

and the potentially problematic (based on Stephen's previously made point):

grpmbr_pub 

(for attribution of why a thing was added to a group) - could reverse it to pub_grpmbr I suppose.

What is needed for chado-xml representation would be something like a name or uniquename column in grpmbr that had a unique constraint on it.

cheers,
Andy


On Fri, Feb 7, 2014 at 2:10 PM, Mara Kim <[hidden email]> wrote:
Hi Andy,

We can definitely add more tables to the module, especially for pub provenance.  Do you have a list of tables that you would need for your purposes?  Also, what changes would make this module with chado-xml?

Hi Stephen,

I don't think your last email was sent to the listserv.


On Fri, Feb 7, 2014 at 12:54 PM, Andy Schroeder <[hidden email]> wrote:

I believe the grpmbr table is useful because it reduces the number of tables.  For example, if there was no grpmbr table then to link an organism to a group you would need three tables: grp_organism, grp_organism_cvterm, grp_organismprop,  and those tables get repeatedly created for every data type that can be grouped.  So, to support grouping of features, organisms, stocks, libraries, analyses, pubs, studies, assays, and projects, it would require 27 new tables in Chado. For every data type that can be grouped we have to add an additional 3 tables.  With the grpmbr table we need grpmbr, grpmbr_cvterm, grpmbrprop, plus one linker table for each data type so for the example set above would require 12 new tables. 

I can see  the utility of this approach if I'm understanding correctly.

However, as currently specified there is no unique key on  this table aside from the primary key.  That is not compatible with chado-xml or at least the tools we use with it and would prevent FlyBase, at least,  from using the module, which would  be a bummer.

The other benefit is that it helps clarify the meaning of table names. For example, if we want to associate an analysis with a group such that it describes an analysis that was performed using the group members then that table (following the example of the analysisfeature table) would be analysisgrp.  But suppose we also wanted to group a set of analyses, the linker tables would be grp_analysis.   So the table name of both puts the words in reverse order and it might be a point of confusion for some folks.  We have the same problem with the grp_pub and pub_grp tables (one for specifying a publication about a group and the other for grouping pubs).  By having the grpmbr table it's much easier to distinguish members of a group from annotations about the group.

At least at FlyBase we tend to attribute most things by linking to a pub so in order to attribute membership in a grp to a pub we would run into confusing table names with or without grpmbr.  

But on the other hand because we do attribute most everything using the grpmbr approach would mean we would only need to add a single grpmbrprop_pub table and the like.

And I agree with others that spelling out the names as much as possible is desirable.   

cheers,
Andy
Stephen



On 2/6/2014 4:59 PM, Andy Schroeder wrote:
Hi Mara et al.,

One potential concern that I see that wasn't discussed during the conference call is that you could potentially link an organism and a feature (and anything else with a grpmbr linker table) to the same grpmbr_id.  Is this desirable behavior?

The only reason that I can see having the grpmbr table at all is to do just that - i.e. group things of fundamentally different types.  I am not a fan of that idea but there may be use cases that require this?  Am I missing other reasons to have a grpmbr table?  And can someone provide some use cases that would require this?

cheers,
,Andy

On Thu, Feb 6, 2014 at 4:16 PM, Mara Kim <[hidden email]> wrote:
Hello gmoders!

This is a continuation of the Chado Comparative Module discussion (http://generic-model-organism-system-database.450254.n5.nabble.com/Chado-Comparative-Module-tp5712078.html), now renamed the Group Module.

The group table has now been renamed to the grp table to avoid SQL keyword conflicts. The last discussion moved the design into a more generic model with an intermediate grpmbr table that linked the table specific linker tables to a grpmbrprop and analysisgrpmbr table.  Module specific linker tables will be supplied by submodules developed as a subset of the group module.

One potential concern that I see that wasn't discussed during the conference call is that you could potentially link an organism and a feature (and anything else with a grpmbr linker table) to the same grpmbr_id.  Is this desirable behavior?

You can see the updated schematic (note that not all tables are pictured) and the putative SQL schema on the wiki (http://gmod.org/wiki/Chado_Group_Module)

--
Mara Kim

Ph.D. Candidate
Computational Biology
Vanderbilt University
Nashville, TN

------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema




------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk


_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema





--
Mara Kim

Ph.D. Candidate
Computational Biology
Vanderbilt University
Nashville, TN



------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Chado Group Module

Mara Kim-2
Hello all,

I have updated the schematic and putative SQL implementation on the wiki to use grpmember (as opposed to grpmbr) and added the proposed pub provenance tables suggested by Andy (not shown in schematic).  Also, the use of rank in grp was accidental, so I went ahead and removed it.  If anyone has a use case though, it could go back in.

I had to break from the standard pub table linker style by reversing the name for grpmember (ie. pub_grpmember as opposed to grpmember_pub).  This *could* be fixed by changing the style of the grpmember linker tables to object-grpmember instead of the other way around.  Personally I prefer the object-groupmember style, but I realize others had reasons for the current style (grpmember-object).  Do others have a preference?

I haven't added the rank column to the grpmember table, yet.  I kind of like it, but I also have some misgivings.  It still doesn't solve the problem of an organism and feature linking to the same grpmember row.  Of course, no reason why things *can't* be ordered in a group.  Definitely a point for more discussion.

@Andy:  The type_id column of grpmember was to be used to indicate different kinds of members to a group.  For example, labeling certain proteins as being "well connected" in a metabolic pathway.  However, I do think there is a strong argument for *not* having a type_id column as this kind of breaks the whole idea of groups being aggregates of similar things.  Another point for discussion.


On Mon, Feb 10, 2014 at 8:46 AM, Andy Schroeder <[hidden email]> wrote:
Hi again Mara,

Thinking on the grpmbr table a bit more I think a better way to have a unique key on the table would be to add a rank column rather than name or uniquename as I initially suggested.  That way if one wanted you could use rank as a mechanism to order members of a group.  The unique key could then be group_id and rank or group_id, type_id, rank.  Which brings up another question.  What do you see as the purpose/meaning of type_id in the grpmbr table?  It could be used to enforce only identical types of things being grouped but it could also be potentially interpreted other ways so curious as to proposed usage.

And while mentioning rank I am not sure that there needs to be a rank column in the grp table.  rank is usually used as part of a composite uniquekey eg. all the prop tables and can also be used to order things in respect to another relation eg. feature_relationship.rank.  I don't see that need in the grp table but I am likely missing something.

cheers,
Andy


On Fri, Feb 7, 2014 at 3:18 PM, Andy Schroeder <[hidden email]> wrote:
Hi Mara,

The missing tables for attribution that we would normally be likely to add would be:

grp_relationship_pub
grpprop_pub
grpmbrprop_pub

and the potentially problematic (based on Stephen's previously made point):

grpmbr_pub 

(for attribution of why a thing was added to a group) - could reverse it to pub_grpmbr I suppose.

What is needed for chado-xml representation would be something like a name or uniquename column in grpmbr that had a unique constraint on it.

cheers,
Andy


On Fri, Feb 7, 2014 at 2:10 PM, Mara Kim <[hidden email]> wrote:
Hi Andy,

We can definitely add more tables to the module, especially for pub provenance.  Do you have a list of tables that you would need for your purposes?  Also, what changes would make this module with chado-xml?

Hi Stephen,

I don't think your last email was sent to the listserv.


On Fri, Feb 7, 2014 at 12:54 PM, Andy Schroeder <[hidden email]> wrote:

I believe the grpmbr table is useful because it reduces the number of tables.  For example, if there was no grpmbr table then to link an organism to a group you would need three tables: grp_organism, grp_organism_cvterm, grp_organismprop,  and those tables get repeatedly created for every data type that can be grouped.  So, to support grouping of features, organisms, stocks, libraries, analyses, pubs, studies, assays, and projects, it would require 27 new tables in Chado. For every data type that can be grouped we have to add an additional 3 tables.  With the grpmbr table we need grpmbr, grpmbr_cvterm, grpmbrprop, plus one linker table for each data type so for the example set above would require 12 new tables. 

I can see  the utility of this approach if I'm understanding correctly.

However, as currently specified there is no unique key on  this table aside from the primary key.  That is not compatible with chado-xml or at least the tools we use with it and would prevent FlyBase, at least,  from using the module, which would  be a bummer.

The other benefit is that it helps clarify the meaning of table names. For example, if we want to associate an analysis with a group such that it describes an analysis that was performed using the group members then that table (following the example of the analysisfeature table) would be analysisgrp.  But suppose we also wanted to group a set of analyses, the linker tables would be grp_analysis.   So the table name of both puts the words in reverse order and it might be a point of confusion for some folks.  We have the same problem with the grp_pub and pub_grp tables (one for specifying a publication about a group and the other for grouping pubs).  By having the grpmbr table it's much easier to distinguish members of a group from annotations about the group.

At least at FlyBase we tend to attribute most things by linking to a pub so in order to attribute membership in a grp to a pub we would run into confusing table names with or without grpmbr.  

But on the other hand because we do attribute most everything using the grpmbr approach would mean we would only need to add a single grpmbrprop_pub table and the like.

And I agree with others that spelling out the names as much as possible is desirable.   

cheers,
Andy
Stephen



On 2/6/2014 4:59 PM, Andy Schroeder wrote:
Hi Mara et al.,

One potential concern that I see that wasn't discussed during the conference call is that you could potentially link an organism and a feature (and anything else with a grpmbr linker table) to the same grpmbr_id.  Is this desirable behavior?

The only reason that I can see having the grpmbr table at all is to do just that - i.e. group things of fundamentally different types.  I am not a fan of that idea but there may be use cases that require this?  Am I missing other reasons to have a grpmbr table?  And can someone provide some use cases that would require this?

cheers,
,Andy

On Thu, Feb 6, 2014 at 4:16 PM, Mara Kim <[hidden email]> wrote:
Hello gmoders!

This is a continuation of the Chado Comparative Module discussion (http://generic-model-organism-system-database.450254.n5.nabble.com/Chado-Comparative-Module-tp5712078.html), now renamed the Group Module.

The group table has now been renamed to the grp table to avoid SQL keyword conflicts. The last discussion moved the design into a more generic model with an intermediate grpmbr table that linked the table specific linker tables to a grpmbrprop and analysisgrpmbr table.  Module specific linker tables will be supplied by submodules developed as a subset of the group module.

One potential concern that I see that wasn't discussed during the conference call is that you could potentially link an organism and a feature (and anything else with a grpmbr linker table) to the same grpmbr_id.  Is this desirable behavior?

You can see the updated schematic (note that not all tables are pictured) and the putative SQL schema on the wiki (http://gmod.org/wiki/Chado_Group_Module)

--
Mara Kim

Ph.D. Candidate
Computational Biology
Vanderbilt University
Nashville, TN

------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema




------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk


_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema





--
Mara Kim

Ph.D. Candidate
Computational Biology
Vanderbilt University
Nashville, TN





--
Mara Kim

Ph.D. Candidate
Computational Biology
Vanderbilt University
Nashville, TN

------------------------------------------------------------------------------
Androi apps run on BlackBerry 10
Introducing the new BlackBerry 10.2.1 Runtime for Android apps.
Now with support for Jelly Bean, Bluetooth, Mapview and more.
Get your Android app in front of a whole new audience.  Start now.
http://pubads.g.doubleclick.net/gampad/clk?id=124407151&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Chado Group Module

McGary, Kris L
Mara et al.,

I'm in favor of adding the rank column to the grpmember table because it would facilitate a number of non-parametric analyses.  However, I think the most obvious use case is picking the top X genes from a set.  For example, a user might only be interested in the top 30 genes most enriched in a set of up-regulated genes.

Kris
________________________________________
From: Mara Kim [[hidden email]]
Sent: Monday, February 10, 2014 2:14 PM
To: Andy Schroeder
Cc: [hidden email]
Subject: Re: [Gmod-schema] Chado Group Module

Hello all,

I have updated the schematic and putative SQL implementation on the wiki to use grpmember (as opposed to grpmbr) and added the proposed pub provenance tables suggested by Andy (not shown in schematic).  Also, the use of rank in grp was accidental, so I went ahead and removed it.  If anyone has a use case though, it could go back in.

I had to break from the standard pub table linker style by reversing the name for grpmember (ie. pub_grpmember as opposed to grpmember_pub).  This *could* be fixed by changing the style of the grpmember linker tables to object-grpmember instead of the other way around.  Personally I prefer the object-groupmember style, but I realize others had reasons for the current style (grpmember-object).  Do others have a preference?

I haven't added the rank column to the grpmember table, yet.  I kind of like it, but I also have some misgivings.  It still doesn't solve the problem of an organism and feature linking to the same grpmember row.  Of course, no reason why things *can't* be ordered in a group.  Definitely a point for more discussion.

@Andy:  The type_id column of grpmember was to be used to indicate different kinds of members to a group.  For example, labeling certain proteins as being "well connected" in a metabolic pathway.  However, I do think there is a strong argument for *not* having a type_id column as this kind of breaks the whole idea of groups being aggregates of similar things.  Another point for discussion.


On Mon, Feb 10, 2014 at 8:46 AM, Andy Schroeder <[hidden email]<mailto:[hidden email]>> wrote:
Hi again Mara,

Thinking on the grpmbr table a bit more I think a better way to have a unique key on the table would be to add a rank column rather than name or uniquename as I initially suggested.  That way if one wanted you could use rank as a mechanism to order members of a group.  The unique key could then be group_id and rank or group_id, type_id, rank.  Which brings up another question.  What do you see as the purpose/meaning of type_id in the grpmbr table?  It could be used to enforce only identical types of things being grouped but it could also be potentially interpreted other ways so curious as to proposed usage.

And while mentioning rank I am not sure that there needs to be a rank column in the grp table.  rank is usually used as part of a composite uniquekey eg. all the prop tables and can also be used to order things in respect to another relation eg. feature_relationship.rank.  I don't see that need in the grp table but I am likely missing something.

cheers,
Andy


On Fri, Feb 7, 2014 at 3:18 PM, Andy Schroeder <[hidden email]<mailto:[hidden email]>> wrote:
Hi Mara,

The missing tables for attribution that we would normally be likely to add would be:

grp_relationship_pub
grpprop_pub
grpmbrprop_pub

and the potentially problematic (based on Stephen's previously made point):

grpmbr_pub

(for attribution of why a thing was added to a group) - could reverse it to pub_grpmbr I suppose.

What is needed for chado-xml representation would be something like a name or uniquename column in grpmbr that had a unique constraint on it.

cheers,
Andy


On Fri, Feb 7, 2014 at 2:10 PM, Mara Kim <[hidden email]<mailto:[hidden email]>> wrote:
Hi Andy,

We can definitely add more tables to the module, especially for pub provenance.  Do you have a list of tables that you would need for your purposes?  Also, what changes would make this module with chado-xml?

Hi Stephen,

I don't think your last email was sent to the listserv.


On Fri, Feb 7, 2014 at 12:54 PM, Andy Schroeder <[hidden email]<mailto:[hidden email]>> wrote:

I believe the grpmbr table is useful because it reduces the number of tables.  For example, if there was no grpmbr table then to link an organism to a group you would need three tables: grp_organism, grp_organism_cvterm, grp_organismprop,  and those tables get repeatedly created for every data type that can be grouped.  So, to support grouping of features, organisms, stocks, libraries, analyses, pubs, studies, assays, and projects, it would require 27 new tables in Chado. For every data type that can be grouped we have to add an additional 3 tables.  With the grpmbr table we need grpmbr, grpmbr_cvterm, grpmbrprop, plus one linker table for each data type so for the example set above would require 12 new tables.

I can see  the utility of this approach if I'm understanding correctly.

However, as currently specified there is no unique key on  this table aside from the primary key.  That is not compatible with chado-xml or at least the tools we use with it and would prevent FlyBase, at least,  from using the module, which would  be a bummer.

The other benefit is that it helps clarify the meaning of table names. For example, if we want to associate an analysis with a group such that it describes an analysis that was performed using the group members then that table (following the example of the analysisfeature table) would be analysisgrp.  But suppose we also wanted to group a set of analyses, the linker tables would be grp_analysis.   So the table name of both puts the words in reverse order and it might be a point of confusion for some folks.  We have the same problem with the grp_pub and pub_grp tables (one for specifying a publication about a group and the other for grouping pubs).  By having the grpmbr table it's much easier to distinguish members of a group from annotations about the group.

At least at FlyBase we tend to attribute most things by linking to a pub so in order to attribute membership in a grp to a pub we would run into confusing table names with or without grpmbr.

But on the other hand because we do attribute most everything using the grpmbr approach would mean we would only need to add a single grpmbrprop_pub table and the like.

And I agree with others that spelling out the names as much as possible is desirable.

cheers,
Andy
Stephen



On 2/6/2014 4:59 PM, Andy Schroeder wrote:
Hi Mara et al.,

One potential concern that I see that wasn't discussed during the conference call is that you could potentially link an organism and a feature (and anything else with a grpmbr linker table) to the same grpmbr_id.  Is this desirable behavior?

The only reason that I can see having the grpmbr table at all is to do just that - i.e. group things of fundamentally different types.  I am not a fan of that idea but there may be use cases that require this?  Am I missing other reasons to have a grpmbr table?  And can someone provide some use cases that would require this?

cheers,
,Andy

On Thu, Feb 6, 2014 at 4:16 PM, Mara Kim <[hidden email]<mailto:[hidden email]>> wrote:
Hello gmoders!

This is a continuation of the Chado Comparative Module discussion (http://generic-model-organism-system-database.450254.n5.nabble.com/Chado-Comparative-Module-tp5712078.html), now renamed the Group Module.

The group table has now been renamed to the grp table to avoid SQL keyword conflicts. The last discussion moved the design into a more generic model with an intermediate grpmbr table that linked the table specific linker tables to a grpmbrprop and analysisgrpmbr table.  Module specific linker tables will be supplied by submodules developed as a subset of the group module.

One potential concern that I see that wasn't discussed during the conference call is that you could potentially link an organism and a feature (and anything else with a grpmbr linker table) to the same grpmbr_id.  Is this desirable behavior?

You can see the updated schematic (note that not all tables are pictured) and the putative SQL schema on the wiki (http://gmod.org/wiki/Chado_Group_Module)

--
Mara Kim

Ph.D. Candidate
Computational Biology
Vanderbilt University
Nashville, TN

------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]<mailto:[hidden email]>
https://lists.sourceforge.net/lists/listinfo/gmod-schema





------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk



_______________________________________________
Gmod-schema mailing list
[hidden email]<mailto:[hidden email]>
https://lists.sourceforge.net/lists/listinfo/gmod-schema






--
Mara Kim

Ph.D. Candidate
Computational Biology
Vanderbilt University
Nashville, TN





--
Mara Kim

Ph.D. Candidate
Computational Biology
Vanderbilt University
Nashville, TN

------------------------------------------------------------------------------
Android apps run on BlackBerry 10
Introducing the new BlackBerry 10.2.1 Runtime for Android apps.
Now with support for Jelly Bean, Bluetooth, Mapview and more.
Get your Android app in front of a whole new audience.  Start now.
http://pubads.g.doubleclick.net/gampad/clk?id=124407151&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Chado Group Module

Mara Kim-2
That seems reasonable.  I've updated the SQL implementation and schematic.  This leaves the following issues:

You could potentially link an organism and a feature (and anything else with a grpmember linker table) to the same grpmember_id.

Current schema is incompatible with chado-xml. (Is this still the case?)

Should grpmember have a type_id column?

grpmember-object linker table name style conflicts with current pub provenance table style. Perhaps move back to object-grpmember linker style?



On Mon, Feb 10, 2014 at 2:40 PM, McGary, Kris L <[hidden email]> wrote:
Mara et al.,

I'm in favor of adding the rank column to the grpmember table because it would facilitate a number of non-parametric analyses.  However, I think the most obvious use case is picking the top X genes from a set.  For example, a user might only be interested in the top 30 genes most enriched in a set of up-regulated genes.

Kris
________________________________________
From: Mara Kim [[hidden email]]
Sent: Monday, February 10, 2014 2:14 PM
To: Andy Schroeder
Cc: [hidden email]
Subject: Re: [Gmod-schema] Chado Group Module

Hello all,

I have updated the schematic and putative SQL implementation on the wiki to use grpmember (as opposed to grpmbr) and added the proposed pub provenance tables suggested by Andy (not shown in schematic).  Also, the use of rank in grp was accidental, so I went ahead and removed it.  If anyone has a use case though, it could go back in.

I had to break from the standard pub table linker style by reversing the name for grpmember (ie. pub_grpmember as opposed to grpmember_pub).  This *could* be fixed by changing the style of the grpmember linker tables to object-grpmember instead of the other way around.  Personally I prefer the object-groupmember style, but I realize others had reasons for the current style (grpmember-object).  Do others have a preference?

I haven't added the rank column to the grpmember table, yet.  I kind of like it, but I also have some misgivings.  It still doesn't solve the problem of an organism and feature linking to the same grpmember row.  Of course, no reason why things *can't* be ordered in a group.  Definitely a point for more discussion.

@Andy:  The type_id column of grpmember was to be used to indicate different kinds of members to a group.  For example, labeling certain proteins as being "well connected" in a metabolic pathway.  However, I do think there is a strong argument for *not* having a type_id column as this kind of breaks the whole idea of groups being aggregates of similar things.  Another point for discussion.


On Mon, Feb 10, 2014 at 8:46 AM, Andy Schroeder <[hidden email]<mailto:[hidden email]>> wrote:
Hi again Mara,

Thinking on the grpmbr table a bit more I think a better way to have a unique key on the table would be to add a rank column rather than name or uniquename as I initially suggested.  That way if one wanted you could use rank as a mechanism to order members of a group.  The unique key could then be group_id and rank or group_id, type_id, rank.  Which brings up another question.  What do you see as the purpose/meaning of type_id in the grpmbr table?  It could be used to enforce only identical types of things being grouped but it could also be potentially interpreted other ways so curious as to proposed usage.

And while mentioning rank I am not sure that there needs to be a rank column in the grp table.  rank is usually used as part of a composite uniquekey eg. all the prop tables and can also be used to order things in respect to another relation eg. feature_relationship.rank.  I don't see that need in the grp table but I am likely missing something.

cheers,
Andy


On Fri, Feb 7, 2014 at 3:18 PM, Andy Schroeder <[hidden email]<mailto:[hidden email]>> wrote:
Hi Mara,

The missing tables for attribution that we would normally be likely to add would be:

grp_relationship_pub
grpprop_pub
grpmbrprop_pub

and the potentially problematic (based on Stephen's previously made point):

grpmbr_pub

(for attribution of why a thing was added to a group) - could reverse it to pub_grpmbr I suppose.

What is needed for chado-xml representation would be something like a name or uniquename column in grpmbr that had a unique constraint on it.

cheers,
Andy


On Fri, Feb 7, 2014 at 2:10 PM, Mara Kim <[hidden email]<mailto:[hidden email]>> wrote:
Hi Andy,

We can definitely add more tables to the module, especially for pub provenance.  Do you have a list of tables that you would need for your purposes?  Also, what changes would make this module with chado-xml?

Hi Stephen,

I don't think your last email was sent to the listserv.


On Fri, Feb 7, 2014 at 12:54 PM, Andy Schroeder <[hidden email]<mailto:[hidden email]>> wrote:

I believe the grpmbr table is useful because it reduces the number of tables.  For example, if there was no grpmbr table then to link an organism to a group you would need three tables: grp_organism, grp_organism_cvterm, grp_organismprop,  and those tables get repeatedly created for every data type that can be grouped.  So, to support grouping of features, organisms, stocks, libraries, analyses, pubs, studies, assays, and projects, it would require 27 new tables in Chado. For every data type that can be grouped we have to add an additional 3 tables.  With the grpmbr table we need grpmbr, grpmbr_cvterm, grpmbrprop, plus one linker table for each data type so for the example set above would require 12 new tables.

I can see  the utility of this approach if I'm understanding correctly.

However, as currently specified there is no unique key on  this table aside from the primary key.  That is not compatible with chado-xml or at least the tools we use with it and would prevent FlyBase, at least,  from using the module, which would  be a bummer.

The other benefit is that it helps clarify the meaning of table names. For example, if we want to associate an analysis with a group such that it describes an analysis that was performed using the group members then that table (following the example of the analysisfeature table) would be analysisgrp.  But suppose we also wanted to group a set of analyses, the linker tables would be grp_analysis.   So the table name of both puts the words in reverse order and it might be a point of confusion for some folks.  We have the same problem with the grp_pub and pub_grp tables (one for specifying a publication about a group and the other for grouping pubs).  By having the grpmbr table it's much easier to distinguish members of a group from annotations about the group.

At least at FlyBase we tend to attribute most things by linking to a pub so in order to attribute membership in a grp to a pub we would run into confusing table names with or without grpmbr.

But on the other hand because we do attribute most everything using the grpmbr approach would mean we would only need to add a single grpmbrprop_pub table and the like.

And I agree with others that spelling out the names as much as possible is desirable.

cheers,
Andy
Stephen



On 2/6/2014 4:59 PM, Andy Schroeder wrote:
Hi Mara et al.,

One potential concern that I see that wasn't discussed during the conference call is that you could potentially link an organism and a feature (and anything else with a grpmbr linker table) to the same grpmbr_id.  Is this desirable behavior?

The only reason that I can see having the grpmbr table at all is to do just that - i.e. group things of fundamentally different types.  I am not a fan of that idea but there may be use cases that require this?  Am I missing other reasons to have a grpmbr table?  And can someone provide some use cases that would require this?

cheers,
,Andy

On Thu, Feb 6, 2014 at 4:16 PM, Mara Kim <[hidden email]<mailto:[hidden email]>> wrote:
Hello gmoders!

This is a continuation of the Chado Comparative Module discussion (http://generic-model-organism-system-database.450254.n5.nabble.com/Chado-Comparative-Module-tp5712078.html), now renamed the Group Module.

The group table has now been renamed to the grp table to avoid SQL keyword conflicts. The last discussion moved the design into a more generic model with an intermediate grpmbr table that linked the table specific linker tables to a grpmbrprop and analysisgrpmbr table.  Module specific linker tables will be supplied by submodules developed as a subset of the group module.

One potential concern that I see that wasn't discussed during the conference call is that you could potentially link an organism and a feature (and anything else with a grpmbr linker table) to the same grpmbr_id.  Is this desirable behavior?

You can see the updated schematic (note that not all tables are pictured) and the putative SQL schema on the wiki (http://gmod.org/wiki/Chado_Group_Module)

--
Mara Kim

Ph.D. Candidate
Computational Biology
Vanderbilt University
Nashville, TN

------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]<mailto:[hidden email]>
https://lists.sourceforge.net/lists/listinfo/gmod-schema





------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk



_______________________________________________
Gmod-schema mailing list
[hidden email]<mailto:[hidden email]>
https://lists.sourceforge.net/lists/listinfo/gmod-schema






--
Mara Kim

Ph.D. Candidate
Computational Biology
Vanderbilt University
Nashville, TN





--
Mara Kim

Ph.D. Candidate
Computational Biology
Vanderbilt University
Nashville, TN



--
Mara Kim

Ph.D. Candidate
Computational Biology
Vanderbilt University
Nashville, TN

------------------------------------------------------------------------------
Android apps run on BlackBerry 10
Introducing the new BlackBerry 10.2.1 Runtime for Android apps.
Now with support for Jelly Bean, Bluetooth, Mapview and more.
Get your Android app in front of a whole new audience.  Start now.
http://pubads.g.doubleclick.net/gampad/clk?id=124407151&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Chado Group Module

Stephen Ficklin-2
In reply to this post by Mara Kim-2
HI Mara,

Thanks for your efforts with this :-)   I have a few short comments...

I too like the rank column in the grpmember table to help order the membership.  Although, It may not make sense to rank members of some groups so I'm not sure it should be part of a unique constraint.

I think it may be better to name our linker tables as object-grpmember because we have the same problem with the grpmember_cvterm table.  It appears at first glance to be used for grouping cvterms. 

I think it makes sense that a type_id exists for stocks and features because they always must have at least one type.  If we have a type_id in the grpmember table then we are implying that membership in a group inherently requires a membership type.  I do like that requirement.  But if the consensus is that we don't want to require a type for all group members then we could remove it from the grpmember table and just use the grpmember_cvterm table for specifying membership types.

Also, I think ChadoXML may also have problems with the nd_experiment table as it also doesn't have a unique constraint.   That table is already set in the Chado v1.2 release.  Is it possible for ChadoXML to adapt to tables without unique contraints?

Thanks!
Stephen


On 2/10/2014 3:14 PM, Mara Kim wrote:
Hello all,

I have updated the schematic and putative SQL implementation on the wiki to use grpmember (as opposed to grpmbr) and added the proposed pub provenance tables suggested by Andy (not shown in schematic).  Also, the use of rank in grp was accidental, so I went ahead and removed it.  If anyone has a use case though, it could go back in.

I had to break from the standard pub table linker style by reversing the name for grpmember (ie. pub_grpmember as opposed to grpmember_pub).  This *could* be fixed by changing the style of the grpmember linker tables to object-grpmember instead of the other way around.  Personally I prefer the object-groupmember style, but I realize others had reasons for the current style (grpmember-object).  Do others have a preference?

I haven't added the rank column to the grpmember table, yet.  I kind of like it, but I also have some misgivings.  It still doesn't solve the problem of an organism and feature linking to the same grpmember row.  Of course, no reason why things *can't* be ordered in a group.  Definitely a point for more discussion.

@Andy:  The type_id column of grpmember was to be used to indicate different kinds of members to a group.  For example, labeling certain proteins as being "well connected" in a metabolic pathway.  However, I do think there is a strong argument for *not* having a type_id column as this kind of breaks the whole idea of groups being aggregates of similar things.  Another point for discussion.


On Mon, Feb 10, 2014 at 8:46 AM, Andy Schroeder <[hidden email]> wrote:
Hi again Mara,

Thinking on the grpmbr table a bit more I think a better way to have a unique key on the table would be to add a rank column rather than name or uniquename as I initially suggested.  That way if one wanted you could use rank as a mechanism to order members of a group.  The unique key could then be group_id and rank or group_id, type_id, rank.  Which brings up another question.  What do you see as the purpose/meaning of type_id in the grpmbr table?  It could be used to enforce only identical types of things being grouped but it could also be potentially interpreted other ways so curious as to proposed usage.

And while mentioning rank I am not sure that there needs to be a rank column in the grp table.  rank is usually used as part of a composite uniquekey eg. all the prop tables and can also be used to order things in respect to another relation eg. feature_relationship.rank.  I don't see that need in the grp table but I am likely missing something.

cheers,
Andy


On Fri, Feb 7, 2014 at 3:18 PM, Andy Schroeder <[hidden email]> wrote:
Hi Mara,

The missing tables for attribution that we would normally be likely to add would be:

grp_relationship_pub
grpprop_pub
grpmbrprop_pub

and the potentially problematic (based on Stephen's previously made point):

grpmbr_pub 

(for attribution of why a thing was added to a group) - could reverse it to pub_grpmbr I suppose.

What is needed for chado-xml representation would be something like a name or uniquename column in grpmbr that had a unique constraint on it.

cheers,
Andy


On Fri, Feb 7, 2014 at 2:10 PM, Mara Kim <[hidden email]> wrote:
Hi Andy,

We can definitely add more tables to the module, especially for pub provenance.  Do you have a list of tables that you would need for your purposes?  Also, what changes would make this module with chado-xml?

Hi Stephen,

I don't think your last email was sent to the listserv.


On Fri, Feb 7, 2014 at 12:54 PM, Andy Schroeder <[hidden email]> wrote:

I believe the grpmbr table is useful because it reduces the number of tables.  For example, if there was no grpmbr table then to link an organism to a group you would need three tables: grp_organism, grp_organism_cvterm, grp_organismprop,  and those tables get repeatedly created for every data type that can be grouped.  So, to support grouping of features, organisms, stocks, libraries, analyses, pubs, studies, assays, and projects, it would require 27 new tables in Chado. For every data type that can be grouped we have to add an additional 3 tables.  With the grpmbr table we need grpmbr, grpmbr_cvterm, grpmbrprop, plus one linker table for each data type so for the example set above would require 12 new tables. 

I can see  the utility of this approach if I'm understanding correctly.

However, as currently specified there is no unique key on  this table aside from the primary key.  That is not compatible with chado-xml or at least the tools we use with it and would prevent FlyBase, at least,  from using the module, which would  be a bummer.

The other benefit is that it helps clarify the meaning of table names. For example, if we want to associate an analysis with a group such that it describes an analysis that was performed using the group members then that table (following the example of the analysisfeature table) would be analysisgrp.  But suppose we also wanted to group a set of analyses, the linker tables would be grp_analysis.   So the table name of both puts the words in reverse order and it might be a point of confusion for some folks.  We have the same problem with the grp_pub and pub_grp tables (one for specifying a publication about a group and the other for grouping pubs).  By having the grpmbr table it's much easier to distinguish members of a group from annotations about the group.

At least at FlyBase we tend to attribute most things by linking to a pub so in order to attribute membership in a grp to a pub we would run into confusing table names with or without grpmbr.  

But on the other hand because we do attribute most everything using the grpmbr approach would mean we would only need to add a single grpmbrprop_pub table and the like.

And I agree with others that spelling out the names as much as possible is desirable.   

cheers,
Andy
Stephen



On 2/6/2014 4:59 PM, Andy Schroeder wrote:
Hi Mara et al.,

One potential concern that I see that wasn't discussed during the conference call is that you could potentially link an organism and a feature (and anything else with a grpmbr linker table) to the same grpmbr_id.  Is this desirable behavior?

The only reason that I can see having the grpmbr table at all is to do just that - i.e. group things of fundamentally different types.  I am not a fan of that idea but there may be use cases that require this?  Am I missing other reasons to have a grpmbr table?  And can someone provide some use cases that would require this?

cheers,
,Andy

On Thu, Feb 6, 2014 at 4:16 PM, Mara Kim <[hidden email]> wrote:
Hello gmoders!

This is a continuation of the Chado Comparative Module discussion (http://generic-model-organism-system-database.450254.n5.nabble.com/Chado-Comparative-Module-tp5712078.html), now renamed the Group Module.

The group table has now been renamed to the grp table to avoid SQL keyword conflicts. The last discussion moved the design into a more generic model with an intermediate grpmbr table that linked the table specific linker tables to a grpmbrprop and analysisgrpmbr table.  Module specific linker tables will be supplied by submodules developed as a subset of the group module.

One potential concern that I see that wasn't discussed during the conference call is that you could potentially link an organism and a feature (and anything else with a grpmbr linker table) to the same grpmbr_id.  Is this desirable behavior?

You can see the updated schematic (note that not all tables are pictured) and the putative SQL schema on the wiki (http://gmod.org/wiki/Chado_Group_Module)

--
Mara Kim

Ph.D. Candidate
Computational Biology
Vanderbilt University
Nashville, TN

------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema




------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk


_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema





--
Mara Kim

Ph.D. Candidate
Computational Biology
Vanderbilt University
Nashville, TN





--
Mara Kim

Ph.D. Candidate
Computational Biology
Vanderbilt University
Nashville, TN


------------------------------------------------------------------------------
Android apps run on BlackBerry 10
Introducing the new BlackBerry 10.2.1 Runtime for Android apps.
Now with support for Jelly Bean, Bluetooth, Mapview and more.
Get your Android app in front of a whole new audience.  Start now.
http://pubads.g.doubleclick.net/gampad/clk?id=124407151&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Chado Group Module

Andy Schroeder
Hi Stephen,

I too like the rank column in the grpmember table to help order the membership.  Although, It may not make sense to rank members of some groups so I'm not sure it should be part of a unique constraint.

In order to have a unique constraint rank needs to be part of it the way the table is currently set up
.
The ordering just comes as a potential side benefit.  It would be up to the user to decide if it conveyed order information or not.

I think it may be better to name our linker tables as object-grpmember because we have the same problem with the grpmember_cvterm table.  It appears at first glance to be used for grouping cvterms. 

I think it makes sense that a type_id exists for stocks and features because they always must have at least one type.  If we have a type_id in the grpmember table then we are implying that membership in a group inherently requires a membership type.  I do like that requirement.  But if the consensus is that we don't want to require a type for all group members then we could remove it from the grpmember table and just use the grpmember_cvterm table for specifying membership types.

Also, I think ChadoXML may also have problems with the nd_experiment table as it also doesn't have a unique constraint.  
That table is already set in the Chado v1.2 release.  Is it possible for ChadoXML to adapt to tables without unique contraints?

We do not currently use the nd_ module and because key tables in the module lack such constraints then we are quite unlikely to do so.  Lack of unique constraints are problematic for valid chadoXML (at least the sort dumped and consumed by XORT).  There are in fact a few tables in the original schema that also do not have usable unique constraints (eg. eimage) and we don't use them either. If we were to do so we would modify them accordingly.   That said groups have certainly developed mechanisms that do not rely on chadoXML and the unique constraints.

However,  we are definitely interested in using this grp module so I would hope that the unique constraint can be added.

Regarding the table naming and Mara's concern about linking multiple object types to a single grpmember_id that deserves more discussion and thought.  One could potentially set up triggers to manage it but not sure that is the best solution. 

cheers,
Andy

 

Thanks!
Stephen



On 2/10/2014 3:14 PM, Mara Kim wrote:
Hello all,

I have updated the schematic and putative SQL implementation on the wiki to use grpmember (as opposed to grpmbr) and added the proposed pub provenance tables suggested by Andy (not shown in schematic).  Also, the use of rank in grp was accidental, so I went ahead and removed it.  If anyone has a use case though, it could go back in.

I had to break from the standard pub table linker style by reversing the name for grpmember (ie. pub_grpmember as opposed to grpmember_pub).  This *could* be fixed by changing the style of the grpmember linker tables to object-grpmember instead of the other way around.  Personally I prefer the object-groupmember style, but I realize others had reasons for the current style (grpmember-object).  Do others have a preference?

I haven't added the rank column to the grpmember table, yet.  I kind of like it, but I also have some misgivings.  It still doesn't solve the problem of an organism and feature linking to the same grpmember row.  Of course, no reason why things *can't* be ordered in a group.  Definitely a point for more discussion.

@Andy:  The type_id column of grpmember was to be used to indicate different kinds of members to a group.  For example, labeling certain proteins as being "well connected" in a metabolic pathway.  However, I do think there is a strong argument for *not* having a type_id column as this kind of breaks the whole idea of groups being aggregates of similar things.  Another point for discussion.


On Mon, Feb 10, 2014 at 8:46 AM, Andy Schroeder <[hidden email]> wrote:
Hi again Mara,

Thinking on the grpmbr table a bit more I think a better way to have a unique key on the table would be to add a rank column rather than name or uniquename as I initially suggested.  That way if one wanted you could use rank as a mechanism to order members of a group.  The unique key could then be group_id and rank or group_id, type_id, rank.  Which brings up another question.  What do you see as the purpose/meaning of type_id in the grpmbr table?  It could be used to enforce only identical types of things being grouped but it could also be potentially interpreted other ways so curious as to proposed usage.

And while mentioning rank I am not sure that there needs to be a rank column in the grp table.  rank is usually used as part of a composite uniquekey eg. all the prop tables and can also be used to order things in respect to another relation eg. feature_relationship.rank.  I don't see that need in the grp table but I am likely missing something.

cheers,
Andy


On Fri, Feb 7, 2014 at 3:18 PM, Andy Schroeder <[hidden email]> wrote:
Hi Mara,

The missing tables for attribution that we would normally be likely to add would be:

grp_relationship_pub
grpprop_pub
grpmbrprop_pub

and the potentially problematic (based on Stephen's previously made point):

grpmbr_pub 

(for attribution of why a thing was added to a group) - could reverse it to pub_grpmbr I suppose.

What is needed for chado-xml representation would be something like a name or uniquename column in grpmbr that had a unique constraint on it.

cheers,
Andy


On Fri, Feb 7, 2014 at 2:10 PM, Mara Kim <[hidden email]> wrote:
Hi Andy,

We can definitely add more tables to the module, especially for pub provenance.  Do you have a list of tables that you would need for your purposes?  Also, what changes would make this module with chado-xml?

Hi Stephen,

I don't think your last email was sent to the listserv.


On Fri, Feb 7, 2014 at 12:54 PM, Andy Schroeder <[hidden email]> wrote:

I believe the grpmbr table is useful because it reduces the number of tables.  For example, if there was no grpmbr table then to link an organism to a group you would need three tables: grp_organism, grp_organism_cvterm, grp_organismprop,  and those tables get repeatedly created for every data type that can be grouped.  So, to support grouping of features, organisms, stocks, libraries, analyses, pubs, studies, assays, and projects, it would require 27 new tables in Chado. For every data type that can be grouped we have to add an additional 3 tables.  With the grpmbr table we need grpmbr, grpmbr_cvterm, grpmbrprop, plus one linker table for each data type so for the example set above would require 12 new tables. 

I can see  the utility of this approach if I'm understanding correctly.

However, as currently specified there is no unique key on  this table aside from the primary key.  That is not compatible with chado-xml or at least the tools we use with it and would prevent FlyBase, at least,  from using the module, which would  be a bummer.

The other benefit is that it helps clarify the meaning of table names. For example, if we want to associate an analysis with a group such that it describes an analysis that was performed using the group members then that table (following the example of the analysisfeature table) would be analysisgrp.  But suppose we also wanted to group a set of analyses, the linker tables would be grp_analysis.   So the table name of both puts the words in reverse order and it might be a point of confusion for some folks.  We have the same problem with the grp_pub and pub_grp tables (one for specifying a publication about a group and the other for grouping pubs).  By having the grpmbr table it's much easier to distinguish members of a group from annotations about the group.

At least at FlyBase we tend to attribute most things by linking to a pub so in order to attribute membership in a grp to a pub we would run into confusing table names with or without grpmbr.  

But on the other hand because we do attribute most everything using the grpmbr approach would mean we would only need to add a single grpmbrprop_pub table and the like.

And I agree with others that spelling out the names as much as possible is desirable.   

cheers,
Andy
Stephen



On 2/6/2014 4:59 PM, Andy Schroeder wrote:
Hi Mara et al.,

One potential concern that I see that wasn't discussed during the conference call is that you could potentially link an organism and a feature (and anything else with a grpmbr linker table) to the same grpmbr_id.  Is this desirable behavior?

The only reason that I can see having the grpmbr table at all is to do just that - i.e. group things of fundamentally different types.  I am not a fan of that idea but there may be use cases that require this?  Am I missing other reasons to have a grpmbr table?  And can someone provide some use cases that would require this?

cheers,
,Andy

On Thu, Feb 6, 2014 at 4:16 PM, Mara Kim <[hidden email]> wrote:
Hello gmoders!

This is a continuation of the Chado Comparative Module discussion (http://generic-model-organism-system-database.450254.n5.nabble.com/Chado-Comparative-Module-tp5712078.html), now renamed the Group Module.

The group table has now been renamed to the grp table to avoid SQL keyword conflicts. The last discussion moved the design into a more generic model with an intermediate grpmbr table that linked the table specific linker tables to a grpmbrprop and analysisgrpmbr table.  Module specific linker tables will be supplied by submodules developed as a subset of the group module.

One potential concern that I see that wasn't discussed during the conference call is that you could potentially link an organism and a feature (and anything else with a grpmbr linker table) to the same grpmbr_id.  Is this desirable behavior?

You can see the updated schematic (note that not all tables are pictured) and the putative SQL schema on the wiki (http://gmod.org/wiki/Chado_Group_Module)

--
Mara Kim

Ph.D. Candidate
Computational Biology
Vanderbilt University
Nashville, TN

------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema




------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk


_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema





--
Mara Kim

Ph.D. Candidate
Computational Biology
Vanderbilt University
Nashville, TN





--
Mara Kim

Ph.D. Candidate
Computational Biology
Vanderbilt University
Nashville, TN



------------------------------------------------------------------------------
Android apps run on BlackBerry 10
Introducing the new BlackBerry 10.2.1 Runtime for Android apps.
Now with support for Jelly Bean, Bluetooth, Mapview and more.
Get your Android app in front of a whole new audience.  Start now.
http://pubads.g.doubleclick.net/gampad/clk?id=124407151&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
12