Since storing a range, or multiple genetic or cytological positions per feature is so common,
we'd like to propose one of the following changes: 1. Change data type for fmin and fmax in featureloc to float. + table is already set up for min/max coordinates - table is not tied to a featuremap and therefore coordinate unit is unknown 2. Add a field to existing featurepos, type_id, to indicate what sort of position (e.g. start, end). + takes advantage of existing table, minimal change, adding a field shouldn't break existing code, views, triggers, et cetera. - ? 3. Create a new table, featureinterval with these fields: featureinterval_id featuremap_id (map set, to get coordinate units) feature_id (object feature being placed) srcfeature_id (target feature) startpos (double precision) endpos (double precision) + straight-forward to help newbies get started - duplicates some information already provided by featurepos table Ethy Cannon Naama Menda Steven Cannon ------------------------------------------------------------------------------ Introducing AppDynamics Lite, a free troubleshooting tool for Java/.NET Get 100% visibility into your production application - at no cost. Code-level diagnostics for performance bottlenecks with <2% overhead Download for free and get started troubleshooting in minutes. http://p.sf.net/sfu/appdyn_d2d_ap1 _______________________________________________ Gmod-schema mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/gmod-schema |
Hi Ethy, Naama and Steven,
How does the existing one to many relationship between feature and featureloc not meet your needs for modeling ranges or multi genetic or cytological positions? In FlyBase, we have many features that have multiple locations, so I'm not sure I understand what it is you are trying to address. Perhaps you can give us a use case? What is your reason for wanting to convert fmin/fmax in featureloc from an integer to a float? Float types in PostgreSQL come with very dire warnings about their use. http://www.postgresql.org/docs/9.1/static/datatype-numeric.html#DATATYPE-FLOAT 8.1.3. Floating-Point Types The data types real and double precision are inexact, variable-precision numeric types.... Inexact means that some values cannot be converted exactly to the internal format and are stored as approximations, so that storing and retrieving a value might show slight discrepancies. ... *Comparing two floating-point values for equality might not always work as expected.* That last statement makes this a non starter for me. The better type to use in place of a float is a numeric, but that is not without pitfalls of its own. http://www.postgresql.org/docs/9.1/static/datatype-numeric.html#DATATYPE-NUMERIC-DECIMAL 8.1.2. Arbitrary Precision Numbers ..."However, arithmetic on numeric values is very slow compared to the integer types, or to the floating-point types described in the next section." Doing anything that might slow down location queries is not ideal unless the benefits outweigh the costs. Cheers, Josh FlyBase On Thu, May 2, 2013 at 1:52 PM, Cannon, Ethalinda K [GDCBA] <[hidden email]> wrote: > Since storing a range, or multiple genetic or cytological positions per feature is so common, > we'd like to propose one of the following changes: > > 1. Change data type for fmin and fmax in featureloc to float. > > + table is already set up for min/max coordinates > - table is not tied to a featuremap and therefore coordinate unit is unknown > > 2. Add a field to existing featurepos, type_id, to indicate what sort of > position (e.g. start, end). > > + takes advantage of existing table, minimal change, adding a field > shouldn't break existing code, views, triggers, et cetera. > - ? > > 3. Create a new table, featureinterval with these fields: > featureinterval_id > featuremap_id (map set, to get coordinate units) > feature_id (object feature being placed) > srcfeature_id (target feature) > startpos (double precision) > endpos (double precision) > > + straight-forward to help newbies get started > - duplicates some information already provided by featurepos table > > > Ethy Cannon > Naama Menda > Steven Cannon > > > ------------------------------------------------------------------------------ > Introducing AppDynamics Lite, a free troubleshooting tool for Java/.NET > Get 100% visibility into your production application - at no cost. > Code-level diagnostics for performance bottlenecks with <2% overhead > Download for free and get started troubleshooting in minutes. > http://p.sf.net/sfu/appdyn_d2d_ap1 > _______________________________________________ > Gmod-schema mailing list > [hidden email] > https://lists.sourceforge.net/lists/listinfo/gmod-schema ------------------------------------------------------------------------------ Get 100% visibility into Java/.NET code with AppDynamics Lite It's a free troubleshooting tool designed for production Get down to code-level detail for bottlenecks, with <2% overhead. Download for free and get started troubleshooting in minutes. http://p.sf.net/sfu/appdyn_d2d_ap2 _______________________________________________ Gmod-schema mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/gmod-schema |
Hi Josh, Does FlyBase use the map module? I think the main thing they'd like to do is represent coordinates using inexact values (like cM). I feel like this ought to be done in the map module, but I don't know of a working example of doing that. Additionally, the map module itself is very poorly documented :-/Scott On Thu, May 2, 2013 at 3:10 PM, Josh Goodman <[hidden email]> wrote: Hi Ethy, Naama and Steven, -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research ------------------------------------------------------------------------------ Get 100% visibility into Java/.NET code with AppDynamics Lite It's a free troubleshooting tool designed for production Get down to code-level detail for bottlenecks, with <2% overhead. Download for free and get started troubleshooting in minutes. http://p.sf.net/sfu/appdyn_d2d_ap2 _______________________________________________ Gmod-schema mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/gmod-schema |
In reply to this post by Cannon, Ethalinda K [COM S]
Hi, We (in GDR, CottonGEN etc) use map module for genetic maps and we do it another way which is similar to #2. We have a custome table featureposprop and store positions like start, stop, and QTL peak.
I think, in general, adding prop table is better than changing the base table since it has less chance of breaking anyone else's code who already uses the table.. Sook
On Thu, May 2, 2013 at 1:52 PM, Cannon, Ethalinda K [GDCBA] <[hidden email]> wrote: Since storing a range, or multiple genetic or cytological positions per feature is so common, ------------------------------------------------------------------------------ Get 100% visibility into Java/.NET code with AppDynamics Lite It's a free troubleshooting tool designed for production Get down to code-level detail for bottlenecks, with <2% overhead. Download for free and get started troubleshooting in minutes. http://p.sf.net/sfu/appdyn_d2d_ap2 _______________________________________________ Gmod-schema mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/gmod-schema |
In reply to this post by Scott Cain
Hi all,
My main concern was with changing the featureloc table. That seems like the least favorable of the three options for a variety of reasons. I can't really speak to the merits of the other two, since we don't use the map module at FlyBase. My only suggestion would be to not use the float point type (double precision) in option 3, since what you put in isn't necessarily what you get out. We already have a handful of these in Chado and I think we would be better off by not perpetuating them if possible. Cheers, Josh On Thu, May 2, 2013 at 6:55 PM, Scott Cain <[hidden email]> wrote: > Hi Josh, > > Does FlyBase use the map module? I think the main thing they'd like to do > is represent coordinates using inexact values (like cM). I feel like this > ought to be done in the map module, but I don't know of a working example of > doing that. Additionally, the map module itself is very poorly documented > :-/ > > Scott > > > > On Thu, May 2, 2013 at 3:10 PM, Josh Goodman <[hidden email]> wrote: >> >> Hi Ethy, Naama and Steven, >> >> How does the existing one to many relationship between feature and >> featureloc not meet your needs for modeling ranges or multi genetic or >> cytological positions? In FlyBase, we have many features that have >> multiple locations, so I'm not sure I understand what it is you are >> trying to address. Perhaps you can give us a use case? >> >> What is your reason for wanting to convert fmin/fmax in featureloc >> from an integer to a float? Float types in PostgreSQL come with very >> dire warnings about their use. >> >> >> http://www.postgresql.org/docs/9.1/static/datatype-numeric.html#DATATYPE-FLOAT >> >> 8.1.3. Floating-Point Types >> The data types real and double precision are inexact, >> variable-precision numeric types.... >> Inexact means that some values cannot be converted exactly to the >> internal format and are stored as approximations, so that storing and >> retrieving a value might show slight discrepancies. >> ... >> *Comparing two floating-point values for equality might not always >> work as expected.* >> >> That last statement makes this a non starter for me. The better type >> to use in place of a float is a numeric, but that is not without >> pitfalls of its own. >> >> >> http://www.postgresql.org/docs/9.1/static/datatype-numeric.html#DATATYPE-NUMERIC-DECIMAL >> >> 8.1.2. Arbitrary Precision Numbers >> >> ..."However, arithmetic on numeric values is very slow compared to the >> integer types, or to the floating-point types described in the next >> section." >> >> Doing anything that might slow down location queries is not ideal >> unless the benefits outweigh the costs. >> >> Cheers, >> Josh >> FlyBase >> >> On Thu, May 2, 2013 at 1:52 PM, Cannon, Ethalinda K [GDCBA] >> <[hidden email]> wrote: >> > Since storing a range, or multiple genetic or cytological positions per >> > feature is so common, >> > we'd like to propose one of the following changes: >> > >> > 1. Change data type for fmin and fmax in featureloc to float. >> > >> > + table is already set up for min/max coordinates >> > - table is not tied to a featuremap and therefore coordinate unit is >> > unknown >> > >> > 2. Add a field to existing featurepos, type_id, to indicate what sort of >> > position (e.g. start, end). >> > >> > + takes advantage of existing table, minimal change, adding a field >> > shouldn't break existing code, views, triggers, et cetera. >> > - ? >> > >> > 3. Create a new table, featureinterval with these fields: >> > featureinterval_id >> > featuremap_id (map set, to get coordinate units) >> > feature_id (object feature being placed) >> > srcfeature_id (target feature) >> > startpos (double precision) >> > endpos (double precision) >> > >> > + straight-forward to help newbies get started >> > - duplicates some information already provided by featurepos table >> > >> > >> > Ethy Cannon >> > Naama Menda >> > Steven Cannon >> > >> > >> > >> > ------------------------------------------------------------------------------ >> > Introducing AppDynamics Lite, a free troubleshooting tool for Java/.NET >> > Get 100% visibility into your production application - at no cost. >> > Code-level diagnostics for performance bottlenecks with <2% overhead >> > Download for free and get started troubleshooting in minutes. >> > http://p.sf.net/sfu/appdyn_d2d_ap1 >> > _______________________________________________ >> > Gmod-schema mailing list >> > [hidden email] >> > https://lists.sourceforge.net/lists/listinfo/gmod-schema >> >> >> ------------------------------------------------------------------------------ >> Get 100% visibility into Java/.NET code with AppDynamics Lite >> It's a free troubleshooting tool designed for production >> Get down to code-level detail for bottlenecks, with <2% overhead. >> >> Download for free and get started troubleshooting in minutes. >> http://p.sf.net/sfu/appdyn_d2d_ap2 >> >> _______________________________________________ >> Gmod-schema mailing list >> [hidden email] >> https://lists.sourceforge.net/lists/listinfo/gmod-schema > > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot > net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research ------------------------------------------------------------------------------ Get 100% visibility into Java/.NET code with AppDynamics Lite It's a free troubleshooting tool designed for production Get down to code-level detail for bottlenecks, with <2% overhead. Download for free and get started troubleshooting in minutes. http://p.sf.net/sfu/appdyn_d2d_ap2 _______________________________________________ Gmod-schema mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/gmod-schema |
In reply to this post by Josh Goodman
Sorry, I sent the response below directly to Josh by accident but would like to see if there are additional reactions, so reposting to the list.
After some more talk amongst ourselves, we're proposing taking this a step further and making two changes to featurepos: - add a type_id field - change the type of mappos to numeric I think we could also accept the featureposprop solution that Sook uses, but would then like to see that table become part of the chado schema and rather than an add-on table that everyone who needs to store start and end genetic coordinates will have to create. In either case, changing the type of featurepos.mappos may be advisable as one of our objectives is to be able to retrieve features within a range of coordinates, which will mean numeric comparisons. It sounds like there's a trade-off regarding the data type, with numeric fields being more accurate and arbitrary precision fields being the faster of the two. I suggest that while speed is desirable, accuracy is more important for genetic markers and QTLs. Obviously, we don't want to break anything. Is there any sense for how many databases are using the featurepos table, in addition to GDR and CottonGen? How have changes to existing tables been handled in the past? Ethy ________________________________________ From: Cannon, Ethalinda K [GDCBA] Sent: Thursday, May 02, 2013 3:11 PM To: Josh Goodman Subject: RE: [Gmod-schema] Proposed changes to map module Thanks for your response, Josh, I can see the reasons for your discomfort with non-integer fields. The reason we would like some sort of non-integer data type is that genetic positions are not integers. How do you store non-integer positions and map values? We could multiply them by 100 (or 1000, or 10,000) and store them as integers but then we'd need to be clear that's what we did so they could be converted back to the proper values for display or calculations. The numeric type does look better than double precision. Not being able to test for equality isn't an issue because genetic positions are approximations and testing if two are equivalent doesn't seem to make much sense. Another reason to not use the featureloc table is the need to link the position back to a specific map set (featuremap) and its unit (usually cM in our case). We need one-to-two relationships if using the featurepos table (which records only one position) so that we can record beginning and end coordinates of QTLs and linkage group maps. At this point, as we talk over the options in our earlier note, we are liking option 2 best. That way featureloc will be unchanged (and not slowed down by numeric fields) and featurepos already contains a float field (mappos). Ethy Naama Steven ________________________________________ From: Josh Goodman [[hidden email]] Sent: Thursday, May 02, 2013 2:10 PM To: Cannon, Ethalinda K [GDCBA] Cc: GMOD Schema/Chado List Subject: Re: [Gmod-schema] Proposed changes to map module Hi Ethy, Naama and Steven, How does the existing one to many relationship between feature and featureloc not meet your needs for modeling ranges or multi genetic or cytological positions? In FlyBase, we have many features that have multiple locations, so I'm not sure I understand what it is you are trying to address. Perhaps you can give us a use case? What is your reason for wanting to convert fmin/fmax in featureloc from an integer to a float? Float types in PostgreSQL come with very dire warnings about their use. http://www.postgresql.org/docs/9.1/static/datatype-numeric.html#DATATYPE-FLOAT 8.1.3. Floating-Point Types The data types real and double precision are inexact, variable-precision numeric types.... Inexact means that some values cannot be converted exactly to the internal format and are stored as approximations, so that storing and retrieving a value might show slight discrepancies. ... *Comparing two floating-point values for equality might not always work as expected.* That last statement makes this a non starter for me. The better type to use in place of a float is a numeric, but that is not without pitfalls of its own. http://www.postgresql.org/docs/9.1/static/datatype-numeric.html#DATATYPE-NUMERIC-DECIMAL 8.1.2. Arbitrary Precision Numbers ..."However, arithmetic on numeric values is very slow compared to the integer types, or to the floating-point types described in the next section." Doing anything that might slow down location queries is not ideal unless the benefits outweigh the costs. Cheers, Josh FlyBase On Thu, May 2, 2013 at 1:52 PM, Cannon, Ethalinda K [GDCBA] <[hidden email]> wrote: > Since storing a range, or multiple genetic or cytological positions per feature is so common, > we'd like to propose one of the following changes: > > 1. Change data type for fmin and fmax in featureloc to float. > > + table is already set up for min/max coordinates > - table is not tied to a featuremap and therefore coordinate unit is unknown > > 2. Add a field to existing featurepos, type_id, to indicate what sort of > position (e.g. start, end). > > + takes advantage of existing table, minimal change, adding a field > shouldn't break existing code, views, triggers, et cetera. > - ? > > 3. Create a new table, featureinterval with these fields: > featureinterval_id > featuremap_id (map set, to get coordinate units) > feature_id (object feature being placed) > srcfeature_id (target feature) > startpos (double precision) > endpos (double precision) > > + straight-forward to help newbies get started > - duplicates some information already provided by featurepos table > > > Ethy Cannon > Naama Menda > Steven Cannon > > > ------------------------------------------------------------------------------ > Introducing AppDynamics Lite, a free troubleshooting tool for Java/.NET > Get 100% visibility into your production application - at no cost. > Code-level diagnostics for performance bottlenecks with <2% overhead > Download for free and get started troubleshooting in minutes. > http://p.sf.net/sfu/appdyn_d2d_ap1 > _______________________________________________ > Gmod-schema mailing list > [hidden email] > https://lists.sourceforge.net/lists/listinfo/gmod-schema ------------------------------------------------------------------------------ Get 100% visibility into Java/.NET code with AppDynamics Lite It's a free troubleshooting tool designed for production Get down to code-level detail for bottlenecks, with <2% overhead. Download for free and get started troubleshooting in minutes. http://p.sf.net/sfu/appdyn_d2d_ap2 _______________________________________________ Gmod-schema mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/gmod-schema |
In reply to this post by Scott Cain
Hi Scott,
Somehow related and have some generic questions, what's the current practice to submit changes to chado schema in general. What generally goes to contrib section or add ons or plugins. Say, if the changes are in the core table, should it go to contrib section. And of course how do somebody submit changes/patches, which repository. thanks, -siddhartha On Thu, 02 May 2013, Scott Cain wrote: > Hi Josh, > > Does FlyBase use the map module? I think the main thing they'd like to do > is represent coordinates using inexact values (like cM). I feel like this > ought to be done in the map module, but I don't know of a working example > of doing that. Additionally, the map module itself is very poorly > documented :-/ > > Scott > > On Thu, May 2, 2013 at 3:10 PM, Josh Goodman <[hidden email]> wrote: > > Hi Ethy, Naama and Steven, > > How does the existing one to many relationship between feature and > featureloc not meet your needs for modeling ranges or multi genetic or > cytological positions? In FlyBase, we have many features that have > multiple locations, so I'm not sure I understand what it is you are > trying to address. Perhaps you can give us a use case? > > What is your reason for wanting to convert fmin/fmax in featureloc > from an integer to a float? Float types in PostgreSQL come with very > dire warnings about their use. > > http://www.postgresql.org/docs/9.1/static/datatype-numeric.html#DATATYPE-FLOAT > > 8.1.3. Floating-Point Types > The data types real and double precision are inexact, > variable-precision numeric types.... > Inexact means that some values cannot be converted exactly to the > internal format and are stored as approximations, so that storing and > retrieving a value might show slight discrepancies. > ... > *Comparing two floating-point values for equality might not always > work as expected.* > > That last statement makes this a non starter for me. The better type > to use in place of a float is a numeric, but that is not without > pitfalls of its own. > > http://www.postgresql.org/docs/9.1/static/datatype-numeric.html#DATATYPE-NUMERIC-DECIMAL > > 8.1.2. Arbitrary Precision Numbers > > ..."However, arithmetic on numeric values is very slow compared to the > integer types, or to the floating-point types described in the next > section." > > Doing anything that might slow down location queries is not ideal > unless the benefits outweigh the costs. > > Cheers, > Josh > FlyBase > On Thu, May 2, 2013 at 1:52 PM, Cannon, Ethalinda K [GDCBA] > <[hidden email]> wrote: > > Since storing a range, or multiple genetic or cytological positions > per feature is so common, > > we'd like to propose one of the following changes: > > > > 1. Change data type for fmin and fmax in featureloc to float. > > > > + table is already set up for min/max coordinates > > - table is not tied to a featuremap and therefore coordinate unit > is unknown > > > > 2. Add a field to existing featurepos, type_id, to indicate what sort > of > > position (e.g. start, end). > > > > + takes advantage of existing table, minimal change, adding a > field > > shouldn't break existing code, views, triggers, et cetera. > > - ? > > > > 3. Create a new table, featureinterval with these fields: > > featureinterval_id > > featuremap_id (map set, to get coordinate units) > > feature_id (object feature being placed) > > srcfeature_id (target feature) > > startpos (double precision) > > endpos (double precision) > > > > + straight-forward to help newbies get started > > - duplicates some information already provided by featurepos table > > > > > > Ethy Cannon > > Naama Menda > > Steven Cannon > > > > > > > ------------------------------------------------------------------------------ > > Introducing AppDynamics Lite, a free troubleshooting tool for > Java/.NET > > Get 100% visibility into your production application - at no cost. > > Code-level diagnostics for performance bottlenecks with <2% overhead > > Download for free and get started troubleshooting in minutes. > > http://p.sf.net/sfu/appdyn_d2d_ap1 > > _______________________________________________ > > Gmod-schema mailing list > > [hidden email] > > https://lists.sourceforge.net/lists/listinfo/gmod-schema > > ------------------------------------------------------------------------------ > Get 100% visibility into Java/.NET code with AppDynamics Lite > It's a free troubleshooting tool designed for production > Get down to code-level detail for bottlenecks, with <2% overhead. > Download for free and get started troubleshooting in minutes. > http://p.sf.net/sfu/appdyn_d2d_ap2 > _______________________________________________ > Gmod-schema mailing list > [hidden email] > https://lists.sourceforge.net/lists/listinfo/gmod-schema > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain > dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > ------------------------------------------------------------------------------ > Get 100% visibility into Java/.NET code with AppDynamics Lite > It's a free troubleshooting tool designed for production > Get down to code-level detail for bottlenecks, with <2% overhead. > Download for free and get started troubleshooting in minutes. > http://p.sf.net/sfu/appdyn_d2d_ap2 > _______________________________________________ > Gmod-schema mailing list > [hidden email] > https://lists.sourceforge.net/lists/listinfo/gmod-schema ------------------------------------------------------------------------------ Get 100% visibility into Java/.NET code with AppDynamics Lite It's a free troubleshooting tool designed for production Get down to code-level detail for bottlenecks, with <2% overhead. Download for free and get started troubleshooting in minutes. http://p.sf.net/sfu/appdyn_d2d_ap2 _______________________________________________ Gmod-schema mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/gmod-schema |
Free forum by Nabble | Edit this page |