Natural diversity module and phenotype cvterm values

classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

Natural diversity module and phenotype cvterm values

Naama Menda
hi,

while working with the natural diversity module, I started loading typical data from potato breeders.
The first hurdle is mapping phenotypes and traits to ontology terms.
Many breeders love to define a trait ('fruit color' ) and assign the values as categories (1=round, 2 = elongated, etc.).
Some traits have a more logical scale (e.g. 'tuber appearance' 1=poor ... 9=excellent )

Since the values cannot always be mapped to PATO, and we'd like to have the data searchable by the breeder-defined exact value ,
I'm adding the scales as cvtermprops (cvterm 'fruit color' has n properties of type 'scale' with the values being the breeder defined shape values).

Now I'm running into traits with multiple scales. One group uses for trait x scale y, and the other uses for the same trait x a different scale z.
Would it make more sense to store 1 scale for each cvterm (and with each new scale add a new cvterm sibling) or add multiple prop types pointing to the same cvterm? (prop type 1 = scale y, prop type 2 = scale z) ?

thanks!
-Naama



Naama Menda
Boyce Thompson Institute for Plant Research
Tower Rd
Ithaca NY 14853
USA

(607) 254 3569
Sol Genomics Network
http://solgenomics.net/
[hidden email]

------------------------------------------------------------------------------


_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Natural diversity module and phenotype cvterm values

Sook Jung
Hi Naama,

That was one of the questions that I wrote in the Use Case site for the natural diversity module..

I prefer the second option - multiple prop types for the same cvterm. When users browse the phenotype CV terms, it would be nice to list one cvterm (fruit color) instead of multiples (fruit color 1-3, fruit color 1-5, etc) for the same phenotype.

But we will still be able to make users to search specifically. For example, users could search for apple trees with fruit color over 3 when the fruit color is measured 1 to 3.

Thanks

Sook


Now I'm running into traits with multiple scales. One group uses for trait x scale y, and the other uses for the same trait x a different scale z.
Would it make more sense to store 1 scale for each cvterm (and with each new scale add a new cvterm sibling) or add multiple prop types pointing to the same cvterm? (prop type 1 = scale y, prop type 2 = scale z) ?

thanks!
-Naama



Naama Menda
Boyce Thompson Institute for Plant Research
Tower Rd
Ithaca NY 14853
USA

(607) 254 3569
Sol Genomics Network
http://solgenomics.net/
[hidden email]



------------------------------------------------------------------------------


_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Natural diversity module and phenotype cvterm values

Pantelis Topalis
    Hi Naama and Sook,

As an ontology developer for AnoBase I have several concerns relatively to the use of prop types for a cvterm. The main one is the mixing
of ontologies which are very flexible and dynamic with a form of a classification scheme which breeders love to have
"Many breeders love to define a trait ('fruit color' ) and assign the values as categories (1=round, 2 = elongated, etc.)."
Sometimes it is difficult to persuade them all to use the very same scheme or to make changes and someone needs to have in mind the "code used". Otherwise a "parser" is needed to convert between the different schemes used by different groups.

On the other hand PATO is a generic ontology of qualities which does not always cover the data that need annotation. Ideally, a
Potato trait ontology is needed here using both anatomical terms and PATO entities to bridge the gap. There is already a similar ontology for cereal plant traits (http://bioportal.bioontology.org/visualize/42824) and for our (VectorBase) purposes there will be another one.

Ontology development is a lengthy process, so in the mean time I would suggest to you what we are doing in VectorBase (following FlyBase's paradigm). We are creating and maintaining an unstructured controlled vocabulary with all the terms needed for the database but they are not covered presently in any existing ontology. Those terms will be replaced by their "proper ontology" counterparts when those will be available.

Greetings,
Pantelis Topalis
Ontology Developer
VectorBase @ IMBB

On 19/5/2010 9:51 μμ, Sook Jung wrote:
Hi Naama,

That was one of the questions that I wrote in the Use Case site for the natural diversity module..

I prefer the second option - multiple prop types for the same cvterm. When users browse the phenotype CV terms, it would be nice to list one cvterm (fruit color) instead of multiples (fruit color 1-3, fruit color 1-5, etc) for the same phenotype.

But we will still be able to make users to search specifically. For example, users could search for apple trees with fruit color over 3 when the fruit color is measured 1 to 3.

Thanks

Sook


Now I'm running into traits with multiple scales. One group uses for trait x scale y, and the other uses for the same trait x a different scale z.
Would it make more sense to store 1 scale for each cvterm (and with each new scale add a new cvterm sibling) or add multiple prop types pointing to the same cvterm? (prop type 1 = scale y, prop type 2 = scale z) ?

thanks!
-Naama



Naama Menda
Boyce Thompson Institute for Plant Research
Tower Rd
Ithaca NY 14853
USA

(607) 254 3569
Sol Genomics Network
http://solgenomics.net/
[hidden email]


------------------------------------------------------------------------------ __________ Information from ESET NOD32 Antivirus, version of virus signature database 5129 (20100519) __________ The message was checked by ESET NOD32 Antivirus. http://www.eset.com
_______________________________________________ Gmod-schema mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/gmod-schema __________ Information from ESET NOD32 Antivirus, version of virus signature database 5129 (20100519) __________ The message was checked by ESET NOD32 Antivirus. http://www.eset.com


------------------------------------------------------------------------------


_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Natural diversity module and phenotype cvterm values

Naama Menda
hi Pantelis

yes, we are developing our own ontology (Solanaceae Phenotype http://solgenomics.net/chado/cvterm.pl?action=view&cvterm_id=23057)

which is mapped to PO and PATO whenever applicable.

 My question is how to handle 'value' terms, such as color names and shape names.
One way is to have all possible fruit shapes as children of the term 'fruit shape' and the other is to store those as properties of 'fruit shape'.

The other issue is the numeric categories. I have phenotyping files filled with numbers, which refer to a quality or a value. Breeders like to assign numbers instead of writing the actual value. It is easier to work this way in the field, and there are less errors, but we have to store these scales as properties of the relevant cvterm.

thanks!
-Naama


2010/5/20 Pantelis Topalis <[hidden email]>
    Hi Naama and Sook,

As an ontology developer for AnoBase I have several concerns relatively to the use of prop types for a cvterm. The main one is the mixing
of ontologies which are very flexible and dynamic with a form of a classification scheme which breeders love to have
"Many breeders love to define a trait ('fruit color' ) and assign the values as categories (1=round, 2 = elongated, etc.)."
Sometimes it is difficult to persuade them all to use the very same scheme or to make changes and someone needs to have in mind the "code used". Otherwise a "parser" is needed to convert between the different schemes used by different groups.

On the other hand PATO is a generic ontology of qualities which does not always cover the data that need annotation. Ideally, a
Potato trait ontology is needed here using both anatomical terms and PATO entities to bridge the gap. There is already a similar ontology for cereal plant traits (http://bioportal.bioontology.org/visualize/42824) and for our (VectorBase) purposes there will be another one.

Ontology development is a lengthy process, so in the mean time I would suggest to you what we are doing in VectorBase (following FlyBase's paradigm). We are creating and maintaining an unstructured controlled vocabulary with all the terms needed for the database but they are not covered presently in any existing ontology. Those terms will be replaced by their "proper ontology" counterparts when those will be available.

Greetings,
Pantelis Topalis
Ontology Developer
VectorBase @ IMBB

On 19/5/2010 9:51 μμ, Sook Jung wrote:
Hi Naama,

That was one of the questions that I wrote in the Use Case site for the natural diversity module..

I prefer the second option - multiple prop types for the same cvterm. When users browse the phenotype CV terms, it would be nice to list one cvterm (fruit color) instead of multiples (fruit color 1-3, fruit color 1-5, etc) for the same phenotype.

But we will still be able to make users to search specifically. For example, users could search for apple trees with fruit color over 3 when the fruit color is measured 1 to 3.

Thanks

Sook


Now I'm running into traits with multiple scales. One group uses for trait x scale y, and the other uses for the same trait x a different scale z.
Would it make more sense to store 1 scale for each cvterm (and with each new scale add a new cvterm sibling) or add multiple prop types pointing to the same cvterm? (prop type 1 = scale y, prop type 2 = scale z) ?

thanks!
-Naama



Naama Menda
Boyce Thompson Institute for Plant Research
Tower Rd
Ithaca NY 14853
USA

(607) 254 3569
Sol Genomics Network
http://solgenomics.net/
[hidden email]


------------------------------------------------------------------------------ __________ Information from ESET NOD32 Antivirus, version of virus signature database 5129 (20100519) __________ The message was checked by ESET NOD32 Antivirus. http://www.eset.com
_______________________________________________ Gmod-schema mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/gmod-schema __________ Information from ESET NOD32 Antivirus, version of virus signature database 5129 (20100519) __________ The message was checked by ESET NOD32 Antivirus. http://www.eset.com



------------------------------------------------------------------------------


_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Natural diversity module and phenotype cvterm values

Bob MacCallum
So let's see if I understand this:

Let's say fruit shape is described by one group of researchers as
1=round
2=square
3=triangular

you'd assign the phenotype as
observable_id = fruit_id
attr_id = fruit_shape_id
value = 2

cvtermprop:
cvterm_id = fruit_shape_id
type_id = enumeration_id ?
value = square
rank = 2

and the ranks would be used to translate from value (2) to display
text ("square").

That would seem to work.


If there are two different numbering systems I guess you have to
create different cvterms for them

texas_fruit_shape
1=round
2=square...

london_fruit_shape
1=square
2=triangular...

could they both have the same cvtermsynonym:
value = "fruit shape"
type_id = "phenotype display name"


On the other hand, translating these shorthand codes (for that is what
it seems they are) at data-entry time into ontology terms, as I think
Pantelis suggested, might make for a more usable database.   It
doesn't sound like a great idea to have people searching public
databases with their own local shorthand codes (unless these were
synonyms to "proper" ontology terms).




On Thu, May 20, 2010 at 2:44 PM, Naama Menda <[hidden email]> wrote:

> hi Pantelis
>
> yes, we are developing our own ontology (Solanaceae Phenotype
> http://solgenomics.net/chado/cvterm.pl?action=view&cvterm_id=23057)
>
> which is mapped to PO and PATO whenever applicable.
>
>  My question is how to handle 'value' terms, such as color names and shape
> names.
> One way is to have all possible fruit shapes as children of the term 'fruit
> shape' and the other is to store those as properties of 'fruit shape'.
>
> The other issue is the numeric categories. I have phenotyping files filled
> with numbers, which refer to a quality or a value. Breeders like to assign
> numbers instead of writing the actual value. It is easier to work this way
> in the field, and there are less errors, but we have to store these scales
> as properties of the relevant cvterm.
>
> thanks!
> -Naama
>
>
> 2010/5/20 Pantelis Topalis <[hidden email]>
>>
>>     Hi Naama and Sook,
>>
>> As an ontology developer for AnoBase I have several concerns relatively to
>> the use of prop types for a cvterm. The main one is the mixing
>> of ontologies which are very flexible and dynamic with a form of a
>> classification scheme which breeders love to have
>> "Many breeders love to define a trait ('fruit color' ) and assign the
>> values as categories (1=round, 2 = elongated, etc.)."
>> Sometimes it is difficult to persuade them all to use the very same scheme
>> or to make changes and someone needs to have in mind the "code used".
>> Otherwise a "parser" is needed to convert between the different schemes used
>> by different groups.
>>
>> On the other hand PATO is a generic ontology of qualities which does not
>> always cover the data that need annotation. Ideally, a
>> Potato trait ontology is needed here using both anatomical terms and PATO
>> entities to bridge the gap. There is already a similar ontology for cereal
>> plant traits (http://bioportal.bioontology.org/visualize/42824) and for our
>> (VectorBase) purposes there will be another one.
>>
>> Ontology development is a lengthy process, so in the mean time I would
>> suggest to you what we are doing in VectorBase (following FlyBase's
>> paradigm). We are creating and maintaining an unstructured controlled
>> vocabulary with all the terms needed for the database but they are not
>> covered presently in any existing ontology. Those terms will be replaced by
>> their "proper ontology" counterparts when those will be available.
>>
>> Greetings,
>> Pantelis Topalis
>> Ontology Developer
>> VectorBase @ IMBB
>>
>> On 19/5/2010 9:51 μμ, Sook Jung wrote:
>>
>> Hi Naama,
>>
>> That was one of the questions that I wrote in the Use Case site for the
>> natural diversity module..
>>
>> I prefer the second option - multiple prop types for the same cvterm. When
>> users browse the phenotype CV terms, it would be nice to list one cvterm
>> (fruit color) instead of multiples (fruit color 1-3, fruit color 1-5, etc)
>> for the same phenotype.
>>
>> But we will still be able to make users to search specifically. For
>> example, users could search for apple trees with fruit color over 3 when the
>> fruit color is measured 1 to 3.
>>
>> Thanks
>>
>> Sook
>>
>>>
>>> Now I'm running into traits with multiple scales. One group uses for
>>> trait x scale y, and the other uses for the same trait x a different scale
>>> z.
>>> Would it make more sense to store 1 scale for each cvterm (and with each
>>> new scale add a new cvterm sibling) or add multiple prop types pointing to
>>> the same cvterm? (prop type 1 = scale y, prop type 2 = scale z) ?
>>>
>>> thanks!
>>> -Naama
>>>
>>>
>>>
>>> Naama Menda
>>> Boyce Thompson Institute for Plant Research
>>> Tower Rd
>>> Ithaca NY 14853
>>> USA
>>>
>>> (607) 254 3569
>>> Sol Genomics Network
>>> http://solgenomics.net/
>>> [hidden email]
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>>
>>
>>
>>
>> __________ Information from ESET NOD32 Antivirus, version of virus
>> signature database 5129 (20100519) __________
>>
>> The message was checked by ESET NOD32 Antivirus.
>>
>> http://www.eset.com
>>
>>
>>
>> _______________________________________________
>> Gmod-schema mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>
>>
>>
>> __________ Information from ESET NOD32 Antivirus, version of virus
>> signature database 5129 (20100519) __________
>>
>> The message was checked by ESET NOD32 Antivirus.
>>
>> http://www.eset.com
>>
>>
>
>

------------------------------------------------------------------------------

_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Natural diversity module and phenotype cvterm values

Sook Jung
Hi,
Phenotype table has 'assay_id' which we don't really need now since we have nd_protocol table. How about using assay_id for the various scales? Below is a modified scheme of Bob's.
Thanks
Sook



Let's say fruit shape is described by one group of researchers as
1=round
2=square
3=triangular

you'd assign the phenotype as
observable_id = fruit_id
attr_id = fruit_shape_id
value = 2 
assay_id = texas_fruit_shape_id


so texas_fruit_shape_id is a cvterm for phenotype.assay_id
And the individual scale is defined in the prop table
 
cvtermprop:
cvterm_id =texas_ fruit_shape_id
type_id = enumeration_id ?
value = square
rank = 2

and the ranks would be used to translate from value (2) to display
text ("square").


------------------------------------------------------------------------------


_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Natural diversity module and phenotype cvterm values

Naama Menda
the assay_id in phenptype refers to a cvterm for the evidence code of the observation.

-Naama



On Thu, May 20, 2010 at 11:33 AM, Sook Jung <[hidden email]> wrote:
Hi,
Phenotype table has 'assay_id' which we don't really need now since we have nd_protocol table. How about using assay_id for the various scales? Below is a modified scheme of Bob's.
Thanks
Sook



Let's say fruit shape is described by one group of researchers as
1=round
2=square
3=triangular

you'd assign the phenotype as
observable_id = fruit_id
attr_id = fruit_shape_id
value = 2 
assay_id = texas_fruit_shape_id


so texas_fruit_shape_id is a cvterm for phenotype.assay_id
And the individual scale is defined in the prop table
 
cvtermprop:
cvterm_id =texas_ fruit_shape_id

type_id = enumeration_id ?
value = square
rank = 2

and the ranks would be used to translate from value (2) to display
text ("square").



------------------------------------------------------------------------------


_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Natural diversity module and phenotype cvterm values

Naama Menda
In reply to this post by Bob MacCallum
answering inline bellow:


On Thu, May 20, 2010 at 10:48 AM, Bob MacCallum <[hidden email]> wrote:
So let's see if I understand this:

Let's say fruit shape is described by one group of researchers as
1=round
2=square
3=triangular

you'd assign the phenotype as
observable_id = fruit_id
attr_id = fruit_shape_id
value = 2

cvtermprop:
cvterm_id = fruit_shape_id
type_id = enumeration_id ?
value = square
rank = 2

and the ranks would be used to translate from value (2) to display
text ("square").

That would seem to work.

yes, but I think the 'value' should be 'round' with rank= 2 , and the loading script would pick the term 'round' from cvtermprop by rank.


If there are two different numbering systems I guess you have to
create different cvterms for them

This I'm not so sure about .
Since all phenotypes refer to the same attribute (fruit shape) I think it's better to have multiple scales fro the same cvterm. When loading new data , the script will have to know which scale is used for fruit shape, and look it up in cvtermprop (type_id = 'texas scale' )
 
texas_fruit_shape
1=round
2=square...

london_fruit_shape
1=square
2=triangular...

could they both have the same cvtermsynonym:
value = "fruit shape"
type_id = "phenotype display name"


On the other hand, translating these shorthand codes (for that is what
it seems they are) at data-entry time into ontology terms, as I think
Pantelis suggested, might make for a more usable database.   It
doesn't sound like a great idea to have people searching public
databases with their own local shorthand codes (unless these were
synonyms to "proper" ontology terms).


Yes, we are trying to map each breeder-specific  terms to existing ones, and adding synonyms whenever possible.


-Naama




On Thu, May 20, 2010 at 2:44 PM, Naama Menda <[hidden email]> wrote:
> hi Pantelis
>
> yes, we are developing our own ontology (Solanaceae Phenotype
> http://solgenomics.net/chado/cvterm.pl?action=view&cvterm_id=23057)
>
> which is mapped to PO and PATO whenever applicable.
>
>  My question is how to handle 'value' terms, such as color names and shape
> names.
> One way is to have all possible fruit shapes as children of the term 'fruit
> shape' and the other is to store those as properties of 'fruit shape'.
>
> The other issue is the numeric categories. I have phenotyping files filled
> with numbers, which refer to a quality or a value. Breeders like to assign
> numbers instead of writing the actual value. It is easier to work this way
> in the field, and there are less errors, but we have to store these scales
> as properties of the relevant cvterm.
>
> thanks!
> -Naama
>
>
> 2010/5/20 Pantelis Topalis <[hidden email]>
>>
>>     Hi Naama and Sook,
>>
>> As an ontology developer for AnoBase I have several concerns relatively to
>> the use of prop types for a cvterm. The main one is the mixing
>> of ontologies which are very flexible and dynamic with a form of a
>> classification scheme which breeders love to have
>> "Many breeders love to define a trait ('fruit color' ) and assign the
>> values as categories (1=round, 2 = elongated, etc.)."
>> Sometimes it is difficult to persuade them all to use the very same scheme
>> or to make changes and someone needs to have in mind the "code used".
>> Otherwise a "parser" is needed to convert between the different schemes used
>> by different groups.
>>
>> On the other hand PATO is a generic ontology of qualities which does not
>> always cover the data that need annotation. Ideally, a
>> Potato trait ontology is needed here using both anatomical terms and PATO
>> entities to bridge the gap. There is already a similar ontology for cereal
>> plant traits (http://bioportal.bioontology.org/visualize/42824) and for our
>> (VectorBase) purposes there will be another one.
>>
>> Ontology development is a lengthy process, so in the mean time I would
>> suggest to you what we are doing in VectorBase (following FlyBase's
>> paradigm). We are creating and maintaining an unstructured controlled
>> vocabulary with all the terms needed for the database but they are not
>> covered presently in any existing ontology. Those terms will be replaced by
>> their "proper ontology" counterparts when those will be available.
>>
>> Greetings,
>> Pantelis Topalis
>> Ontology Developer
>> VectorBase @ IMBB
>>
>> On 19/5/2010 9:51 μμ, Sook Jung wrote:
>>
>> Hi Naama,
>>
>> That was one of the questions that I wrote in the Use Case site for the
>> natural diversity module..
>>
>> I prefer the second option - multiple prop types for the same cvterm. When
>> users browse the phenotype CV terms, it would be nice to list one cvterm
>> (fruit color) instead of multiples (fruit color 1-3, fruit color 1-5, etc)
>> for the same phenotype.
>>
>> But we will still be able to make users to search specifically. For
>> example, users could search for apple trees with fruit color over 3 when the
>> fruit color is measured 1 to 3.
>>
>> Thanks
>>
>> Sook
>>
>>>
>>> Now I'm running into traits with multiple scales. One group uses for
>>> trait x scale y, and the other uses for the same trait x a different scale
>>> z.
>>> Would it make more sense to store 1 scale for each cvterm (and with each
>>> new scale add a new cvterm sibling) or add multiple prop types pointing to
>>> the same cvterm? (prop type 1 = scale y, prop type 2 = scale z) ?
>>>
>>> thanks!
>>> -Naama
>>>
>>>
>>>
>>> Naama Menda
>>> Boyce Thompson Institute for Plant Research
>>> Tower Rd
>>> Ithaca NY 14853
>>> USA
>>>
>>> (607) 254 3569
>>> Sol Genomics Network
>>> http://solgenomics.net/
>>> [hidden email]
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>>
>>
>>
>>
>> __________ Information from ESET NOD32 Antivirus, version of virus
>> signature database 5129 (20100519) __________
>>
>> The message was checked by ESET NOD32 Antivirus.
>>
>> http://www.eset.com
>>
>>
>>
>> _______________________________________________
>> Gmod-schema mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>
>>
>>
>> __________ Information from ESET NOD32 Antivirus, version of virus
>> signature database 5129 (20100519) __________
>>
>> The message was checked by ESET NOD32 Antivirus.
>>
>> http://www.eset.com
>>
>>
>
>


------------------------------------------------------------------------------


_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Natural diversity module and phenotype cvterm values

Sook Jung
Hi,

I think Naama's suggestion will work well but just wanted to bring this up.

Same phenotype could be measured using several different instruments and reagents, depending on breeders, which will be stored in nd_protocol and linked to a specific phenotypic result recorded in phenotype table through nd_assay_phenotype table.

Should the scales better be separated out from the protocol and stored in cvtermprop or could it be stored in nd_protocolprop? What do you think would be better? Would it be better to store in cvtermprop than in nd_protocolprop for easier query?

If we store in nd_protocolprop, it would be like this..

Let's say fruit shape is described by one group of researchers as
1=round
2=square
3=triangular

you'd assign the phenotype as
observable_id = fruit_id
attr_id = fruit_shape_id
value = square (like Naama suggested)


nd_protocol
nd_protocol.name = texasa_protocol (or fruit_shape_protocol_1)

 nd_protocolprop

cvterm_id = scale_id
value = square
rank = 2


Sook

------------------------------------------------------------------------------


_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Natural diversity module and phenotype cvterm values

Pantelis Topalis
In reply to this post by Naama Menda
Hi all,

The example below with the texas_fruit_shape and london_fruit_shape seems problematic in my opinion. Different shaped fruits are entities that do exist whereas "texas" or "london fruit shape" are informational artifacts according to Basic Formal Ontology (BFO) which is the standard accepted by Obofoundry as the top level ontology.

What I am suggesting is to face the problem at the level of the ontology having cvterms like

fruit shape
	round fruit (shape)
	elongated fruit (shape)
	square fruit (shape)
	triangular fruit (shape)
	cubical fruit (shape) as children of the fruit shape 
	....


Therefore we will be able to populate the phenotype module as follows
observable_id = fruit_id
attr_id = fruit_shape_id
cvalue_id = round_fruit_id
assay_id = the assay_id in nd_assay/nd_assay_phenotype table to link all together.

And if it is important to maintain the information about geolocations we use the appropriate tables in nd module.
Cheers,
Pantelis


On 20/5/2010 6:43 μμ, Naama Menda wrote:
answering inline bellow:


On Thu, May 20, 2010 at 10:48 AM, Bob MacCallum <[hidden email]> wrote:
So let's see if I understand this:

Let's say fruit shape is described by one group of researchers as
1=round
2=square
3=triangular

you'd assign the phenotype as
observable_id = fruit_id
attr_id = fruit_shape_id
value = 2

cvtermprop:
cvterm_id = fruit_shape_id
type_id = enumeration_id ?
value = square
rank = 2

and the ranks would be used to translate from value (2) to display
text ("square").

That would seem to work.

yes, but I think the 'value' should be 'round' with rank= 2 , and the loading script would pick the term 'round' from cvtermprop by rank.


If there are two different numbering systems I guess you have to
create different cvterms for them

This I'm not so sure about .
Since all phenotypes refer to the same attribute (fruit shape) I think it's better to have multiple scales fro the same cvterm. When loading new data , the script will have to know which scale is used for fruit shape, and look it up in cvtermprop (type_id = 'texas scale' )
Β 
texas_fruit_shape
1=round
2=square...

london_fruit_shape
1=square
2=triangular...

could they both have the same cvtermsynonym:
value = "fruit shape"
type_id = "phenotype display name"


On the other hand, translating these shorthand codes (for that is what
it seems they are) at data-entry time into ontology terms, as I think
Pantelis suggested, might make for a more usable database. Β  It
doesn't sound like a great idea to have people searching public
databases with their own local shorthand codes (unless these were
synonyms to "proper" ontology terms).


Yes, we are trying to map each breeder-specificΒ  terms to existing ones, and adding synonyms whenever possible.


-Naama




On Thu, May 20, 2010 at 2:44 PM, Naama Menda <[hidden email]> wrote:
> hi Pantelis
>
> yes, we are developing our own ontology (Solanaceae Phenotype
> http://solgenomics.net/chado/cvterm.pl?action=view&cvterm_id=23057)
>
> which is mapped to PO and PATO whenever applicable.
>
> Β My question is how to handle 'value' terms, such as color names and shape
> names.
> One way is to have all possible fruit shapes as children of the term 'fruit
> shape' and the other is to store those as properties of 'fruit shape'.
>
> The other issue is the numeric categories. I have phenotyping files filled
> with numbers, which refer to a quality or a value. Breeders like to assign
> numbers instead of writing the actual value. It is easier to work this way
> in the field, and there are less errors, but we have to store these scales
> as properties of the relevant cvterm.
>
> thanks!
> -Naama
>
>
> 2010/5/20 Pantelis Topalis <[hidden email]>
>>
>> Β Β Β  Hi Naama and Sook,
>>
>> As an ontology developer for AnoBase I have several concerns relatively to
>> the use of prop types for a cvterm. The main one is the mixing
>> of ontologies which are very flexible and dynamic with a form of a
>> classification scheme which breeders love to have
>> "Many breeders love to define a trait ('fruit color' ) and assign the
>> values as categories (1=round, 2 = elongated, etc.)."
>> Sometimes it is difficult to persuade them all to use the very same scheme
>> or to make changes and someone needs to have in mind the "code used".
>> Otherwise a "parser" is needed to convert between the different schemes used
>> by different groups.
>>
>> On the other hand PATO is a generic ontology of qualities which does not
>> always cover the data that need annotation. Ideally, a
>> Potato trait ontology is needed here using both anatomical terms and PATO
>> entities to bridge the gap. There is already a similar ontology for cereal
>> plant traits (http://bioportal.bioontology.org/visualize/42824) and for our
>> (VectorBase) purposes there will be another one.
>>
>> Ontology development is a lengthy process, so in the mean time I would
>> suggest to you what we are doing in VectorBase (following FlyBase's
>> paradigm). We are creating and maintaining an unstructured controlled
>> vocabulary with all the terms needed for the database but they are not
>> covered presently in any existing ontology. Those terms will be replaced by
>> their "proper ontology" counterparts when those will be available.
>>
>> Greetings,
>> Pantelis Topalis
>> Ontology Developer
>> VectorBase @ IMBB
>>
>> On 19/5/2010 9:51 ΞΌΞΌ, Sook Jung wrote:
>>
>> Hi Naama,
>>
>> That was one of the questions that I wrote in the Use Case site for the
>> natural diversity module..
>>
>> I prefer the second option - multiple prop types for the same cvterm. When
>> users browse the phenotype CV terms, it would be nice to list one cvterm
>> (fruit color) instead of multiples (fruit color 1-3, fruit color 1-5, etc)
>> for the same phenotype.
>>
>> But we will still be able to make users to search specifically. For
>> example, users could search for apple trees with fruit color over 3 when the
>> fruit color is measured 1 to 3.
>>
>> Thanks
>>
>> Sook
>>
>>>
>>> Now I'm running into traits with multiple scales. One group uses for
>>> trait x scale y, and the other uses for the same trait x a different scale
>>> z.
>>> Would it make more sense to store 1 scale for each cvterm (and with each
>>> new scale add a new cvterm sibling) or add multiple prop types pointing to
>>> the same cvterm? (prop type 1 = scale y, prop type 2 = scale z) ?
>>>
>>> thanks!
>>> -Naama
>>>
>>>
>>>
>>> Naama Menda
>>> Boyce Thompson Institute for Plant Research
>>> Tower Rd
>>> Ithaca NY 14853
>>> USA
>>>
>>> (607) 254 3569
>>> Sol Genomics Network
>>> http://solgenomics.net/
>>> [hidden email]
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>>
>>
>>
>>
>> __________ Information from ESET NOD32 Antivirus, version of virus
>> signature database 5129 (20100519) __________
>>
>> The message was checked by ESET NOD32 Antivirus.
>>
>> http://www.eset.com
>>
>>
>>
>> _______________________________________________
>> Gmod-schema mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>
>>
>>
>> __________ Information from ESET NOD32 Antivirus, version of virus
>> signature database 5129 (20100519) __________
>>
>> The message was checked by ESET NOD32 Antivirus.
>>
>> http://www.eset.com
>>
>>
>
>



------------------------------------------------------------------------------


_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Natural diversity module and phenotype cvterm values

Naama Menda
right, this is how existing terms in our phenotype ontology look  like , and we'll continue with this scheme for the potato terms (tuber shape, and the values will be child terms)

There is also mapping of those terms to PO and PATO, whenever possible.

Pankaj, can I send you a list of terms for the tuber node in PO? We need both anatomy and developmental stages terms.

thanks!
-Naama



2010/5/21 Pantelis Topalis <[hidden email]>
Hi all,

The example below with the texas_fruit_shape and london_fruit_shape seems problematic in my opinion. Different shaped fruits are entities that do exist whereas "texas" or "london fruit shape" are informational artifacts according to Basic Formal Ontology (BFO) which is the standard accepted by Obofoundry as the top level ontology.

What I am suggesting is to face the problem at the level of the ontology having cvterms like

fruit shape
	round fruit (shape)
	elongated fruit (shape)
	square fruit (shape)
	triangular fruit (shape)
	cubical fruit (shape) as children of the fruit shape 
	....


Therefore we will be able to populate the phenotype module as follows
observable_id = fruit_id
attr_id = fruit_shape_id
cvalue_id = round_fruit_id
assay_id = the assay_id in nd_assay/nd_assay_phenotype table to link all together.

And if it is important to maintain the information about geolocations we use the appropriate tables in nd module.
Cheers,
Pantelis



On 20/5/2010 6:43 μμ, Naama Menda wrote:
answering inline bellow:


On Thu, May 20, 2010 at 10:48 AM, Bob MacCallum <[hidden email]> wrote:
So let's see if I understand this:

Let's say fruit shape is described by one group of researchers as
1=round
2=square
3=triangular

you'd assign the phenotype as
observable_id = fruit_id
attr_id = fruit_shape_id
value = 2

cvtermprop:
cvterm_id = fruit_shape_id
type_id = enumeration_id ?
value = square
rank = 2

and the ranks would be used to translate from value (2) to display
text ("square").

That would seem to work.

yes, but I think the 'value' should be 'round' with rank= 2 , and the loading script would pick the term 'round' from cvtermprop by rank.


If there are two different numbering systems I guess you have to
create different cvterms for them

This I'm not so sure about .
Since all phenotypes refer to the same attribute (fruit shape) I think it's better to have multiple scales fro the same cvterm. When loading new data , the script will have to know which scale is used for fruit shape, and look it up in cvtermprop (type_id = 'texas scale' )
Β 
texas_fruit_shape
1=round
2=square...

london_fruit_shape
1=square
2=triangular...

could they both have the same cvtermsynonym:
value = "fruit shape"
type_id = "phenotype display name"


On the other hand, translating these shorthand codes (for that is what
it seems they are) at data-entry time into ontology terms, as I think
Pantelis suggested, might make for a more usable database. Β  It

doesn't sound like a great idea to have people searching public
databases with their own local shorthand codes (unless these were
synonyms to "proper" ontology terms).


Yes, we are trying to map each breeder-specificΒ  terms to existing ones, and adding synonyms whenever possible.


-Naama




On Thu, May 20, 2010 at 2:44 PM, Naama Menda <[hidden email]> wrote:
> hi Pantelis
>
> yes, we are developing our own ontology (Solanaceae Phenotype
> http://solgenomics.net/chado/cvterm.pl?action=view&cvterm_id=23057)
>
> which is mapped to PO and PATO whenever applicable.
>
> Β My question is how to handle 'value' terms, such as color names and shape

> names.
> One way is to have all possible fruit shapes as children of the term 'fruit
> shape' and the other is to store those as properties of 'fruit shape'.
>
> The other issue is the numeric categories. I have phenotyping files filled
> with numbers, which refer to a quality or a value. Breeders like to assign
> numbers instead of writing the actual value. It is easier to work this way
> in the field, and there are less errors, but we have to store these scales
> as properties of the relevant cvterm.
>
> thanks!
> -Naama
>
>
> 2010/5/20 Pantelis Topalis <[hidden email]>
>>
>> Β Β Β  Hi Naama and Sook,

>>
>> As an ontology developer for AnoBase I have several concerns relatively to
>> the use of prop types for a cvterm. The main one is the mixing
>> of ontologies which are very flexible and dynamic with a form of a
>> classification scheme which breeders love to have
>> "Many breeders love to define a trait ('fruit color' ) and assign the
>> values as categories (1=round, 2 = elongated, etc.)."
>> Sometimes it is difficult to persuade them all to use the very same scheme
>> or to make changes and someone needs to have in mind the "code used".
>> Otherwise a "parser" is needed to convert between the different schemes used
>> by different groups.
>>
>> On the other hand PATO is a generic ontology of qualities which does not
>> always cover the data that need annotation. Ideally, a
>> Potato trait ontology is needed here using both anatomical terms and PATO
>> entities to bridge the gap. There is already a similar ontology for cereal
>> plant traits (http://bioportal.bioontology.org/visualize/42824) and for our
>> (VectorBase) purposes there will be another one.
>>
>> Ontology development is a lengthy process, so in the mean time I would
>> suggest to you what we are doing in VectorBase (following FlyBase's
>> paradigm). We are creating and maintaining an unstructured controlled
>> vocabulary with all the terms needed for the database but they are not
>> covered presently in any existing ontology. Those terms will be replaced by
>> their "proper ontology" counterparts when those will be available.
>>
>> Greetings,
>> Pantelis Topalis
>> Ontology Developer
>> VectorBase @ IMBB
>>
>> On 19/5/2010 9:51 ΞΌΞΌ, Sook Jung wrote:
>>
>> Hi Naama,
>>
>> That was one of the questions that I wrote in the Use Case site for the
>> natural diversity module..
>>
>> I prefer the second option - multiple prop types for the same cvterm. When
>> users browse the phenotype CV terms, it would be nice to list one cvterm
>> (fruit color) instead of multiples (fruit color 1-3, fruit color 1-5, etc)
>> for the same phenotype.
>>
>> But we will still be able to make users to search specifically. For
>> example, users could search for apple trees with fruit color over 3 when the
>> fruit color is measured 1 to 3.
>>
>> Thanks
>>
>> Sook
>>
>>>
>>> Now I'm running into traits with multiple scales. One group uses for
>>> trait x scale y, and the other uses for the same trait x a different scale
>>> z.
>>> Would it make more sense to store 1 scale for each cvterm (and with each
>>> new scale add a new cvterm sibling) or add multiple prop types pointing to
>>> the same cvterm? (prop type 1 = scale y, prop type 2 = scale z) ?
>>>
>>> thanks!
>>> -Naama
>>>
>>>
>>>
>>> Naama Menda
>>> Boyce Thompson Institute for Plant Research
>>> Tower Rd
>>> Ithaca NY 14853
>>> USA
>>>
>>> (607) 254 3569
>>> Sol Genomics Network
>>> http://solgenomics.net/
>>> [hidden email]
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>>
>>
>>
>>
>> __________ Information from ESET NOD32 Antivirus, version of virus
>> signature database 5129 (20100519) __________
>>
>> The message was checked by ESET NOD32 Antivirus.
>>
>> http://www.eset.com
>>
>>
>>
>> _______________________________________________
>> Gmod-schema mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>
>>
>>
>> __________ Information from ESET NOD32 Antivirus, version of virus
>> signature database 5129 (20100519) __________
>>
>> The message was checked by ESET NOD32 Antivirus.
>>
>> http://www.eset.com
>>
>>
>
>




------------------------------------------------------------------------------


_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Natural diversity module and phenotype cvterm values

Sook Jung
In reply to this post by Pantelis Topalis
Hi Pantelis,
It works for the example below but how about a case like this - when one breeder uses a scale from 1 to 5 to record fruit size and the other uses one from 1 to 7. Then we have to use their scales to store the phenotype data.. So where does the scale definition goes - I was wondering cvtermprop or nd_assayprop.
Cheers
Sook

2010/5/21 Pantelis Topalis <[hidden email]>
Hi all,

The example below with the texas_fruit_shape and london_fruit_shape seems problematic in my opinion. Different shaped fruits are entities that do exist whereas "texas" or "london fruit shape" are informational artifacts according to Basic Formal Ontology (BFO) which is the standard accepted by Obofoundry as the top level ontology.

What I am suggesting is to face the problem at the level of the ontology having cvterms like

fruit shape
	round fruit (shape)
	elongated fruit (shape)
	square fruit (shape)
	triangular fruit (shape)
	cubical fruit (shape) as children of the fruit shape 
	....


Therefore we will be able to populate the phenotype module as follows
observable_id = fruit_id
attr_id = fruit_shape_id
cvalue_id = round_fruit_id
assay_id = the assay_id in nd_assay/nd_assay_phenotype table to link all together.

And if it is important to maintain the information about geolocations we use the appropriate tables in nd module.

------------------------------------------------------------------------------


_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Natural diversity module and phenotype cvterm values

Naama Menda
hi Sook,

I think your loading script should handle the scale-to-cvterm mapping (you'll have to pre-load all the relevant cvterms for fruit shape)
Whether the scale should be stored in the database is another question .
If you choose to do so, I guess both cvtermprop and nd_assayprop are good.
Maybe others have comments on which table is better suited for storing cvterm scales.

-Naama



On Fri, May 21, 2010 at 10:13 AM, Sook Jung <[hidden email]> wrote:
Hi Pantelis,
It works for the example below but how about a case like this - when one breeder uses a scale from 1 to 5 to record fruit size and the other uses one from 1 to 7. Then we have to use their scales to store the phenotype data.. So where does the scale definition goes - I was wondering cvtermprop or nd_assayprop.
Cheers
Sook

2010/5/21 Pantelis Topalis <[hidden email]>

Hi all,

The example below with the texas_fruit_shape and london_fruit_shape seems problematic in my opinion. Different shaped fruits are entities that do exist whereas "texas" or "london fruit shape" are informational artifacts according to Basic Formal Ontology (BFO) which is the standard accepted by Obofoundry as the top level ontology.

What I am suggesting is to face the problem at the level of the ontology having cvterms like

fruit shape
	round fruit (shape)
	elongated fruit (shape)
	square fruit (shape)
	triangular fruit (shape)
	cubical fruit (shape) as children of the fruit shape 
	....


Therefore we will be able to populate the phenotype module as follows
observable_id = fruit_id
attr_id = fruit_shape_id
cvalue_id = round_fruit_id
assay_id = the assay_id in nd_assay/nd_assay_phenotype table to link all together.

And if it is important to maintain the information about geolocations we use the appropriate tables in nd module.

------------------------------------------------------------------------------


_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema



------------------------------------------------------------------------------


_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Natural diversity module and phenotype cvterm values

Dave Clements
Hello all,

First a basic point/questions about PATO.  PATO uses an entity-quality model.  I think we've been confusing what a quality is.  As I understand it, in a standard PATO implementation, our example would be

  Entity: fruit (from the potato anatomy ontology, or whatever we are using)
  Quality: square

That "square" is a "shape" is not explicitly stated.  It's encoded in PATO because "shape" would be the parent of of "square".  We've been discussing terms like "fruit shape" as if they were atomic terms and, in PATO anyway, they are not.

Am I missing something?  Pantelis, am I merely reiterating a point you already made?

I'd also like to suggest that when a needed term is not in PATO, we work with the PATO people to get it added or to identify the existing term.


I now want to go back to Naama's two original questions

We want the best of both worlds here.  We want to store phenotypes in ways that the data is computationally queryable and that can be integrated across projects (i.e., using standard ontologies); and we want to allow people who are used to breeder-specific codings to be able to see and query on them.

I agree that when the data is loaded we have to normalize it into common terms and scales.  For shape, PATO will probably have most/all terms we need.  For "appearance" which is a linear scale, we will have to normalize breeder's ratings to a single 0 to 1 (or 0-10 or 0-100 or ...) scale.  We should still be able to use the PATO model to encode the fact that this number we are storing is about "tuber" and "appearance".

I'm still thinking about how to store the breeder's original encoding.  I do think it would be desirable to store the mapping somewhere in the database, as this would facilitate querying and help people understand how the normalized values were calculated.

Hope this helps.

Dave C.


On Fri, May 21, 2010 at 7:47 AM, Naama Menda <[hidden email]> wrote:
hi Sook,

I think your loading script should handle the scale-to-cvterm mapping (you'll have to pre-load all the relevant cvterms for fruit shape)
Whether the scale should be stored in the database is another question .
If you choose to do so, I guess both cvtermprop and nd_assayprop are good.
Maybe others have comments on which table is better suited for storing cvterm scales.

-Naama



On Fri, May 21, 2010 at 10:13 AM, Sook Jung <[hidden email]> wrote:
Hi Pantelis,
It works for the example below but how about a case like this - when one breeder uses a scale from 1 to 5 to record fruit size and the other uses one from 1 to 7. Then we have to use their scales to store the phenotype data.. So where does the scale definition goes - I was wondering cvtermprop or nd_assayprop.
Cheers
Sook

2010/5/21 Pantelis Topalis <[hidden email]>

Hi all,

The example below with the texas_fruit_shape and london_fruit_shape seems problematic in my opinion. Different shaped fruits are entities that do exist whereas "texas" or "london fruit shape" are informational artifacts according to Basic Formal Ontology (BFO) which is the standard accepted by Obofoundry as the top level ontology.

What I am suggesting is to face the problem at the level of the ontology having cvterms like

fruit shape
	round fruit (shape)
	elongated fruit (shape)
	square fruit (shape)
	triangular fruit (shape)
	cubical fruit (shape) as children of the fruit shape 
	....


Therefore we will be able to populate the phenotype module as follows
observable_id = fruit_id
attr_id = fruit_shape_id
cvalue_id = round_fruit_id
assay_id = the assay_id in nd_assay/nd_assay_phenotype table to link all together.

And if it is important to maintain the information about geolocations we use the appropriate tables in nd module.

------------------------------------------------------------------------------



_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema



------------------------------------------------------------------------------


_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema




--
http://gmod.org/wiki/GMOD_News
http://gmod.org/wiki/Calendar
http://gmod.org/wiki/Help_Desk_Feedback

------------------------------------------------------------------------------


_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Natural diversity module and phenotype cvterm values

Pantelis Topalis
Comments in-line:

On 21/5/2010 8:40 μμ, Dave Clements wrote:
Hello all,

First a basic point/questions about PATO.  PATO uses an entity-quality model.  I think we've been confusing what a quality is.  As I understand it, in a standard PATO implementation, our example would be

  Entity: fruit (from the potato anatomy ontology, or whatever we are using)
  Quality: square

That "square" is a "shape" is not explicitly stated.  It's encoded in PATO because "shape" would be the parent of of "square".  We've been discussing terms like "fruit shape" as if they were atomic terms and, in PATO anyway, they are not.

Am I missing something?  Pantelis, am I merely reiterating a point you already made?
I'd also like to suggest that when a needed term is not in PATO, we work with the PATO people to get it added or to identify the existing term.


This is my point too and it is clear to me that a "trait" ontology like the one Naama says it is under construction, is needed.


I now want to go back to Naama's two original questions

We want the best of both worlds here.  We want to store phenotypes in ways that the data is computationally queryable and that can be integrated across projects (i.e., using standard ontologies); and we want to allow people who are used to breeder-specific codings to be able to see and query on them.

I agree that when the data is loaded we have to normalize it into common terms and scales.  For shape, PATO will probably have most/all terms we need.  For "appearance" which is a linear scale, we will have to normalize breeder's ratings to a single 0 to 1 (or 0-10 or 0-100 or ...) scale.  We should still be able to use the PATO model to encode the fact that this number we are storing is about "tuber" and "appearance".

I'm still thinking about how to store the breeder's original encoding.  I do think it would be desirable to store the mapping somewhere in the database, as this would facilitate querying and help people understand how the normalized values were calculated.

My thoughts exactly! I want simply to add that if normalization transforms breeder's arbitralily chosen scale into a physical (measurable) one (e.g. size '1' = 1-2 inches, size '2' = 2-3 inches etc) then those values can be stored as such in cvtermprop and have the same meaning for all.
I believe this is simpler and can lead to more efficient queries, but again I don't know the needs and the habits of the breeder community.

Cheers,
Pantelis

Hope this helps.

Dave C.


On Fri, May 21, 2010 at 7:47 AM, Naama Menda <[hidden email]> wrote:
hi Sook,

I think your loading script should handle the scale-to-cvterm mapping (you'll have to pre-load all the relevant cvterms for fruit shape)
Whether the scale should be stored in the database is another question .
If you choose to do so, I guess both cvtermprop and nd_assayprop are good.
Maybe others have comments on which table is better suited for storing cvterm scales.

-Naama



On Fri, May 21, 2010 at 10:13 AM, Sook Jung <[hidden email]> wrote:
Hi Pantelis,
It works for the example below but how about a case like this - when one breeder uses a scale from 1 to 5 to record fruit size and the other uses one from 1 to 7. Then we have to use their scales to store the phenotype data.. So where does the scale definition goes - I was wondering cvtermprop or nd_assayprop.
Cheers
Sook

2010/5/21 Pantelis Topalis <[hidden email]>

Hi all,

The example below with the texas_fruit_shape and london_fruit_shape seems problematic in my opinion. Different shaped fruits are entities that do exist whereas "texas" or "london fruit shape" are informational artifacts according to Basic Formal Ontology (BFO) which is the standard accepted by Obofoundry as the top level ontology.

What I am suggesting is to face the problem at the level of the ontology having cvterms like

fruit shape
	round fruit (shape)
	elongated fruit (shape)
	square fruit (shape)
	triangular fruit (shape)
	cubical fruit (shape) as children of the fruit shape 
	....


Therefore we will be able to populate the phenotype module as follows
observable_id = fruit_id
attr_id = fruit_shape_id
cvalue_id = round_fruit_id
assay_id = the assay_id in nd_assay/nd_assay_phenotype table to link all together.

And if it is important to maintain the information about geolocations we use the appropriate tables in nd module.
        

------------------------------------------------------------------------------



_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema



------------------------------------------------------------------------------


_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema




--
http://gmod.org/wiki/GMOD_News
http://gmod.org/wiki/Calendar
http://gmod.org/wiki/Help_Desk_Feedback
------------------------------------------------------------------------------ __________ Information from ESET NOD32 Antivirus, version of virus signature database 5136 (20100521) __________ The message was checked by ESET NOD32 Antivirus. http://www.eset.com
_______________________________________________ Gmod-schema mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/gmod-schema __________ Information from ESET NOD32 Antivirus, version of virus signature database 5136 (20100521) __________ The message was checked by ESET NOD32 Antivirus. http://www.eset.com


------------------------------------------------------------------------------


_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Natural diversity module and phenotype cvterm values

Suzanna Lewis-3
In reply to this post by Dave Clements

On May 21, 2010, at 10:40 AM, Dave Clements wrote:

> Hello all,
>
> First a basic point/questions about PATO.  PATO uses an entity-quality model.  I think we've been confusing what a quality is.  As I understand it, in a standard PATO implementation, our example would be
>
>   Entity: fruit (from the potato anatomy ontology, or whatever we are using)
>   Quality: square
>
> That "square" is a "shape" is not explicitly stated.  
> It's encoded in PATO because "shape" would be the parent of of "square".  We've been discussing terms like "fruit shape" as if they were atomic terms and, in PATO anyway, they are not.

Correct, the entity observed is "fruit" and the "trait"/"quality" is shape.

>
> Am I missing something?  Pantelis, am I merely reiterating a point you already made?
>
> I'd also like to suggest that when a needed term is not in PATO, we work with the PATO people to get it added or to identify the existing term.

That would be wonderful and greatly appreciated.

>
>
> I now want to go back to Naama's two original questions
>
> We want the best of both worlds here.  We want to store phenotypes in ways that the data is computationally queryable and that can be integrated across projects (i.e., using standard ontologies); and we want to allow people who are used to breeder-specific codings to be able to see and query on them.
>
> I agree that when the data is loaded we have to normalize it into common terms and scales.  For shape, PATO will probably have most/all terms we need.  For "appearance" which is a linear scale, we will have to normalize breeder's ratings to a single 0 to 1 (or 0-10 or 0-100 or ...) scale.  We should still be able to use the PATO model to encode the fact that this number we are storing is about "tuber" and "appearance".
>
> I'm still thinking about how to store the breeder's original encoding.  I do think it would be desirable to store the mapping somewhere in the database, as this would facilitate querying and help people understand how the normalized values were calculated.
>
> Hope this helps.
>
> Dave C.
>
>
> On Fri, May 21, 2010 at 7:47 AM, Naama Menda <[hidden email]> wrote:
> hi Sook,
>
> I think your loading script should handle the scale-to-cvterm mapping (you'll have to pre-load all the relevant cvterms for fruit shape)
> Whether the scale should be stored in the database is another question .
> If you choose to do so, I guess both cvtermprop and nd_assayprop are good.
> Maybe others have comments on which table is better suited for storing cvterm scales.
>
> -Naama
>
>
>
> On Fri, May 21, 2010 at 10:13 AM, Sook Jung <[hidden email]> wrote:
> Hi Pantelis,
> It works for the example below but how about a case like this - when one breeder uses a scale from 1 to 5 to record fruit size and the other uses one from 1 to 7. Then we have to use their scales to store the phenotype data.. So where does the scale definition goes - I was wondering cvtermprop or nd_assayprop.
> Cheers
> Sook
>
> 2010/5/21 Pantelis Topalis <[hidden email]>
>
> Hi all,
>
> The example below with the texas_fruit_shape and london_fruit_shape seems problematic in my opinion. Different shaped fruits are entities that do exist whereas "texas" or "london fruit shape" are informational artifacts according to Basic Formal Ontology (BFO) which is the standard accepted by Obofoundry as the top level ontology.
>
> What I am suggesting is to face the problem at the level of the ontology having cvterms like
>
> fruit shape
> round fruit (shape)
> elongated fruit (shape)
> square fruit (shape)
> triangular fruit (shape)
> cubical fruit (shape) as children of the fruit shape
> ....
>
>
> Therefore we will be able to populate the phenotype module as follows
> observable_id = fruit_id
> attr_id = fruit_shape_id
> cvalue_id = round_fruit_id
> assay_id = the assay_id in nd_assay/nd_assay_phenotype table to link all together.
>
> And if it is important to maintain the information about geolocations we use the appropriate tables in nd module.
>
>
> ------------------------------------------------------------------------------
>
>
>
> _______________________________________________
> Gmod-schema mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>
>
>
> ------------------------------------------------------------------------------
>
>
> _______________________________________________
> Gmod-schema mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>
>
>
>
> --
> http://gmod.org/wiki/GMOD_News
> http://gmod.org/wiki/Calendar
> http://gmod.org/wiki/Help_Desk_Feedback
> ------------------------------------------------------------------------------
>
> _______________________________________________
> Gmod-schema mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gmod-schema


------------------------------------------------------------------------------

_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema