proposal for changes to the phenotype module

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

proposal for changes to the phenotype module

Naama Menda
We have had several discussions about modifying the phenotype module.
 
our main goals are 
1. the break the link between phenotype value and trait(s)
2. allow post-composing of terms using multiple ontologies, defining relationships between cvterms that describe a phenotype.

Here is what Valentin, Mathieu, and I came up with today. We think this table would cover the use cases we've been looking at, including storing complex EQ statements (in this case the cvterm will be anything we use from PO, GO, Rel. ontology , PATO, CHEBI. The type and relationship between the terms will be defined in the type_id column, e.g. 'primary entity 1' , 'primary relationship', 'primary entity 2', 'quality' , 'secondary entity 1' etc. ) .


The current phenotype table remains unchanged for backward compatibility.


-----------------------------

COMMENT ON TABLE phenotype IS 'Columns
observable_id,
attr_id,
cvalue_id,
assay_id
are deprecated to break the connection between the phenotype value and the trait.
The phenotype table should be used to store only the phenotype value. Use tables phenotype_cvterm and/or phenotype_cvtermrelationship to store the trait(s) associated with the phenotype. This allows a more abstract  way of post-composing cvterms and storing EQ statements.';


CREATE TABLE phenotype_cvtermrelationship (
       phenotype_cvtermrelationship_id serial PRIMARY KEY,
       phenotype_id integer NOT NULL REFERENCES phenotype(phenotype_id),
       cvterm_id integer NOT NULL REFERENCES cvterm(cvterm_id),
       type_id integer NOT NULL REFERENCES cvterm(cvterm_id),
       phenotypegroup integer DEFAULT NULL,
       rank integer DEFAULT 0 );

ALTER TABLE phenotype_cvtermrelationship ADD unique (phenotype_id, cvterm_id, type_id, phenotypegroup,rank);

-------------------------

The 'phenotypegroup' column is there to indicate which rows together create a single phenotype statement. 
Rank column is for cases when you need to store the same cvterm_ids + type_id more than once.


This design allows storing EQ statements that have multiple ontology terms of different types associated with each phenotype statement.
e.g.
the a gene with a phenotype "chlorophyll degradation inhibited in the fruit" is annotated with 
GO term of type "primary entity 1":  "chlorophyll catabolic process" , 
relationship between entity 1 and entity 2:  "occurs_in"
"primary entity 2":  the PO term for "fruit"
Quality: the PATO term "decreased rate" 

So basically we are post-composing a phenotype statement using PO, GO, PATO terms , with each one having a type.
Until now people have used phenotypeprop, but this is not a very elegant solution, and it's hard to group this way multiple cvterms that make a single statement or post-composed description.

We used the EQ statement case as the most complex example for using phenotype_cvtermrelationship, but this design works well for simple pre-composed terms as well.


-Naama



Naama Menda
Boyce Thompson Institute for Plant Research
Tower Rd
Ithaca NY 14853
USA

(607) 254 3569
Sol Genomics Network
http://solgenomics.net/
[hidden email]


------------------------------------------------------------------------------
Open source business process management suite built on Java and Eclipse
Turn processes into business applications with Bonita BPM Community Edition
Quickly connect people, data, and systems into organized workflows
Winner of BOSSIE, CODIE, OW2 and Gartner awards
http://p.sf.net/sfu/Bonitasoft
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: proposal for changes to the phenotype module

Cannon, Ethalinda K [COM S]

Hello Naama,

 

I spent some time a few weeks back trying to map a complex EQ statement onto the proposed schema, then work duties prevented me from finishing the task. But I would like to respond to this proposal, so I am inclosing my incomplete efforts in hopes that there is some value in them.

 

I have been chiefly concerned with how to store a post-composed statement consisting of compound terms (e.g. primary entity1 - relationship - primary entity2). I think the phenotype_cvtermrelationship.phenotypegroup field can make this work.

 

At this point, I have no objection to the proposed schema. It would be a good exercise to load one of the EQ spread sheets into Chado, then extract it into the same format. My thinking with this is to not only verify that the data can be stored, but also that it can be extracted in a format that Anika's phenotype similarity tool can use. If I can carve out time to do this, I'm happy to volunteer, but also happy to have someone else do this.

 

Ethy

 


From: Naama Menda <[hidden email]>
Sent: Tuesday, July 08, 2014 10:41 AM
To: gmod schema
Cc: Rouard, Mathieu (Bioversity-France); Laurel Cooper; marie-angelique Laporte; Graham McLaren; Guignon, Valentin (Bioversity-France)
Subject: [Gmod-schema] proposal for changes to the phenotype module
 
We have had several discussions about modifying the phenotype module.
 
our main goals are 
1. the break the link between phenotype value and trait(s)
2. allow post-composing of terms using multiple ontologies, defining relationships between cvterms that describe a phenotype.

Here is what Valentin, Mathieu, and I came up with today. We think this table would cover the use cases we've been looking at, including storing complex EQ statements (in this case the cvterm will be anything we use from PO, GO, Rel. ontology , PATO, CHEBI. The type and relationship between the terms will be defined in the type_id column, e.g. 'primary entity 1' , 'primary relationship', 'primary entity 2', 'quality' , 'secondary entity 1' etc. ) .


The current phenotype table remains unchanged for backward compatibility.


-----------------------------

COMMENT ON TABLE phenotype IS 'Columns
observable_id,
attr_id,
cvalue_id,
assay_id
are deprecated to break the connection between the phenotype value and the trait.
The phenotype table should be used to store only the phenotype value. Use tables phenotype_cvterm and/or phenotype_cvtermrelationship to store the trait(s) associated with the phenotype. This allows a more abstract  way of post-composing cvterms and storing EQ statements.';


CREATE TABLE phenotype_cvtermrelationship (
       phenotype_cvtermrelationship_id serial PRIMARY KEY,
       phenotype_id integer NOT NULL REFERENCES phenotype(phenotype_id),
       cvterm_id integer NOT NULL REFERENCES cvterm(cvterm_id),
       type_id integer NOT NULL REFERENCES cvterm(cvterm_id),
       phenotypegroup integer DEFAULT NULL,
       rank integer DEFAULT 0 );

ALTER TABLE phenotype_cvtermrelationship ADD unique (phenotype_id, cvterm_id, type_id, phenotypegroup,rank);

-------------------------

The 'phenotypegroup' column is there to indicate which rows together create a single phenotype statement. 
Rank column is for cases when you need to store the same cvterm_ids + type_id more than once.


This design allows storing EQ statements that have multiple ontology terms of different types associated with each phenotype statement.
e.g.
the a gene with a phenotype "chlorophyll degradation inhibited in the fruit" is annotated with 
GO term of type "primary entity 1":  "chlorophyll catabolic process" , 
relationship between entity 1 and entity 2:  "occurs_in"
"primary entity 2":  the PO term for "fruit"
Quality: the PATO term "decreased rate" 

So basically we are post-composing a phenotype statement using PO, GO, PATO terms , with each one having a type.
Until now people have used phenotypeprop, but this is not a very elegant solution, and it's hard to group this way multiple cvterms that make a single statement or post-composed description.

We used the EQ statement case as the most complex example for using phenotype_cvtermrelationship, but this design works well for simple pre-composed terms as well.


-Naama



Naama Menda
Boyce Thompson Institute for Plant Research
Tower Rd
Ithaca NY 14853
USA

(607) 254 3569
Sol Genomics Network
http://solgenomics.net/
[hidden email]


------------------------------------------------------------------------------
Open source business process management suite built on Java and Eclipse
Turn processes into business applications with Bonita BPM Community Edition
Quickly connect people, data, and systems into organized workflows
Winner of BOSSIE, CODIE, OW2 and Gartner awards
http://p.sf.net/sfu/Bonitasoft
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

proposal for changes to the phenotype module

Kucheran, Lacey Sanderson
In reply to this post by Naama Menda
Hi Naama,

I’m a bit confused as to how your example specifically fits into the tables described. Did you mean it to be:

"chlorophyll degradation inhibited in the fruit” is annotated with
GO term of type "primary entity 1":  "chlorophyll catabolic process" ,
relationship between entity 1 and entity 2:  "occurs_in"
"primary entity 2":  the PO term for "fruit"
Quality: the PATO term "decreased rate”

phenotype_cvtermrelationship: Record #1
type_id => primary entity 1
cvterm_id => chlorophyll catabolic process
phenotype_cvtermrelationship: Record #2
type_id => relationship between entity 1 and entity 2
cvterm_id => occurs_in
phenotype_cvtermrelationship: Record #3
type_id => primary entity 2
cvterm_id => fruit
phenotype
cvalue => decreased rate

I think part of the confusion might be in the name. A relationship table in chado should have a subject_id and object_id to relate two objects. In this case you only have a cvterm_id which means really your just adding additional information. Might it be better for this table to follow the regular subject/type/object structure and use the phenotype_cvterm table with an added type_id field?

Using the same example:
phenotype_cvterm: Record #1
type_id => primary entity 1
cvterm_id => chlorophyll catabolic process
phenotype_cvterm: Record #2
type_id => primary entity 2
cvterm_id => fruit
phenotype_cvtermrelationship
subject_id => record #1
type_id => occurs_in
object_id => record #2
phenotype
cvalue => decreased rate

Anyway, if you could tell me exactly what goes into what tables/fields I think that would help immensely with me understanding your proposal so I can see if it works for our data?
Thanks!
~Lacey

------------------------------------------------------
Lacey-Anne Sanderson
Bioinformaticist
Pulse Crop Breeding and Genetics
Phone: (306) 966-4975
Room 3C06 Agriculture
Department of Plant Sciences
University of Saskatchewan

Date: Tue, 8 Jul 2014 11:41:49 -0400
From: Naama Menda <[hidden email]>
Subject: [Gmod-schema] proposal for changes to the phenotype module
To: gmod schema <[hidden email]>
Cc: "Rouard, Mathieu \(Bioversity-France\)" <[hidden email]>,
Laurel Cooper <[hidden email]>, marie-angelique
Laporte <[hidden email]>, Graham McLaren
<[hidden email]>, "Guignon, Valentin \(Bioversity-France\)"
<[hidden email]>
Message-ID:
<[hidden email]>
Content-Type: text/plain; charset="utf-8"

We have had several discussions about modifying the phenotype module.

our main goals are
1. the break the link between phenotype value and trait(s)
2. allow post-composing of terms using multiple ontologies, defining
relationships between cvterms that describe a phenotype.

Here is what Valentin, Mathieu, and I came up with today. We think this
table would cover the use cases we've been looking at, including storing
complex EQ statements (in this case the cvterm will be anything we use from
PO, GO, Rel. ontology , PATO, CHEBI. The type and relationship between the
terms will be defined in the type_id column, e.g. 'primary entity 1' ,
'primary relationship', 'primary entity 2', 'quality' , 'secondary entity
1' etc. ) .


The current phenotype table remains unchanged for backward compatibility.


-----------------------------

COMMENT ON TABLE phenotype IS 'Columns
observable_id,
attr_id,
cvalue_id,
assay_id
are deprecated to break the connection between the phenotype value and the
trait.
The phenotype table should be used to store only the phenotype value. Use
tables phenotype_cvterm and/or phenotype_cvtermrelationship to store the
trait(s) associated with the phenotype. This allows a more abstract  way of
post-composing cvterms and storing EQ statements.';


CREATE TABLE phenotype_cvtermrelationship (
      phenotype_cvtermrelationship_id serial PRIMARY KEY,
      phenotype_id integer NOT NULL REFERENCES phenotype(phenotype_id),
      cvterm_id integer NOT NULL REFERENCES cvterm(cvterm_id),
      type_id integer NOT NULL REFERENCES cvterm(cvterm_id),
      phenotypegroup integer DEFAULT NULL,
      rank integer DEFAULT 0 );

ALTER TABLE phenotype_cvtermrelationship ADD unique (phenotype_id,
cvterm_id, type_id, phenotypegroup,rank);

-------------------------

The 'phenotypegroup' column is there to indicate which rows together create
a single phenotype statement.
Rank column is for cases when you need to store the same cvterm_ids +
type_id more than once.


This design allows storing EQ statements that have multiple ontology terms
of different types associated with each phenotype statement.
e.g.
the a gene with a phenotype "chlorophyll degradation inhibited in the
fruit" is annotated with
GO term of type "primary entity 1":  "chlorophyll catabolic process" ,
relationship between entity 1 and entity 2:  "occurs_in"
"primary entity 2":  the PO term for "fruit"
Quality: the PATO term "decreased rate"

So basically we are post-composing a phenotype statement using PO, GO, PATO
terms , with each one having a type.
Until now people have used phenotypeprop, but this is not a very elegant
solution, and it's hard to group this way multiple cvterms that make a
single statement or post-composed description.

We used the EQ statement case as the most complex example for using
phenotype_cvtermrelationship, but this design works well for simple
pre-composed terms as well.


-Naama



Naama Menda
Boyce Thompson Institute for Plant Research
Tower Rd
Ithaca NY 14853
USA

(607) 254 3569
Sol Genomics Network
http://solgenomics.net/
[hidden email]


------------------------------------------------------------------------------

_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: proposal for changes to the phenotype module

Naama Menda
hi Lacey,

the idea is to have the ability to build a phenotype statement using multiple cvterms . Sometimes these cvterms may have a relationship between them, as with 'primary entity 1' and 'primary entity 2' , and sometimes these have just one cvterm (e.g. quality from PATO, or a developmental stage from PO, environment from EO ,etc. )

We need to have the ability to group these multiple cvterm annotations since they all point to one phenotype ID, and a phenotype ID could have more than one post-composed statement.

the table phenotype_cvtermrelationship could be named differently , maybe phenotype_cvtermgroup or something else.
Anyway, it should not have a subject-object relationship as with the other _relationship tables.

-Naama



On Fri, Jul 11, 2014 at 2:06 PM, Kucheran, Lacey Sanderson <[hidden email]> wrote:
Hi Naama,

I’m a bit confused as to how your example specifically fits into the tables described. Did you mean it to be:

"chlorophyll degradation inhibited in the fruit” is annotated with
GO term of type "primary entity 1":  "chlorophyll catabolic process" ,
relationship between entity 1 and entity 2:  "occurs_in"
"primary entity 2":  the PO term for "fruit"
Quality: the PATO term "decreased rate”

phenotype_cvtermrelationship: Record #1
type_id => primary entity 1
cvterm_id => chlorophyll catabolic process
phenotype_cvtermrelationship: Record #2
type_id => relationship between entity 1 and entity 2
cvterm_id => occurs_in
phenotype_cvtermrelationship: Record #3
type_id => primary entity 2
cvterm_id => fruit
phenotype
cvalue => decreased rate

I think part of the confusion might be in the name. A relationship table in chado should have a subject_id and object_id to relate two objects. In this case you only have a cvterm_id which means really your just adding additional information. Might it be better for this table to follow the regular subject/type/object structure and use the phenotype_cvterm table with an added type_id field?

Using the same example:
phenotype_cvterm: Record #1
type_id => primary entity 1
cvterm_id => chlorophyll catabolic process
phenotype_cvterm: Record #2
type_id => primary entity 2
cvterm_id => fruit
phenotype_cvtermrelationship
subject_id => record #1
type_id => occurs_in
object_id => record #2
phenotype
cvalue => decreased rate

Anyway, if you could tell me exactly what goes into what tables/fields I think that would help immensely with me understanding your proposal so I can see if it works for our data?
Thanks!
~Lacey

------------------------------------------------------
Lacey-Anne Sanderson
Bioinformaticist
Pulse Crop Breeding and Genetics
Phone: <a href="tel:%28306%29%20966-4975" value="+13069664975" target="_blank">(306) 966-4975
Room 3C06 Agriculture
Department of Plant Sciences
University of Saskatchewan

Date: Tue, 8 Jul 2014 11:41:49 -0400
From: Naama Menda <[hidden email]>
Subject: [Gmod-schema] proposal for changes to the phenotype module
To: gmod schema <[hidden email]>
Cc: "Rouard, Mathieu \(Bioversity-France\)" <[hidden email]>,
Laurel Cooper <[hidden email]>, marie-angelique
Laporte <[hidden email]>, Graham McLaren
<[hidden email]>, "Guignon, Valentin \(Bioversity-France\)"
<[hidden email]>
Message-ID:
<[hidden email]>
Content-Type: text/plain; charset="utf-8"

We have had several discussions about modifying the phenotype module.

our main goals are
1. the break the link between phenotype value and trait(s)
2. allow post-composing of terms using multiple ontologies, defining
relationships between cvterms that describe a phenotype.

Here is what Valentin, Mathieu, and I came up with today. We think this
table would cover the use cases we've been looking at, including storing
complex EQ statements (in this case the cvterm will be anything we use from
PO, GO, Rel. ontology , PATO, CHEBI. The type and relationship between the
terms will be defined in the type_id column, e.g. 'primary entity 1' ,
'primary relationship', 'primary entity 2', 'quality' , 'secondary entity
1' etc. ) .


The current phenotype table remains unchanged for backward compatibility.


-----------------------------

COMMENT ON TABLE phenotype IS 'Columns
observable_id,
attr_id,
cvalue_id,
assay_id
are deprecated to break the connection between the phenotype value and the
trait.
The phenotype table should be used to store only the phenotype value. Use
tables phenotype_cvterm and/or phenotype_cvtermrelationship to store the
trait(s) associated with the phenotype. This allows a more abstract  way of
post-composing cvterms and storing EQ statements.';


CREATE TABLE phenotype_cvtermrelationship (
      phenotype_cvtermrelationship_id serial PRIMARY KEY,
      phenotype_id integer NOT NULL REFERENCES phenotype(phenotype_id),
      cvterm_id integer NOT NULL REFERENCES cvterm(cvterm_id),
      type_id integer NOT NULL REFERENCES cvterm(cvterm_id),
      phenotypegroup integer DEFAULT NULL,
      rank integer DEFAULT 0 );

ALTER TABLE phenotype_cvtermrelationship ADD unique (phenotype_id,
cvterm_id, type_id, phenotypegroup,rank);

-------------------------

The 'phenotypegroup' column is there to indicate which rows together create
a single phenotype statement.
Rank column is for cases when you need to store the same cvterm_ids +
type_id more than once.


This design allows storing EQ statements that have multiple ontology terms
of different types associated with each phenotype statement.
e.g.
the a gene with a phenotype "chlorophyll degradation inhibited in the
fruit" is annotated with
GO term of type "primary entity 1":  "chlorophyll catabolic process" ,
relationship between entity 1 and entity 2:  "occurs_in"
"primary entity 2":  the PO term for "fruit"
Quality: the PATO term "decreased rate"

So basically we are post-composing a phenotype statement using PO, GO, PATO
terms , with each one having a type.
Until now people have used phenotypeprop, but this is not a very elegant
solution, and it's hard to group this way multiple cvterms that make a
single statement or post-composed description.

We used the EQ statement case as the most complex example for using
phenotype_cvtermrelationship, but this design works well for simple
pre-composed terms as well.


-Naama



Naama Menda
Boyce Thompson Institute for Plant Research
Tower Rd
Ithaca NY 14853
USA

<a href="tel:%28607%29%20254%203569" value="+16072543569" target="_blank">(607) 254 3569
Sol Genomics Network
http://solgenomics.net/
[hidden email]



------------------------------------------------------------------------------
Want fast and easy access to all the code in your enterprise? Index and
search up to 200,000 lines of code with a free copy of Black Duck
Code Sight - the same software that powers the world's largest code
search on Ohloh, the Black Duck Open Hub! Try it now.
http://p.sf.net/sfu/bds
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: proposal for changes to the phenotype module

Cannon, Ethalinda K [COM S]

Hi Naama,


Lacey and I talked through various use cases at the ICLGG meeting last week and I see some merit to having something like phenotype_cvtermgroup​ (or even just adding a type_id field to the existing table, phenotype_cvterm) and a phenotype_cvtermrelationship table with subject/type/object fields. 


One reason is that for "compound" post-composed statements like the EQ statements, a three part entity could be represented with:


   phenotype_cvtermphenotype_cvtermrelationshipphenotype_cvterm 


which would encode the syntax. If a three part entity is represented with:


  phenotype_cvtermgroupphenotype_cvtermgroupphenotype_cvtermgroup


then I think you would have to know it's an entity in an EQ statement to decode it.


But in the first example, there's still the matter of having to indicate all three are part of a group (ie compound term).


Lacey and I also talked about the possibility of a post-composed statement having multiple phenotype ids (e.g., to hold values for a trait described by the post-composed statement) but hadn't considered a phenotype id having multiple post-composed statements ... unless you mean the various compound parts of an EQ statement?


It's difficult to discuss this in text! Did you look at my picture? Did it help? I could make more pictures.... ;-) We could also try scheduling a call.


Ethy​



From: Naama Menda <[hidden email]>
Sent: Tuesday, July 15, 2014 8:43 AM
To: Kucheran, Lacey Sanderson
Cc: GMOD Schema
Subject: Re: [Gmod-schema] proposal for changes to the phenotype module
 
hi Lacey,

the idea is to have the ability to build a phenotype statement using multiple cvterms . Sometimes these cvterms may have a relationship between them, as with 'primary entity 1' and 'primary entity 2' , and sometimes these have just one cvterm (e.g. quality from PATO, or a developmental stage from PO, environment from EO ,etc. )

We need to have the ability to group these multiple cvterm annotations since they all point to one phenotype ID, and a phenotype ID could have more than one post-composed statement.

the table phenotype_cvtermrelationship could be named differently , maybe phenotype_cvtermgroup or something else.
Anyway, it should not have a subject-object relationship as with the other _relationship tables.

-Naama



On Fri, Jul 11, 2014 at 2:06 PM, Kucheran, Lacey Sanderson <[hidden email]> wrote:
Hi Naama,

I’m a bit confused as to how your example specifically fits into the tables described. Did you mean it to be:

"chlorophyll degradation inhibited in the fruit” is annotated with
GO term of type "primary entity 1":  "chlorophyll catabolic process" ,
relationship between entity 1 and entity 2:  "occurs_in"
"primary entity 2":  the PO term for "fruit"
Quality: the PATO term "decreased rate”

phenotype_cvtermrelationship: Record #1
type_id => primary entity 1
cvterm_id => chlorophyll catabolic process
phenotype_cvtermrelationship: Record #2
type_id => relationship between entity 1 and entity 2
cvterm_id => occurs_in
phenotype_cvtermrelationship: Record #3
type_id => primary entity 2
cvterm_id => fruit
phenotype
cvalue => decreased rate

I think part of the confusion might be in the name. A relationship table in chado should have a subject_id and object_id to relate two objects. In this case you only have a cvterm_id which means really your just adding additional information. Might it be better for this table to follow the regular subject/type/object structure and use the phenotype_cvterm table with an added type_id field?

Using the same example:
phenotype_cvterm: Record #1
type_id => primary entity 1
cvterm_id => chlorophyll catabolic process
phenotype_cvterm: Record #2
type_id => primary entity 2
cvterm_id => fruit
phenotype_cvtermrelationship
subject_id => record #1
type_id => occurs_in
object_id => record #2
phenotype
cvalue => decreased rate

Anyway, if you could tell me exactly what goes into what tables/fields I think that would help immensely with me understanding your proposal so I can see if it works for our data?
Thanks!
~Lacey

------------------------------------------------------
Lacey-Anne Sanderson
Bioinformaticist
Pulse Crop Breeding and Genetics
Phone: <a href="tel:%28306%29%20966-4975" value="&#43;13069664975" target="_blank"> (306) 966-4975
Room 3C06 Agriculture
Department of Plant Sciences
University of Saskatchewan

Date: Tue, 8 Jul 2014 11:41:49 -0400
From: Naama Menda <[hidden email]>
Subject: [Gmod-schema] proposal for changes to the phenotype module
To: gmod schema <[hidden email]>
Cc: "Rouard, Mathieu \(Bioversity-France\)" <[hidden email]>,
Laurel Cooper <[hidden email]>, marie-angelique
Laporte <[hidden email]>, Graham McLaren
<[hidden email]>, "Guignon, Valentin \(Bioversity-France\)"
<[hidden email]>
Message-ID:
<[hidden email]>
Content-Type: text/plain; charset="utf-8"

We have had several discussions about modifying the phenotype module.

our main goals are
1. the break the link between phenotype value and trait(s)
2. allow post-composing of terms using multiple ontologies, defining
relationships between cvterms that describe a phenotype.

Here is what Valentin, Mathieu, and I came up with today. We think this
table would cover the use cases we've been looking at, including storing
complex EQ statements (in this case the cvterm will be anything we use from
PO, GO, Rel. ontology , PATO, CHEBI. The type and relationship between the
terms will be defined in the type_id column, e.g. 'primary entity 1' ,
'primary relationship', 'primary entity 2', 'quality' , 'secondary entity
1' etc. ) .


The current phenotype table remains unchanged for backward compatibility.


-----------------------------

COMMENT ON TABLE phenotype IS 'Columns
observable_id,
attr_id,
cvalue_id,
assay_id
are deprecated to break the connection between the phenotype value and the
trait.
The phenotype table should be used to store only the phenotype value. Use
tables phenotype_cvterm and/or phenotype_cvtermrelationship to store the
trait(s) associated with the phenotype. This allows a more abstract  way of
post-composing cvterms and storing EQ statements.';


CREATE TABLE phenotype_cvtermrelationship (
      phenotype_cvtermrelationship_id serial PRIMARY KEY,
      phenotype_id integer NOT NULL REFERENCES phenotype(phenotype_id),
      cvterm_id integer NOT NULL REFERENCES cvterm(cvterm_id),
      type_id integer NOT NULL REFERENCES cvterm(cvterm_id),
      phenotypegroup integer DEFAULT NULL,
      rank integer DEFAULT 0 );

ALTER TABLE phenotype_cvtermrelationship ADD unique (phenotype_id,
cvterm_id, type_id, phenotypegroup,rank);

-------------------------

The 'phenotypegroup' column is there to indicate which rows together create
a single phenotype statement.
Rank column is for cases when you need to store the same cvterm_ids +
type_id more than once.


This design allows storing EQ statements that have multiple ontology terms
of different types associated with each phenotype statement.
e.g.
the a gene with a phenotype "chlorophyll degradation inhibited in the
fruit" is annotated with
GO term of type "primary entity 1":  "chlorophyll catabolic process" ,
relationship between entity 1 and entity 2:  "occurs_in"
"primary entity 2":  the PO term for "fruit"
Quality: the PATO term "decreased rate"

So basically we are post-composing a phenotype statement using PO, GO, PATO
terms , with each one having a type.
Until now people have used phenotypeprop, but this is not a very elegant
solution, and it's hard to group this way multiple cvterms that make a
single statement or post-composed description.

We used the EQ statement case as the most complex example for using
phenotype_cvtermrelationship, but this design works well for simple
pre-composed terms as well.


-Naama



Naama Menda
Boyce Thompson Institute for Plant Research
Tower Rd
Ithaca NY 14853
USA

<a href="tel:%28607%29%20254%203569" value="&#43;16072543569" target="_blank">(607) 254 3569
Sol Genomics Network
http://solgenomics.net/
[hidden email]



------------------------------------------------------------------------------
Want fast and easy access to all the code in your enterprise? Index and
search up to 200,000 lines of code with a free copy of Black Duck
Code Sight - the same software that powers the world's largest code
search on Ohloh, the Black Duck Open Hub! Try it now.
http://p.sf.net/sfu/bds
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: proposal for changes to the phenotype module

Naama Menda
A type_id in phenotype_cvterm could work, but how would it be handled if you store 2 phenotype_cvterm rows of type 'quality' ? I think we need a straight forward way of pulling together all the parts that build the phenotype statement.

If you use object-subject, then what is the object and what is the subject? Will entity 1 always be the object? 

The question is would there be a case of multiple statements linked to a single phenotype_id? 
And we also need to keep in mind that EQ statements is just one use case for the phenotype.
Would be good to get feedback from other users of this module to see what are their needs and if the new design fits it. 

-Naama



On Tue, Jul 15, 2014 at 10:30 AM, Cannon, Ethalinda K [E CPE] <[hidden email]> wrote:

Hi Naama,


Lacey and I talked through various use cases at the ICLGG meeting last week and I see some merit to having something like phenotype_cvtermgroup​ (or even just adding a type_id field to the existing table, phenotype_cvterm) and a phenotype_cvtermrelationship table with subject/type/object fields. 


One reason is that for "compound" post-composed statements like the EQ statements, a three part entity could be represented with:


   phenotype_cvtermphenotype_cvtermrelationshipphenotype_cvterm 


which would encode the syntax. If a three part entity is represented with:


  phenotype_cvtermgroupphenotype_cvtermgroupphenotype_cvtermgroup


then I think you would have to know it's an entity in an EQ statement to decode it.


But in the first example, there's still the matter of having to indicate all three are part of a group (ie compound term).


Lacey and I also talked about the possibility of a post-composed statement having multiple phenotype ids (e.g., to hold values for a trait described by the post-composed statement) but hadn't considered a phenotype id having multiple post-composed statements ... unless you mean the various compound parts of an EQ statement?


It's difficult to discuss this in text! Did you look at my picture? Did it help? I could make more pictures.... ;-) We could also try scheduling a call.


Ethy​



From: Naama Menda <[hidden email]>
Sent: Tuesday, July 15, 2014 8:43 AM
To: Kucheran, Lacey Sanderson
Cc: GMOD Schema
Subject: Re: [Gmod-schema] proposal for changes to the phenotype module
 
hi Lacey,

the idea is to have the ability to build a phenotype statement using multiple cvterms . Sometimes these cvterms may have a relationship between them, as with 'primary entity 1' and 'primary entity 2' , and sometimes these have just one cvterm (e.g. quality from PATO, or a developmental stage from PO, environment from EO ,etc. )

We need to have the ability to group these multiple cvterm annotations since they all point to one phenotype ID, and a phenotype ID could have more than one post-composed statement.

the table phenotype_cvtermrelationship could be named differently , maybe phenotype_cvtermgroup or something else.
Anyway, it should not have a subject-object relationship as with the other _relationship tables.

-Naama



On Fri, Jul 11, 2014 at 2:06 PM, Kucheran, Lacey Sanderson <[hidden email]> wrote:
Hi Naama,

I’m a bit confused as to how your example specifically fits into the tables described. Did you mean it to be:

"chlorophyll degradation inhibited in the fruit” is annotated with
GO term of type "primary entity 1":  "chlorophyll catabolic process" ,
relationship between entity 1 and entity 2:  "occurs_in"
"primary entity 2":  the PO term for "fruit"
Quality: the PATO term "decreased rate”

phenotype_cvtermrelationship: Record #1
type_id => primary entity 1
cvterm_id => chlorophyll catabolic process
phenotype_cvtermrelationship: Record #2
type_id => relationship between entity 1 and entity 2
cvterm_id => occurs_in
phenotype_cvtermrelationship: Record #3
type_id => primary entity 2
cvterm_id => fruit
phenotype
cvalue => decreased rate

I think part of the confusion might be in the name. A relationship table in chado should have a subject_id and object_id to relate two objects. In this case you only have a cvterm_id which means really your just adding additional information. Might it be better for this table to follow the regular subject/type/object structure and use the phenotype_cvterm table with an added type_id field?

Using the same example:
phenotype_cvterm: Record #1
type_id => primary entity 1
cvterm_id => chlorophyll catabolic process
phenotype_cvterm: Record #2
type_id => primary entity 2
cvterm_id => fruit
phenotype_cvtermrelationship
subject_id => record #1
type_id => occurs_in
object_id => record #2
phenotype
cvalue => decreased rate

Anyway, if you could tell me exactly what goes into what tables/fields I think that would help immensely with me understanding your proposal so I can see if it works for our data?
Thanks!
~Lacey

------------------------------------------------------
Lacey-Anne Sanderson
Bioinformaticist
Pulse Crop Breeding and Genetics
Phone: <a href="tel:%28306%29%20966-4975" value="+13069664975" target="_blank"> (306) 966-4975
Room 3C06 Agriculture
Department of Plant Sciences
University of Saskatchewan

Date: Tue, 8 Jul 2014 11:41:49 -0400
From: Naama Menda <[hidden email]>
Subject: [Gmod-schema] proposal for changes to the phenotype module
To: gmod schema <[hidden email]>
Cc: "Rouard, Mathieu \(Bioversity-France\)" <[hidden email]>,
Laurel Cooper <[hidden email]>, marie-angelique
Laporte <[hidden email]>, Graham McLaren
<[hidden email]>, "Guignon, Valentin \(Bioversity-France\)"
<[hidden email]>
Message-ID:
<[hidden email]>
Content-Type: text/plain; charset="utf-8"

We have had several discussions about modifying the phenotype module.

our main goals are
1. the break the link between phenotype value and trait(s)
2. allow post-composing of terms using multiple ontologies, defining
relationships between cvterms that describe a phenotype.

Here is what Valentin, Mathieu, and I came up with today. We think this
table would cover the use cases we've been looking at, including storing
complex EQ statements (in this case the cvterm will be anything we use from
PO, GO, Rel. ontology , PATO, CHEBI. The type and relationship between the
terms will be defined in the type_id column, e.g. 'primary entity 1' ,
'primary relationship', 'primary entity 2', 'quality' , 'secondary entity
1' etc. ) .


The current phenotype table remains unchanged for backward compatibility.


-----------------------------

COMMENT ON TABLE phenotype IS 'Columns
observable_id,
attr_id,
cvalue_id,
assay_id
are deprecated to break the connection between the phenotype value and the
trait.
The phenotype table should be used to store only the phenotype value. Use
tables phenotype_cvterm and/or phenotype_cvtermrelationship to store the
trait(s) associated with the phenotype. This allows a more abstract  way of
post-composing cvterms and storing EQ statements.';


CREATE TABLE phenotype_cvtermrelationship (
      phenotype_cvtermrelationship_id serial PRIMARY KEY,
      phenotype_id integer NOT NULL REFERENCES phenotype(phenotype_id),
      cvterm_id integer NOT NULL REFERENCES cvterm(cvterm_id),
      type_id integer NOT NULL REFERENCES cvterm(cvterm_id),
      phenotypegroup integer DEFAULT NULL,
      rank integer DEFAULT 0 );

ALTER TABLE phenotype_cvtermrelationship ADD unique (phenotype_id,
cvterm_id, type_id, phenotypegroup,rank);

-------------------------

The 'phenotypegroup' column is there to indicate which rows together create
a single phenotype statement.
Rank column is for cases when you need to store the same cvterm_ids +
type_id more than once.


This design allows storing EQ statements that have multiple ontology terms
of different types associated with each phenotype statement.
e.g.
the a gene with a phenotype "chlorophyll degradation inhibited in the
fruit" is annotated with
GO term of type "primary entity 1":  "chlorophyll catabolic process" ,
relationship between entity 1 and entity 2:  "occurs_in"
"primary entity 2":  the PO term for "fruit"
Quality: the PATO term "decreased rate"

So basically we are post-composing a phenotype statement using PO, GO, PATO
terms , with each one having a type.
Until now people have used phenotypeprop, but this is not a very elegant
solution, and it's hard to group this way multiple cvterms that make a
single statement or post-composed description.

We used the EQ statement case as the most complex example for using
phenotype_cvtermrelationship, but this design works well for simple
pre-composed terms as well.


-Naama



Naama Menda
Boyce Thompson Institute for Plant Research
Tower Rd
Ithaca NY 14853
USA

<a href="tel:%28607%29%20254%203569" value="+16072543569" target="_blank">(607) 254 3569
Sol Genomics Network
http://solgenomics.net/
[hidden email]



------------------------------------------------------------------------------
Want fast and easy access to all the code in your enterprise? Index and
search up to 200,000 lines of code with a free copy of Black Duck
Code Sight - the same software that powers the world's largest code
search on Ohloh, the Black Duck Open Hub! Try it now.
http://p.sf.net/sfu/bds
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema



------------------------------------------------------------------------------
Want fast and easy access to all the code in your enterprise? Index and
search up to 200,000 lines of code with a free copy of Black Duck
Code Sight - the same software that powers the world's largest code
search on Ohloh, the Black Duck Open Hub! Try it now.
http://p.sf.net/sfu/bds
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema