filling cvtermpath

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

filling cvtermpath

Naama Menda
is there anyone who's using the cvtermpath table?

We've refactored a while ago the make_cvtermpath code, but I'm not sure it's populating the cvtermpath the way it should .
Does anyone have examples or properly querying the cvtermpath to find all recursive children and recursive parents?

thanks!
-Naama




Naama Menda
Boyce Thompson Institute for Plant Research
Tower Rd
Ithaca NY 14853
USA

(607) 254 3569
Sol Genomics Network
http://solgenomics.net/
[hidden email]

------------------------------------------------------------------------------
Nokia and AT&T present the 2010 Calling All Innovators-North America contest
Create new apps & games for the Nokia N8 for consumers in  U.S. and Canada
$10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing
Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store
http://p.sf.net/sfu/nokia-dev2dev
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: filling cvtermpath

Naama Menda
thanks for sharing your code!
This is a bit different from how we load the cvtermpath
( http://gmod.svn.sourceforge.net/viewvc/gmod/schema/trunk/chado/bin/cxgn/gmod_make_cvtermpath.pl?view=log )

this loader is based on the old script http://gmod.svn.sourceforge.net/viewvc/gmod/schema/trunk/chado/bin/make_cvtermpath.pl?view=log
(I think Allen Day wrote)

I'm wondering if there are other databases using cvtermpath (Flybase? )  who may comment more about this.

-Naama



On Tue, Nov 2, 2010 at 10:48 AM, Fan Kang <[hidden email]> wrote:
Hi Naama,

As John mentioned earlier, I wrote a plpgsql function for loading child/parent relationships into cvtermpath table in our local database using Flybase. See my code attached.  Please note that my code works for populating GO terms only.  I used the following GO documentation to build the inferred relationships, and populated positive pathdistances.  My note/comments for the logic are embedded in the code. 

Like what you experienced, I couldn't figure out from the Chado documentation/code how the path should be exactly populated, either. Since we only have GO terms in cvterm table, and the logic described in GO documentation makes sense to me,  I wrote the code for our flybase-like database.  If you figure out a generic approach for populating inferred relationships/path, could you share your knowledge/code with us? 

FYI.
GO documentation:

--loading GO term relationships into cvtermpath (after the function "fill_cvtermpath" is created):
select * from fill_cvtermpath ('cellular_component');
select * from fill_cvtermpath ('molecular_function');
select * from fill_cvtermpath ('biological_process');

Regards,
Fan



On Nov 1, 2010, at 9:46 PM, Naama Menda wrote:

hi John,

I managed to get recursive children and recursive parents , but it's difficult to test if it works properly 100%.
It seems like using the positive pathdistance you can get all child nodes or parent nodes (depending if your term is the subject or the object respectively) , and the negative distances can give you only the direct children (subjects) and direct parents (objects)

However, I could not figure out from the chado page about cvterm transitive closure how exactly the path should be populated.

thanks!
-Naama

 
On Mon, Nov 1, 2010 at 5:04 PM, John Matese <[hidden email]> wrote:
Dear Naama,

Just replying off-list as a preliminary message, as Fan Kang (Cc'd) might reply to the list-proper (or perhaps you can re-direct our responses there, once collated.  As you well know, the cvtermpath code has always been problematic.  I've looked at it several times myself (as the versions changed), and I was never convinced it ever functioned as it was supposed to (please don't take offense).  No GMOD versions ever properly took into account whether the relationships were truly reflexive, transitive, etc.  In a local project that utilizes a Flybase mirror (so not GMOD/chado, exactly), Fan invested a good amount of time writing up a postgres procedure that populates cvtermpath, independent from the perl script.  We believe it functions as intended, populating the reflexive/transitive closure, although we may have not populated both positive and negative paths (subject<=>object).  That is to say, the data make sense to us, moreso than the GMOD perl logic.  We can send you the procedural logic, if you wish.  I think that make_cvtermpath script may need to be re-written from scratch, instead of "re-factored", as I think the original logic was broken and I seem to remember that it contained code that seemed to be useless cruft anyway (data structures constructed and then not used, from the very beginning versions).  I don't trust it, and you are justified to be skeptical about its functionality...  ;)

Cheers,
John Matese


On Nov 1, 2010, at 4:33 PM, Naama Menda wrote:

is there anyone who's using the cvtermpath table?

We've refactored a while ago the make_cvtermpath code, but I'm not sure it's populating the cvtermpath the way it should .
Does anyone have examples or properly querying the cvtermpath to find all recursive children and recursive parents?

thanks!
-Naama




Naama Menda
Boyce Thompson Institute for Plant Research
Tower Rd
Ithaca NY 14853
USA

(607) 254 3569
Sol Genomics Network
http://solgenomics.net/
[hidden email]
------------------------------------------------------------------------------
Nokia and AT&T present the 2010 Calling All Innovators-North America contest
Create new apps & games for the Nokia N8 for consumers in  U.S. and Canada
$10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing
Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store
http://p.sf.net/sfu/nokia-dev2dev_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema






------------------------------------------------------------------------------
Nokia and AT&T present the 2010 Calling All Innovators-North America contest
Create new apps & games for the Nokia N8 for consumers in  U.S. and Canada
$10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing
Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store
http://p.sf.net/sfu/nokia-dev2dev
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: filling cvtermpath

Andy Schroeder
FlyBase doesn't populate or use cvtermpath so we don't have anything to add.

cheers,
Andy

On 11/2/10 11:16 AM, Naama Menda wrote:

> thanks for sharing your code!
> This is a bit different from how we load the cvtermpath
> (
> http://gmod.svn.sourceforge.net/viewvc/gmod/schema/trunk/chado/bin/cxgn/gmod_make_cvtermpath.pl?view=log
> )
>
> this loader is based on the old script
> http://gmod.svn.sourceforge.net/viewvc/gmod/schema/trunk/chado/bin/make_cvtermpath.pl?view=log
> (I think Allen Day wrote)
>
> I'm wondering if there are other databases using cvtermpath (Flybase? )�
> who may comment more about this.
>
> -Naama
>
>
>
> On Tue, Nov 2, 2010 at 10:48 AM, Fan Kang <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     Hi Naama,
>
>     As John mentioned earlier, I wrote a plpgsql function for loading
>     child/parent relationships into cvtermpath table in our local
>     database using Flybase. See my code attached. �Please note that my
>     code works for populating GO terms only. �I used the following GO
>     documentation to build the inferred relationships, and populated
>     positive pathdistances. �My note/comments for the logic are embedded
>     in the code.�
>
>     Like what you experienced, I couldn't figure out from the Chado
>     documentation/code how the path should be exactly populated, either.
>     Since we only have GO terms in cvterm table, and the logic described
>     in GO documentation makes sense to me, �I wrote the code for our
>     flybase-like database. �If you figure out a generic approach for
>     populating inferred relationships/path, could you share your
>     knowledge/code with us?�
>
>     FYI.
>     GO documentation:
>     http://www.geneontology.org/GO.ontology.relations.shtml
>
>     --loading GO term relationships into cvtermpath (after the function
>     "fill_cvtermpath" is created):
>     select * from fill_cvtermpath ('cellular_component');
>     select * from fill_cvtermpath ('molecular_function');
>     select * from fill_cvtermpath ('biological_process');
>
>     Regards,
>     Fan
>     http://genomics.princeton.edu/~fkang/
>     <http://genomics.princeton.edu/%7Efkang/>
>
>
>
>     On Nov 1, 2010, at 9:46 PM, Naama Menda wrote:
>
>>     hi John,
>>
>>     I managed to get recursive children and recursive parents , but
>>     it's difficult to test if it works properly 100%.
>>     It seems like using the positive pathdistance you can get all
>>     child nodes or parent nodes (depending if your term is the subject
>>     or the object respectively) , and the negative distances can give
>>     you only the direct children (subjects) and direct parents (objects)
>>
>>     However, I could not figure out from the chado page about cvterm
>>     transitive closure how exactly the path should be populated.
>>
>>     thanks!
>>     -Naama
>>
>>     �
>>     On Mon, Nov 1, 2010 at 5:04 PM, John Matese
>>     <[hidden email]
>>     <mailto:[hidden email]>> wrote:
>>
>>         Dear Naama,
>>
>>         Just replying off-list as a preliminary message, as Fan Kang
>>         (Cc'd) might reply to the list-proper (or perhaps you can
>>         re-direct our responses there, once collated. �As you well
>>         know, the�cvtermpath code has always been problematic. �I've
>>         looked at it several times myself (as the versions changed),
>>         and I was never convinced it ever functioned as it was
>>         supposed to (please don't take offense). �No GMOD versions
>>         ever properly took into account whether the relationships were
>>         truly reflexive, transitive, etc. �In a local project that
>>         utilizes a Flybase mirror (so not GMOD/chado, exactly), Fan
>>         invested a good amount of time writing up a postgres procedure
>>         that populates cvtermpath, independent from the perl script.
>>         �We believe it functions as intended, populating the
>>         reflexive/transitive closure, although we may have not
>>         populated both positive and negative paths (subject<=>object).
>>         �That is to say, the data make sense to us, moreso than the
>>         GMOD perl logic. �We can send you the procedural logic, if you
>>         wish. �I think that�make_cvtermpath script may need to be
>>         re-written from scratch, instead of "re-factored", as I think
>>         the original logic was broken and I seem to remember that it
>>         contained code that seemed to be useless cruft anyway (data
>>         structures constructed and then not used, from the very
>>         beginning versions). �I don't trust it, and you are justified
>>         to be skeptical about its functionality... �;)
>>
>>         Cheers,
>>         John Matese
>>
>>
>>         On Nov 1, 2010, at 4:33 PM, Naama Menda wrote:
>>
>>>         is there anyone who's using the cvtermpath table?
>>>
>>>         We've refactored a while ago the make_cvtermpath code, but
>>>         I'm not sure it's populating the cvtermpath the way it should .
>>>         Does anyone have examples or properly querying the cvtermpath
>>>         to find all recursive children and recursive parents?
>>>
>>>         thanks!
>>>         -Naama
>>>
>>>
>>>
>>>
>>>         Naama Menda
>>>         Boyce Thompson Institute for Plant Research
>>>         Tower Rd
>>>         Ithaca NY 14853
>>>         USA
>>>
>>>         (607) 254 3569
>>>         Sol Genomics Network
>>>         http://solgenomics.net/
>>>         [hidden email] <mailto:[hidden email]>
>>>         ------------------------------------------------------------------------------
>>>         Nokia and AT&T present the 2010 Calling All Innovators-North
>>>         America contest
>>>         Create new apps & games for the Nokia N8 for consumers in
>>>         �U.S. and Canada
>>>         $10 million total in prizes - $4M cash, 500 devices, nearly
>>>         $6M in marketing
>>>         Develop with Nokia Qt SDK, Web Runtime, or Java and Publish
>>>         to Ovi Store
>>>         http://p.sf.net/sfu/nokia-dev2dev_______________________________________________
>>>         Gmod-schema mailing list
>>>         [hidden email]
>>>         <mailto:[hidden email]>
>>>         https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>
>>
>
>
>
>
>
> ------------------------------------------------------------------------------
> Nokia and AT&T present the 2010 Calling All Innovators-North America contest
> Create new apps&  games for the Nokia N8 for consumers in  U.S. and Canada
> $10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing
> Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store
> http://p.sf.net/sfu/nokia-dev2dev
>
>
>
> _______________________________________________
> Gmod-schema mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gmod-schema

------------------------------------------------------------------------------
Nokia and AT&T present the 2010 Calling All Innovators-North America contest
Create new apps & games for the Nokia N8 for consumers in  U.S. and Canada
$10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing
Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store
http://p.sf.net/sfu/nokia-dev2dev
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: filling cvtermpath

Chris Mungall
In reply to this post by Naama Menda

** Specification:

The GO documentation is fine so long as you are just loading GO, but  
in fact there is no need to hardcode specific rules for individual  
relations; everything required should be in the source obo file.

For a correct specification of which relationships should be inferred,  
you should look at the mapping of obo to owl, and the owl formal  
specification. But this may be too much for most people. Because most  
ontologies people will be using correspond to a simple subset of OWL,  
a simple rule approximation can be used. See

http://wiki.geneontology.org/index.php/Reasoning_with_OBO_Format#Graph_Closure

Facts:

        • is_transitive(is_a)
        • is_reflexive(is_a)
these facts are built-in and cannot be modified

Rules:

        • is_transitive(R), X R Y, Y R Z -> X R Z
        • is_reflexive(R) -> X R X
        • X is_a Y, Y R Z, not(is_class_level(R)) -> X R Z
        • X R Y, Y is_a Z, not(is_class_level(R)) -> X R Z
        • X R Y, Y S Z, transitive_over(R,S) -> X R Z
        • X R1 Y, Y R2 Z, holds_over_chain(R,R1,R2) -> X R Z
        • Y union_of (X, ...) -> X is_a Y

Many ontologies will only need the first 4 rules.

Note this doesn't specify the semantics of pathdistance - this is  
essentially the number of rule applications required.

** Implementations:

I wouldn't recommend re-implementing any of the above, there's a few  
options available. In the long term, the correct implementation will  
be to use an OWL reasoner to pre-compute all paths, and then load this  
into the database. This is what GMOD should be converging upon.  
However, at this time, not all OWL reasoners are scalable over all  
ontologies of interest to GMOD, so the best thing to do would be to  
use a graph-based approximation following the rules above. There's a  
few choices here.

At one time I did implement a plpgsql version, but this had problems  
with transactions. Ken's implementation may be more efficient, or  
plpgsql may be faster now. I don't see Ken's code so I can't vouch for  
its correctness. Additionally, this is only good for pg users.

go-moose / GOBO has an in-memory reasoner that can be used to dump the  
pre-computed closure of all relations. It implements the rules above

        http://wiki.geneontology.org/index.php/GO_Moose

Someone could write a GOBO/DBIC bridge but it may be simpler to dump  
the closure and have a separate script to load it. This makes it  
easier to integrate other reasoners, as I can't guarantee go-moose  
will be supported indefinitely (I don't use perl for ontologies any  
more).

If you install OBOEdit it comes with a command line executable called  
obo2linkfile, which dumps the relational closure. This could be loaded  
into cvtermpath easily.

This will be replaced by a java package called OWLGraphWrapper. This  
provides a fast implementation of the above rules, and will also allow  
the use of an owl reasoner when it's appropriate. This would make the  
most sensible target for long term integration. We're currently making  
a bridge between this and the go database hibernate layer. Here's a  
link to the source, but the package names may change:

        http://geneontology.svn.sourceforge.net/viewvc/geneontology/OWLTools/src/owltools/graph/OWLGraphWrapper.java?view=log

Apologies for the heterogeneity of implementations, the blame for this  
lies with me.

I think the most pragmatic solution would be to write a cvtermpath  
loader that takes a 3 or 4 column file (distance optional) and loads  
the table. The file could be populated by a method above such as  
obo2linkfile, and then switched to the new code when it's more mature.


On Nov 2, 2010, at 8:16 AM, Naama Menda wrote:

> thanks for sharing your code!
> This is a bit different from how we load the cvtermpath
> ( http://gmod.svn.sourceforge.net/viewvc/gmod/schema/trunk/chado/bin/cxgn/gmod_make_cvtermpath.pl?view=log 
>  )
>
> this loader is based on the old script http://gmod.svn.sourceforge.net/viewvc/gmod/schema/trunk/chado/bin/make_cvtermpath.pl?view=log
> (I think Allen Day wrote)
>
> I'm wondering if there are other databases using cvtermpath  
> (Flybase? )  who may comment more about this.
>
> -Naama
>
>
>
> On Tue, Nov 2, 2010 at 10:48 AM, Fan Kang <[hidden email]
> > wrote:
> Hi Naama,
>
> As John mentioned earlier, I wrote a plpgsql function for loading  
> child/parent relationships into cvtermpath table in our local  
> database using Flybase. See my code attached.  Please note that my  
> code works for populating GO terms only.  I used the following GO  
> documentation to build the inferred relationships, and populated  
> positive pathdistances.  My note/comments for the logic are embedded  
> in the code.
>
> Like what you experienced, I couldn't figure out from the Chado  
> documentation/code how the path should be exactly populated, either.  
> Since we only have GO terms in cvterm table, and the logic described  
> in GO documentation makes sense to me,  I wrote the code for our  
> flybase-like database.  If you figure out a generic approach for  
> populating inferred relationships/path, could you share your  
> knowledge/code with us?
>
> FYI.
> GO documentation:
> http://www.geneontology.org/GO.ontology.relations.shtml
>
> --loading GO term relationships into cvtermpath (after the function  
> "fill_cvtermpath" is created):
> select * from fill_cvtermpath ('cellular_component');
> select * from fill_cvtermpath ('molecular_function');
> select * from fill_cvtermpath ('biological_process');
>
> Regards,
> Fan
> http://genomics.princeton.edu/~fkang/
>
>
>
> On Nov 1, 2010, at 9:46 PM, Naama Menda wrote:
>
>> hi John,
>>
>> I managed to get recursive children and recursive parents , but  
>> it's difficult to test if it works properly 100%.
>> It seems like using the positive pathdistance you can get all child  
>> nodes or parent nodes (depending if your term is the subject or the  
>> object respectively) , and the negative distances can give you only  
>> the direct children (subjects) and direct parents (objects)
>>
>> However, I could not figure out from the chado page about cvterm  
>> transitive closure how exactly the path should be populated.
>>
>> thanks!
>> -Naama
>>
>>
>> On Mon, Nov 1, 2010 at 5:04 PM, John Matese <[hidden email]
>> > wrote:
>> Dear Naama,
>>
>> Just replying off-list as a preliminary message, as Fan Kang (Cc'd)  
>> might reply to the list-proper (or perhaps you can re-direct our  
>> responses there, once collated.  As you well know, the cvtermpath  
>> code has always been problematic.  I've looked at it several times  
>> myself (as the versions changed), and I was never convinced it ever  
>> functioned as it was supposed to (please don't take offense).  No  
>> GMOD versions ever properly took into account whether the  
>> relationships were truly reflexive, transitive, etc.  In a local  
>> project that utilizes a Flybase mirror (so not GMOD/chado,  
>> exactly), Fan invested a good amount of time writing up a postgres  
>> procedure that populates cvtermpath, independent from the perl  
>> script.  We believe it functions as intended, populating the  
>> reflexive/transitive closure, although we may have not populated  
>> both positive and negative paths (subject<=>object).  That is to  
>> say, the data make sense to us, moreso than the GMOD perl logic.  
>> We can send you the procedural logic, if you wish.  I think that  
>> make_cvtermpath script may need to be re-written from scratch,  
>> instead of "re-factored", as I think the original logic was broken  
>> and I seem to remember that it contained code that seemed to be  
>> useless cruft anyway (data structures constructed and then not  
>> used, from the very beginning versions).  I don't trust it, and you  
>> are justified to be skeptical about its functionality...  ;)
>>
>> Cheers,
>> John Matese
>>
>>
>> On Nov 1, 2010, at 4:33 PM, Naama Menda wrote:
>>
>>> is there anyone who's using the cvtermpath table?
>>>
>>> We've refactored a while ago the make_cvtermpath code, but I'm not  
>>> sure it's populating the cvtermpath the way it should .
>>> Does anyone have examples or properly querying the cvtermpath to  
>>> find all recursive children and recursive parents?
>>>
>>> thanks!
>>> -Naama
>>>
>>>
>>>
>>>
>>> Naama Menda
>>> Boyce Thompson Institute for Plant Research
>>> Tower Rd
>>> Ithaca NY 14853
>>> USA
>>>
>>> (607) 254 3569
>>> Sol Genomics Network
>>> http://solgenomics.net/
>>> [hidden email]
>>> ------------------------------------------------------------------------------
>>> Nokia and AT&T present the 2010 Calling All Innovators-North  
>>> America contest
>>> Create new apps & games for the Nokia N8 for consumers in  U.S.  
>>> and Canada
>>> $10 million total in prizes - $4M cash, 500 devices, nearly $6M in  
>>> marketing
>>> Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi  
>>> Store
>>> http://p.sf.net/sfu/nokia-dev2dev_______________________________________________
>>> Gmod-schema mailing list
>>> [hidden email]
>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>
>>
>
>
>
> ------------------------------------------------------------------------------
> Nokia and AT&T present the 2010 Calling All Innovators-North America  
> contest
> Create new apps & games for the Nokia N8 for consumers in  U.S. and  
> Canada
> $10 million total in prizes - $4M cash, 500 devices, nearly $6M in  
> marketing
> Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi  
> Store
> http://p.sf.net/sfu/nokia-dev2dev_______________________________________________
> Gmod-schema mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gmod-schema


------------------------------------------------------------------------------
Nokia and AT&T present the 2010 Calling All Innovators-North America contest
Create new apps & games for the Nokia N8 for consumers in  U.S. and Canada
$10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing
Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store
http://p.sf.net/sfu/nokia-dev2dev
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: filling cvtermpath

Lukas A. Mueller
Hi Chris,

there is something very basic I don't understand with the current implementation of cvtermpath, and that is that it contains the relationship types. If there is a path with mixed relationship types, it will be listed twice, once for each relationship type. There may be theoretical aspects that require this, but in practice, getting two paths where there is just one is often wrong. The problem seems to be really that between non parent/children relationships, an assignment of the relationship type seems to be somewhat non-sensical...

cheers
Lukas

On Nov 2, 2010, at 4:00 PM, Chris Mungall wrote:

>
> ** Specification:
>
> The GO documentation is fine so long as you are just loading GO, but
> in fact there is no need to hardcode specific rules for individual
> relations; everything required should be in the source obo file.
>
> For a correct specification of which relationships should be inferred,
> you should look at the mapping of obo to owl, and the owl formal
> specification. But this may be too much for most people. Because most
> ontologies people will be using correspond to a simple subset of OWL,
> a simple rule approximation can be used. See
>
> http://wiki.geneontology.org/index.php/Reasoning_with_OBO_Format#Graph_Closure
>
> Facts:
>
>        • is_transitive(is_a)
>        • is_reflexive(is_a)
> these facts are built-in and cannot be modified
>
> Rules:
>
>        • is_transitive(R), X R Y, Y R Z -> X R Z
>        • is_reflexive(R) -> X R X
>        • X is_a Y, Y R Z, not(is_class_level(R)) -> X R Z
>        • X R Y, Y is_a Z, not(is_class_level(R)) -> X R Z
>        • X R Y, Y S Z, transitive_over(R,S) -> X R Z
>        • X R1 Y, Y R2 Z, holds_over_chain(R,R1,R2) -> X R Z
>        • Y union_of (X, ...) -> X is_a Y
>
> Many ontologies will only need the first 4 rules.
>
> Note this doesn't specify the semantics of pathdistance - this is
> essentially the number of rule applications required.
>
> ** Implementations:
>
> I wouldn't recommend re-implementing any of the above, there's a few
> options available. In the long term, the correct implementation will
> be to use an OWL reasoner to pre-compute all paths, and then load this
> into the database. This is what GMOD should be converging upon.
> However, at this time, not all OWL reasoners are scalable over all
> ontologies of interest to GMOD, so the best thing to do would be to
> use a graph-based approximation following the rules above. There's a
> few choices here.
>
> At one time I did implement a plpgsql version, but this had problems
> with transactions. Ken's implementation may be more efficient, or
> plpgsql may be faster now. I don't see Ken's code so I can't vouch for
> its correctness. Additionally, this is only good for pg users.
>
> go-moose / GOBO has an in-memory reasoner that can be used to dump the
> pre-computed closure of all relations. It implements the rules above
>
>        http://wiki.geneontology.org/index.php/GO_Moose
>
> Someone could write a GOBO/DBIC bridge but it may be simpler to dump
> the closure and have a separate script to load it. This makes it
> easier to integrate other reasoners, as I can't guarantee go-moose
> will be supported indefinitely (I don't use perl for ontologies any
> more).
>
> If you install OBOEdit it comes with a command line executable called
> obo2linkfile, which dumps the relational closure. This could be loaded
> into cvtermpath easily.
>
> This will be replaced by a java package called OWLGraphWrapper. This
> provides a fast implementation of the above rules, and will also allow
> the use of an owl reasoner when it's appropriate. This would make the
> most sensible target for long term integration. We're currently making
> a bridge between this and the go database hibernate layer. Here's a
> link to the source, but the package names may change:
>
>        http://geneontology.svn.sourceforge.net/viewvc/geneontology/OWLTools/src/owltools/graph/OWLGraphWrapper.java?view=log
>
> Apologies for the heterogeneity of implementations, the blame for this
> lies with me.
>
> I think the most pragmatic solution would be to write a cvtermpath
> loader that takes a 3 or 4 column file (distance optional) and loads
> the table. The file could be populated by a method above such as
> obo2linkfile, and then switched to the new code when it's more mature.
>
>
> On Nov 2, 2010, at 8:16 AM, Naama Menda wrote:
>
>> thanks for sharing your code!
>> This is a bit different from how we load the cvtermpath
>> ( http://gmod.svn.sourceforge.net/viewvc/gmod/schema/trunk/chado/bin/cxgn/gmod_make_cvtermpath.pl?view=log
>> )
>>
>> this loader is based on the old script http://gmod.svn.sourceforge.net/viewvc/gmod/schema/trunk/chado/bin/make_cvtermpath.pl?view=log
>> (I think Allen Day wrote)
>>
>> I'm wondering if there are other databases using cvtermpath
>> (Flybase? )  who may comment more about this.
>>
>> -Naama
>>
>>
>>
>> On Tue, Nov 2, 2010 at 10:48 AM, Fan Kang <[hidden email]
>>> wrote:
>> Hi Naama,
>>
>> As John mentioned earlier, I wrote a plpgsql function for loading
>> child/parent relationships into cvtermpath table in our local
>> database using Flybase. See my code attached.  Please note that my
>> code works for populating GO terms only.  I used the following GO
>> documentation to build the inferred relationships, and populated
>> positive pathdistances.  My note/comments for the logic are embedded
>> in the code.
>>
>> Like what you experienced, I couldn't figure out from the Chado
>> documentation/code how the path should be exactly populated, either.
>> Since we only have GO terms in cvterm table, and the logic described
>> in GO documentation makes sense to me,  I wrote the code for our
>> flybase-like database.  If you figure out a generic approach for
>> populating inferred relationships/path, could you share your
>> knowledge/code with us?
>>
>> FYI.
>> GO documentation:
>> http://www.geneontology.org/GO.ontology.relations.shtml
>>
>> --loading GO term relationships into cvtermpath (after the function
>> "fill_cvtermpath" is created):
>> select * from fill_cvtermpath ('cellular_component');
>> select * from fill_cvtermpath ('molecular_function');
>> select * from fill_cvtermpath ('biological_process');
>>
>> Regards,
>> Fan
>> http://genomics.princeton.edu/~fkang/
>>
>>
>>
>> On Nov 1, 2010, at 9:46 PM, Naama Menda wrote:
>>
>>> hi John,
>>>
>>> I managed to get recursive children and recursive parents , but
>>> it's difficult to test if it works properly 100%.
>>> It seems like using the positive pathdistance you can get all child
>>> nodes or parent nodes (depending if your term is the subject or the
>>> object respectively) , and the negative distances can give you only
>>> the direct children (subjects) and direct parents (objects)
>>>
>>> However, I could not figure out from the chado page about cvterm
>>> transitive closure how exactly the path should be populated.
>>>
>>> thanks!
>>> -Naama
>>>
>>>
>>> On Mon, Nov 1, 2010 at 5:04 PM, John Matese <[hidden email]
>>>> wrote:
>>> Dear Naama,
>>>
>>> Just replying off-list as a preliminary message, as Fan Kang (Cc'd)
>>> might reply to the list-proper (or perhaps you can re-direct our
>>> responses there, once collated.  As you well know, the cvtermpath
>>> code has always been problematic.  I've looked at it several times
>>> myself (as the versions changed), and I was never convinced it ever
>>> functioned as it was supposed to (please don't take offense).  No
>>> GMOD versions ever properly took into account whether the
>>> relationships were truly reflexive, transitive, etc.  In a local
>>> project that utilizes a Flybase mirror (so not GMOD/chado,
>>> exactly), Fan invested a good amount of time writing up a postgres
>>> procedure that populates cvtermpath, independent from the perl
>>> script.  We believe it functions as intended, populating the
>>> reflexive/transitive closure, although we may have not populated
>>> both positive and negative paths (subject<=>object).  That is to
>>> say, the data make sense to us, moreso than the GMOD perl logic.
>>> We can send you the procedural logic, if you wish.  I think that
>>> make_cvtermpath script may need to be re-written from scratch,
>>> instead of "re-factored", as I think the original logic was broken
>>> and I seem to remember that it contained code that seemed to be
>>> useless cruft anyway (data structures constructed and then not
>>> used, from the very beginning versions).  I don't trust it, and you
>>> are justified to be skeptical about its functionality...  ;)
>>>
>>> Cheers,
>>> John Matese
>>>
>>>
>>> On Nov 1, 2010, at 4:33 PM, Naama Menda wrote:
>>>
>>>> is there anyone who's using the cvtermpath table?
>>>>
>>>> We've refactored a while ago the make_cvtermpath code, but I'm not
>>>> sure it's populating the cvtermpath the way it should .
>>>> Does anyone have examples or properly querying the cvtermpath to
>>>> find all recursive children and recursive parents?
>>>>
>>>> thanks!
>>>> -Naama
>>>>
>>>>
>>>>
>>>>
>>>> Naama Menda
>>>> Boyce Thompson Institute for Plant Research
>>>> Tower Rd
>>>> Ithaca NY 14853
>>>> USA
>>>>
>>>> (607) 254 3569
>>>> Sol Genomics Network
>>>> http://solgenomics.net/
>>>> [hidden email]
>>>> ------------------------------------------------------------------------------
>>>> Nokia and AT&T present the 2010 Calling All Innovators-North
>>>> America contest
>>>> Create new apps & games for the Nokia N8 for consumers in  U.S.
>>>> and Canada
>>>> $10 million total in prizes - $4M cash, 500 devices, nearly $6M in
>>>> marketing
>>>> Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi
>>>> Store
>>>> http://p.sf.net/sfu/nokia-dev2dev_______________________________________________
>>>> Gmod-schema mailing list
>>>> [hidden email]
>>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>>
>>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Nokia and AT&T present the 2010 Calling All Innovators-North America
>> contest
>> Create new apps & games for the Nokia N8 for consumers in  U.S. and
>> Canada
>> $10 million total in prizes - $4M cash, 500 devices, nearly $6M in
>> marketing
>> Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi
>> Store
>> http://p.sf.net/sfu/nokia-dev2dev_______________________________________________
>> Gmod-schema mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>
>
> ------------------------------------------------------------------------------
> Nokia and AT&T present the 2010 Calling All Innovators-North America contest
> Create new apps & games for the Nokia N8 for consumers in  U.S. and Canada
> $10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing
> Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store
> http://p.sf.net/sfu/nokia-dev2dev
> _______________________________________________
> Gmod-schema mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gmod-schema


------------------------------------------------------------------------------
Nokia and AT&T present the 2010 Calling All Innovators-North America contest
Create new apps & games for the Nokia N8 for consumers in  U.S. and Canada
$10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing
Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store
http://p.sf.net/sfu/nokia-dev2dev
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: filling cvtermpath

Chris Mungall

Hi Lukas

Can you give me an example of where you think it doesn't make sense?

It helps if you think of the edges as sentences. For example:

* every S-phase is a biological process
* every S-phase is part of a biological process

Here I've highlighted two paths from S-phase to the root node, and  
both are biologically valid.

On Nov 2, 2010, at 1:25 PM, Lukas Mueller wrote:

> Hi Chris,
>
> there is something very basic I don't understand with the current  
> implementation of cvtermpath, and that is that it contains the  
> relationship types. If there is a path with mixed relationship  
> types, it will be listed twice, once for each relationship type.  
> There may be theoretical aspects that require this, but in practice,  
> getting two paths where there is just one is often wrong. The  
> problem seems to be really that between non parent/children  
> relationships, an assignment of the relationship type seems to be  
> somewhat non-sensical...
>
> cheers
> Lukas
>
> On Nov 2, 2010, at 4:00 PM, Chris Mungall wrote:
>
>>
>> ** Specification:
>>
>> The GO documentation is fine so long as you are just loading GO, but
>> in fact there is no need to hardcode specific rules for individual
>> relations; everything required should be in the source obo file.
>>
>> For a correct specification of which relationships should be  
>> inferred,
>> you should look at the mapping of obo to owl, and the owl formal
>> specification. But this may be too much for most people. Because most
>> ontologies people will be using correspond to a simple subset of OWL,
>> a simple rule approximation can be used. See
>>
>> http://wiki.geneontology.org/index.php/Reasoning_with_OBO_Format#Graph_Closure
>>
>> Facts:
>>
>>       • is_transitive(is_a)
>>       • is_reflexive(is_a)
>> these facts are built-in and cannot be modified
>>
>> Rules:
>>
>>       • is_transitive(R), X R Y, Y R Z -> X R Z
>>       • is_reflexive(R) -> X R X
>>       • X is_a Y, Y R Z, not(is_class_level(R)) -> X R Z
>>       • X R Y, Y is_a Z, not(is_class_level(R)) -> X R Z
>>       • X R Y, Y S Z, transitive_over(R,S) -> X R Z
>>       • X R1 Y, Y R2 Z, holds_over_chain(R,R1,R2) -> X R Z
>>       • Y union_of (X, ...) -> X is_a Y
>>
>> Many ontologies will only need the first 4 rules.
>>
>> Note this doesn't specify the semantics of pathdistance - this is
>> essentially the number of rule applications required.
>>
>> ** Implementations:
>>
>> I wouldn't recommend re-implementing any of the above, there's a few
>> options available. In the long term, the correct implementation will
>> be to use an OWL reasoner to pre-compute all paths, and then load  
>> this
>> into the database. This is what GMOD should be converging upon.
>> However, at this time, not all OWL reasoners are scalable over all
>> ontologies of interest to GMOD, so the best thing to do would be to
>> use a graph-based approximation following the rules above. There's a
>> few choices here.
>>
>> At one time I did implement a plpgsql version, but this had problems
>> with transactions. Ken's implementation may be more efficient, or
>> plpgsql may be faster now. I don't see Ken's code so I can't vouch  
>> for
>> its correctness. Additionally, this is only good for pg users.
>>
>> go-moose / GOBO has an in-memory reasoner that can be used to dump  
>> the
>> pre-computed closure of all relations. It implements the rules above
>>
>>       http://wiki.geneontology.org/index.php/GO_Moose
>>
>> Someone could write a GOBO/DBIC bridge but it may be simpler to dump
>> the closure and have a separate script to load it. This makes it
>> easier to integrate other reasoners, as I can't guarantee go-moose
>> will be supported indefinitely (I don't use perl for ontologies any
>> more).
>>
>> If you install OBOEdit it comes with a command line executable called
>> obo2linkfile, which dumps the relational closure. This could be  
>> loaded
>> into cvtermpath easily.
>>
>> This will be replaced by a java package called OWLGraphWrapper. This
>> provides a fast implementation of the above rules, and will also  
>> allow
>> the use of an owl reasoner when it's appropriate. This would make the
>> most sensible target for long term integration. We're currently  
>> making
>> a bridge between this and the go database hibernate layer. Here's a
>> link to the source, but the package names may change:
>>
>>       http://geneontology.svn.sourceforge.net/viewvc/geneontology/OWLTools/src/owltools/graph/OWLGraphWrapper.java?view=log
>>
>> Apologies for the heterogeneity of implementations, the blame for  
>> this
>> lies with me.
>>
>> I think the most pragmatic solution would be to write a cvtermpath
>> loader that takes a 3 or 4 column file (distance optional) and loads
>> the table. The file could be populated by a method above such as
>> obo2linkfile, and then switched to the new code when it's more  
>> mature.
>>
>>
>> On Nov 2, 2010, at 8:16 AM, Naama Menda wrote:
>>
>>> thanks for sharing your code!
>>> This is a bit different from how we load the cvtermpath
>>> ( http://gmod.svn.sourceforge.net/viewvc/gmod/schema/trunk/chado/bin/cxgn/gmod_make_cvtermpath.pl?view=log
>>> )
>>>
>>> this loader is based on the old script http://gmod.svn.sourceforge.net/viewvc/gmod/schema/trunk/chado/bin/make_cvtermpath.pl?view=log
>>> (I think Allen Day wrote)
>>>
>>> I'm wondering if there are other databases using cvtermpath
>>> (Flybase? )  who may comment more about this.
>>>
>>> -Naama
>>>
>>>
>>>
>>> On Tue, Nov 2, 2010 at 10:48 AM, Fan Kang <[hidden email]
>>>> wrote:
>>> Hi Naama,
>>>
>>> As John mentioned earlier, I wrote a plpgsql function for loading
>>> child/parent relationships into cvtermpath table in our local
>>> database using Flybase. See my code attached.  Please note that my
>>> code works for populating GO terms only.  I used the following GO
>>> documentation to build the inferred relationships, and populated
>>> positive pathdistances.  My note/comments for the logic are embedded
>>> in the code.
>>>
>>> Like what you experienced, I couldn't figure out from the Chado
>>> documentation/code how the path should be exactly populated, either.
>>> Since we only have GO terms in cvterm table, and the logic described
>>> in GO documentation makes sense to me,  I wrote the code for our
>>> flybase-like database.  If you figure out a generic approach for
>>> populating inferred relationships/path, could you share your
>>> knowledge/code with us?
>>>
>>> FYI.
>>> GO documentation:
>>> http://www.geneontology.org/GO.ontology.relations.shtml
>>>
>>> --loading GO term relationships into cvtermpath (after the function
>>> "fill_cvtermpath" is created):
>>> select * from fill_cvtermpath ('cellular_component');
>>> select * from fill_cvtermpath ('molecular_function');
>>> select * from fill_cvtermpath ('biological_process');
>>>
>>> Regards,
>>> Fan
>>> http://genomics.princeton.edu/~fkang/
>>>
>>>
>>>
>>> On Nov 1, 2010, at 9:46 PM, Naama Menda wrote:
>>>
>>>> hi John,
>>>>
>>>> I managed to get recursive children and recursive parents , but
>>>> it's difficult to test if it works properly 100%.
>>>> It seems like using the positive pathdistance you can get all child
>>>> nodes or parent nodes (depending if your term is the subject or the
>>>> object respectively) , and the negative distances can give you only
>>>> the direct children (subjects) and direct parents (objects)
>>>>
>>>> However, I could not figure out from the chado page about cvterm
>>>> transitive closure how exactly the path should be populated.
>>>>
>>>> thanks!
>>>> -Naama
>>>>
>>>>
>>>> On Mon, Nov 1, 2010 at 5:04 PM, John Matese <[hidden email]
>>>>> wrote:
>>>> Dear Naama,
>>>>
>>>> Just replying off-list as a preliminary message, as Fan Kang (Cc'd)
>>>> might reply to the list-proper (or perhaps you can re-direct our
>>>> responses there, once collated.  As you well know, the cvtermpath
>>>> code has always been problematic.  I've looked at it several times
>>>> myself (as the versions changed), and I was never convinced it ever
>>>> functioned as it was supposed to (please don't take offense).  No
>>>> GMOD versions ever properly took into account whether the
>>>> relationships were truly reflexive, transitive, etc.  In a local
>>>> project that utilizes a Flybase mirror (so not GMOD/chado,
>>>> exactly), Fan invested a good amount of time writing up a postgres
>>>> procedure that populates cvtermpath, independent from the perl
>>>> script.  We believe it functions as intended, populating the
>>>> reflexive/transitive closure, although we may have not populated
>>>> both positive and negative paths (subject<=>object).  That is to
>>>> say, the data make sense to us, moreso than the GMOD perl logic.
>>>> We can send you the procedural logic, if you wish.  I think that
>>>> make_cvtermpath script may need to be re-written from scratch,
>>>> instead of "re-factored", as I think the original logic was broken
>>>> and I seem to remember that it contained code that seemed to be
>>>> useless cruft anyway (data structures constructed and then not
>>>> used, from the very beginning versions).  I don't trust it, and you
>>>> are justified to be skeptical about its functionality...  ;)
>>>>
>>>> Cheers,
>>>> John Matese
>>>>
>>>>
>>>> On Nov 1, 2010, at 4:33 PM, Naama Menda wrote:
>>>>
>>>>> is there anyone who's using the cvtermpath table?
>>>>>
>>>>> We've refactored a while ago the make_cvtermpath code, but I'm not
>>>>> sure it's populating the cvtermpath the way it should .
>>>>> Does anyone have examples or properly querying the cvtermpath to
>>>>> find all recursive children and recursive parents?
>>>>>
>>>>> thanks!
>>>>> -Naama
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Naama Menda
>>>>> Boyce Thompson Institute for Plant Research
>>>>> Tower Rd
>>>>> Ithaca NY 14853
>>>>> USA
>>>>>
>>>>> (607) 254 3569
>>>>> Sol Genomics Network
>>>>> http://solgenomics.net/
>>>>> [hidden email]
>>>>> ------------------------------------------------------------------------------
>>>>> Nokia and AT&T present the 2010 Calling All Innovators-North
>>>>> America contest
>>>>> Create new apps & games for the Nokia N8 for consumers in  U.S.
>>>>> and Canada
>>>>> $10 million total in prizes - $4M cash, 500 devices, nearly $6M in
>>>>> marketing
>>>>> Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi
>>>>> Store
>>>>> http://p.sf.net/sfu/nokia-dev2dev_______________________________________________
>>>>> Gmod-schema mailing list
>>>>> [hidden email]
>>>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>>>
>>>>
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Nokia and AT&T present the 2010 Calling All Innovators-North America
>>> contest
>>> Create new apps & games for the Nokia N8 for consumers in  U.S. and
>>> Canada
>>> $10 million total in prizes - $4M cash, 500 devices, nearly $6M in
>>> marketing
>>> Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi
>>> Store
>>> http://p.sf.net/sfu/nokia-dev2dev_______________________________________________
>>> Gmod-schema mailing list
>>> [hidden email]
>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>
>>
>> ------------------------------------------------------------------------------
>> Nokia and AT&T present the 2010 Calling All Innovators-North  
>> America contest
>> Create new apps & games for the Nokia N8 for consumers in  U.S. and  
>> Canada
>> $10 million total in prizes - $4M cash, 500 devices, nearly $6M in  
>> marketing
>> Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi  
>> Store
>> http://p.sf.net/sfu/nokia-dev2dev
>> _______________________________________________
>> Gmod-schema mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>


------------------------------------------------------------------------------
Nokia and AT&T present the 2010 Calling All Innovators-North America contest
Create new apps & games for the Nokia N8 for consumers in  U.S. and Canada
$10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing
Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store
http://p.sf.net/sfu/nokia-dev2dev
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: filling cvtermpath

Lukas A. Mueller
Hi Chris,

one use case here is that we use cvtermpath to determine parents and children of a term, and the way it is implemented it returns duplicate paths, which is not correct (is is relatively easy to correct for that of course). In the case you would like to know relationship types, you can just trivially (I think) join to cvterm_relationship, and you will get all the paths with different type_id. So there is really no reason for them to be duplicated in cvtermpath.

cheers
Lukas

On Nov 2, 2010, at 4:51 PM, Chris Mungall wrote:

>
> Hi Lukas
>
> Can you give me an example of where you think it doesn't make sense?
>
> It helps if you think of the edges as sentences. For example:
>
> * every S-phase is a biological process
> * every S-phase is part of a biological process
>
> Here I've highlighted two paths from S-phase to the root node, and
> both are biologically valid.
>
> On Nov 2, 2010, at 1:25 PM, Lukas Mueller wrote:
>
>> Hi Chris,
>>
>> there is something very basic I don't understand with the current
>> implementation of cvtermpath, and that is that it contains the
>> relationship types. If there is a path with mixed relationship
>> types, it will be listed twice, once for each relationship type.
>> There may be theoretical aspects that require this, but in practice,
>> getting two paths where there is just one is often wrong. The
>> problem seems to be really that between non parent/children
>> relationships, an assignment of the relationship type seems to be
>> somewhat non-sensical...
>>
>> cheers
>> Lukas
>>
>> On Nov 2, 2010, at 4:00 PM, Chris Mungall wrote:
>>
>>>
>>> ** Specification:
>>>
>>> The GO documentation is fine so long as you are just loading GO, but
>>> in fact there is no need to hardcode specific rules for individual
>>> relations; everything required should be in the source obo file.
>>>
>>> For a correct specification of which relationships should be
>>> inferred,
>>> you should look at the mapping of obo to owl, and the owl formal
>>> specification. But this may be too much for most people. Because most
>>> ontologies people will be using correspond to a simple subset of OWL,
>>> a simple rule approximation can be used. See
>>>
>>> http://wiki.geneontology.org/index.php/Reasoning_with_OBO_Format#Graph_Closure
>>>
>>> Facts:
>>>
>>>      • is_transitive(is_a)
>>>      • is_reflexive(is_a)
>>> these facts are built-in and cannot be modified
>>>
>>> Rules:
>>>
>>>      • is_transitive(R), X R Y, Y R Z -> X R Z
>>>      • is_reflexive(R) -> X R X
>>>      • X is_a Y, Y R Z, not(is_class_level(R)) -> X R Z
>>>      • X R Y, Y is_a Z, not(is_class_level(R)) -> X R Z
>>>      • X R Y, Y S Z, transitive_over(R,S) -> X R Z
>>>      • X R1 Y, Y R2 Z, holds_over_chain(R,R1,R2) -> X R Z
>>>      • Y union_of (X, ...) -> X is_a Y
>>>
>>> Many ontologies will only need the first 4 rules.
>>>
>>> Note this doesn't specify the semantics of pathdistance - this is
>>> essentially the number of rule applications required.
>>>
>>> ** Implementations:
>>>
>>> I wouldn't recommend re-implementing any of the above, there's a few
>>> options available. In the long term, the correct implementation will
>>> be to use an OWL reasoner to pre-compute all paths, and then load
>>> this
>>> into the database. This is what GMOD should be converging upon.
>>> However, at this time, not all OWL reasoners are scalable over all
>>> ontologies of interest to GMOD, so the best thing to do would be to
>>> use a graph-based approximation following the rules above. There's a
>>> few choices here.
>>>
>>> At one time I did implement a plpgsql version, but this had problems
>>> with transactions. Ken's implementation may be more efficient, or
>>> plpgsql may be faster now. I don't see Ken's code so I can't vouch
>>> for
>>> its correctness. Additionally, this is only good for pg users.
>>>
>>> go-moose / GOBO has an in-memory reasoner that can be used to dump
>>> the
>>> pre-computed closure of all relations. It implements the rules above
>>>
>>>      http://wiki.geneontology.org/index.php/GO_Moose
>>>
>>> Someone could write a GOBO/DBIC bridge but it may be simpler to dump
>>> the closure and have a separate script to load it. This makes it
>>> easier to integrate other reasoners, as I can't guarantee go-moose
>>> will be supported indefinitely (I don't use perl for ontologies any
>>> more).
>>>
>>> If you install OBOEdit it comes with a command line executable called
>>> obo2linkfile, which dumps the relational closure. This could be
>>> loaded
>>> into cvtermpath easily.
>>>
>>> This will be replaced by a java package called OWLGraphWrapper. This
>>> provides a fast implementation of the above rules, and will also
>>> allow
>>> the use of an owl reasoner when it's appropriate. This would make the
>>> most sensible target for long term integration. We're currently
>>> making
>>> a bridge between this and the go database hibernate layer. Here's a
>>> link to the source, but the package names may change:
>>>
>>>      http://geneontology.svn.sourceforge.net/viewvc/geneontology/OWLTools/src/owltools/graph/OWLGraphWrapper.java?view=log
>>>
>>> Apologies for the heterogeneity of implementations, the blame for
>>> this
>>> lies with me.
>>>
>>> I think the most pragmatic solution would be to write a cvtermpath
>>> loader that takes a 3 or 4 column file (distance optional) and loads
>>> the table. The file could be populated by a method above such as
>>> obo2linkfile, and then switched to the new code when it's more
>>> mature.
>>>
>>>
>>> On Nov 2, 2010, at 8:16 AM, Naama Menda wrote:
>>>
>>>> thanks for sharing your code!
>>>> This is a bit different from how we load the cvtermpath
>>>> ( http://gmod.svn.sourceforge.net/viewvc/gmod/schema/trunk/chado/bin/cxgn/gmod_make_cvtermpath.pl?view=log
>>>> )
>>>>
>>>> this loader is based on the old script http://gmod.svn.sourceforge.net/viewvc/gmod/schema/trunk/chado/bin/make_cvtermpath.pl?view=log
>>>> (I think Allen Day wrote)
>>>>
>>>> I'm wondering if there are other databases using cvtermpath
>>>> (Flybase? )  who may comment more about this.
>>>>
>>>> -Naama
>>>>
>>>>
>>>>
>>>> On Tue, Nov 2, 2010 at 10:48 AM, Fan Kang <[hidden email]
>>>>> wrote:
>>>> Hi Naama,
>>>>
>>>> As John mentioned earlier, I wrote a plpgsql function for loading
>>>> child/parent relationships into cvtermpath table in our local
>>>> database using Flybase. See my code attached.  Please note that my
>>>> code works for populating GO terms only.  I used the following GO
>>>> documentation to build the inferred relationships, and populated
>>>> positive pathdistances.  My note/comments for the logic are embedded
>>>> in the code.
>>>>
>>>> Like what you experienced, I couldn't figure out from the Chado
>>>> documentation/code how the path should be exactly populated, either.
>>>> Since we only have GO terms in cvterm table, and the logic described
>>>> in GO documentation makes sense to me,  I wrote the code for our
>>>> flybase-like database.  If you figure out a generic approach for
>>>> populating inferred relationships/path, could you share your
>>>> knowledge/code with us?
>>>>
>>>> FYI.
>>>> GO documentation:
>>>> http://www.geneontology.org/GO.ontology.relations.shtml
>>>>
>>>> --loading GO term relationships into cvtermpath (after the function
>>>> "fill_cvtermpath" is created):
>>>> select * from fill_cvtermpath ('cellular_component');
>>>> select * from fill_cvtermpath ('molecular_function');
>>>> select * from fill_cvtermpath ('biological_process');
>>>>
>>>> Regards,
>>>> Fan
>>>> http://genomics.princeton.edu/~fkang/
>>>>
>>>>
>>>>
>>>> On Nov 1, 2010, at 9:46 PM, Naama Menda wrote:
>>>>
>>>>> hi John,
>>>>>
>>>>> I managed to get recursive children and recursive parents , but
>>>>> it's difficult to test if it works properly 100%.
>>>>> It seems like using the positive pathdistance you can get all child
>>>>> nodes or parent nodes (depending if your term is the subject or the
>>>>> object respectively) , and the negative distances can give you only
>>>>> the direct children (subjects) and direct parents (objects)
>>>>>
>>>>> However, I could not figure out from the chado page about cvterm
>>>>> transitive closure how exactly the path should be populated.
>>>>>
>>>>> thanks!
>>>>> -Naama
>>>>>
>>>>>
>>>>> On Mon, Nov 1, 2010 at 5:04 PM, John Matese <[hidden email]
>>>>>> wrote:
>>>>> Dear Naama,
>>>>>
>>>>> Just replying off-list as a preliminary message, as Fan Kang (Cc'd)
>>>>> might reply to the list-proper (or perhaps you can re-direct our
>>>>> responses there, once collated.  As you well know, the cvtermpath
>>>>> code has always been problematic.  I've looked at it several times
>>>>> myself (as the versions changed), and I was never convinced it ever
>>>>> functioned as it was supposed to (please don't take offense).  No
>>>>> GMOD versions ever properly took into account whether the
>>>>> relationships were truly reflexive, transitive, etc.  In a local
>>>>> project that utilizes a Flybase mirror (so not GMOD/chado,
>>>>> exactly), Fan invested a good amount of time writing up a postgres
>>>>> procedure that populates cvtermpath, independent from the perl
>>>>> script.  We believe it functions as intended, populating the
>>>>> reflexive/transitive closure, although we may have not populated
>>>>> both positive and negative paths (subject<=>object).  That is to
>>>>> say, the data make sense to us, moreso than the GMOD perl logic.
>>>>> We can send you the procedural logic, if you wish.  I think that
>>>>> make_cvtermpath script may need to be re-written from scratch,
>>>>> instead of "re-factored", as I think the original logic was broken
>>>>> and I seem to remember that it contained code that seemed to be
>>>>> useless cruft anyway (data structures constructed and then not
>>>>> used, from the very beginning versions).  I don't trust it, and you
>>>>> are justified to be skeptical about its functionality...  ;)
>>>>>
>>>>> Cheers,
>>>>> John Matese
>>>>>
>>>>>
>>>>> On Nov 1, 2010, at 4:33 PM, Naama Menda wrote:
>>>>>
>>>>>> is there anyone who's using the cvtermpath table?
>>>>>>
>>>>>> We've refactored a while ago the make_cvtermpath code, but I'm not
>>>>>> sure it's populating the cvtermpath the way it should .
>>>>>> Does anyone have examples or properly querying the cvtermpath to
>>>>>> find all recursive children and recursive parents?
>>>>>>
>>>>>> thanks!
>>>>>> -Naama
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Naama Menda
>>>>>> Boyce Thompson Institute for Plant Research
>>>>>> Tower Rd
>>>>>> Ithaca NY 14853
>>>>>> USA
>>>>>>
>>>>>> (607) 254 3569
>>>>>> Sol Genomics Network
>>>>>> http://solgenomics.net/
>>>>>> [hidden email]
>>>>>> ------------------------------------------------------------------------------
>>>>>> Nokia and AT&T present the 2010 Calling All Innovators-North
>>>>>> America contest
>>>>>> Create new apps & games for the Nokia N8 for consumers in  U.S.
>>>>>> and Canada
>>>>>> $10 million total in prizes - $4M cash, 500 devices, nearly $6M in
>>>>>> marketing
>>>>>> Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi
>>>>>> Store
>>>>>> http://p.sf.net/sfu/nokia-dev2dev_______________________________________________
>>>>>> Gmod-schema mailing list
>>>>>> [hidden email]
>>>>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Nokia and AT&T present the 2010 Calling All Innovators-North America
>>>> contest
>>>> Create new apps & games for the Nokia N8 for consumers in  U.S. and
>>>> Canada
>>>> $10 million total in prizes - $4M cash, 500 devices, nearly $6M in
>>>> marketing
>>>> Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi
>>>> Store
>>>> http://p.sf.net/sfu/nokia-dev2dev_______________________________________________
>>>> Gmod-schema mailing list
>>>> [hidden email]
>>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Nokia and AT&T present the 2010 Calling All Innovators-North
>>> America contest
>>> Create new apps & games for the Nokia N8 for consumers in  U.S. and
>>> Canada
>>> $10 million total in prizes - $4M cash, 500 devices, nearly $6M in
>>> marketing
>>> Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi
>>> Store
>>> http://p.sf.net/sfu/nokia-dev2dev
>>> _______________________________________________
>>> Gmod-schema mailing list
>>> [hidden email]
>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>
>


------------------------------------------------------------------------------
Achieve Improved Network Security with IP and DNS Reputation.
Defend against bad network traffic, including botnets, malware,
phishing sites, and compromised hosts - saving your company time,
money, and embarrassment.   Learn More!
http://p.sf.net/sfu/hpdev2dev-nov
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: filling cvtermpath

Chris Mungall

I'm afraid I'm not following. If you want the direct parents and  
children then you can use cvterm_relationship. I think you mean to say  
you use the cvtermpath table to determine ancestors and descendants,  
and don't care about the relationship type. In this case you could  
just select the distinct values for subject and object - I think this  
is what you mean by "easy to correct for". This doesn't seem like such  
a high penalty to pay for interoperability with other use cases for  
the cvtermpath table.

You say: "In the case you would like to know relationship types, you  
can just trivially (I think) join to cvterm_relationship, and you will  
get all the paths with different type_id". I'm not sure I follow this,  
but I'm pretty sure it's incorrect. To get the correct results, it's  
necessary to populate cvtermpath with the relationship type. Anything  
else will give the wrong results.

The rules I provided below will provide the correct answers for all  
ontologies in obo format. You could have a simplified version in which  
you just take the transitive closure of the edges, ignoring the  
relationship type. This is in fact what many bioinformaticians are  
trained to do with ontologies. Unfortunately, for an increasing number  
of ontologies this strategy is too simplistic. For an anatomy  
ontology, it's necessary to distinguish the inferred parthood  
relationships from the inferred developmental relationships to give  
the correct answers to queries. For SO, it's necessary to distinguish  
the subtypes of RNA from the parts. A number of ontologies are moving  
beyond the usual isa/partof/developsfrom relationship types, and it  
doesn't make sense to lump these in blindly with the others when  
calculating the cvtermpath. The correct composition rules have to be  
followed.

If you _do_ want the simple transitive closure ignoring relationship  
types then you can still store this in Chado; this will also have your  
desired property of having unique (subject,object), since you're just  
using one (artificial) relationship type which is the super-relation  
of all the others. But applications shouldn't make the assumption that  
this is the case, and it's reasonable for applications to make the  
assumption that relationship_type is filled in with the correct  
semantics.

Hope this helps,
Chris

On Nov 2, 2010, at 3:55 PM, Lukas Mueller wrote:

> Hi Chris,
>
> one use case here is that we use cvtermpath to determine parents and  
> children of a term, and the way it is implemented it returns  
> duplicate paths, which is not correct (is is relatively easy to  
> correct for that of course). In the case you would like to know  
> relationship types, you can just trivially (I think) join to  
> cvterm_relationship, and you will get all the paths with different  
> type_id. So there is really no reason for them to be duplicated in  
> cvtermpath.
>
> cheers
> Lukas
>
> On Nov 2, 2010, at 4:51 PM, Chris Mungall wrote:
>
>>
>> Hi Lukas
>>
>> Can you give me an example of where you think it doesn't make sense?
>>
>> It helps if you think of the edges as sentences. For example:
>>
>> * every S-phase is a biological process
>> * every S-phase is part of a biological process
>>
>> Here I've highlighted two paths from S-phase to the root node, and
>> both are biologically valid.
>>
>> On Nov 2, 2010, at 1:25 PM, Lukas Mueller wrote:
>>
>>> Hi Chris,
>>>
>>> there is something very basic I don't understand with the current
>>> implementation of cvtermpath, and that is that it contains the
>>> relationship types. If there is a path with mixed relationship
>>> types, it will be listed twice, once for each relationship type.
>>> There may be theoretical aspects that require this, but in practice,
>>> getting two paths where there is just one is often wrong. The
>>> problem seems to be really that between non parent/children
>>> relationships, an assignment of the relationship type seems to be
>>> somewhat non-sensical...
>>>
>>> cheers
>>> Lukas
>>>
>>> On Nov 2, 2010, at 4:00 PM, Chris Mungall wrote:
>>>
>>>>
>>>> ** Specification:
>>>>
>>>> The GO documentation is fine so long as you are just loading GO,  
>>>> but
>>>> in fact there is no need to hardcode specific rules for individual
>>>> relations; everything required should be in the source obo file.
>>>>
>>>> For a correct specification of which relationships should be
>>>> inferred,
>>>> you should look at the mapping of obo to owl, and the owl formal
>>>> specification. But this may be too much for most people. Because  
>>>> most
>>>> ontologies people will be using correspond to a simple subset of  
>>>> OWL,
>>>> a simple rule approximation can be used. See
>>>>
>>>> http://wiki.geneontology.org/index.php/Reasoning_with_OBO_Format#Graph_Closure
>>>>
>>>> Facts:
>>>>
>>>>     • is_transitive(is_a)
>>>>     • is_reflexive(is_a)
>>>> these facts are built-in and cannot be modified
>>>>
>>>> Rules:
>>>>
>>>>     • is_transitive(R), X R Y, Y R Z -> X R Z
>>>>     • is_reflexive(R) -> X R X
>>>>     • X is_a Y, Y R Z, not(is_class_level(R)) -> X R Z
>>>>     • X R Y, Y is_a Z, not(is_class_level(R)) -> X R Z
>>>>     • X R Y, Y S Z, transitive_over(R,S) -> X R Z
>>>>     • X R1 Y, Y R2 Z, holds_over_chain(R,R1,R2) -> X R Z
>>>>     • Y union_of (X, ...) -> X is_a Y
>>>>
>>>> Many ontologies will only need the first 4 rules.
>>>>
>>>> Note this doesn't specify the semantics of pathdistance - this is
>>>> essentially the number of rule applications required.
>>>>
>>>> ** Implementations:
>>>>
>>>> I wouldn't recommend re-implementing any of the above, there's a  
>>>> few
>>>> options available. In the long term, the correct implementation  
>>>> will
>>>> be to use an OWL reasoner to pre-compute all paths, and then load
>>>> this
>>>> into the database. This is what GMOD should be converging upon.
>>>> However, at this time, not all OWL reasoners are scalable over all
>>>> ontologies of interest to GMOD, so the best thing to do would be to
>>>> use a graph-based approximation following the rules above.  
>>>> There's a
>>>> few choices here.
>>>>
>>>> At one time I did implement a plpgsql version, but this had  
>>>> problems
>>>> with transactions. Ken's implementation may be more efficient, or
>>>> plpgsql may be faster now. I don't see Ken's code so I can't vouch
>>>> for
>>>> its correctness. Additionally, this is only good for pg users.
>>>>
>>>> go-moose / GOBO has an in-memory reasoner that can be used to dump
>>>> the
>>>> pre-computed closure of all relations. It implements the rules  
>>>> above
>>>>
>>>>     http://wiki.geneontology.org/index.php/GO_Moose
>>>>
>>>> Someone could write a GOBO/DBIC bridge but it may be simpler to  
>>>> dump
>>>> the closure and have a separate script to load it. This makes it
>>>> easier to integrate other reasoners, as I can't guarantee go-moose
>>>> will be supported indefinitely (I don't use perl for ontologies any
>>>> more).
>>>>
>>>> If you install OBOEdit it comes with a command line executable  
>>>> called
>>>> obo2linkfile, which dumps the relational closure. This could be
>>>> loaded
>>>> into cvtermpath easily.
>>>>
>>>> This will be replaced by a java package called OWLGraphWrapper.  
>>>> This
>>>> provides a fast implementation of the above rules, and will also
>>>> allow
>>>> the use of an owl reasoner when it's appropriate. This would make  
>>>> the
>>>> most sensible target for long term integration. We're currently
>>>> making
>>>> a bridge between this and the go database hibernate layer. Here's a
>>>> link to the source, but the package names may change:
>>>>
>>>>     http://geneontology.svn.sourceforge.net/viewvc/geneontology/OWLTools/src/owltools/graph/OWLGraphWrapper.java?view=log
>>>>
>>>> Apologies for the heterogeneity of implementations, the blame for
>>>> this
>>>> lies with me.
>>>>
>>>> I think the most pragmatic solution would be to write a cvtermpath
>>>> loader that takes a 3 or 4 column file (distance optional) and  
>>>> loads
>>>> the table. The file could be populated by a method above such as
>>>> obo2linkfile, and then switched to the new code when it's more
>>>> mature.
>>>>
>>>>
>>>> On Nov 2, 2010, at 8:16 AM, Naama Menda wrote:
>>>>
>>>>> thanks for sharing your code!
>>>>> This is a bit different from how we load the cvtermpath
>>>>> ( http://gmod.svn.sourceforge.net/viewvc/gmod/schema/trunk/chado/bin/cxgn/gmod_make_cvtermpath.pl?view=log
>>>>> )
>>>>>
>>>>> this loader is based on the old script http://gmod.svn.sourceforge.net/viewvc/gmod/schema/trunk/chado/bin/make_cvtermpath.pl?view=log
>>>>> (I think Allen Day wrote)
>>>>>
>>>>> I'm wondering if there are other databases using cvtermpath
>>>>> (Flybase? )  who may comment more about this.
>>>>>
>>>>> -Naama
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Nov 2, 2010 at 10:48 AM, Fan Kang <[hidden email]
>>>>>> wrote:
>>>>> Hi Naama,
>>>>>
>>>>> As John mentioned earlier, I wrote a plpgsql function for loading
>>>>> child/parent relationships into cvtermpath table in our local
>>>>> database using Flybase. See my code attached.  Please note that my
>>>>> code works for populating GO terms only.  I used the following GO
>>>>> documentation to build the inferred relationships, and populated
>>>>> positive pathdistances.  My note/comments for the logic are  
>>>>> embedded
>>>>> in the code.
>>>>>
>>>>> Like what you experienced, I couldn't figure out from the Chado
>>>>> documentation/code how the path should be exactly populated,  
>>>>> either.
>>>>> Since we only have GO terms in cvterm table, and the logic  
>>>>> described
>>>>> in GO documentation makes sense to me,  I wrote the code for our
>>>>> flybase-like database.  If you figure out a generic approach for
>>>>> populating inferred relationships/path, could you share your
>>>>> knowledge/code with us?
>>>>>
>>>>> FYI.
>>>>> GO documentation:
>>>>> http://www.geneontology.org/GO.ontology.relations.shtml
>>>>>
>>>>> --loading GO term relationships into cvtermpath (after the  
>>>>> function
>>>>> "fill_cvtermpath" is created):
>>>>> select * from fill_cvtermpath ('cellular_component');
>>>>> select * from fill_cvtermpath ('molecular_function');
>>>>> select * from fill_cvtermpath ('biological_process');
>>>>>
>>>>> Regards,
>>>>> Fan
>>>>> http://genomics.princeton.edu/~fkang/
>>>>>
>>>>>
>>>>>
>>>>> On Nov 1, 2010, at 9:46 PM, Naama Menda wrote:
>>>>>
>>>>>> hi John,
>>>>>>
>>>>>> I managed to get recursive children and recursive parents , but
>>>>>> it's difficult to test if it works properly 100%.
>>>>>> It seems like using the positive pathdistance you can get all  
>>>>>> child
>>>>>> nodes or parent nodes (depending if your term is the subject or  
>>>>>> the
>>>>>> object respectively) , and the negative distances can give you  
>>>>>> only
>>>>>> the direct children (subjects) and direct parents (objects)
>>>>>>
>>>>>> However, I could not figure out from the chado page about cvterm
>>>>>> transitive closure how exactly the path should be populated.
>>>>>>
>>>>>> thanks!
>>>>>> -Naama
>>>>>>
>>>>>>
>>>>>> On Mon, Nov 1, 2010 at 5:04 PM, John Matese <[hidden email]
>>>>>>> wrote:
>>>>>> Dear Naama,
>>>>>>
>>>>>> Just replying off-list as a preliminary message, as Fan Kang  
>>>>>> (Cc'd)
>>>>>> might reply to the list-proper (or perhaps you can re-direct our
>>>>>> responses there, once collated.  As you well know, the cvtermpath
>>>>>> code has always been problematic.  I've looked at it several  
>>>>>> times
>>>>>> myself (as the versions changed), and I was never convinced it  
>>>>>> ever
>>>>>> functioned as it was supposed to (please don't take offense).  No
>>>>>> GMOD versions ever properly took into account whether the
>>>>>> relationships were truly reflexive, transitive, etc.  In a local
>>>>>> project that utilizes a Flybase mirror (so not GMOD/chado,
>>>>>> exactly), Fan invested a good amount of time writing up a  
>>>>>> postgres
>>>>>> procedure that populates cvtermpath, independent from the perl
>>>>>> script.  We believe it functions as intended, populating the
>>>>>> reflexive/transitive closure, although we may have not populated
>>>>>> both positive and negative paths (subject<=>object).  That is to
>>>>>> say, the data make sense to us, moreso than the GMOD perl logic.
>>>>>> We can send you the procedural logic, if you wish.  I think that
>>>>>> make_cvtermpath script may need to be re-written from scratch,
>>>>>> instead of "re-factored", as I think the original logic was  
>>>>>> broken
>>>>>> and I seem to remember that it contained code that seemed to be
>>>>>> useless cruft anyway (data structures constructed and then not
>>>>>> used, from the very beginning versions).  I don't trust it, and  
>>>>>> you
>>>>>> are justified to be skeptical about its functionality...  ;)
>>>>>>
>>>>>> Cheers,
>>>>>> John Matese
>>>>>>
>>>>>>
>>>>>> On Nov 1, 2010, at 4:33 PM, Naama Menda wrote:
>>>>>>
>>>>>>> is there anyone who's using the cvtermpath table?
>>>>>>>
>>>>>>> We've refactored a while ago the make_cvtermpath code, but I'm  
>>>>>>> not
>>>>>>> sure it's populating the cvtermpath the way it should .
>>>>>>> Does anyone have examples or properly querying the cvtermpath to
>>>>>>> find all recursive children and recursive parents?
>>>>>>>
>>>>>>> thanks!
>>>>>>> -Naama
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Naama Menda
>>>>>>> Boyce Thompson Institute for Plant Research
>>>>>>> Tower Rd
>>>>>>> Ithaca NY 14853
>>>>>>> USA
>>>>>>>
>>>>>>> (607) 254 3569
>>>>>>> Sol Genomics Network
>>>>>>> http://solgenomics.net/
>>>>>>> [hidden email]
>>>>>>> ------------------------------------------------------------------------------
>>>>>>> Nokia and AT&T present the 2010 Calling All Innovators-North
>>>>>>> America contest
>>>>>>> Create new apps & games for the Nokia N8 for consumers in  U.S.
>>>>>>> and Canada
>>>>>>> $10 million total in prizes - $4M cash, 500 devices, nearly  
>>>>>>> $6M in
>>>>>>> marketing
>>>>>>> Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to  
>>>>>>> Ovi
>>>>>>> Store
>>>>>>> http://p.sf.net/sfu/nokia-dev2dev_______________________________________________
>>>>>>> Gmod-schema mailing list
>>>>>>> [hidden email]
>>>>>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>> Nokia and AT&T present the 2010 Calling All Innovators-North  
>>>>> America
>>>>> contest
>>>>> Create new apps & games for the Nokia N8 for consumers in  U.S.  
>>>>> and
>>>>> Canada
>>>>> $10 million total in prizes - $4M cash, 500 devices, nearly $6M in
>>>>> marketing
>>>>> Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi
>>>>> Store
>>>>> http://p.sf.net/sfu/nokia-dev2dev_______________________________________________
>>>>> Gmod-schema mailing list
>>>>> [hidden email]
>>>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Nokia and AT&T present the 2010 Calling All Innovators-North
>>>> America contest
>>>> Create new apps & games for the Nokia N8 for consumers in  U.S. and
>>>> Canada
>>>> $10 million total in prizes - $4M cash, 500 devices, nearly $6M in
>>>> marketing
>>>> Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi
>>>> Store
>>>> http://p.sf.net/sfu/nokia-dev2dev
>>>> _______________________________________________
>>>> Gmod-schema mailing list
>>>> [hidden email]
>>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>>
>>
>


------------------------------------------------------------------------------
Achieve Improved Network Security with IP and DNS Reputation.
Defend against bad network traffic, including botnets, malware,
phishing sites, and compromised hosts - saving your company time,
money, and embarrassment.   Learn More!
http://p.sf.net/sfu/hpdev2dev-nov
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: filling cvtermpath

Robert Buels
Chris Mungall wrote:
> I'm afraid I'm not following. If you want the direct parents and  
> children then you can use cvterm_relationship. I think you mean to say  
> you use the cvtermpath table to determine ancestors and descendants,  
> and don't care about the relationship type. In this case you could  
> just select the distinct values for subject and object - I think this  

+1 to distinct on subject_id, object_id.  Ignoring the type of the
relationship in cvtermpath would severely limit its usefulness for other
applications.

Rob


------------------------------------------------------------------------------
Achieve Improved Network Security with IP and DNS Reputation.
Defend against bad network traffic, including botnets, malware,
phishing sites, and compromised hosts - saving your company time,
money, and embarrassment.   Learn More!
http://p.sf.net/sfu/hpdev2dev-nov
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: filling cvtermpath

John Matese
In reply to this post by Chris Mungall
Hi All,

Just wanted to chime in:

gmod_make_cvtermpath.pl logic may have been functional at one time,  
perhaps if only "is_a" relationship was considered.  I seem to  
remember the relationship term name actually being hard-coded, which  
eventually made the script nonfunctional.  The re-factored version may  
have fixed that particular issue, but the current behavior of  
selecting all types from the relationship ontology and handling them  
equivalently, without exploring/testing their associated properties  
within cvtermprop and their position within the relationship chain, is  
doomed to run afoul of the rules that Chris has laid out, below.   I  
am uncertain how easy it would be to code the "is_class_level,  
transitive_over, holds_over_chain, and union_of" functions that Chris  
has noted, but as he suggested, perhaps it shouldn't be attempted,  
PROVIDED a suitable alternative resource existed?

So, how about we lobby the OBO Foundry to create and host the  
transitive closure files from popular ontologies (or all of them),  
perhaps as a link from the ontology "detail" page?

Cheers,
John


On Nov 2, 2010, at 4:00 PM, Chris Mungall wrote:

>
> ** Specification:
>
> The GO documentation is fine so long as you are just loading GO, but  
> in fact there is no need to hardcode specific rules for individual  
> relations; everything required should be in the source obo file.
>
> For a correct specification of which relationships should be  
> inferred, you should look at the mapping of obo to owl, and the owl  
> formal specification. But this may be too much for most people.  
> Because most ontologies people will be using correspond to a simple  
> subset of OWL, a simple rule approximation can be used. See
>
> http://wiki.geneontology.org/index.php/Reasoning_with_OBO_Format#Graph_Closure
>
> Facts:
>
> • is_transitive(is_a)
> • is_reflexive(is_a)
> these facts are built-in and cannot be modified
>
> Rules:
>
> • is_transitive(R), X R Y, Y R Z -> X R Z
> • is_reflexive(R) -> X R X
> • X is_a Y, Y R Z, not(is_class_level(R)) -> X R Z
> • X R Y, Y is_a Z, not(is_class_level(R)) -> X R Z
> • X R Y, Y S Z, transitive_over(R,S) -> X R Z
> • X R1 Y, Y R2 Z, holds_over_chain(R,R1,R2) -> X R Z
> • Y union_of (X, ...) -> X is_a Y
>
> Many ontologies will only need the first 4 rules.
>
> Note this doesn't specify the semantics of pathdistance - this is  
> essentially the number of rule applications required.
>
> ** Implementations:
>
> I wouldn't recommend re-implementing any of the above, there's a few  
> options available. In the long term, the correct implementation will  
> be to use an OWL reasoner to pre-compute all paths, and then load  
> this into the database. This is what GMOD should be converging upon.  
> However, at this time, not all OWL reasoners are scalable over all  
> ontologies of interest to GMOD, so the best thing to do would be to  
> use a graph-based approximation following the rules above. There's a  
> few choices here.
>
> At one time I did implement a plpgsql version, but this had problems  
> with transactions. Ken's implementation may be more efficient, or  
> plpgsql may be faster now. I don't see Ken's code so I can't vouch  
> for its correctness. Additionally, this is only good for pg users.
>
> go-moose / GOBO has an in-memory reasoner that can be used to dump  
> the pre-computed closure of all relations. It implements the rules  
> above
>
> http://wiki.geneontology.org/index.php/GO_Moose
>
> Someone could write a GOBO/DBIC bridge but it may be simpler to dump  
> the closure and have a separate script to load it. This makes it  
> easier to integrate other reasoners, as I can't guarantee go-moose  
> will be supported indefinitely (I don't use perl for ontologies any  
> more).
>
> If you install OBOEdit it comes with a command line executable  
> called obo2linkfile, which dumps the relational closure. This could  
> be loaded into cvtermpath easily.
>
> This will be replaced by a java package called OWLGraphWrapper. This  
> provides a fast implementation of the above rules, and will also  
> allow the use of an owl reasoner when it's appropriate. This would  
> make the most sensible target for long term integration. We're  
> currently making a bridge between this and the go database hibernate  
> layer. Here's a link to the source, but the package names may change:
>
> http://geneontology.svn.sourceforge.net/viewvc/geneontology/OWLTools/src/owltools/graph/OWLGraphWrapper.java?view=log
>
> Apologies for the heterogeneity of implementations, the blame for  
> this lies with me.
>
> I think the most pragmatic solution would be to write a cvtermpath  
> loader that takes a 3 or 4 column file (distance optional) and loads  
> the table. The file could be populated by a method above such as  
> obo2linkfile, and then switched to the new code when it's more mature.
>
>
> On Nov 2, 2010, at 8:16 AM, Naama Menda wrote:
>
>> thanks for sharing your code!
>> This is a bit different from how we load the cvtermpath
>> ( http://gmod.svn.sourceforge.net/viewvc/gmod/schema/trunk/chado/bin/cxgn/gmod_make_cvtermpath.pl?view=log 
>>  )
>>
>> this loader is based on the old script http://gmod.svn.sourceforge.net/viewvc/gmod/schema/trunk/chado/bin/make_cvtermpath.pl?view=log
>> (I think Allen Day wrote)
>>
>> I'm wondering if there are other databases using cvtermpath  
>> (Flybase? )  who may comment more about this.
>>
>> -Naama
>>
>>
>>
>> On Tue, Nov 2, 2010 at 10:48 AM, Fan Kang <[hidden email]
>> > wrote:
>> Hi Naama,
>>
>> As John mentioned earlier, I wrote a plpgsql function for loading  
>> child/parent relationships into cvtermpath table in our local  
>> database using Flybase. See my code attached.  Please note that my  
>> code works for populating GO terms only.  I used the following GO  
>> documentation to build the inferred relationships, and populated  
>> positive pathdistances.  My note/comments for the logic are  
>> embedded in the code.
>>
>> Like what you experienced, I couldn't figure out from the Chado  
>> documentation/code how the path should be exactly populated,  
>> either. Since we only have GO terms in cvterm table, and the logic  
>> described in GO documentation makes sense to me,  I wrote the code  
>> for our flybase-like database.  If you figure out a generic  
>> approach for populating inferred relationships/path, could you  
>> share your knowledge/code with us?
>>
>> FYI.
>> GO documentation:
>> http://www.geneontology.org/GO.ontology.relations.shtml
>>
>> --loading GO term relationships into cvtermpath (after the function  
>> "fill_cvtermpath" is created):
>> select * from fill_cvtermpath ('cellular_component');
>> select * from fill_cvtermpath ('molecular_function');
>> select * from fill_cvtermpath ('biological_process');
>>
>> Regards,
>> Fan
>> http://genomics.princeton.edu/~fkang/
>>
>>
>>
>> On Nov 1, 2010, at 9:46 PM, Naama Menda wrote:
>>
>>> hi John,
>>>
>>> I managed to get recursive children and recursive parents , but  
>>> it's difficult to test if it works properly 100%.
>>> It seems like using the positive pathdistance you can get all  
>>> child nodes or parent nodes (depending if your term is the subject  
>>> or the object respectively) , and the negative distances can give  
>>> you only the direct children (subjects) and direct parents (objects)
>>>
>>> However, I could not figure out from the chado page about cvterm  
>>> transitive closure how exactly the path should be populated.
>>>
>>> thanks!
>>> -Naama
>>>
>>>
>>> On Mon, Nov 1, 2010 at 5:04 PM, John Matese <[hidden email]
>>> > wrote:
>>> Dear Naama,
>>>
>>> Just replying off-list as a preliminary message, as Fan Kang  
>>> (Cc'd) might reply to the list-proper (or perhaps you can re-
>>> direct our responses there, once collated.  As you well know, the  
>>> cvtermpath code has always been problematic.  I've looked at it  
>>> several times myself (as the versions changed), and I was never  
>>> convinced it ever functioned as it was supposed to (please don't  
>>> take offense).  No GMOD versions ever properly took into account  
>>> whether the relationships were truly reflexive, transitive, etc.  
>>> In a local project that utilizes a Flybase mirror (so not GMOD/
>>> chado, exactly), Fan invested a good amount of time writing up a  
>>> postgres procedure that populates cvtermpath, independent from the  
>>> perl script.  We believe it functions as intended, populating the  
>>> reflexive/transitive closure, although we may have not populated  
>>> both positive and negative paths (subject<=>object).  That is to  
>>> say, the data make sense to us, moreso than the GMOD perl logic.  
>>> We can send you the procedural logic, if you wish.  I think that  
>>> make_cvtermpath script may need to be re-written from scratch,  
>>> instead of "re-factored", as I think the original logic was broken  
>>> and I seem to remember that it contained code that seemed to be  
>>> useless cruft anyway (data structures constructed and then not  
>>> used, from the very beginning versions).  I don't trust it, and  
>>> you are justified to be skeptical about its functionality...  ;)
>>>
>>> Cheers,
>>> John Matese
>>>
>>>
>>> On Nov 1, 2010, at 4:33 PM, Naama Menda wrote:
>>>
>>>> is there anyone who's using the cvtermpath table?
>>>>
>>>> We've refactored a while ago the make_cvtermpath code, but I'm  
>>>> not sure it's populating the cvtermpath the way it should .
>>>> Does anyone have examples or properly querying the cvtermpath to  
>>>> find all recursive children and recursive parents?
>>>>
>>>> thanks!
>>>> -Naama
>>>>
>>>>
>>>>
>>>>
>>>> Naama Menda
>>>> Boyce Thompson Institute for Plant Research
>>>> Tower Rd
>>>> Ithaca NY 14853
>>>> USA
>>>>
>>>> (607) 254 3569
>>>> Sol Genomics Network
>>>> http://solgenomics.net/
>>>> [hidden email]
>>>> ------------------------------------------------------------------------------
>>>> Nokia and AT&T present the 2010 Calling All Innovators-North  
>>>> America contest
>>>> Create new apps & games for the Nokia N8 for consumers in  U.S.  
>>>> and Canada
>>>> $10 million total in prizes - $4M cash, 500 devices, nearly $6M  
>>>> in marketing
>>>> Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to  
>>>> Ovi Store
>>>> http://p.sf.net/sfu/nokia-dev2dev_______________________________________________
>>>> Gmod-schema mailing list
>>>> [hidden email]
>>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>>
>>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Nokia and AT&T present the 2010 Calling All Innovators-North  
>> America contest
>> Create new apps & games for the Nokia N8 for consumers in  U.S. and  
>> Canada
>> $10 million total in prizes - $4M cash, 500 devices, nearly $6M in  
>> marketing
>> Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi  
>> Store
>> http://p.sf.net/sfu/nokia-dev2dev_______________________________________________
>> Gmod-schema mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>
>


------------------------------------------------------------------------------
Achieve Improved Network Security with IP and DNS Reputation.
Defend against bad network traffic, including botnets, malware,
phishing sites, and compromised hosts - saving your company time,
money, and embarrassment.   Learn More!
http://p.sf.net/sfu/hpdev2dev-nov
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: filling cvtermpath

Naama Menda
gmod_make_cvtermpath.pl does load different path for each type_id .
http://gmod.svn.sourceforge.net/viewvc/gmod/schema/trunk/chado/bin/cxgn/gmod_make_cvtermpath.pl?view=log

it finds the type_id from cvterm_relationship, and stores it in the cvtermpath.

This code if basically refactoring of the 6 year old code from here

http://gmod.svn.sourceforge.net/viewvc/gmod/schema/trunk/chado/bin/make_cvtermpath.pl?view=log


I don't know if the original code still works, because I use the new one with Bio::Chado::Schema,
but it has the same logic for populating cvtermpath.


-Naama



On Wed, Nov 3, 2010 at 12:29 PM, John Matese <[hidden email]> wrote:
Hi All,

Just wanted to chime in:

gmod_make_cvtermpath.pl logic may have been functional at one time, perhaps if only "is_a" relationship was considered.  I seem to remember the relationship term name actually being hard-coded, which eventually made the script nonfunctional.  The re-factored version may have fixed that particular issue, but the current behavior of selecting all types from the relationship ontology and handling them equivalently, without exploring/testing their associated properties within cvtermprop and their position within the relationship chain, is doomed to run afoul of the rules that Chris has laid out, below.   I am uncertain how easy it would be to code the "is_class_level, transitive_over, holds_over_chain, and union_of" functions that Chris has noted, but as he suggested, perhaps it shouldn't be attempted, PROVIDED a suitable alternative resource existed?

So, how about we lobby the OBO Foundry to create and host the transitive closure files from popular ontologies (or all of them), perhaps as a link from the ontology "detail" page?

Cheers,
John



On Nov 2, 2010, at 4:00 PM, Chris Mungall wrote:


** Specification:

The GO documentation is fine so long as you are just loading GO, but in fact there is no need to hardcode specific rules for individual relations; everything required should be in the source obo file.

For a correct specification of which relationships should be inferred, you should look at the mapping of obo to owl, and the owl formal specification. But this may be too much for most people. Because most ontologies people will be using correspond to a simple subset of OWL, a simple rule approximation can be used. See

http://wiki.geneontology.org/index.php/Reasoning_with_OBO_Format#Graph_Closure

Facts:

       • is_transitive(is_a)
       • is_reflexive(is_a)
these facts are built-in and cannot be modified

Rules:

       • is_transitive(R), X R Y, Y R Z -> X R Z
       • is_reflexive(R) -> X R X
       • X is_a Y, Y R Z, not(is_class_level(R)) -> X R Z
       • X R Y, Y is_a Z, not(is_class_level(R)) -> X R Z
       • X R Y, Y S Z, transitive_over(R,S) -> X R Z
       • X R1 Y, Y R2 Z, holds_over_chain(R,R1,R2) -> X R Z
       • Y union_of (X, ...) -> X is_a Y

Many ontologies will only need the first 4 rules.

Note this doesn't specify the semantics of pathdistance - this is essentially the number of rule applications required.

** Implementations:

I wouldn't recommend re-implementing any of the above, there's a few options available. In the long term, the correct implementation will be to use an OWL reasoner to pre-compute all paths, and then load this into the database. This is what GMOD should be converging upon. However, at this time, not all OWL reasoners are scalable over all ontologies of interest to GMOD, so the best thing to do would be to use a graph-based approximation following the rules above. There's a few choices here.

At one time I did implement a plpgsql version, but this had problems with transactions. Ken's implementation may be more efficient, or plpgsql may be faster now. I don't see Ken's code so I can't vouch for its correctness. Additionally, this is only good for pg users.

go-moose / GOBO has an in-memory reasoner that can be used to dump the pre-computed closure of all relations. It implements the rules above

       http://wiki.geneontology.org/index.php/GO_Moose

Someone could write a GOBO/DBIC bridge but it may be simpler to dump the closure and have a separate script to load it. This makes it easier to integrate other reasoners, as I can't guarantee go-moose will be supported indefinitely (I don't use perl for ontologies any more).

If you install OBOEdit it comes with a command line executable called obo2linkfile, which dumps the relational closure. This could be loaded into cvtermpath easily.

This will be replaced by a java package called OWLGraphWrapper. This provides a fast implementation of the above rules, and will also allow the use of an owl reasoner when it's appropriate. This would make the most sensible target for long term integration. We're currently making a bridge between this and the go database hibernate layer. Here's a link to the source, but the package names may change:

       http://geneontology.svn.sourceforge.net/viewvc/geneontology/OWLTools/src/owltools/graph/OWLGraphWrapper.java?view=log

Apologies for the heterogeneity of implementations, the blame for this lies with me.

I think the most pragmatic solution would be to write a cvtermpath loader that takes a 3 or 4 column file (distance optional) and loads the table. The file could be populated by a method above such as obo2linkfile, and then switched to the new code when it's more mature.


On Nov 2, 2010, at 8:16 AM, Naama Menda wrote:

thanks for sharing your code!
This is a bit different from how we load the cvtermpath
( http://gmod.svn.sourceforge.net/viewvc/gmod/schema/trunk/chado/bin/cxgn/gmod_make_cvtermpath.pl?view=log )

this loader is based on the old script http://gmod.svn.sourceforge.net/viewvc/gmod/schema/trunk/chado/bin/make_cvtermpath.pl?view=log
(I think Allen Day wrote)

I'm wondering if there are other databases using cvtermpath (Flybase? )  who may comment more about this.

-Naama



On Tue, Nov 2, 2010 at 10:48 AM, Fan Kang <[hidden email]> wrote:
Hi Naama,

As John mentioned earlier, I wrote a plpgsql function for loading child/parent relationships into cvtermpath table in our local database using Flybase. See my code attached.  Please note that my code works for populating GO terms only.  I used the following GO documentation to build the inferred relationships, and populated positive pathdistances.  My note/comments for the logic are embedded in the code.

Like what you experienced, I couldn't figure out from the Chado documentation/code how the path should be exactly populated, either. Since we only have GO terms in cvterm table, and the logic described in GO documentation makes sense to me,  I wrote the code for our flybase-like database.  If you figure out a generic approach for populating inferred relationships/path, could you share your knowledge/code with us?

FYI.
GO documentation:
http://www.geneontology.org/GO.ontology.relations.shtml

--loading GO term relationships into cvtermpath (after the function "fill_cvtermpath" is created):
select * from fill_cvtermpath ('cellular_component');
select * from fill_cvtermpath ('molecular_function');
select * from fill_cvtermpath ('biological_process');

Regards,
Fan
http://genomics.princeton.edu/~fkang/



On Nov 1, 2010, at 9:46 PM, Naama Menda wrote:

hi John,

I managed to get recursive children and recursive parents , but it's difficult to test if it works properly 100%.
It seems like using the positive pathdistance you can get all child nodes or parent nodes (depending if your term is the subject or the object respectively) , and the negative distances can give you only the direct children (subjects) and direct parents (objects)

However, I could not figure out from the chado page about cvterm transitive closure how exactly the path should be populated.

thanks!
-Naama


On Mon, Nov 1, 2010 at 5:04 PM, John Matese <[hidden email]> wrote:
Dear Naama,

Just replying off-list as a preliminary message, as Fan Kang (Cc'd) might reply to the list-proper (or perhaps you can re-direct our responses there, once collated.  As you well know, the cvtermpath code has always been problematic.  I've looked at it several times myself (as the versions changed), and I was never convinced it ever functioned as it was supposed to (please don't take offense).  No GMOD versions ever properly took into account whether the relationships were truly reflexive, transitive, etc.  In a local project that utilizes a Flybase mirror (so not GMOD/chado, exactly), Fan invested a good amount of time writing up a postgres procedure that populates cvtermpath, independent from the perl script.  We believe it functions as intended, populating the reflexive/transitive closure, although we may have not populated both positive and negative paths (subject<=>object).  That is to say, the data make sense to us, moreso than the GMOD perl logic.  We can send you the procedural logic, if you wish.  I think that make_cvtermpath script may need to be re-written from scratch, instead of "re-factored", as I think the original logic was broken and I seem to remember that it contained code that seemed to be useless cruft anyway (data structures constructed and then not used, from the very beginning versions).  I don't trust it, and you are justified to be skeptical about its functionality...  ;)

Cheers,
John Matese


On Nov 1, 2010, at 4:33 PM, Naama Menda wrote:

is there anyone who's using the cvtermpath table?

We've refactored a while ago the make_cvtermpath code, but I'm not sure it's populating the cvtermpath the way it should .
Does anyone have examples or properly querying the cvtermpath to find all recursive children and recursive parents?

thanks!
-Naama




Naama Menda
Boyce Thompson Institute for Plant Research
Tower Rd
Ithaca NY 14853
USA

(607) 254 3569
Sol Genomics Network
http://solgenomics.net/
[hidden email]
------------------------------------------------------------------------------
Nokia and AT&T present the 2010 Calling All Innovators-North America contest
Create new apps & games for the Nokia N8 for consumers in  U.S. and Canada
$10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing
Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store
http://p.sf.net/sfu/nokia-dev2dev_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema





------------------------------------------------------------------------------
Nokia and AT&T present the 2010 Calling All Innovators-North America contest
Create new apps & games for the Nokia N8 for consumers in  U.S. and Canada
$10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing
Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store
http://p.sf.net/sfu/nokia-dev2dev_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema





------------------------------------------------------------------------------
Achieve Improved Network Security with IP and DNS Reputation.
Defend against bad network traffic, including botnets, malware,
phishing sites, and compromised hosts - saving your company time,
money, and embarrassment.   Learn More!
http://p.sf.net/sfu/hpdev2dev-nov
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: filling cvtermpath

Lukas A. Mueller
In reply to this post by John Matese
Hi John,

I am under the impression that the loading script correctly loads the different paths, irrespective of what the relationship type is, but Naama has a deeper understanding of the script (possibly it only works for a subset of type_ids, but in practice all we care about.)

There are different possible use cases for this table. Some people would like to use it for ontological reasoning, while others just want to expand terms to their subtrees for querying entities associated with the term and its entire subtree without recursing through the entire subtree. Since very few people seem to be even using the cvtermpath table, I am wondering, for this latter use case, what approach people have taken to write that query efficiently.

I also think that there should be a canonical way of populating the table. If there are a number of different ways, we'll create a lot of incompatibilities between the databases, which is somewhat counter to the idea of having a common database schema.

cheers
Lukas

On Nov 3, 2010, at 12:29 PM, John Matese wrote:

> Hi All,
>
> Just wanted to chime in:
>
> gmod_make_cvtermpath.pl logic may have been functional at one time,
> perhaps if only "is_a" relationship was considered.  I seem to
> remember the relationship term name actually being hard-coded, which
> eventually made the script nonfunctional.  The re-factored version may
> have fixed that particular issue, but the current behavior of
> selecting all types from the relationship ontology and handling them
> equivalently, without exploring/testing their associated properties
> within cvtermprop and their position within the relationship chain, is
> doomed to run afoul of the rules that Chris has laid out, below.   I
> am uncertain how easy it would be to code the "is_class_level,
> transitive_over, holds_over_chain, and union_of" functions that Chris
> has noted, but as he suggested, perhaps it shouldn't be attempted,
> PROVIDED a suitable alternative resource existed?
>
> So, how about we lobby the OBO Foundry to create and host the
> transitive closure files from popular ontologies (or all of them),
> perhaps as a link from the ontology "detail" page?
>
> Cheers,
> John
>
>
> On Nov 2, 2010, at 4:00 PM, Chris Mungall wrote:
>
>>
>> ** Specification:
>>
>> The GO documentation is fine so long as you are just loading GO, but
>> in fact there is no need to hardcode specific rules for individual
>> relations; everything required should be in the source obo file.
>>
>> For a correct specification of which relationships should be
>> inferred, you should look at the mapping of obo to owl, and the owl
>> formal specification. But this may be too much for most people.
>> Because most ontologies people will be using correspond to a simple
>> subset of OWL, a simple rule approximation can be used. See
>>
>> http://wiki.geneontology.org/index.php/Reasoning_with_OBO_Format#Graph_Closure
>>
>> Facts:
>>
>>      • is_transitive(is_a)
>>      • is_reflexive(is_a)
>> these facts are built-in and cannot be modified
>>
>> Rules:
>>
>>      • is_transitive(R), X R Y, Y R Z -> X R Z
>>      • is_reflexive(R) -> X R X
>>      • X is_a Y, Y R Z, not(is_class_level(R)) -> X R Z
>>      • X R Y, Y is_a Z, not(is_class_level(R)) -> X R Z
>>      • X R Y, Y S Z, transitive_over(R,S) -> X R Z
>>      • X R1 Y, Y R2 Z, holds_over_chain(R,R1,R2) -> X R Z
>>      • Y union_of (X, ...) -> X is_a Y
>>
>> Many ontologies will only need the first 4 rules.
>>
>> Note this doesn't specify the semantics of pathdistance - this is
>> essentially the number of rule applications required.
>>
>> ** Implementations:
>>
>> I wouldn't recommend re-implementing any of the above, there's a few
>> options available. In the long term, the correct implementation will
>> be to use an OWL reasoner to pre-compute all paths, and then load
>> this into the database. This is what GMOD should be converging upon.
>> However, at this time, not all OWL reasoners are scalable over all
>> ontologies of interest to GMOD, so the best thing to do would be to
>> use a graph-based approximation following the rules above. There's a
>> few choices here.
>>
>> At one time I did implement a plpgsql version, but this had problems
>> with transactions. Ken's implementation may be more efficient, or
>> plpgsql may be faster now. I don't see Ken's code so I can't vouch
>> for its correctness. Additionally, this is only good for pg users.
>>
>> go-moose / GOBO has an in-memory reasoner that can be used to dump
>> the pre-computed closure of all relations. It implements the rules
>> above
>>
>>      http://wiki.geneontology.org/index.php/GO_Moose
>>
>> Someone could write a GOBO/DBIC bridge but it may be simpler to dump
>> the closure and have a separate script to load it. This makes it
>> easier to integrate other reasoners, as I can't guarantee go-moose
>> will be supported indefinitely (I don't use perl for ontologies any
>> more).
>>
>> If you install OBOEdit it comes with a command line executable
>> called obo2linkfile, which dumps the relational closure. This could
>> be loaded into cvtermpath easily.
>>
>> This will be replaced by a java package called OWLGraphWrapper. This
>> provides a fast implementation of the above rules, and will also
>> allow the use of an owl reasoner when it's appropriate. This would
>> make the most sensible target for long term integration. We're
>> currently making a bridge between this and the go database hibernate
>> layer. Here's a link to the source, but the package names may change:
>>
>>      http://geneontology.svn.sourceforge.net/viewvc/geneontology/OWLTools/src/owltools/graph/OWLGraphWrapper.java?view=log
>>
>> Apologies for the heterogeneity of implementations, the blame for
>> this lies with me.
>>
>> I think the most pragmatic solution would be to write a cvtermpath
>> loader that takes a 3 or 4 column file (distance optional) and loads
>> the table. The file could be populated by a method above such as
>> obo2linkfile, and then switched to the new code when it's more mature.
>>
>>
>> On Nov 2, 2010, at 8:16 AM, Naama Menda wrote:
>>
>>> thanks for sharing your code!
>>> This is a bit different from how we load the cvtermpath
>>> ( http://gmod.svn.sourceforge.net/viewvc/gmod/schema/trunk/chado/bin/cxgn/gmod_make_cvtermpath.pl?view=log
>>> )
>>>
>>> this loader is based on the old script http://gmod.svn.sourceforge.net/viewvc/gmod/schema/trunk/chado/bin/make_cvtermpath.pl?view=log
>>> (I think Allen Day wrote)
>>>
>>> I'm wondering if there are other databases using cvtermpath
>>> (Flybase? )  who may comment more about this.
>>>
>>> -Naama
>>>
>>>
>>>
>>> On Tue, Nov 2, 2010 at 10:48 AM, Fan Kang <[hidden email]
>>>> wrote:
>>> Hi Naama,
>>>
>>> As John mentioned earlier, I wrote a plpgsql function for loading
>>> child/parent relationships into cvtermpath table in our local
>>> database using Flybase. See my code attached.  Please note that my
>>> code works for populating GO terms only.  I used the following GO
>>> documentation to build the inferred relationships, and populated
>>> positive pathdistances.  My note/comments for the logic are
>>> embedded in the code.
>>>
>>> Like what you experienced, I couldn't figure out from the Chado
>>> documentation/code how the path should be exactly populated,
>>> either. Since we only have GO terms in cvterm table, and the logic
>>> described in GO documentation makes sense to me,  I wrote the code
>>> for our flybase-like database.  If you figure out a generic
>>> approach for populating inferred relationships/path, could you
>>> share your knowledge/code with us?
>>>
>>> FYI.
>>> GO documentation:
>>> http://www.geneontology.org/GO.ontology.relations.shtml
>>>
>>> --loading GO term relationships into cvtermpath (after the function
>>> "fill_cvtermpath" is created):
>>> select * from fill_cvtermpath ('cellular_component');
>>> select * from fill_cvtermpath ('molecular_function');
>>> select * from fill_cvtermpath ('biological_process');
>>>
>>> Regards,
>>> Fan
>>> http://genomics.princeton.edu/~fkang/
>>>
>>>
>>>
>>> On Nov 1, 2010, at 9:46 PM, Naama Menda wrote:
>>>
>>>> hi John,
>>>>
>>>> I managed to get recursive children and recursive parents , but
>>>> it's difficult to test if it works properly 100%.
>>>> It seems like using the positive pathdistance you can get all
>>>> child nodes or parent nodes (depending if your term is the subject
>>>> or the object respectively) , and the negative distances can give
>>>> you only the direct children (subjects) and direct parents (objects)
>>>>
>>>> However, I could not figure out from the chado page about cvterm
>>>> transitive closure how exactly the path should be populated.
>>>>
>>>> thanks!
>>>> -Naama
>>>>
>>>>
>>>> On Mon, Nov 1, 2010 at 5:04 PM, John Matese <[hidden email]
>>>>> wrote:
>>>> Dear Naama,
>>>>
>>>> Just replying off-list as a preliminary message, as Fan Kang
>>>> (Cc'd) might reply to the list-proper (or perhaps you can re-
>>>> direct our responses there, once collated.  As you well know, the
>>>> cvtermpath code has always been problematic.  I've looked at it
>>>> several times myself (as the versions changed), and I was never
>>>> convinced it ever functioned as it was supposed to (please don't
>>>> take offense).  No GMOD versions ever properly took into account
>>>> whether the relationships were truly reflexive, transitive, etc.
>>>> In a local project that utilizes a Flybase mirror (so not GMOD/
>>>> chado, exactly), Fan invested a good amount of time writing up a
>>>> postgres procedure that populates cvtermpath, independent from the
>>>> perl script.  We believe it functions as intended, populating the
>>>> reflexive/transitive closure, although we may have not populated
>>>> both positive and negative paths (subject<=>object).  That is to
>>>> say, the data make sense to us, moreso than the GMOD perl logic.
>>>> We can send you the procedural logic, if you wish.  I think that
>>>> make_cvtermpath script may need to be re-written from scratch,
>>>> instead of "re-factored", as I think the original logic was broken
>>>> and I seem to remember that it contained code that seemed to be
>>>> useless cruft anyway (data structures constructed and then not
>>>> used, from the very beginning versions).  I don't trust it, and
>>>> you are justified to be skeptical about its functionality...  ;)
>>>>
>>>> Cheers,
>>>> John Matese
>>>>
>>>>
>>>> On Nov 1, 2010, at 4:33 PM, Naama Menda wrote:
>>>>
>>>>> is there anyone who's using the cvtermpath table?
>>>>>
>>>>> We've refactored a while ago the make_cvtermpath code, but I'm
>>>>> not sure it's populating the cvtermpath the way it should .
>>>>> Does anyone have examples or properly querying the cvtermpath to
>>>>> find all recursive children and recursive parents?
>>>>>
>>>>> thanks!
>>>>> -Naama
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Naama Menda
>>>>> Boyce Thompson Institute for Plant Research
>>>>> Tower Rd
>>>>> Ithaca NY 14853
>>>>> USA
>>>>>
>>>>> (607) 254 3569
>>>>> Sol Genomics Network
>>>>> http://solgenomics.net/
>>>>> [hidden email]
>>>>> ------------------------------------------------------------------------------
>>>>> Nokia and AT&T present the 2010 Calling All Innovators-North
>>>>> America contest
>>>>> Create new apps & games for the Nokia N8 for consumers in  U.S.
>>>>> and Canada
>>>>> $10 million total in prizes - $4M cash, 500 devices, nearly $6M
>>>>> in marketing
>>>>> Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to
>>>>> Ovi Store
>>>>> http://p.sf.net/sfu/nokia-dev2dev_______________________________________________
>>>>> Gmod-schema mailing list
>>>>> [hidden email]
>>>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>>>
>>>>
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Nokia and AT&T present the 2010 Calling All Innovators-North
>>> America contest
>>> Create new apps & games for the Nokia N8 for consumers in  U.S. and
>>> Canada
>>> $10 million total in prizes - $4M cash, 500 devices, nearly $6M in
>>> marketing
>>> Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi
>>> Store
>>> http://p.sf.net/sfu/nokia-dev2dev_______________________________________________
>>> Gmod-schema mailing list
>>> [hidden email]
>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>
>>
>
>
> ------------------------------------------------------------------------------
> Achieve Improved Network Security with IP and DNS Reputation.
> Defend against bad network traffic, including botnets, malware,
> phishing sites, and compromised hosts - saving your company time,
> money, and embarrassment.   Learn More!
> http://p.sf.net/sfu/hpdev2dev-nov
> _______________________________________________
> Gmod-schema mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gmod-schema


------------------------------------------------------------------------------
Achieve Improved Network Security with IP and DNS Reputation.
Defend against bad network traffic, including botnets, malware,
phishing sites, and compromised hosts - saving your company time,
money, and embarrassment.   Learn More!
http://p.sf.net/sfu/hpdev2dev-nov
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: filling cvtermpath

John Matese
Hi Lukas and Naama,

Yes, I think the gmod_make_cvtermpath.pl script does indeed load "the  
different paths, irrespective of what the relationship type is".  The  
question is, should it?  Check your cvtermpath table for relationships  
like negatively_regulates or positively_regulates and see how their  
paths are being represented.  gmod_make_cvtermpath.pl logic may  
account for rules 2-4 that Chris laid out, but almost certainly not  
the last 3 rules as it does no checks of a relationship's properties.  
As Chris said, it may work for some ontologies, but might not for  
others depending on the definition of the relationship term.  I am no  
expert on this matter, though.

-John


On Nov 3, 2010, at 1:36 PM, Lukas Mueller wrote:

> Hi John,
>
> I am under the impression that the loading script correctly loads  
> the different paths, irrespective of what the relationship type is,  
> but Naama has a deeper understanding of the script (possibly it only  
> works for a subset of type_ids, but in practice all we care about.)
>
> There are different possible use cases for this table. Some people  
> would like to use it for ontological reasoning, while others just  
> want to expand terms to their subtrees for querying entities  
> associated with the term and its entire subtree without recursing  
> through the entire subtree. Since very few people seem to be even  
> using the cvtermpath table, I am wondering, for this latter use  
> case, what approach people have taken to write that query efficiently.
>
> I also think that there should be a canonical way of populating the  
> table. If there are a number of different ways, we'll create a lot  
> of incompatibilities between the databases, which is somewhat  
> counter to the idea of having a common database schema.
>
> cheers
> Lukas
>
> On Nov 3, 2010, at 12:29 PM, John Matese wrote:
>
>> Hi All,
>>
>> Just wanted to chime in:
>>
>> gmod_make_cvtermpath.pl logic may have been functional at one time,
>> perhaps if only "is_a" relationship was considered.  I seem to
>> remember the relationship term name actually being hard-coded, which
>> eventually made the script nonfunctional.  The re-factored version  
>> may
>> have fixed that particular issue, but the current behavior of
>> selecting all types from the relationship ontology and handling them
>> equivalently, without exploring/testing their associated properties
>> within cvtermprop and their position within the relationship chain,  
>> is
>> doomed to run afoul of the rules that Chris has laid out, below.   I
>> am uncertain how easy it would be to code the "is_class_level,
>> transitive_over, holds_over_chain, and union_of" functions that Chris
>> has noted, but as he suggested, perhaps it shouldn't be attempted,
>> PROVIDED a suitable alternative resource existed?
>>
>> So, how about we lobby the OBO Foundry to create and host the
>> transitive closure files from popular ontologies (or all of them),
>> perhaps as a link from the ontology "detail" page?
>>
>> Cheers,
>> John
>>
>>
>> On Nov 2, 2010, at 4:00 PM, Chris Mungall wrote:
>>
>>>
>>> ** Specification:
>>>
>>> The GO documentation is fine so long as you are just loading GO, but
>>> in fact there is no need to hardcode specific rules for individual
>>> relations; everything required should be in the source obo file.
>>>
>>> For a correct specification of which relationships should be
>>> inferred, you should look at the mapping of obo to owl, and the owl
>>> formal specification. But this may be too much for most people.
>>> Because most ontologies people will be using correspond to a simple
>>> subset of OWL, a simple rule approximation can be used. See
>>>
>>> http://wiki.geneontology.org/index.php/Reasoning_with_OBO_Format#Graph_Closure
>>>
>>> Facts:
>>>
>>>     • is_transitive(is_a)
>>>     • is_reflexive(is_a)
>>> these facts are built-in and cannot be modified
>>>
>>> Rules:
>>>
>>>     • is_transitive(R), X R Y, Y R Z -> X R Z
>>>     • is_reflexive(R) -> X R X
>>>     • X is_a Y, Y R Z, not(is_class_level(R)) -> X R Z
>>>     • X R Y, Y is_a Z, not(is_class_level(R)) -> X R Z
>>>     • X R Y, Y S Z, transitive_over(R,S) -> X R Z
>>>     • X R1 Y, Y R2 Z, holds_over_chain(R,R1,R2) -> X R Z
>>>     • Y union_of (X, ...) -> X is_a Y
>>>
>>> Many ontologies will only need the first 4 rules.
>>>
>>> Note this doesn't specify the semantics of pathdistance - this is
>>> essentially the number of rule applications required.
>>>
>>> ** Implementations:
>>>
>>> I wouldn't recommend re-implementing any of the above, there's a few
>>> options available. In the long term, the correct implementation will
>>> be to use an OWL reasoner to pre-compute all paths, and then load
>>> this into the database. This is what GMOD should be converging upon.
>>> However, at this time, not all OWL reasoners are scalable over all
>>> ontologies of interest to GMOD, so the best thing to do would be to
>>> use a graph-based approximation following the rules above. There's a
>>> few choices here.
>>>
>>> At one time I did implement a plpgsql version, but this had problems
>>> with transactions. Ken's implementation may be more efficient, or
>>> plpgsql may be faster now. I don't see Ken's code so I can't vouch
>>> for its correctness. Additionally, this is only good for pg users.
>>>
>>> go-moose / GOBO has an in-memory reasoner that can be used to dump
>>> the pre-computed closure of all relations. It implements the rules
>>> above
>>>
>>>     http://wiki.geneontology.org/index.php/GO_Moose
>>>
>>> Someone could write a GOBO/DBIC bridge but it may be simpler to dump
>>> the closure and have a separate script to load it. This makes it
>>> easier to integrate other reasoners, as I can't guarantee go-moose
>>> will be supported indefinitely (I don't use perl for ontologies any
>>> more).
>>>
>>> If you install OBOEdit it comes with a command line executable
>>> called obo2linkfile, which dumps the relational closure. This could
>>> be loaded into cvtermpath easily.
>>>
>>> This will be replaced by a java package called OWLGraphWrapper. This
>>> provides a fast implementation of the above rules, and will also
>>> allow the use of an owl reasoner when it's appropriate. This would
>>> make the most sensible target for long term integration. We're
>>> currently making a bridge between this and the go database hibernate
>>> layer. Here's a link to the source, but the package names may  
>>> change:
>>>
>>>     http://geneontology.svn.sourceforge.net/viewvc/geneontology/OWLTools/src/owltools/graph/OWLGraphWrapper.java?view=log
>>>
>>> Apologies for the heterogeneity of implementations, the blame for
>>> this lies with me.
>>>
>>> I think the most pragmatic solution would be to write a cvtermpath
>>> loader that takes a 3 or 4 column file (distance optional) and loads
>>> the table. The file could be populated by a method above such as
>>> obo2linkfile, and then switched to the new code when it's more  
>>> mature.
>>>
>>>
>>> On Nov 2, 2010, at 8:16 AM, Naama Menda wrote:
>>>
>>>> thanks for sharing your code!
>>>> This is a bit different from how we load the cvtermpath
>>>> ( http://gmod.svn.sourceforge.net/viewvc/gmod/schema/trunk/chado/bin/cxgn/gmod_make_cvtermpath.pl?view=log
>>>> )
>>>>
>>>> this loader is based on the old script http://gmod.svn.sourceforge.net/viewvc/gmod/schema/trunk/chado/bin/make_cvtermpath.pl?view=log
>>>> (I think Allen Day wrote)
>>>>
>>>> I'm wondering if there are other databases using cvtermpath
>>>> (Flybase? )  who may comment more about this.
>>>>
>>>> -Naama
>>>>
>>>>
>>>>
>>>> On Tue, Nov 2, 2010 at 10:48 AM, Fan Kang <[hidden email]
>>>>> wrote:
>>>> Hi Naama,
>>>>
>>>> As John mentioned earlier, I wrote a plpgsql function for loading
>>>> child/parent relationships into cvtermpath table in our local
>>>> database using Flybase. See my code attached.  Please note that my
>>>> code works for populating GO terms only.  I used the following GO
>>>> documentation to build the inferred relationships, and populated
>>>> positive pathdistances.  My note/comments for the logic are
>>>> embedded in the code.
>>>>
>>>> Like what you experienced, I couldn't figure out from the Chado
>>>> documentation/code how the path should be exactly populated,
>>>> either. Since we only have GO terms in cvterm table, and the logic
>>>> described in GO documentation makes sense to me,  I wrote the code
>>>> for our flybase-like database.  If you figure out a generic
>>>> approach for populating inferred relationships/path, could you
>>>> share your knowledge/code with us?
>>>>
>>>> FYI.
>>>> GO documentation:
>>>> http://www.geneontology.org/GO.ontology.relations.shtml
>>>>
>>>> --loading GO term relationships into cvtermpath (after the function
>>>> "fill_cvtermpath" is created):
>>>> select * from fill_cvtermpath ('cellular_component');
>>>> select * from fill_cvtermpath ('molecular_function');
>>>> select * from fill_cvtermpath ('biological_process');
>>>>
>>>> Regards,
>>>> Fan
>>>> http://genomics.princeton.edu/~fkang/
>>>>
>>>>
>>>>
>>>> On Nov 1, 2010, at 9:46 PM, Naama Menda wrote:
>>>>
>>>>> hi John,
>>>>>
>>>>> I managed to get recursive children and recursive parents , but
>>>>> it's difficult to test if it works properly 100%.
>>>>> It seems like using the positive pathdistance you can get all
>>>>> child nodes or parent nodes (depending if your term is the subject
>>>>> or the object respectively) , and the negative distances can give
>>>>> you only the direct children (subjects) and direct parents  
>>>>> (objects)
>>>>>
>>>>> However, I could not figure out from the chado page about cvterm
>>>>> transitive closure how exactly the path should be populated.
>>>>>
>>>>> thanks!
>>>>> -Naama
>>>>>
>>>>>
>>>>> On Mon, Nov 1, 2010 at 5:04 PM, John Matese <[hidden email]
>>>>>> wrote:
>>>>> Dear Naama,
>>>>>
>>>>> Just replying off-list as a preliminary message, as Fan Kang
>>>>> (Cc'd) might reply to the list-proper (or perhaps you can re-
>>>>> direct our responses there, once collated.  As you well know, the
>>>>> cvtermpath code has always been problematic.  I've looked at it
>>>>> several times myself (as the versions changed), and I was never
>>>>> convinced it ever functioned as it was supposed to (please don't
>>>>> take offense).  No GMOD versions ever properly took into account
>>>>> whether the relationships were truly reflexive, transitive, etc.
>>>>> In a local project that utilizes a Flybase mirror (so not GMOD/
>>>>> chado, exactly), Fan invested a good amount of time writing up a
>>>>> postgres procedure that populates cvtermpath, independent from the
>>>>> perl script.  We believe it functions as intended, populating the
>>>>> reflexive/transitive closure, although we may have not populated
>>>>> both positive and negative paths (subject<=>object).  That is to
>>>>> say, the data make sense to us, moreso than the GMOD perl logic.
>>>>> We can send you the procedural logic, if you wish.  I think that
>>>>> make_cvtermpath script may need to be re-written from scratch,
>>>>> instead of "re-factored", as I think the original logic was broken
>>>>> and I seem to remember that it contained code that seemed to be
>>>>> useless cruft anyway (data structures constructed and then not
>>>>> used, from the very beginning versions).  I don't trust it, and
>>>>> you are justified to be skeptical about its functionality...  ;)
>>>>>
>>>>> Cheers,
>>>>> John Matese
>>>>>
>>>>>
>>>>> On Nov 1, 2010, at 4:33 PM, Naama Menda wrote:
>>>>>
>>>>>> is there anyone who's using the cvtermpath table?
>>>>>>
>>>>>> We've refactored a while ago the make_cvtermpath code, but I'm
>>>>>> not sure it's populating the cvtermpath the way it should .
>>>>>> Does anyone have examples or properly querying the cvtermpath to
>>>>>> find all recursive children and recursive parents?
>>>>>>
>>>>>> thanks!
>>>>>> -Naama
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Naama Menda
>>>>>> Boyce Thompson Institute for Plant Research
>>>>>> Tower Rd
>>>>>> Ithaca NY 14853
>>>>>> USA
>>>>>>
>>>>>> (607) 254 3569
>>>>>> Sol Genomics Network
>>>>>> http://solgenomics.net/
>>>>>> [hidden email]
>>>>>> ------------------------------------------------------------------------------
>>>>>> Nokia and AT&T present the 2010 Calling All Innovators-North
>>>>>> America contest
>>>>>> Create new apps & games for the Nokia N8 for consumers in  U.S.
>>>>>> and Canada
>>>>>> $10 million total in prizes - $4M cash, 500 devices, nearly $6M
>>>>>> in marketing
>>>>>> Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to
>>>>>> Ovi Store
>>>>>> http://p.sf.net/sfu/nokia-dev2dev_______________________________________________
>>>>>> Gmod-schema mailing list
>>>>>> [hidden email]
>>>>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Nokia and AT&T present the 2010 Calling All Innovators-North
>>>> America contest
>>>> Create new apps & games for the Nokia N8 for consumers in  U.S. and
>>>> Canada
>>>> $10 million total in prizes - $4M cash, 500 devices, nearly $6M in
>>>> marketing
>>>> Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi
>>>> Store
>>>> http://p.sf.net/sfu/nokia-dev2dev_______________________________________________
>>>> Gmod-schema mailing list
>>>> [hidden email]
>>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>>
>>>
>>
>>
>> ------------------------------------------------------------------------------
>> Achieve Improved Network Security with IP and DNS Reputation.
>> Defend against bad network traffic, including botnets, malware,
>> phishing sites, and compromised hosts - saving your company time,
>> money, and embarrassment.   Learn More!
>> http://p.sf.net/sfu/hpdev2dev-nov
>> _______________________________________________
>> Gmod-schema mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>
>
> ------------------------------------------------------------------------------
> Achieve Improved Network Security with IP and DNS Reputation.
> Defend against bad network traffic, including botnets, malware,
> phishing sites, and compromised hosts - saving your company time,
> money, and embarrassment.   Learn More!
> http://p.sf.net/sfu/hpdev2dev-nov
> _______________________________________________
> Gmod-schema mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>


------------------------------------------------------------------------------
Achieve Improved Network Security with IP and DNS Reputation.
Defend against bad network traffic, including botnets, malware,
phishing sites, and compromised hosts - saving your company time,
money, and embarrassment.   Learn More!
http://p.sf.net/sfu/hpdev2dev-nov
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: filling cvtermpath

Siddhartha Basu
Hi,
I think loading from a transitive closure dump and particularly john's
idead of making them available from OBO foundry seems to an excellent
idea.

-siddhartha


On Wed, 03 Nov 2010, John Matese wrote:

> Hi Lukas and Naama,
>
> Yes, I think the gmod_make_cvtermpath.pl script does indeed load "the  
> different paths, irrespective of what the relationship type is".  The  
> question is, should it?  Check your cvtermpath table for relationships  
> like negatively_regulates or positively_regulates and see how their  
> paths are being represented.  gmod_make_cvtermpath.pl logic may  
> account for rules 2-4 that Chris laid out, but almost certainly not  
> the last 3 rules as it does no checks of a relationship's properties.  
> As Chris said, it may work for some ontologies, but might not for  
> others depending on the definition of the relationship term.  I am no  
> expert on this matter, though.
>
> -John
>
>
> On Nov 3, 2010, at 1:36 PM, Lukas Mueller wrote:
>
> > Hi John,
> >
> > I am under the impression that the loading script correctly loads  
> > the different paths, irrespective of what the relationship type is,  
> > but Naama has a deeper understanding of the script (possibly it only  
> > works for a subset of type_ids, but in practice all we care about.)
> >
> > There are different possible use cases for this table. Some people  
> > would like to use it for ontological reasoning, while others just  
> > want to expand terms to their subtrees for querying entities  
> > associated with the term and its entire subtree without recursing  
> > through the entire subtree. Since very few people seem to be even  
> > using the cvtermpath table, I am wondering, for this latter use  
> > case, what approach people have taken to write that query efficiently.
> >
> > I also think that there should be a canonical way of populating the  
> > table. If there are a number of different ways, we'll create a lot  
> > of incompatibilities between the databases, which is somewhat  
> > counter to the idea of having a common database schema.
> >
> > cheers
> > Lukas
> >
> > On Nov 3, 2010, at 12:29 PM, John Matese wrote:
> >
> >> Hi All,
> >>
> >> Just wanted to chime in:
> >>
> >> gmod_make_cvtermpath.pl logic may have been functional at one time,
> >> perhaps if only "is_a" relationship was considered.  I seem to
> >> remember the relationship term name actually being hard-coded, which
> >> eventually made the script nonfunctional.  The re-factored version  
> >> may
> >> have fixed that particular issue, but the current behavior of
> >> selecting all types from the relationship ontology and handling them
> >> equivalently, without exploring/testing their associated properties
> >> within cvtermprop and their position within the relationship chain,  
> >> is
> >> doomed to run afoul of the rules that Chris has laid out, below.   I
> >> am uncertain how easy it would be to code the "is_class_level,
> >> transitive_over, holds_over_chain, and union_of" functions that Chris
> >> has noted, but as he suggested, perhaps it shouldn't be attempted,
> >> PROVIDED a suitable alternative resource existed?
> >>
> >> So, how about we lobby the OBO Foundry to create and host the
> >> transitive closure files from popular ontologies (or all of them),
> >> perhaps as a link from the ontology "detail" page?
> >>
> >> Cheers,
> >> John
> >>
> >>
> >> On Nov 2, 2010, at 4:00 PM, Chris Mungall wrote:
> >>
> >>>
> >>> ** Specification:
> >>>
> >>> The GO documentation is fine so long as you are just loading GO, but
> >>> in fact there is no need to hardcode specific rules for individual
> >>> relations; everything required should be in the source obo file.
> >>>
> >>> For a correct specification of which relationships should be
> >>> inferred, you should look at the mapping of obo to owl, and the owl
> >>> formal specification. But this may be too much for most people.
> >>> Because most ontologies people will be using correspond to a simple
> >>> subset of OWL, a simple rule approximation can be used. See
> >>>
> >>> http://wiki.geneontology.org/index.php/Reasoning_with_OBO_Format#Graph_Closure
> >>>
> >>> Facts:
> >>>
> >>>     • is_transitive(is_a)
> >>>     • is_reflexive(is_a)
> >>> these facts are built-in and cannot be modified
> >>>
> >>> Rules:
> >>>
> >>>     • is_transitive(R), X R Y, Y R Z -> X R Z
> >>>     • is_reflexive(R) -> X R X
> >>>     • X is_a Y, Y R Z, not(is_class_level(R)) -> X R Z
> >>>     • X R Y, Y is_a Z, not(is_class_level(R)) -> X R Z
> >>>     • X R Y, Y S Z, transitive_over(R,S) -> X R Z
> >>>     • X R1 Y, Y R2 Z, holds_over_chain(R,R1,R2) -> X R Z
> >>>     • Y union_of (X, ...) -> X is_a Y
> >>>
> >>> Many ontologies will only need the first 4 rules.
> >>>
> >>> Note this doesn't specify the semantics of pathdistance - this is
> >>> essentially the number of rule applications required.
> >>>
> >>> ** Implementations:
> >>>
> >>> I wouldn't recommend re-implementing any of the above, there's a few
> >>> options available. In the long term, the correct implementation will
> >>> be to use an OWL reasoner to pre-compute all paths, and then load
> >>> this into the database. This is what GMOD should be converging upon.
> >>> However, at this time, not all OWL reasoners are scalable over all
> >>> ontologies of interest to GMOD, so the best thing to do would be to
> >>> use a graph-based approximation following the rules above. There's a
> >>> few choices here.
> >>>
> >>> At one time I did implement a plpgsql version, but this had problems
> >>> with transactions. Ken's implementation may be more efficient, or
> >>> plpgsql may be faster now. I don't see Ken's code so I can't vouch
> >>> for its correctness. Additionally, this is only good for pg users.
> >>>
> >>> go-moose / GOBO has an in-memory reasoner that can be used to dump
> >>> the pre-computed closure of all relations. It implements the rules
> >>> above
> >>>
> >>>     http://wiki.geneontology.org/index.php/GO_Moose
> >>>
> >>> Someone could write a GOBO/DBIC bridge but it may be simpler to dump
> >>> the closure and have a separate script to load it. This makes it
> >>> easier to integrate other reasoners, as I can't guarantee go-moose
> >>> will be supported indefinitely (I don't use perl for ontologies any
> >>> more).
> >>>
> >>> If you install OBOEdit it comes with a command line executable
> >>> called obo2linkfile, which dumps the relational closure. This could
> >>> be loaded into cvtermpath easily.
> >>>
> >>> This will be replaced by a java package called OWLGraphWrapper. This
> >>> provides a fast implementation of the above rules, and will also
> >>> allow the use of an owl reasoner when it's appropriate. This would
> >>> make the most sensible target for long term integration. We're
> >>> currently making a bridge between this and the go database hibernate
> >>> layer. Here's a link to the source, but the package names may  
> >>> change:
> >>>
> >>>     http://geneontology.svn.sourceforge.net/viewvc/geneontology/OWLTools/src/owltools/graph/OWLGraphWrapper.java?view=log
> >>>
> >>> Apologies for the heterogeneity of implementations, the blame for
> >>> this lies with me.
> >>>
> >>> I think the most pragmatic solution would be to write a cvtermpath
> >>> loader that takes a 3 or 4 column file (distance optional) and loads
> >>> the table. The file could be populated by a method above such as
> >>> obo2linkfile, and then switched to the new code when it's more  
> >>> mature.
> >>>
> >>>
> >>> On Nov 2, 2010, at 8:16 AM, Naama Menda wrote:
> >>>
> >>>> thanks for sharing your code!
> >>>> This is a bit different from how we load the cvtermpath
> >>>> ( http://gmod.svn.sourceforge.net/viewvc/gmod/schema/trunk/chado/bin/cxgn/gmod_make_cvtermpath.pl?view=log
> >>>> )
> >>>>
> >>>> this loader is based on the old script http://gmod.svn.sourceforge.net/viewvc/gmod/schema/trunk/chado/bin/make_cvtermpath.pl?view=log
> >>>> (I think Allen Day wrote)
> >>>>
> >>>> I'm wondering if there are other databases using cvtermpath
> >>>> (Flybase? )  who may comment more about this.
> >>>>
> >>>> -Naama
> >>>>
> >>>>
> >>>>
> >>>> On Tue, Nov 2, 2010 at 10:48 AM, Fan Kang <[hidden email]
> >>>>> wrote:
> >>>> Hi Naama,
> >>>>
> >>>> As John mentioned earlier, I wrote a plpgsql function for loading
> >>>> child/parent relationships into cvtermpath table in our local
> >>>> database using Flybase. See my code attached.  Please note that my
> >>>> code works for populating GO terms only.  I used the following GO
> >>>> documentation to build the inferred relationships, and populated
> >>>> positive pathdistances.  My note/comments for the logic are
> >>>> embedded in the code.
> >>>>
> >>>> Like what you experienced, I couldn't figure out from the Chado
> >>>> documentation/code how the path should be exactly populated,
> >>>> either. Since we only have GO terms in cvterm table, and the logic
> >>>> described in GO documentation makes sense to me,  I wrote the code
> >>>> for our flybase-like database.  If you figure out a generic
> >>>> approach for populating inferred relationships/path, could you
> >>>> share your knowledge/code with us?
> >>>>
> >>>> FYI.
> >>>> GO documentation:
> >>>> http://www.geneontology.org/GO.ontology.relations.shtml
> >>>>
> >>>> --loading GO term relationships into cvtermpath (after the function
> >>>> "fill_cvtermpath" is created):
> >>>> select * from fill_cvtermpath ('cellular_component');
> >>>> select * from fill_cvtermpath ('molecular_function');
> >>>> select * from fill_cvtermpath ('biological_process');
> >>>>
> >>>> Regards,
> >>>> Fan
> >>>> http://genomics.princeton.edu/~fkang/
> >>>>
> >>>>
> >>>>
> >>>> On Nov 1, 2010, at 9:46 PM, Naama Menda wrote:
> >>>>
> >>>>> hi John,
> >>>>>
> >>>>> I managed to get recursive children and recursive parents , but
> >>>>> it's difficult to test if it works properly 100%.
> >>>>> It seems like using the positive pathdistance you can get all
> >>>>> child nodes or parent nodes (depending if your term is the subject
> >>>>> or the object respectively) , and the negative distances can give
> >>>>> you only the direct children (subjects) and direct parents  
> >>>>> (objects)
> >>>>>
> >>>>> However, I could not figure out from the chado page about cvterm
> >>>>> transitive closure how exactly the path should be populated.
> >>>>>
> >>>>> thanks!
> >>>>> -Naama
> >>>>>
> >>>>>
> >>>>> On Mon, Nov 1, 2010 at 5:04 PM, John Matese <[hidden email]
> >>>>>> wrote:
> >>>>> Dear Naama,
> >>>>>
> >>>>> Just replying off-list as a preliminary message, as Fan Kang
> >>>>> (Cc'd) might reply to the list-proper (or perhaps you can re-
> >>>>> direct our responses there, once collated.  As you well know, the
> >>>>> cvtermpath code has always been problematic.  I've looked at it
> >>>>> several times myself (as the versions changed), and I was never
> >>>>> convinced it ever functioned as it was supposed to (please don't
> >>>>> take offense).  No GMOD versions ever properly took into account
> >>>>> whether the relationships were truly reflexive, transitive, etc.
> >>>>> In a local project that utilizes a Flybase mirror (so not GMOD/
> >>>>> chado, exactly), Fan invested a good amount of time writing up a
> >>>>> postgres procedure that populates cvtermpath, independent from the
> >>>>> perl script.  We believe it functions as intended, populating the
> >>>>> reflexive/transitive closure, although we may have not populated
> >>>>> both positive and negative paths (subject<=>object).  That is to
> >>>>> say, the data make sense to us, moreso than the GMOD perl logic.
> >>>>> We can send you the procedural logic, if you wish.  I think that
> >>>>> make_cvtermpath script may need to be re-written from scratch,
> >>>>> instead of "re-factored", as I think the original logic was broken
> >>>>> and I seem to remember that it contained code that seemed to be
> >>>>> useless cruft anyway (data structures constructed and then not
> >>>>> used, from the very beginning versions).  I don't trust it, and
> >>>>> you are justified to be skeptical about its functionality...  ;)
> >>>>>
> >>>>> Cheers,
> >>>>> John Matese
> >>>>>
> >>>>>
> >>>>> On Nov 1, 2010, at 4:33 PM, Naama Menda wrote:
> >>>>>
> >>>>>> is there anyone who's using the cvtermpath table?
> >>>>>>
> >>>>>> We've refactored a while ago the make_cvtermpath code, but I'm
> >>>>>> not sure it's populating the cvtermpath the way it should .
> >>>>>> Does anyone have examples or properly querying the cvtermpath to
> >>>>>> find all recursive children and recursive parents?
> >>>>>>
> >>>>>> thanks!
> >>>>>> -Naama
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> Naama Menda
> >>>>>> Boyce Thompson Institute for Plant Research
> >>>>>> Tower Rd
> >>>>>> Ithaca NY 14853
> >>>>>> USA
> >>>>>>
> >>>>>> (607) 254 3569
> >>>>>> Sol Genomics Network
> >>>>>> http://solgenomics.net/
> >>>>>> [hidden email]
> >>>>>> ------------------------------------------------------------------------------
> >>>>>> Nokia and AT&T present the 2010 Calling All Innovators-North
> >>>>>> America contest
> >>>>>> Create new apps & games for the Nokia N8 for consumers in  U.S.
> >>>>>> and Canada
> >>>>>> $10 million total in prizes - $4M cash, 500 devices, nearly $6M
> >>>>>> in marketing
> >>>>>> Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to
> >>>>>> Ovi Store
> >>>>>> http://p.sf.net/sfu/nokia-dev2dev_______________________________________________
> >>>>>> Gmod-schema mailing list
> >>>>>> [hidden email]
> >>>>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>> ------------------------------------------------------------------------------
> >>>> Nokia and AT&T present the 2010 Calling All Innovators-North
> >>>> America contest
> >>>> Create new apps & games for the Nokia N8 for consumers in  U.S. and
> >>>> Canada
> >>>> $10 million total in prizes - $4M cash, 500 devices, nearly $6M in
> >>>> marketing
> >>>> Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi
> >>>> Store
> >>>> http://p.sf.net/sfu/nokia-dev2dev_______________________________________________
> >>>> Gmod-schema mailing list
> >>>> [hidden email]
> >>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
> >>>
> >>>
> >>
> >>
> >> ------------------------------------------------------------------------------
> >> Achieve Improved Network Security with IP and DNS Reputation.
> >> Defend against bad network traffic, including botnets, malware,
> >> phishing sites, and compromised hosts - saving your company time,
> >> money, and embarrassment.   Learn More!
> >> http://p.sf.net/sfu/hpdev2dev-nov
> >> _______________________________________________
> >> Gmod-schema mailing list
> >> [hidden email]
> >> https://lists.sourceforge.net/lists/listinfo/gmod-schema
> >
> >
> > ------------------------------------------------------------------------------
> > Achieve Improved Network Security with IP and DNS Reputation.
> > Defend against bad network traffic, including botnets, malware,
> > phishing sites, and compromised hosts - saving your company time,
> > money, and embarrassment.   Learn More!
> > http://p.sf.net/sfu/hpdev2dev-nov
> > _______________________________________________
> > Gmod-schema mailing list
> > [hidden email]
> > https://lists.sourceforge.net/lists/listinfo/gmod-schema
> >
>
>
> ------------------------------------------------------------------------------
> Achieve Improved Network Security with IP and DNS Reputation.
> Defend against bad network traffic, including botnets, malware,
> phishing sites, and compromised hosts - saving your company time,
> money, and embarrassment.   Learn More!
> http://p.sf.net/sfu/hpdev2dev-nov
> _______________________________________________
> Gmod-schema mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gmod-schema

------------------------------------------------------------------------------
Achieve Improved Network Security with IP and DNS Reputation.
Defend against bad network traffic, including botnets, malware,
phishing sites, and compromised hosts - saving your company time,
money, and embarrassment.   Learn More!
http://p.sf.net/sfu/hpdev2dev-nov
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema