cv_root view very slow for chado

classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

cv_root view very slow for chado

Daniel E. Cook
Hello,

I'm having a bit of trouble with the view function 'cv_root' when I load in the gene ontology. I have loaded the human disease ontology, human phenotype ontolgoy, and a few other ontologies.

Everything is fine until I load the gene ontology (which, incidentally, takes a *very* long time. Does anyone know how to speed this process up?). Once the gene ontology is loaded, a few views go extremely slow. I know the gene ontology dataset is large, but a count of the cv_root view can take a couple of minutes. The result I get, by the way, is 118. 

I would like the cv_root view to work quicker because I am using it to build a simple ontology browser. Does anyone have any ideas? I have tried vacuuming. Not exactly sure what else to try.

Thanks so much.
-DEC

------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: cv_root view very slow for chado

Chris Mungall
Hi Daniel

Is this still the source for the cv_root view (not function)?
+CREATE OR REPLACE VIEW cv_root AS
+ SELECT 
+  cv_id,
+  cvterm_id AS root_cvterm_id
+ FROM cvterm
+ WHERE 
+  cvterm_id NOT IN ( SELECT subject_id FROM cvterm_relationship)    AND
+  is_obsolete=0;
This should be fast even for large rowcounts unless there is no indexing. Have you tried EXPLAIN?

118 seems high but I imagine that these are all helper terms.

The load process relies on an XSLT step which doesn't scale well with larger ontologies. Someone could try looking into a faster processor than xsltproc, or rewriting the xslt, but it would be better to do a ground up rewrite either

 1. perl - Bio::Chado::Schema
 2. java - OWLTools plus the GBOL layer

Populating graph_path should definitely be done via owltools and not home-grown code. See:
  http://code.google.com/p/owltools/wiki/CommandLineExamples

But backing up I don't know if I would write an ontology browser directly on top of the cv module or if I'd even write a new ontology browser.

The cv module only supports a subset of the obo format spec, never mind OWL. This is fine for the kind of operations you would want to do within chado (e.g. join feature to cvterm to get the SO label; do queries involving transitive closure). But you're likely to start hitting limitations once you start building a browser. Even if it's a basic browser you might hit some naive limitations e.g. what constitutes a root in a multi-ontology instance. The fact you have the ontologies you say you have loaded means that maybe you want more than a basic browser.

Why not use AmiGO 2?

Examples:

It's driven by a SOLR index rather than a relational backend. You could write the browser on top of this API or just use or skin the browser that's there. E.g.
It's fairly generic, in fact we were going to set up a disease/phenotype instance at some point, and could help you through this process.

Cheers
Chris

On Tue, Jul 16, 2013 at 2:11 PM, Daniel E. Cook <[hidden email]> wrote:
Hello,

I'm having a bit of trouble with the view function 'cv_root' when I load in the gene ontology. I have loaded the human disease ontology, human phenotype ontolgoy, and a few other ontologies.

Everything is fine until I load the gene ontology (which, incidentally, takes a *very* long time. Does anyone know how to speed this process up?). Once the gene ontology is loaded, a few views go extremely slow. I know the gene ontology dataset is large, but a count of the cv_root view can take a couple of minutes. The result I get, by the way, is 118. 

I would like the cv_root view to work quicker because I am using it to build a simple ontology browser. Does anyone have any ideas? I have tried vacuuming. Not exactly sure what else to try.

Thanks so much.
-DEC

------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema



------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: cv_root view very slow for chado

Daniel E. Cook
Chris,

Thanks for the quick reply.  I should admit that I am a bit new at to chado. Broadly, my intention is to use the chado schema to pull data and draw network graphs linking genes, phenotypes, and diseases. However, I wanted to use django as my web framework and use a js visualization library on top of this, with chado as a backend.

To answer you question - I am using the latest version of chado (1.23) - and that is the SQL of the cv_root view. Before loading the gene ontology, that view *was* very fast. It's only once that is loaded that that things slow down tremendously. It takes ~ 5 minutes to do a row count.

The apparently high number of root terms may be due to the fact that I am loading a human phenotype ontology, gene ontology, and a disease ontology.

I actually already did write an ontology browser, if you can call it that. I did it primarily as an exercise to familiarize myself with chado and chado+django together. Here is a screenshot to give you an idea of what I did: http://cl.ly/image/3P2C432D1h3o

I think - for now - the AmiGO 2 browser may be more complex than what need (starting out), although it certainly does look useful.

Any advice/tips/pointers are greatly appreciated!

Thank you!
Dan 

-DEC


On Tue, Jul 16, 2013 at 5:25 PM, Chris Mungall <[hidden email]> wrote:
Hi Daniel

Is this still the source for the cv_root view (not function)?
+CREATE OR REPLACE VIEW cv_root AS
+ SELECT 
+  cv_id,
+  cvterm_id AS root_cvterm_id
+ FROM cvterm
+ WHERE 
+  cvterm_id NOT IN ( SELECT subject_id FROM cvterm_relationship)    AND
+  is_obsolete=0;
This should be fast even for large rowcounts unless there is no indexing. Have you tried EXPLAIN?

118 seems high but I imagine that these are all helper terms.

The load process relies on an XSLT step which doesn't scale well with larger ontologies. Someone could try looking into a faster processor than xsltproc, or rewriting the xslt, but it would be better to do a ground up rewrite either

 1. perl - Bio::Chado::Schema
 2. java - OWLTools plus the GBOL layer

Populating graph_path should definitely be done via owltools and not home-grown code. See:
  http://code.google.com/p/owltools/wiki/CommandLineExamples

But backing up I don't know if I would write an ontology browser directly on top of the cv module or if I'd even write a new ontology browser.

The cv module only supports a subset of the obo format spec, never mind OWL. This is fine for the kind of operations you would want to do within chado (e.g. join feature to cvterm to get the SO label; do queries involving transitive closure). But you're likely to start hitting limitations once you start building a browser. Even if it's a basic browser you might hit some naive limitations e.g. what constitutes a root in a multi-ontology instance. The fact you have the ontologies you say you have loaded means that maybe you want more than a basic browser.

Why not use AmiGO 2?

Examples:

It's driven by a SOLR index rather than a relational backend. You could write the browser on top of this API or just use or skin the browser that's there. E.g.
It's fairly generic, in fact we were going to set up a disease/phenotype instance at some point, and could help you through this process.

Cheers
Chris

On Tue, Jul 16, 2013 at 2:11 PM, Daniel E. Cook <[hidden email]> wrote:
Hello,

I'm having a bit of trouble with the view function 'cv_root' when I load in the gene ontology. I have loaded the human disease ontology, human phenotype ontolgoy, and a few other ontologies.

Everything is fine until I load the gene ontology (which, incidentally, takes a *very* long time. Does anyone know how to speed this process up?). Once the gene ontology is loaded, a few views go extremely slow. I know the gene ontology dataset is large, but a count of the cv_root view can take a couple of minutes. The result I get, by the way, is 118. 

I would like the cv_root view to work quicker because I am using it to build a simple ontology browser. Does anyone have any ideas? I have tried vacuuming. Not exactly sure what else to try.

Thanks so much.
-DEC

------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema




------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: cv_root view very slow for chado

Scott Cain
Hi Dan,

I'm also curious what the explain output is.  I suspect it involves the subselect, since they can cause slowness.  Worst comes to worst, you could materialize the view--presumably your cv and cvterm tables don't change to often once everything is set up.

Scott



On Tue, Jul 16, 2013 at 4:13 PM, Daniel E. Cook <[hidden email]> wrote:
Chris,

Thanks for the quick reply.  I should admit that I am a bit new at to chado. Broadly, my intention is to use the chado schema to pull data and draw network graphs linking genes, phenotypes, and diseases. However, I wanted to use django as my web framework and use a js visualization library on top of this, with chado as a backend.

To answer you question - I am using the latest version of chado (1.23) - and that is the SQL of the cv_root view. Before loading the gene ontology, that view *was* very fast. It's only once that is loaded that that things slow down tremendously. It takes ~ 5 minutes to do a row count.

The apparently high number of root terms may be due to the fact that I am loading a human phenotype ontology, gene ontology, and a disease ontology.

I actually already did write an ontology browser, if you can call it that. I did it primarily as an exercise to familiarize myself with chado and chado+django together. Here is a screenshot to give you an idea of what I did: http://cl.ly/image/3P2C432D1h3o

I think - for now - the AmiGO 2 browser may be more complex than what need (starting out), although it certainly does look useful.

Any advice/tips/pointers are greatly appreciated!

Thank you!
Dan 

-DEC


On Tue, Jul 16, 2013 at 5:25 PM, Chris Mungall <[hidden email]> wrote:
Hi Daniel

Is this still the source for the cv_root view (not function)?
+CREATE OR REPLACE VIEW cv_root AS
+ SELECT 
+  cv_id,
+  cvterm_id AS root_cvterm_id
+ FROM cvterm
+ WHERE 
+  cvterm_id NOT IN ( SELECT subject_id FROM cvterm_relationship)    AND
+  is_obsolete=0;
This should be fast even for large rowcounts unless there is no indexing. Have you tried EXPLAIN?

118 seems high but I imagine that these are all helper terms.

The load process relies on an XSLT step which doesn't scale well with larger ontologies. Someone could try looking into a faster processor than xsltproc, or rewriting the xslt, but it would be better to do a ground up rewrite either

 1. perl - Bio::Chado::Schema
 2. java - OWLTools plus the GBOL layer

Populating graph_path should definitely be done via owltools and not home-grown code. See:
  http://code.google.com/p/owltools/wiki/CommandLineExamples

But backing up I don't know if I would write an ontology browser directly on top of the cv module or if I'd even write a new ontology browser.

The cv module only supports a subset of the obo format spec, never mind OWL. This is fine for the kind of operations you would want to do within chado (e.g. join feature to cvterm to get the SO label; do queries involving transitive closure). But you're likely to start hitting limitations once you start building a browser. Even if it's a basic browser you might hit some naive limitations e.g. what constitutes a root in a multi-ontology instance. The fact you have the ontologies you say you have loaded means that maybe you want more than a basic browser.

Why not use AmiGO 2?

Examples:

It's driven by a SOLR index rather than a relational backend. You could write the browser on top of this API or just use or skin the browser that's there. E.g.
It's fairly generic, in fact we were going to set up a disease/phenotype instance at some point, and could help you through this process.

Cheers
Chris

On Tue, Jul 16, 2013 at 2:11 PM, Daniel E. Cook <[hidden email]> wrote:
Hello,

I'm having a bit of trouble with the view function 'cv_root' when I load in the gene ontology. I have loaded the human disease ontology, human phenotype ontolgoy, and a few other ontologies.

Everything is fine until I load the gene ontology (which, incidentally, takes a *very* long time. Does anyone know how to speed this process up?). Once the gene ontology is loaded, a few views go extremely slow. I know the gene ontology dataset is large, but a count of the cv_root view can take a couple of minutes. The result I get, by the way, is 118. 

I would like the cv_root view to work quicker because I am using it to build a simple ontology browser. Does anyone have any ideas? I have tried vacuuming. Not exactly sure what else to try.

Thanks so much.
-DEC

------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema




------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema




--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research

------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: cv_root view very slow for chado

Daniel E. Cook
Scott - definitely was thinking of materializing the view if needed. I will followup tomorrow morning regarding the output of explain when I play around with things a bit more.

Thank you!
Dan

-DEC


On Tue, Jul 16, 2013 at 9:48 PM, Scott Cain <[hidden email]> wrote:
Hi Dan,

I'm also curious what the explain output is.  I suspect it involves the subselect, since they can cause slowness.  Worst comes to worst, you could materialize the view--presumably your cv and cvterm tables don't change to often once everything is set up.

Scott



On Tue, Jul 16, 2013 at 4:13 PM, Daniel E. Cook <[hidden email]> wrote:
Chris,

Thanks for the quick reply.  I should admit that I am a bit new at to chado. Broadly, my intention is to use the chado schema to pull data and draw network graphs linking genes, phenotypes, and diseases. However, I wanted to use django as my web framework and use a js visualization library on top of this, with chado as a backend.

To answer you question - I am using the latest version of chado (1.23) - and that is the SQL of the cv_root view. Before loading the gene ontology, that view *was* very fast. It's only once that is loaded that that things slow down tremendously. It takes ~ 5 minutes to do a row count.

The apparently high number of root terms may be due to the fact that I am loading a human phenotype ontology, gene ontology, and a disease ontology.

I actually already did write an ontology browser, if you can call it that. I did it primarily as an exercise to familiarize myself with chado and chado+django together. Here is a screenshot to give you an idea of what I did: http://cl.ly/image/3P2C432D1h3o

I think - for now - the AmiGO 2 browser may be more complex than what need (starting out), although it certainly does look useful.

Any advice/tips/pointers are greatly appreciated!

Thank you!
Dan 

-DEC


On Tue, Jul 16, 2013 at 5:25 PM, Chris Mungall <[hidden email]> wrote:
Hi Daniel

Is this still the source for the cv_root view (not function)?
+CREATE OR REPLACE VIEW cv_root AS
+ SELECT 
+  cv_id,
+  cvterm_id AS root_cvterm_id
+ FROM cvterm
+ WHERE 
+  cvterm_id NOT IN ( SELECT subject_id FROM cvterm_relationship)    AND
+  is_obsolete=0;
This should be fast even for large rowcounts unless there is no indexing. Have you tried EXPLAIN?

118 seems high but I imagine that these are all helper terms.

The load process relies on an XSLT step which doesn't scale well with larger ontologies. Someone could try looking into a faster processor than xsltproc, or rewriting the xslt, but it would be better to do a ground up rewrite either

 1. perl - Bio::Chado::Schema
 2. java - OWLTools plus the GBOL layer

Populating graph_path should definitely be done via owltools and not home-grown code. See:
  http://code.google.com/p/owltools/wiki/CommandLineExamples

But backing up I don't know if I would write an ontology browser directly on top of the cv module or if I'd even write a new ontology browser.

The cv module only supports a subset of the obo format spec, never mind OWL. This is fine for the kind of operations you would want to do within chado (e.g. join feature to cvterm to get the SO label; do queries involving transitive closure). But you're likely to start hitting limitations once you start building a browser. Even if it's a basic browser you might hit some naive limitations e.g. what constitutes a root in a multi-ontology instance. The fact you have the ontologies you say you have loaded means that maybe you want more than a basic browser.

Why not use AmiGO 2?

Examples:

It's driven by a SOLR index rather than a relational backend. You could write the browser on top of this API or just use or skin the browser that's there. E.g.
It's fairly generic, in fact we were going to set up a disease/phenotype instance at some point, and could help you through this process.

Cheers
Chris

On Tue, Jul 16, 2013 at 2:11 PM, Daniel E. Cook <[hidden email]> wrote:
Hello,

I'm having a bit of trouble with the view function 'cv_root' when I load in the gene ontology. I have loaded the human disease ontology, human phenotype ontolgoy, and a few other ontologies.

Everything is fine until I load the gene ontology (which, incidentally, takes a *very* long time. Does anyone know how to speed this process up?). Once the gene ontology is loaded, a few views go extremely slow. I know the gene ontology dataset is large, but a count of the cv_root view can take a couple of minutes. The result I get, by the way, is 118. 

I would like the cv_root view to work quicker because I am using it to build a simple ontology browser. Does anyone have any ideas? I have tried vacuuming. Not exactly sure what else to try.

Thanks so much.
-DEC

------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema




------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema




--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     <a href="tel:216-392-3087" value="+12163923087" target="_blank">216-392-3087
Ontario Institute for Cancer Research


------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: cv_root view very slow for chado

Daniel E. Cook
I have run the following command:

EXPLAIN ANALYZE SELECT
 cv_id,
  cvterm_id AS root_cvterm_id
 FROM cvterm
 WHERE
  cvterm_id NOT IN ( SELECT subject_id FROM cvterm_relationship)    AND
  is_obsolete=0;

And the following output was generated:

                                                               QUERY
PLAN
----------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using cvterm_c1 on cvterm  (cost=0.00..52270187.79
rows=23081 width=8) (actual time=37588.650..295304.530 rows=118
loops=1)
   Index Cond: (is_obsolete = 0)
   Filter: (NOT (SubPlan 1))
   Rows Removed by Filter: 46115
   SubPlan 1
     ->  Materialize  (cost=0.00..2053.23 rows=84415 width=4) (actual
time=0.003..3.264 rows=33269 loops=46233)
           ->  Seq Scan on cvterm_relationship  (cost=0.00..1301.15
rows=84415 width=4) (actual time=0.014..15.809 rows=84415 loops=1)
 Total runtime: 295304.780 ms
(8 rows)


It actually returns 118 rows. Any idea what is going on?

Thank you.
-DEC


On Tue, Jul 16, 2013 at 10:48 PM, Daniel E. Cook
<[hidden email]> wrote:

> Scott - definitely was thinking of materializing the view if needed. I will
> followup tomorrow morning regarding the output of explain when I play around
> with things a bit more.
>
> Thank you!
> Dan
>
> -DEC
>
>
> On Tue, Jul 16, 2013 at 9:48 PM, Scott Cain <[hidden email]> wrote:
>>
>> Hi Dan,
>>
>> I'm also curious what the explain output is.  I suspect it involves the
>> subselect, since they can cause slowness.  Worst comes to worst, you could
>> materialize the view--presumably your cv and cvterm tables don't change to
>> often once everything is set up.
>>
>> Scott
>>
>>
>>
>> On Tue, Jul 16, 2013 at 4:13 PM, Daniel E. Cook
>> <[hidden email]> wrote:
>>>
>>> Chris,
>>>
>>> Thanks for the quick reply.  I should admit that I am a bit new at to
>>> chado. Broadly, my intention is to use the chado schema to pull data and
>>> draw network graphs linking genes, phenotypes, and diseases. However, I
>>> wanted to use django as my web framework and use a js visualization library
>>> on top of this, with chado as a backend.
>>>
>>> To answer you question - I am using the latest version of chado (1.23) -
>>> and that is the SQL of the cv_root view. Before loading the gene ontology,
>>> that view *was* very fast. It's only once that is loaded that that things
>>> slow down tremendously. It takes ~ 5 minutes to do a row count.
>>>
>>> The apparently high number of root terms may be due to the fact that I am
>>> loading a human phenotype ontology, gene ontology, and a disease ontology.
>>>
>>> I actually already did write an ontology browser, if you can call it
>>> that. I did it primarily as an exercise to familiarize myself with chado and
>>> chado+django together. Here is a screenshot to give you an idea of what I
>>> did: http://cl.ly/image/3P2C432D1h3o
>>>
>>> I think - for now - the AmiGO 2 browser may be more complex than what
>>> need (starting out), although it certainly does look useful.
>>>
>>> Any advice/tips/pointers are greatly appreciated!
>>>
>>> Thank you!
>>> Dan
>>>
>>> -DEC
>>>
>>>
>>> On Tue, Jul 16, 2013 at 5:25 PM, Chris Mungall <[hidden email]> wrote:
>>>>
>>>> Hi Daniel
>>>>
>>>> Is this still the source for the cv_root view (not function)?
>>>>
>>>> +CREATE OR REPLACE VIEW cv_root AS
>>>> + SELECT
>>>> +  cv_id,
>>>> +  cvterm_id AS root_cvterm_id
>>>> + FROM cvterm
>>>> + WHERE
>>>> +  cvterm_id NOT IN ( SELECT subject_id FROM cvterm_relationship)    AND
>>>> +  is_obsolete=0;
>>>>
>>>> This should be fast even for large rowcounts unless there is no
>>>> indexing. Have you tried EXPLAIN?
>>>>
>>>> 118 seems high but I imagine that these are all helper terms.
>>>>
>>>> The load process relies on an XSLT step which doesn't scale well with
>>>> larger ontologies. Someone could try looking into a faster processor than
>>>> xsltproc, or rewriting the xslt, but it would be better to do a ground up
>>>> rewrite either
>>>>
>>>>  1. perl - Bio::Chado::Schema
>>>>  2. java - OWLTools plus the GBOL layer
>>>>
>>>> Populating graph_path should definitely be done via owltools and not
>>>> home-grown code. See:
>>>>   http://code.google.com/p/owltools/wiki/CommandLineExamples
>>>>
>>>> But backing up I don't know if I would write an ontology browser
>>>> directly on top of the cv module or if I'd even write a new ontology
>>>> browser.
>>>>
>>>> The cv module only supports a subset of the obo format spec, never mind
>>>> OWL. This is fine for the kind of operations you would want to do within
>>>> chado (e.g. join feature to cvterm to get the SO label; do queries involving
>>>> transitive closure). But you're likely to start hitting limitations once you
>>>> start building a browser. Even if it's a basic browser you might hit some
>>>> naive limitations e.g. what constitutes a root in a multi-ontology instance.
>>>> The fact you have the ontologies you say you have loaded means that maybe
>>>> you want more than a basic browser.
>>>>
>>>> Why not use AmiGO 2?
>>>>
>>>> Examples:
>>>>
>>>> http://amigo2.geneontology.org/
>>>>
>>>> Source:
>>>>
>>>> https://github.com/kltm/amigo
>>>>
>>>> It's driven by a SOLR index rather than a relational backend. You could
>>>> write the browser on top of this API or just use or skin the browser that's
>>>> there. E.g.
>>>>
>>>> http://amigo2.geneontology.org/cgi-bin/amigo2/amigo/term/GO:0022008
>>>>
>>>> It's fairly generic, in fact we were going to set up a disease/phenotype
>>>> instance at some point, and could help you through this process.
>>>>
>>>> Cheers
>>>> Chris
>>>>
>>>> On Tue, Jul 16, 2013 at 2:11 PM, Daniel E. Cook
>>>> <[hidden email]> wrote:
>>>>>
>>>>> Hello,
>>>>>
>>>>> I'm having a bit of trouble with the view function 'cv_root' when I
>>>>> load in the gene ontology. I have loaded the human disease ontology, human
>>>>> phenotype ontolgoy, and a few other ontologies.
>>>>>
>>>>> Everything is fine until I load the gene ontology (which, incidentally,
>>>>> takes a *very* long time. Does anyone know how to speed this process up?).
>>>>> Once the gene ontology is loaded, a few views go extremely slow. I know the
>>>>> gene ontology dataset is large, but a count of the cv_root view can take a
>>>>> couple of minutes. The result I get, by the way, is 118.
>>>>>
>>>>> I would like the cv_root view to work quicker because I am using it to
>>>>> build a simple ontology browser. Does anyone have any ideas? I have tried
>>>>> vacuuming. Not exactly sure what else to try.
>>>>>
>>>>> Thanks so much.
>>>>> -DEC
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>> See everything from the browser to the database with AppDynamics
>>>>> Get end-to-end visibility with application monitoring from AppDynamics
>>>>> Isolate bottlenecks and diagnose root cause in seconds.
>>>>> Start your free trial of AppDynamics Pro today!
>>>>>
>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
>>>>> _______________________________________________
>>>>> Gmod-schema mailing list
>>>>> [hidden email]
>>>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>>>>
>>>>
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> See everything from the browser to the database with AppDynamics
>>> Get end-to-end visibility with application monitoring from AppDynamics
>>> Isolate bottlenecks and diagnose root cause in seconds.
>>> Start your free trial of AppDynamics Pro today!
>>>
>>> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
>>> _______________________________________________
>>> Gmod-schema mailing list
>>> [hidden email]
>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>>
>>
>>
>>
>> --
>> ------------------------------------------------------------------------
>> Scott Cain, Ph. D.                                   scott at scottcain
>> dot net
>> GMOD Coordinator (http://gmod.org/)                     216-392-3087
>> Ontario Institute for Cancer Research
>
>

------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: cv_root view very slow for chado

Scott Cain
Hi Daniel,

My suggestion is to raise your work_mem parameter in the postgresql.conf file.  The default is 1MB, and that appears to be to small for this query. Be careful moving it up: that is the amount of memory used per connection per in-memory sort, so if the database is heavily used, it can grow pretty quickly.  I did some experimenting on my small server (with 41,000 rows in cvterm).

  explain analyze select * from cv_root;

didn't return after 15 minutes and I killed it with the default work_mem, but when I bumped it up to 10MB, I got this:

explain analyze select * from cv_root;
                                                            QUERY PLAN                                                           
----------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on cvterm  (cost=1375.86..6142.11 rows=19767 width=8) (actual time=355.471..447.961 rows=157 loops=1)
   Filter: ((NOT (hashed SubPlan 1)) AND (is_obsolete = 0))
   SubPlan 1
     ->  Seq Scan on cvterm_relationship  (cost=0.00..1183.89 rows=76789 width=4) (actual time=0.012..170.194 rows=76789 loops=1)
 Total runtime: 450.580 ms
(5 rows)

I'm guessing it really wants to hold the subquery results in memory but can't because work_mem is too small, which then results in some sort of thrashing.

I did try a simpler query with work_mem at the default to see if I could get it to complete, so I did this:

explain analyze select * from cv_root where cv_id=14;
                                                               QUERY PLAN                                                              
----------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using cvterm_c1 on cvterm  (cost=0.00..2178220.98 rows=1052 width=8) (actual time=743.435..26941.099 rows=46 loops=1)
   Index Cond: ((cv_id = 14) AND (is_obsolete = 0))
   Filter: (NOT (SubPlan 1))
   SubPlan 1
     ->  Materialize  (cost=0.00..1867.83 rows=76789 width=4) (actual time=0.029..6.481 rows=3107 loops=2002)
           ->  Seq Scan on cvterm_relationship  (cost=0.00..1183.89 rows=76789 width=4) (actual time=0.026..171.562 rows=76789 loops=1)
 Total runtime: 26942.079 ms
(7 rows)

And it is still slow, but not has bad since it has the initial filtering condition that it can use up front.  That was for the Sequence Ontology; if I try  molecular function of GO, I killed it after 20 minutes.

Scott



On Wed, Jul 17, 2013 at 7:38 AM, Daniel E. Cook <[hidden email]> wrote:
I have run the following command:

EXPLAIN ANALYZE SELECT
 cv_id,
  cvterm_id AS root_cvterm_id
 FROM cvterm
 WHERE
  cvterm_id NOT IN ( SELECT subject_id FROM cvterm_relationship)    AND
  is_obsolete=0;

And the following output was generated:

                                                               QUERY
PLAN
----------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using cvterm_c1 on cvterm  (cost=0.00..52270187.79
rows=23081 width=8) (actual time=37588.650..295304.530 rows=118
loops=1)
   Index Cond: (is_obsolete = 0)
   Filter: (NOT (SubPlan 1))
   Rows Removed by Filter: 46115
   SubPlan 1
     ->  Materialize  (cost=0.00..2053.23 rows=84415 width=4) (actual
time=0.003..3.264 rows=33269 loops=46233)
           ->  Seq Scan on cvterm_relationship  (cost=0.00..1301.15
rows=84415 width=4) (actual time=0.014..15.809 rows=84415 loops=1)
 Total runtime: 295304.780 ms
(8 rows)


It actually returns 118 rows. Any idea what is going on?

Thank you.
-DEC


On Tue, Jul 16, 2013 at 10:48 PM, Daniel E. Cook
<[hidden email]> wrote:
> Scott - definitely was thinking of materializing the view if needed. I will
> followup tomorrow morning regarding the output of explain when I play around
> with things a bit more.
>
> Thank you!
> Dan
>
> -DEC
>
>
> On Tue, Jul 16, 2013 at 9:48 PM, Scott Cain <[hidden email]> wrote:
>>
>> Hi Dan,
>>
>> I'm also curious what the explain output is.  I suspect it involves the
>> subselect, since they can cause slowness.  Worst comes to worst, you could
>> materialize the view--presumably your cv and cvterm tables don't change to
>> often once everything is set up.
>>
>> Scott
>>
>>
>>
>> On Tue, Jul 16, 2013 at 4:13 PM, Daniel E. Cook
>> <[hidden email]> wrote:
>>>
>>> Chris,
>>>
>>> Thanks for the quick reply.  I should admit that I am a bit new at to
>>> chado. Broadly, my intention is to use the chado schema to pull data and
>>> draw network graphs linking genes, phenotypes, and diseases. However, I
>>> wanted to use django as my web framework and use a js visualization library
>>> on top of this, with chado as a backend.
>>>
>>> To answer you question - I am using the latest version of chado (1.23) -
>>> and that is the SQL of the cv_root view. Before loading the gene ontology,
>>> that view *was* very fast. It's only once that is loaded that that things
>>> slow down tremendously. It takes ~ 5 minutes to do a row count.
>>>
>>> The apparently high number of root terms may be due to the fact that I am
>>> loading a human phenotype ontology, gene ontology, and a disease ontology.
>>>
>>> I actually already did write an ontology browser, if you can call it
>>> that. I did it primarily as an exercise to familiarize myself with chado and
>>> chado+django together. Here is a screenshot to give you an idea of what I
>>> did: http://cl.ly/image/3P2C432D1h3o
>>>
>>> I think - for now - the AmiGO 2 browser may be more complex than what
>>> need (starting out), although it certainly does look useful.
>>>
>>> Any advice/tips/pointers are greatly appreciated!
>>>
>>> Thank you!
>>> Dan
>>>
>>> -DEC
>>>
>>>
>>> On Tue, Jul 16, 2013 at 5:25 PM, Chris Mungall <[hidden email]> wrote:
>>>>
>>>> Hi Daniel
>>>>
>>>> Is this still the source for the cv_root view (not function)?
>>>>
>>>> +CREATE OR REPLACE VIEW cv_root AS
>>>> + SELECT
>>>> +  cv_id,
>>>> +  cvterm_id AS root_cvterm_id
>>>> + FROM cvterm
>>>> + WHERE
>>>> +  cvterm_id NOT IN ( SELECT subject_id FROM cvterm_relationship)    AND
>>>> +  is_obsolete=0;
>>>>
>>>> This should be fast even for large rowcounts unless there is no
>>>> indexing. Have you tried EXPLAIN?
>>>>
>>>> 118 seems high but I imagine that these are all helper terms.
>>>>
>>>> The load process relies on an XSLT step which doesn't scale well with
>>>> larger ontologies. Someone could try looking into a faster processor than
>>>> xsltproc, or rewriting the xslt, but it would be better to do a ground up
>>>> rewrite either
>>>>
>>>>  1. perl - Bio::Chado::Schema
>>>>  2. java - OWLTools plus the GBOL layer
>>>>
>>>> Populating graph_path should definitely be done via owltools and not
>>>> home-grown code. See:
>>>>   http://code.google.com/p/owltools/wiki/CommandLineExamples
>>>>
>>>> But backing up I don't know if I would write an ontology browser
>>>> directly on top of the cv module or if I'd even write a new ontology
>>>> browser.
>>>>
>>>> The cv module only supports a subset of the obo format spec, never mind
>>>> OWL. This is fine for the kind of operations you would want to do within
>>>> chado (e.g. join feature to cvterm to get the SO label; do queries involving
>>>> transitive closure). But you're likely to start hitting limitations once you
>>>> start building a browser. Even if it's a basic browser you might hit some
>>>> naive limitations e.g. what constitutes a root in a multi-ontology instance.
>>>> The fact you have the ontologies you say you have loaded means that maybe
>>>> you want more than a basic browser.
>>>>
>>>> Why not use AmiGO 2?
>>>>
>>>> Examples:
>>>>
>>>> http://amigo2.geneontology.org/
>>>>
>>>> Source:
>>>>
>>>> https://github.com/kltm/amigo
>>>>
>>>> It's driven by a SOLR index rather than a relational backend. You could
>>>> write the browser on top of this API or just use or skin the browser that's
>>>> there. E.g.
>>>>
>>>> http://amigo2.geneontology.org/cgi-bin/amigo2/amigo/term/GO:0022008
>>>>
>>>> It's fairly generic, in fact we were going to set up a disease/phenotype
>>>> instance at some point, and could help you through this process.
>>>>
>>>> Cheers
>>>> Chris
>>>>
>>>> On Tue, Jul 16, 2013 at 2:11 PM, Daniel E. Cook
>>>> <[hidden email]> wrote:
>>>>>
>>>>> Hello,
>>>>>
>>>>> I'm having a bit of trouble with the view function 'cv_root' when I
>>>>> load in the gene ontology. I have loaded the human disease ontology, human
>>>>> phenotype ontolgoy, and a few other ontologies.
>>>>>
>>>>> Everything is fine until I load the gene ontology (which, incidentally,
>>>>> takes a *very* long time. Does anyone know how to speed this process up?).
>>>>> Once the gene ontology is loaded, a few views go extremely slow. I know the
>>>>> gene ontology dataset is large, but a count of the cv_root view can take a
>>>>> couple of minutes. The result I get, by the way, is 118.
>>>>>
>>>>> I would like the cv_root view to work quicker because I am using it to
>>>>> build a simple ontology browser. Does anyone have any ideas? I have tried
>>>>> vacuuming. Not exactly sure what else to try.
>>>>>
>>>>> Thanks so much.
>>>>> -DEC
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>> See everything from the browser to the database with AppDynamics
>>>>> Get end-to-end visibility with application monitoring from AppDynamics
>>>>> Isolate bottlenecks and diagnose root cause in seconds.
>>>>> Start your free trial of AppDynamics Pro today!
>>>>>
>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
>>>>> _______________________________________________
>>>>> Gmod-schema mailing list
>>>>> [hidden email]
>>>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>>>>
>>>>
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> See everything from the browser to the database with AppDynamics
>>> Get end-to-end visibility with application monitoring from AppDynamics
>>> Isolate bottlenecks and diagnose root cause in seconds.
>>> Start your free trial of AppDynamics Pro today!
>>>
>>> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
>>> _______________________________________________
>>> Gmod-schema mailing list
>>> [hidden email]
>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>>
>>
>>
>>
>> --
>> ------------------------------------------------------------------------
>> Scott Cain, Ph. D.                                   scott at scottcain
>> dot net
>> GMOD Coordinator (http://gmod.org/)                     <a href="tel:216-392-3087" value="+12163923087">216-392-3087
>> Ontario Institute for Cancer Research
>
>

------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema



--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research

------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: cv_root view very slow for chado

Daniel E. Cook
Wow!

I bumped it up to 20MB just to start (I will drop it back down later).
That fixed everything!

                                                           QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on cvterm  (cost=1512.19..3818.84 rows=23183 width=8)
(actual time=35.106..55.098 rows=118 loops=1)
   Filter: ((NOT (hashed SubPlan 1)) AND (is_obsolete = 0))
   Rows Removed by Filter: 50467
   SubPlan 1
     ->  Seq Scan on cvterm_relationship  (cost=0.00..1301.15
rows=84415 width=4) (actual time=0.018..13.051 rows=84415 loops=1)
 Total runtime: 55.200 ms
(6 rows)

Very helpful Scott. Thank you so much!

Dan

-DEC


On Wed, Jul 17, 2013 at 1:37 PM, Scott Cain <[hidden email]> wrote:

> Hi Daniel,
>
> My suggestion is to raise your work_mem parameter in the postgresql.conf
> file.  The default is 1MB, and that appears to be to small for this query.
> Be careful moving it up: that is the amount of memory used per connection
> per in-memory sort, so if the database is heavily used, it can grow pretty
> quickly.  I did some experimenting on my small server (with 41,000 rows in
> cvterm).
>
>   explain analyze select * from cv_root;
>
> didn't return after 15 minutes and I killed it with the default work_mem,
> but when I bumped it up to 10MB, I got this:
>
> explain analyze select * from cv_root;
>                                                             QUERY PLAN
> ----------------------------------------------------------------------------------------------------------------------------------
>  Seq Scan on cvterm  (cost=1375.86..6142.11 rows=19767 width=8) (actual
> time=355.471..447.961 rows=157 loops=1)
>    Filter: ((NOT (hashed SubPlan 1)) AND (is_obsolete = 0))
>    SubPlan 1
>      ->  Seq Scan on cvterm_relationship  (cost=0.00..1183.89 rows=76789
> width=4) (actual time=0.012..170.194 rows=76789 loops=1)
>  Total runtime: 450.580 ms
> (5 rows)
>
> I'm guessing it really wants to hold the subquery results in memory but
> can't because work_mem is too small, which then results in some sort of
> thrashing.
>
> I did try a simpler query with work_mem at the default to see if I could get
> it to complete, so I did this:
>
> explain analyze select * from cv_root where cv_id=14;
>                                                                QUERY PLAN
> ----------------------------------------------------------------------------------------------------------------------------------------
>  Index Scan using cvterm_c1 on cvterm  (cost=0.00..2178220.98 rows=1052
> width=8) (actual time=743.435..26941.099 rows=46 loops=1)
>    Index Cond: ((cv_id = 14) AND (is_obsolete = 0))
>    Filter: (NOT (SubPlan 1))
>    SubPlan 1
>      ->  Materialize  (cost=0.00..1867.83 rows=76789 width=4) (actual
> time=0.029..6.481 rows=3107 loops=2002)
>            ->  Seq Scan on cvterm_relationship  (cost=0.00..1183.89
> rows=76789 width=4) (actual time=0.026..171.562 rows=76789 loops=1)
>  Total runtime: 26942.079 ms
> (7 rows)
>
> And it is still slow, but not has bad since it has the initial filtering
> condition that it can use up front.  That was for the Sequence Ontology; if
> I try  molecular function of GO, I killed it after 20 minutes.
>
> Scott
>
>
>
> On Wed, Jul 17, 2013 at 7:38 AM, Daniel E. Cook
> <[hidden email]> wrote:
>>
>> I have run the following command:
>>
>> EXPLAIN ANALYZE SELECT
>>  cv_id,
>>   cvterm_id AS root_cvterm_id
>>  FROM cvterm
>>  WHERE
>>   cvterm_id NOT IN ( SELECT subject_id FROM cvterm_relationship)    AND
>>   is_obsolete=0;
>>
>> And the following output was generated:
>>
>>                                                                QUERY
>> PLAN
>>
>> ----------------------------------------------------------------------------------------------------------------------------------------
>>  Index Scan using cvterm_c1 on cvterm  (cost=0.00..52270187.79
>> rows=23081 width=8) (actual time=37588.650..295304.530 rows=118
>> loops=1)
>>    Index Cond: (is_obsolete = 0)
>>    Filter: (NOT (SubPlan 1))
>>    Rows Removed by Filter: 46115
>>    SubPlan 1
>>      ->  Materialize  (cost=0.00..2053.23 rows=84415 width=4) (actual
>> time=0.003..3.264 rows=33269 loops=46233)
>>            ->  Seq Scan on cvterm_relationship  (cost=0.00..1301.15
>> rows=84415 width=4) (actual time=0.014..15.809 rows=84415 loops=1)
>>  Total runtime: 295304.780 ms
>> (8 rows)
>>
>>
>> It actually returns 118 rows. Any idea what is going on?
>>
>> Thank you.
>> -DEC
>>
>>
>> On Tue, Jul 16, 2013 at 10:48 PM, Daniel E. Cook
>> <[hidden email]> wrote:
>> > Scott - definitely was thinking of materializing the view if needed. I
>> > will
>> > followup tomorrow morning regarding the output of explain when I play
>> > around
>> > with things a bit more.
>> >
>> > Thank you!
>> > Dan
>> >
>> > -DEC
>> >
>> >
>> > On Tue, Jul 16, 2013 at 9:48 PM, Scott Cain <[hidden email]> wrote:
>> >>
>> >> Hi Dan,
>> >>
>> >> I'm also curious what the explain output is.  I suspect it involves the
>> >> subselect, since they can cause slowness.  Worst comes to worst, you
>> >> could
>> >> materialize the view--presumably your cv and cvterm tables don't change
>> >> to
>> >> often once everything is set up.
>> >>
>> >> Scott
>> >>
>> >>
>> >>
>> >> On Tue, Jul 16, 2013 at 4:13 PM, Daniel E. Cook
>> >> <[hidden email]> wrote:
>> >>>
>> >>> Chris,
>> >>>
>> >>> Thanks for the quick reply.  I should admit that I am a bit new at to
>> >>> chado. Broadly, my intention is to use the chado schema to pull data
>> >>> and
>> >>> draw network graphs linking genes, phenotypes, and diseases. However,
>> >>> I
>> >>> wanted to use django as my web framework and use a js visualization
>> >>> library
>> >>> on top of this, with chado as a backend.
>> >>>
>> >>> To answer you question - I am using the latest version of chado (1.23)
>> >>> -
>> >>> and that is the SQL of the cv_root view. Before loading the gene
>> >>> ontology,
>> >>> that view *was* very fast. It's only once that is loaded that that
>> >>> things
>> >>> slow down tremendously. It takes ~ 5 minutes to do a row count.
>> >>>
>> >>> The apparently high number of root terms may be due to the fact that I
>> >>> am
>> >>> loading a human phenotype ontology, gene ontology, and a disease
>> >>> ontology.
>> >>>
>> >>> I actually already did write an ontology browser, if you can call it
>> >>> that. I did it primarily as an exercise to familiarize myself with
>> >>> chado and
>> >>> chado+django together. Here is a screenshot to give you an idea of
>> >>> what I
>> >>> did: http://cl.ly/image/3P2C432D1h3o
>> >>>
>> >>> I think - for now - the AmiGO 2 browser may be more complex than what
>> >>> need (starting out), although it certainly does look useful.
>> >>>
>> >>> Any advice/tips/pointers are greatly appreciated!
>> >>>
>> >>> Thank you!
>> >>> Dan
>> >>>
>> >>> -DEC
>> >>>
>> >>>
>> >>> On Tue, Jul 16, 2013 at 5:25 PM, Chris Mungall <[hidden email]>
>> >>> wrote:
>> >>>>
>> >>>> Hi Daniel
>> >>>>
>> >>>> Is this still the source for the cv_root view (not function)?
>> >>>>
>> >>>> +CREATE OR REPLACE VIEW cv_root AS
>> >>>> + SELECT
>> >>>> +  cv_id,
>> >>>> +  cvterm_id AS root_cvterm_id
>> >>>> + FROM cvterm
>> >>>> + WHERE
>> >>>> +  cvterm_id NOT IN ( SELECT subject_id FROM cvterm_relationship)
>> >>>> AND
>> >>>> +  is_obsolete=0;
>> >>>>
>> >>>> This should be fast even for large rowcounts unless there is no
>> >>>> indexing. Have you tried EXPLAIN?
>> >>>>
>> >>>> 118 seems high but I imagine that these are all helper terms.
>> >>>>
>> >>>> The load process relies on an XSLT step which doesn't scale well with
>> >>>> larger ontologies. Someone could try looking into a faster processor
>> >>>> than
>> >>>> xsltproc, or rewriting the xslt, but it would be better to do a
>> >>>> ground up
>> >>>> rewrite either
>> >>>>
>> >>>>  1. perl - Bio::Chado::Schema
>> >>>>  2. java - OWLTools plus the GBOL layer
>> >>>>
>> >>>> Populating graph_path should definitely be done via owltools and not
>> >>>> home-grown code. See:
>> >>>>   http://code.google.com/p/owltools/wiki/CommandLineExamples
>> >>>>
>> >>>> But backing up I don't know if I would write an ontology browser
>> >>>> directly on top of the cv module or if I'd even write a new ontology
>> >>>> browser.
>> >>>>
>> >>>> The cv module only supports a subset of the obo format spec, never
>> >>>> mind
>> >>>> OWL. This is fine for the kind of operations you would want to do
>> >>>> within
>> >>>> chado (e.g. join feature to cvterm to get the SO label; do queries
>> >>>> involving
>> >>>> transitive closure). But you're likely to start hitting limitations
>> >>>> once you
>> >>>> start building a browser. Even if it's a basic browser you might hit
>> >>>> some
>> >>>> naive limitations e.g. what constitutes a root in a multi-ontology
>> >>>> instance.
>> >>>> The fact you have the ontologies you say you have loaded means that
>> >>>> maybe
>> >>>> you want more than a basic browser.
>> >>>>
>> >>>> Why not use AmiGO 2?
>> >>>>
>> >>>> Examples:
>> >>>>
>> >>>> http://amigo2.geneontology.org/
>> >>>>
>> >>>> Source:
>> >>>>
>> >>>> https://github.com/kltm/amigo
>> >>>>
>> >>>> It's driven by a SOLR index rather than a relational backend. You
>> >>>> could
>> >>>> write the browser on top of this API or just use or skin the browser
>> >>>> that's
>> >>>> there. E.g.
>> >>>>
>> >>>> http://amigo2.geneontology.org/cgi-bin/amigo2/amigo/term/GO:0022008
>> >>>>
>> >>>> It's fairly generic, in fact we were going to set up a
>> >>>> disease/phenotype
>> >>>> instance at some point, and could help you through this process.
>> >>>>
>> >>>> Cheers
>> >>>> Chris
>> >>>>
>> >>>> On Tue, Jul 16, 2013 at 2:11 PM, Daniel E. Cook
>> >>>> <[hidden email]> wrote:
>> >>>>>
>> >>>>> Hello,
>> >>>>>
>> >>>>> I'm having a bit of trouble with the view function 'cv_root' when I
>> >>>>> load in the gene ontology. I have loaded the human disease ontology,
>> >>>>> human
>> >>>>> phenotype ontolgoy, and a few other ontologies.
>> >>>>>
>> >>>>> Everything is fine until I load the gene ontology (which,
>> >>>>> incidentally,
>> >>>>> takes a *very* long time. Does anyone know how to speed this process
>> >>>>> up?).
>> >>>>> Once the gene ontology is loaded, a few views go extremely slow. I
>> >>>>> know the
>> >>>>> gene ontology dataset is large, but a count of the cv_root view can
>> >>>>> take a
>> >>>>> couple of minutes. The result I get, by the way, is 118.
>> >>>>>
>> >>>>> I would like the cv_root view to work quicker because I am using it
>> >>>>> to
>> >>>>> build a simple ontology browser. Does anyone have any ideas? I have
>> >>>>> tried
>> >>>>> vacuuming. Not exactly sure what else to try.
>> >>>>>
>> >>>>> Thanks so much.
>> >>>>> -DEC
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> ------------------------------------------------------------------------------
>> >>>>> See everything from the browser to the database with AppDynamics
>> >>>>> Get end-to-end visibility with application monitoring from
>> >>>>> AppDynamics
>> >>>>> Isolate bottlenecks and diagnose root cause in seconds.
>> >>>>> Start your free trial of AppDynamics Pro today!
>> >>>>>
>> >>>>>
>> >>>>> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
>> >>>>> _______________________________________________
>> >>>>> Gmod-schema mailing list
>> >>>>> [hidden email]
>> >>>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>> >>>>>
>> >>>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> ------------------------------------------------------------------------------
>> >>> See everything from the browser to the database with AppDynamics
>> >>> Get end-to-end visibility with application monitoring from AppDynamics
>> >>> Isolate bottlenecks and diagnose root cause in seconds.
>> >>> Start your free trial of AppDynamics Pro today!
>> >>>
>> >>>
>> >>> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
>> >>> _______________________________________________
>> >>> Gmod-schema mailing list
>> >>> [hidden email]
>> >>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >>
>> >> ------------------------------------------------------------------------
>> >> Scott Cain, Ph. D.                                   scott at scottcain
>> >> dot net
>> >> GMOD Coordinator (http://gmod.org/)                     216-392-3087
>> >> Ontario Institute for Cancer Research
>> >
>> >
>>
>>
>> ------------------------------------------------------------------------------
>> See everything from the browser to the database with AppDynamics
>> Get end-to-end visibility with application monitoring from AppDynamics
>> Isolate bottlenecks and diagnose root cause in seconds.
>> Start your free trial of AppDynamics Pro today!
>>
>> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
>> _______________________________________________
>> Gmod-schema mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>
>
>
>
> --
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   scott at scottcain dot
> net
> GMOD Coordinator (http://gmod.org/)                     216-392-3087
> Ontario Institute for Cancer Research

------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: cv_root view very slow for chado

Chris Mungall
In reply to this post by Daniel E. Cook
My recommendation would be to keep things loosely coupled. Write your app such that you can swap in and out ontology providers, gene annotation providers, interaction providers, etc.

We're working on something similar and should have some stable services you can use for this - ping me again in a  couple of weeks

Thanks for sorting out the query issue Scott.

Cheers



On Tue, Jul 16, 2013 at 4:13 PM, Daniel E. Cook <[hidden email]> wrote:
Chris,

Thanks for the quick reply.  I should admit that I am a bit new at to chado. Broadly, my intention is to use the chado schema to pull data and draw network graphs linking genes, phenotypes, and diseases. However, I wanted to use django as my web framework and use a js visualization library on top of this, with chado as a backend.

To answer you question - I am using the latest version of chado (1.23) - and that is the SQL of the cv_root view. Before loading the gene ontology, that view *was* very fast. It's only once that is loaded that that things slow down tremendously. It takes ~ 5 minutes to do a row count.

The apparently high number of root terms may be due to the fact that I am loading a human phenotype ontology, gene ontology, and a disease ontology.

I actually already did write an ontology browser, if you can call it that. I did it primarily as an exercise to familiarize myself with chado and chado+django together. Here is a screenshot to give you an idea of what I did: http://cl.ly/image/3P2C432D1h3o

I think - for now - the AmiGO 2 browser may be more complex than what need (starting out), although it certainly does look useful.

Any advice/tips/pointers are greatly appreciated!

Thank you!
Dan 

-DEC


On Tue, Jul 16, 2013 at 5:25 PM, Chris Mungall <[hidden email]> wrote:
Hi Daniel

Is this still the source for the cv_root view (not function)?
+CREATE OR REPLACE VIEW cv_root AS
+ SELECT 
+  cv_id,
+  cvterm_id AS root_cvterm_id
+ FROM cvterm
+ WHERE 
+  cvterm_id NOT IN ( SELECT subject_id FROM cvterm_relationship)    AND
+  is_obsolete=0;
This should be fast even for large rowcounts unless there is no indexing. Have you tried EXPLAIN?

118 seems high but I imagine that these are all helper terms.

The load process relies on an XSLT step which doesn't scale well with larger ontologies. Someone could try looking into a faster processor than xsltproc, or rewriting the xslt, but it would be better to do a ground up rewrite either

 1. perl - Bio::Chado::Schema
 2. java - OWLTools plus the GBOL layer

Populating graph_path should definitely be done via owltools and not home-grown code. See:
  http://code.google.com/p/owltools/wiki/CommandLineExamples

But backing up I don't know if I would write an ontology browser directly on top of the cv module or if I'd even write a new ontology browser.

The cv module only supports a subset of the obo format spec, never mind OWL. This is fine for the kind of operations you would want to do within chado (e.g. join feature to cvterm to get the SO label; do queries involving transitive closure). But you're likely to start hitting limitations once you start building a browser. Even if it's a basic browser you might hit some naive limitations e.g. what constitutes a root in a multi-ontology instance. The fact you have the ontologies you say you have loaded means that maybe you want more than a basic browser.

Why not use AmiGO 2?

Examples:

It's driven by a SOLR index rather than a relational backend. You could write the browser on top of this API or just use or skin the browser that's there. E.g.
It's fairly generic, in fact we were going to set up a disease/phenotype instance at some point, and could help you through this process.

Cheers
Chris

On Tue, Jul 16, 2013 at 2:11 PM, Daniel E. Cook <[hidden email]> wrote:
Hello,

I'm having a bit of trouble with the view function 'cv_root' when I load in the gene ontology. I have loaded the human disease ontology, human phenotype ontolgoy, and a few other ontologies.

Everything is fine until I load the gene ontology (which, incidentally, takes a *very* long time. Does anyone know how to speed this process up?). Once the gene ontology is loaded, a few views go extremely slow. I know the gene ontology dataset is large, but a count of the cv_root view can take a couple of minutes. The result I get, by the way, is 118. 

I would like the cv_root view to work quicker because I am using it to build a simple ontology browser. Does anyone have any ideas? I have tried vacuuming. Not exactly sure what else to try.

Thanks so much.
-DEC

------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema





------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: cv_root view very slow for chado

Daniel E. Cook
Thanks again. One other question - I am interested in loading a list
of all human genes into chado (preferably with GO annotations). I
obtained a gff3 file from ensemble - but from the looks of it - the
file does not have GO annotations. Is there somewhere I can obtain a
gff3 file with GO annotations? What would be the best way to load a
set of human genome annotations?

Thanks,
Dan
-DEC


On Wed, Jul 17, 2013 at 9:42 PM, Chris Mungall <[hidden email]> wrote:

> My recommendation would be to keep things loosely coupled. Write your app
> such that you can swap in and out ontology providers, gene annotation
> providers, interaction providers, etc.
>
> We're working on something similar and should have some stable services you
> can use for this - ping me again in a  couple of weeks
>
> Thanks for sorting out the query issue Scott.
>
> Cheers
>
>
>
> On Tue, Jul 16, 2013 at 4:13 PM, Daniel E. Cook
> <[hidden email]> wrote:
>>
>> Chris,
>>
>> Thanks for the quick reply.  I should admit that I am a bit new at to
>> chado. Broadly, my intention is to use the chado schema to pull data and
>> draw network graphs linking genes, phenotypes, and diseases. However, I
>> wanted to use django as my web framework and use a js visualization library
>> on top of this, with chado as a backend.
>>
>> To answer you question - I am using the latest version of chado (1.23) -
>> and that is the SQL of the cv_root view. Before loading the gene ontology,
>> that view *was* very fast. It's only once that is loaded that that things
>> slow down tremendously. It takes ~ 5 minutes to do a row count.
>>
>> The apparently high number of root terms may be due to the fact that I am
>> loading a human phenotype ontology, gene ontology, and a disease ontology.
>>
>> I actually already did write an ontology browser, if you can call it that.
>> I did it primarily as an exercise to familiarize myself with chado and
>> chado+django together. Here is a screenshot to give you an idea of what I
>> did: http://cl.ly/image/3P2C432D1h3o
>>
>> I think - for now - the AmiGO 2 browser may be more complex than what need
>> (starting out), although it certainly does look useful.
>>
>> Any advice/tips/pointers are greatly appreciated!
>>
>> Thank you!
>> Dan
>>
>> -DEC
>>
>>
>> On Tue, Jul 16, 2013 at 5:25 PM, Chris Mungall <[hidden email]> wrote:
>>>
>>> Hi Daniel
>>>
>>> Is this still the source for the cv_root view (not function)?
>>>
>>> +CREATE OR REPLACE VIEW cv_root AS
>>> + SELECT
>>> +  cv_id,
>>> +  cvterm_id AS root_cvterm_id
>>> + FROM cvterm
>>> + WHERE
>>> +  cvterm_id NOT IN ( SELECT subject_id FROM cvterm_relationship)    AND
>>> +  is_obsolete=0;
>>>
>>> This should be fast even for large rowcounts unless there is no indexing.
>>> Have you tried EXPLAIN?
>>>
>>> 118 seems high but I imagine that these are all helper terms.
>>>
>>> The load process relies on an XSLT step which doesn't scale well with
>>> larger ontologies. Someone could try looking into a faster processor than
>>> xsltproc, or rewriting the xslt, but it would be better to do a ground up
>>> rewrite either
>>>
>>>  1. perl - Bio::Chado::Schema
>>>  2. java - OWLTools plus the GBOL layer
>>>
>>> Populating graph_path should definitely be done via owltools and not
>>> home-grown code. See:
>>>   http://code.google.com/p/owltools/wiki/CommandLineExamples
>>>
>>> But backing up I don't know if I would write an ontology browser directly
>>> on top of the cv module or if I'd even write a new ontology browser.
>>>
>>> The cv module only supports a subset of the obo format spec, never mind
>>> OWL. This is fine for the kind of operations you would want to do within
>>> chado (e.g. join feature to cvterm to get the SO label; do queries involving
>>> transitive closure). But you're likely to start hitting limitations once you
>>> start building a browser. Even if it's a basic browser you might hit some
>>> naive limitations e.g. what constitutes a root in a multi-ontology instance.
>>> The fact you have the ontologies you say you have loaded means that maybe
>>> you want more than a basic browser.
>>>
>>> Why not use AmiGO 2?
>>>
>>> Examples:
>>>
>>> http://amigo2.geneontology.org/
>>>
>>> Source:
>>>
>>> https://github.com/kltm/amigo
>>>
>>> It's driven by a SOLR index rather than a relational backend. You could
>>> write the browser on top of this API or just use or skin the browser that's
>>> there. E.g.
>>>
>>> http://amigo2.geneontology.org/cgi-bin/amigo2/amigo/term/GO:0022008
>>>
>>> It's fairly generic, in fact we were going to set up a disease/phenotype
>>> instance at some point, and could help you through this process.
>>>
>>> Cheers
>>> Chris
>>>
>>> On Tue, Jul 16, 2013 at 2:11 PM, Daniel E. Cook
>>> <[hidden email]> wrote:
>>>>
>>>> Hello,
>>>>
>>>> I'm having a bit of trouble with the view function 'cv_root' when I load
>>>> in the gene ontology. I have loaded the human disease ontology, human
>>>> phenotype ontolgoy, and a few other ontologies.
>>>>
>>>> Everything is fine until I load the gene ontology (which, incidentally,
>>>> takes a *very* long time. Does anyone know how to speed this process up?).
>>>> Once the gene ontology is loaded, a few views go extremely slow. I know the
>>>> gene ontology dataset is large, but a count of the cv_root view can take a
>>>> couple of minutes. The result I get, by the way, is 118.
>>>>
>>>> I would like the cv_root view to work quicker because I am using it to
>>>> build a simple ontology browser. Does anyone have any ideas? I have tried
>>>> vacuuming. Not exactly sure what else to try.
>>>>
>>>> Thanks so much.
>>>> -DEC
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> See everything from the browser to the database with AppDynamics
>>>> Get end-to-end visibility with application monitoring from AppDynamics
>>>> Isolate bottlenecks and diagnose root cause in seconds.
>>>> Start your free trial of AppDynamics Pro today!
>>>>
>>>> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
>>>> _______________________________________________
>>>> Gmod-schema mailing list
>>>> [hidden email]
>>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>>>
>>>
>>
>

------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: cv_root view very slow for chado

Siddhartha Basu
Hi Daniel,
GO annotations are generally available as GAF format
http://www.geneontology.org/GO.format.gaf-2_0.shtml
You could download human GO annotations from here,
http://www.geneontology.org/GO.downloads.annotations.shtml
OR here
http://www.ebi.ac.uk/GOA/human_release
In case of mapping identifier, you might need to use the corresponding gp2protein file
http://www.geneontology.org/gp2protein/

Briefly, you load your features from GFF3, then load your GO annotations by linking
through feature_cvterm.
http://gmod.org/wiki/Chado_Tables#Table:_feature_cvterm
It is assumed you have already loaded GO in chado database.

This might also addresses your cross posting here ....
http://www.biostars.org/p/76959/

Hope this helps,
-siddhartha

On Thu, 18 Jul 2013, Daniel E. Cook wrote:

> Thanks again. One other question - I am interested in loading a list
> of all human genes into chado (preferably with GO annotations). I
> obtained a gff3 file from ensemble - but from the looks of it - the
> file does not have GO annotations. Is there somewhere I can obtain a
> gff3 file with GO annotations? What would be the best way to load a
> set of human genome annotations?
>
> Thanks,
> Dan
> -DEC
>
>
> On Wed, Jul 17, 2013 at 9:42 PM, Chris Mungall <[hidden email]> wrote:
> > My recommendation would be to keep things loosely coupled. Write your app
> > such that you can swap in and out ontology providers, gene annotation
> > providers, interaction providers, etc.
> >
> > We're working on something similar and should have some stable services you
> > can use for this - ping me again in a  couple of weeks
> >
> > Thanks for sorting out the query issue Scott.
> >
> > Cheers
> >
> >
> >
> > On Tue, Jul 16, 2013 at 4:13 PM, Daniel E. Cook
> > <[hidden email]> wrote:
> >>
> >> Chris,
> >>
> >> Thanks for the quick reply.  I should admit that I am a bit new at to
> >> chado. Broadly, my intention is to use the chado schema to pull data and
> >> draw network graphs linking genes, phenotypes, and diseases. However, I
> >> wanted to use django as my web framework and use a js visualization library
> >> on top of this, with chado as a backend.
> >>
> >> To answer you question - I am using the latest version of chado (1.23) -
> >> and that is the SQL of the cv_root view. Before loading the gene ontology,
> >> that view *was* very fast. It's only once that is loaded that that things
> >> slow down tremendously. It takes ~ 5 minutes to do a row count.
> >>
> >> The apparently high number of root terms may be due to the fact that I am
> >> loading a human phenotype ontology, gene ontology, and a disease ontology.
> >>
> >> I actually already did write an ontology browser, if you can call it that.
> >> I did it primarily as an exercise to familiarize myself with chado and
> >> chado+django together. Here is a screenshot to give you an idea of what I
> >> did: http://cl.ly/image/3P2C432D1h3o
> >>
> >> I think - for now - the AmiGO 2 browser may be more complex than what need
> >> (starting out), although it certainly does look useful.
> >>
> >> Any advice/tips/pointers are greatly appreciated!
> >>
> >> Thank you!
> >> Dan
> >>
> >> -DEC
> >>
> >>
> >> On Tue, Jul 16, 2013 at 5:25 PM, Chris Mungall <[hidden email]> wrote:
> >>>
> >>> Hi Daniel
> >>>
> >>> Is this still the source for the cv_root view (not function)?
> >>>
> >>> +CREATE OR REPLACE VIEW cv_root AS
> >>> + SELECT
> >>> +  cv_id,
> >>> +  cvterm_id AS root_cvterm_id
> >>> + FROM cvterm
> >>> + WHERE
> >>> +  cvterm_id NOT IN ( SELECT subject_id FROM cvterm_relationship)    AND
> >>> +  is_obsolete=0;
> >>>
> >>> This should be fast even for large rowcounts unless there is no indexing.
> >>> Have you tried EXPLAIN?
> >>>
> >>> 118 seems high but I imagine that these are all helper terms.
> >>>
> >>> The load process relies on an XSLT step which doesn't scale well with
> >>> larger ontologies. Someone could try looking into a faster processor than
> >>> xsltproc, or rewriting the xslt, but it would be better to do a ground up
> >>> rewrite either
> >>>
> >>>  1. perl - Bio::Chado::Schema
> >>>  2. java - OWLTools plus the GBOL layer
> >>>
> >>> Populating graph_path should definitely be done via owltools and not
> >>> home-grown code. See:
> >>>   http://code.google.com/p/owltools/wiki/CommandLineExamples
> >>>
> >>> But backing up I don't know if I would write an ontology browser directly
> >>> on top of the cv module or if I'd even write a new ontology browser.
> >>>
> >>> The cv module only supports a subset of the obo format spec, never mind
> >>> OWL. This is fine for the kind of operations you would want to do within
> >>> chado (e.g. join feature to cvterm to get the SO label; do queries involving
> >>> transitive closure). But you're likely to start hitting limitations once you
> >>> start building a browser. Even if it's a basic browser you might hit some
> >>> naive limitations e.g. what constitutes a root in a multi-ontology instance.
> >>> The fact you have the ontologies you say you have loaded means that maybe
> >>> you want more than a basic browser.
> >>>
> >>> Why not use AmiGO 2?
> >>>
> >>> Examples:
> >>>
> >>> http://amigo2.geneontology.org/
> >>>
> >>> Source:
> >>>
> >>> https://github.com/kltm/amigo
> >>>
> >>> It's driven by a SOLR index rather than a relational backend. You could
> >>> write the browser on top of this API or just use or skin the browser that's
> >>> there. E.g.
> >>>
> >>> http://amigo2.geneontology.org/cgi-bin/amigo2/amigo/term/GO:0022008
> >>>
> >>> It's fairly generic, in fact we were going to set up a disease/phenotype
> >>> instance at some point, and could help you through this process.
> >>>
> >>> Cheers
> >>> Chris
> >>>
> >>> On Tue, Jul 16, 2013 at 2:11 PM, Daniel E. Cook
> >>> <[hidden email]> wrote:
> >>>>
> >>>> Hello,
> >>>>
> >>>> I'm having a bit of trouble with the view function 'cv_root' when I load
> >>>> in the gene ontology. I have loaded the human disease ontology, human
> >>>> phenotype ontolgoy, and a few other ontologies.
> >>>>
> >>>> Everything is fine until I load the gene ontology (which, incidentally,
> >>>> takes a *very* long time. Does anyone know how to speed this process up?).
> >>>> Once the gene ontology is loaded, a few views go extremely slow. I know the
> >>>> gene ontology dataset is large, but a count of the cv_root view can take a
> >>>> couple of minutes. The result I get, by the way, is 118.
> >>>>
> >>>> I would like the cv_root view to work quicker because I am using it to
> >>>> build a simple ontology browser. Does anyone have any ideas? I have tried
> >>>> vacuuming. Not exactly sure what else to try.
> >>>>
> >>>> Thanks so much.
> >>>> -DEC
> >>>>
> >>>>
> >>>> ------------------------------------------------------------------------------
> >>>> See everything from the browser to the database with AppDynamics
> >>>> Get end-to-end visibility with application monitoring from AppDynamics
> >>>> Isolate bottlenecks and diagnose root cause in seconds.
> >>>> Start your free trial of AppDynamics Pro today!
> >>>>
> >>>> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
> >>>> _______________________________________________
> >>>> Gmod-schema mailing list
> >>>> [hidden email]
> >>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
> >>>>
> >>>
> >>
> >
>
> ------------------------------------------------------------------------------
> See everything from the browser to the database with AppDynamics
> Get end-to-end visibility with application monitoring from AppDynamics
> Isolate bottlenecks and diagnose root cause in seconds.
> Start your free trial of AppDynamics Pro today!
> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
> _______________________________________________
> Gmod-schema mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gmod-schema

------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: cv_root view very slow for chado

Daniel E. Cook
Thank you - It looks like I am getting close. The problem is - It
doesn't look like these files have the chromosome information so I am
getting this error:

gmod_bulk_load_gff3.pl --gfffile 'feature/ref_GRCh37.p10_top_level.gff3.sorted'
Preparing data for inserting into the chado database
(This may take a while ...)
Unable to find srcfeature NC_000001.10 in the database.

Where might I get a gff file that contains the chromosome information?

Thank you!

-Dan


On Thu, Jul 18, 2013 at 8:21 PM, Siddhartha Basu <[hidden email]> wrote:

> Hi Daniel,
> GO annotations are generally available as GAF format
> http://www.geneontology.org/GO.format.gaf-2_0.shtml
> You could download human GO annotations from here,
> http://www.geneontology.org/GO.downloads.annotations.shtml
> OR here
> http://www.ebi.ac.uk/GOA/human_release
> In case of mapping identifier, you might need to use the corresponding gp2protein file
> http://www.geneontology.org/gp2protein/
>
> Briefly, you load your features from GFF3, then load your GO annotations by linking
> through feature_cvterm.
> http://gmod.org/wiki/Chado_Tables#Table:_feature_cvterm
> It is assumed you have already loaded GO in chado database.
>
> This might also addresses your cross posting here ....
> http://www.biostars.org/p/76959/
>
> Hope this helps,
> -siddhartha
>
> On Thu, 18 Jul 2013, Daniel E. Cook wrote:
>
>> Thanks again. One other question - I am interested in loading a list
>> of all human genes into chado (preferably with GO annotations). I
>> obtained a gff3 file from ensemble - but from the looks of it - the
>> file does not have GO annotations. Is there somewhere I can obtain a
>> gff3 file with GO annotations? What would be the best way to load a
>> set of human genome annotations?
>>
>> Thanks,
>> Dan
>> -DEC
>>
>>
>> On Wed, Jul 17, 2013 at 9:42 PM, Chris Mungall <[hidden email]> wrote:
>> > My recommendation would be to keep things loosely coupled. Write your app
>> > such that you can swap in and out ontology providers, gene annotation
>> > providers, interaction providers, etc.
>> >
>> > We're working on something similar and should have some stable services you
>> > can use for this - ping me again in a  couple of weeks
>> >
>> > Thanks for sorting out the query issue Scott.
>> >
>> > Cheers
>> >
>> >
>> >
>> > On Tue, Jul 16, 2013 at 4:13 PM, Daniel E. Cook
>> > <[hidden email]> wrote:
>> >>
>> >> Chris,
>> >>
>> >> Thanks for the quick reply.  I should admit that I am a bit new at to
>> >> chado. Broadly, my intention is to use the chado schema to pull data and
>> >> draw network graphs linking genes, phenotypes, and diseases. However, I
>> >> wanted to use django as my web framework and use a js visualization library
>> >> on top of this, with chado as a backend.
>> >>
>> >> To answer you question - I am using the latest version of chado (1.23) -
>> >> and that is the SQL of the cv_root view. Before loading the gene ontology,
>> >> that view *was* very fast. It's only once that is loaded that that things
>> >> slow down tremendously. It takes ~ 5 minutes to do a row count.
>> >>
>> >> The apparently high number of root terms may be due to the fact that I am
>> >> loading a human phenotype ontology, gene ontology, and a disease ontology.
>> >>
>> >> I actually already did write an ontology browser, if you can call it that.
>> >> I did it primarily as an exercise to familiarize myself with chado and
>> >> chado+django together. Here is a screenshot to give you an idea of what I
>> >> did: http://cl.ly/image/3P2C432D1h3o
>> >>
>> >> I think - for now - the AmiGO 2 browser may be more complex than what need
>> >> (starting out), although it certainly does look useful.
>> >>
>> >> Any advice/tips/pointers are greatly appreciated!
>> >>
>> >> Thank you!
>> >> Dan
>> >>
>> >> -DEC
>> >>
>> >>
>> >> On Tue, Jul 16, 2013 at 5:25 PM, Chris Mungall <[hidden email]> wrote:
>> >>>
>> >>> Hi Daniel
>> >>>
>> >>> Is this still the source for the cv_root view (not function)?
>> >>>
>> >>> +CREATE OR REPLACE VIEW cv_root AS
>> >>> + SELECT
>> >>> +  cv_id,
>> >>> +  cvterm_id AS root_cvterm_id
>> >>> + FROM cvterm
>> >>> + WHERE
>> >>> +  cvterm_id NOT IN ( SELECT subject_id FROM cvterm_relationship)    AND
>> >>> +  is_obsolete=0;
>> >>>
>> >>> This should be fast even for large rowcounts unless there is no indexing.
>> >>> Have you tried EXPLAIN?
>> >>>
>> >>> 118 seems high but I imagine that these are all helper terms.
>> >>>
>> >>> The load process relies on an XSLT step which doesn't scale well with
>> >>> larger ontologies. Someone could try looking into a faster processor than
>> >>> xsltproc, or rewriting the xslt, but it would be better to do a ground up
>> >>> rewrite either
>> >>>
>> >>>  1. perl - Bio::Chado::Schema
>> >>>  2. java - OWLTools plus the GBOL layer
>> >>>
>> >>> Populating graph_path should definitely be done via owltools and not
>> >>> home-grown code. See:
>> >>>   http://code.google.com/p/owltools/wiki/CommandLineExamples
>> >>>
>> >>> But backing up I don't know if I would write an ontology browser directly
>> >>> on top of the cv module or if I'd even write a new ontology browser.
>> >>>
>> >>> The cv module only supports a subset of the obo format spec, never mind
>> >>> OWL. This is fine for the kind of operations you would want to do within
>> >>> chado (e.g. join feature to cvterm to get the SO label; do queries involving
>> >>> transitive closure). But you're likely to start hitting limitations once you
>> >>> start building a browser. Even if it's a basic browser you might hit some
>> >>> naive limitations e.g. what constitutes a root in a multi-ontology instance.
>> >>> The fact you have the ontologies you say you have loaded means that maybe
>> >>> you want more than a basic browser.
>> >>>
>> >>> Why not use AmiGO 2?
>> >>>
>> >>> Examples:
>> >>>
>> >>> http://amigo2.geneontology.org/
>> >>>
>> >>> Source:
>> >>>
>> >>> https://github.com/kltm/amigo
>> >>>
>> >>> It's driven by a SOLR index rather than a relational backend. You could
>> >>> write the browser on top of this API or just use or skin the browser that's
>> >>> there. E.g.
>> >>>
>> >>> http://amigo2.geneontology.org/cgi-bin/amigo2/amigo/term/GO:0022008
>> >>>
>> >>> It's fairly generic, in fact we were going to set up a disease/phenotype
>> >>> instance at some point, and could help you through this process.
>> >>>
>> >>> Cheers
>> >>> Chris
>> >>>
>> >>> On Tue, Jul 16, 2013 at 2:11 PM, Daniel E. Cook
>> >>> <[hidden email]> wrote:
>> >>>>
>> >>>> Hello,
>> >>>>
>> >>>> I'm having a bit of trouble with the view function 'cv_root' when I load
>> >>>> in the gene ontology. I have loaded the human disease ontology, human
>> >>>> phenotype ontolgoy, and a few other ontologies.
>> >>>>
>> >>>> Everything is fine until I load the gene ontology (which, incidentally,
>> >>>> takes a *very* long time. Does anyone know how to speed this process up?).
>> >>>> Once the gene ontology is loaded, a few views go extremely slow. I know the
>> >>>> gene ontology dataset is large, but a count of the cv_root view can take a
>> >>>> couple of minutes. The result I get, by the way, is 118.
>> >>>>
>> >>>> I would like the cv_root view to work quicker because I am using it to
>> >>>> build a simple ontology browser. Does anyone have any ideas? I have tried
>> >>>> vacuuming. Not exactly sure what else to try.
>> >>>>
>> >>>> Thanks so much.
>> >>>> -DEC
>> >>>>
>> >>>>
>> >>>> ------------------------------------------------------------------------------
>> >>>> See everything from the browser to the database with AppDynamics
>> >>>> Get end-to-end visibility with application monitoring from AppDynamics
>> >>>> Isolate bottlenecks and diagnose root cause in seconds.
>> >>>> Start your free trial of AppDynamics Pro today!
>> >>>>
>> >>>> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
>> >>>> _______________________________________________
>> >>>> Gmod-schema mailing list
>> >>>> [hidden email]
>> >>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>> >>>>
>> >>>
>> >>
>> >
>>
>> ------------------------------------------------------------------------------
>> See everything from the browser to the database with AppDynamics
>> Get end-to-end visibility with application monitoring from AppDynamics
>> Isolate bottlenecks and diagnose root cause in seconds.
>> Start your free trial of AppDynamics Pro today!
>> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
>> _______________________________________________
>> Gmod-schema mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>
> ------------------------------------------------------------------------------
> See everything from the browser to the database with AppDynamics
> Get end-to-end visibility with application monitoring from AppDynamics
> Isolate bottlenecks and diagnose root cause in seconds.
> Start your free trial of AppDynamics Pro today!
> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
> _______________________________________________
> Gmod-schema mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gmod-schema

------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: cv_root view very slow for chado

Siddhartha Basu
On Fri, 19 Jul 2013, Daniel E. Cook wrote:

> Thank you - It looks like I am getting close. The problem is - It
> doesn't look like these files have the chromosome information so I am
> getting this error:
>
> gmod_bulk_load_gff3.pl --gfffile 'feature/ref_GRCh37.p10_top_level.gff3.sorted'
> Preparing data for inserting into the chado database
> (This may take a while ...)
> Unable to find srcfeature NC_000001.10 in the database.
>
> Where might I get a gff file that contains the chromosome information?
Try to load this file
ftp://ftp.ncbi.nih.gov/genomes/H_sapiens/GFF/ref_GRCh37.p10_scaffolds.gff3.gz
first and see if it works.

thanks,
-siddhartha


>
> Thank you!
>
> -Dan
>
>
> On Thu, Jul 18, 2013 at 8:21 PM, Siddhartha Basu <[hidden email]> wrote:
> > Hi Daniel,
> > GO annotations are generally available as GAF format
> > http://www.geneontology.org/GO.format.gaf-2_0.shtml
> > You could download human GO annotations from here,
> > http://www.geneontology.org/GO.downloads.annotations.shtml
> > OR here
> > http://www.ebi.ac.uk/GOA/human_release
> > In case of mapping identifier, you might need to use the corresponding gp2protein file
> > http://www.geneontology.org/gp2protein/
> >
> > Briefly, you load your features from GFF3, then load your GO annotations by linking
> > through feature_cvterm.
> > http://gmod.org/wiki/Chado_Tables#Table:_feature_cvterm
> > It is assumed you have already loaded GO in chado database.
> >
> > This might also addresses your cross posting here ....
> > http://www.biostars.org/p/76959/
> >
> > Hope this helps,
> > -siddhartha
> >
> > On Thu, 18 Jul 2013, Daniel E. Cook wrote:
> >
> >> Thanks again. One other question - I am interested in loading a list
> >> of all human genes into chado (preferably with GO annotations). I
> >> obtained a gff3 file from ensemble - but from the looks of it - the
> >> file does not have GO annotations. Is there somewhere I can obtain a
> >> gff3 file with GO annotations? What would be the best way to load a
> >> set of human genome annotations?
> >>
> >> Thanks,
> >> Dan
> >> -DEC
> >>
> >>
> >> On Wed, Jul 17, 2013 at 9:42 PM, Chris Mungall <[hidden email]> wrote:
> >> > My recommendation would be to keep things loosely coupled. Write your app
> >> > such that you can swap in and out ontology providers, gene annotation
> >> > providers, interaction providers, etc.
> >> >
> >> > We're working on something similar and should have some stable services you
> >> > can use for this - ping me again in a  couple of weeks
> >> >
> >> > Thanks for sorting out the query issue Scott.
> >> >
> >> > Cheers
> >> >
> >> >
> >> >
> >> > On Tue, Jul 16, 2013 at 4:13 PM, Daniel E. Cook
> >> > <[hidden email]> wrote:
> >> >>
> >> >> Chris,
> >> >>
> >> >> Thanks for the quick reply.  I should admit that I am a bit new at to
> >> >> chado. Broadly, my intention is to use the chado schema to pull data and
> >> >> draw network graphs linking genes, phenotypes, and diseases. However, I
> >> >> wanted to use django as my web framework and use a js visualization library
> >> >> on top of this, with chado as a backend.
> >> >>
> >> >> To answer you question - I am using the latest version of chado (1.23) -
> >> >> and that is the SQL of the cv_root view. Before loading the gene ontology,
> >> >> that view *was* very fast. It's only once that is loaded that that things
> >> >> slow down tremendously. It takes ~ 5 minutes to do a row count.
> >> >>
> >> >> The apparently high number of root terms may be due to the fact that I am
> >> >> loading a human phenotype ontology, gene ontology, and a disease ontology.
> >> >>
> >> >> I actually already did write an ontology browser, if you can call it that.
> >> >> I did it primarily as an exercise to familiarize myself with chado and
> >> >> chado+django together. Here is a screenshot to give you an idea of what I
> >> >> did: http://cl.ly/image/3P2C432D1h3o
> >> >>
> >> >> I think - for now - the AmiGO 2 browser may be more complex than what need
> >> >> (starting out), although it certainly does look useful.
> >> >>
> >> >> Any advice/tips/pointers are greatly appreciated!
> >> >>
> >> >> Thank you!
> >> >> Dan
> >> >>
> >> >> -DEC
> >> >>
> >> >>
> >> >> On Tue, Jul 16, 2013 at 5:25 PM, Chris Mungall <[hidden email]> wrote:
> >> >>>
> >> >>> Hi Daniel
> >> >>>
> >> >>> Is this still the source for the cv_root view (not function)?
> >> >>>
> >> >>> +CREATE OR REPLACE VIEW cv_root AS
> >> >>> + SELECT
> >> >>> +  cv_id,
> >> >>> +  cvterm_id AS root_cvterm_id
> >> >>> + FROM cvterm
> >> >>> + WHERE
> >> >>> +  cvterm_id NOT IN ( SELECT subject_id FROM cvterm_relationship)    AND
> >> >>> +  is_obsolete=0;
> >> >>>
> >> >>> This should be fast even for large rowcounts unless there is no indexing.
> >> >>> Have you tried EXPLAIN?
> >> >>>
> >> >>> 118 seems high but I imagine that these are all helper terms.
> >> >>>
> >> >>> The load process relies on an XSLT step which doesn't scale well with
> >> >>> larger ontologies. Someone could try looking into a faster processor than
> >> >>> xsltproc, or rewriting the xslt, but it would be better to do a ground up
> >> >>> rewrite either
> >> >>>
> >> >>>  1. perl - Bio::Chado::Schema
> >> >>>  2. java - OWLTools plus the GBOL layer
> >> >>>
> >> >>> Populating graph_path should definitely be done via owltools and not
> >> >>> home-grown code. See:
> >> >>>   http://code.google.com/p/owltools/wiki/CommandLineExamples
> >> >>>
> >> >>> But backing up I don't know if I would write an ontology browser directly
> >> >>> on top of the cv module or if I'd even write a new ontology browser.
> >> >>>
> >> >>> The cv module only supports a subset of the obo format spec, never mind
> >> >>> OWL. This is fine for the kind of operations you would want to do within
> >> >>> chado (e.g. join feature to cvterm to get the SO label; do queries involving
> >> >>> transitive closure). But you're likely to start hitting limitations once you
> >> >>> start building a browser. Even if it's a basic browser you might hit some
> >> >>> naive limitations e.g. what constitutes a root in a multi-ontology instance.
> >> >>> The fact you have the ontologies you say you have loaded means that maybe
> >> >>> you want more than a basic browser.
> >> >>>
> >> >>> Why not use AmiGO 2?
> >> >>>
> >> >>> Examples:
> >> >>>
> >> >>> http://amigo2.geneontology.org/
> >> >>>
> >> >>> Source:
> >> >>>
> >> >>> https://github.com/kltm/amigo
> >> >>>
> >> >>> It's driven by a SOLR index rather than a relational backend. You could
> >> >>> write the browser on top of this API or just use or skin the browser that's
> >> >>> there. E.g.
> >> >>>
> >> >>> http://amigo2.geneontology.org/cgi-bin/amigo2/amigo/term/GO:0022008
> >> >>>
> >> >>> It's fairly generic, in fact we were going to set up a disease/phenotype
> >> >>> instance at some point, and could help you through this process.
> >> >>>
> >> >>> Cheers
> >> >>> Chris
> >> >>>
> >> >>> On Tue, Jul 16, 2013 at 2:11 PM, Daniel E. Cook
> >> >>> <[hidden email]> wrote:
> >> >>>>
> >> >>>> Hello,
> >> >>>>
> >> >>>> I'm having a bit of trouble with the view function 'cv_root' when I load
> >> >>>> in the gene ontology. I have loaded the human disease ontology, human
> >> >>>> phenotype ontolgoy, and a few other ontologies.
> >> >>>>
> >> >>>> Everything is fine until I load the gene ontology (which, incidentally,
> >> >>>> takes a *very* long time. Does anyone know how to speed this process up?).
> >> >>>> Once the gene ontology is loaded, a few views go extremely slow. I know the
> >> >>>> gene ontology dataset is large, but a count of the cv_root view can take a
> >> >>>> couple of minutes. The result I get, by the way, is 118.
> >> >>>>
> >> >>>> I would like the cv_root view to work quicker because I am using it to
> >> >>>> build a simple ontology browser. Does anyone have any ideas? I have tried
> >> >>>> vacuuming. Not exactly sure what else to try.
> >> >>>>
> >> >>>> Thanks so much.
> >> >>>> -DEC
> >> >>>>
> >> >>>>
> >> >>>> ------------------------------------------------------------------------------
> >> >>>> See everything from the browser to the database with AppDynamics
> >> >>>> Get end-to-end visibility with application monitoring from AppDynamics
> >> >>>> Isolate bottlenecks and diagnose root cause in seconds.
> >> >>>> Start your free trial of AppDynamics Pro today!
> >> >>>>
> >> >>>> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
> >> >>>> _______________________________________________
> >> >>>> Gmod-schema mailing list
> >> >>>> [hidden email]
> >> >>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
> >> >>>>
> >> >>>
> >> >>
> >> >
> >>
> >> ------------------------------------------------------------------------------
> >> See everything from the browser to the database with AppDynamics
> >> Get end-to-end visibility with application monitoring from AppDynamics
> >> Isolate bottlenecks and diagnose root cause in seconds.
> >> Start your free trial of AppDynamics Pro today!
> >> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
> >> _______________________________________________
> >> Gmod-schema mailing list
> >> [hidden email]
> >> https://lists.sourceforge.net/lists/listinfo/gmod-schema
> >
> > ------------------------------------------------------------------------------
> > See everything from the browser to the database with AppDynamics
> > Get end-to-end visibility with application monitoring from AppDynamics
> > Isolate bottlenecks and diagnose root cause in seconds.
> > Start your free trial of AppDynamics Pro today!
> > http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
> > _______________________________________________
> > Gmod-schema mailing list
> > [hidden email]
> > https://lists.sourceforge.net/lists/listinfo/gmod-schema
>
> ------------------------------------------------------------------------------
> See everything from the browser to the database with AppDynamics
> Get end-to-end visibility with application monitoring from AppDynamics
> Isolate bottlenecks and diagnose root cause in seconds.
> Start your free trial of AppDynamics Pro today!
> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
> _______________________________________________
> Gmod-schema mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gmod-schema

------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: cv_root view very slow for chado

Daniel E. Cook
I tried both of them. For that one I get this error:

Unable to find srcfeature NT_077402.2 in the database.

This is the very first feature in that file...
-DEC


On Fri, Jul 19, 2013 at 1:31 PM, Siddhartha Basu <[hidden email]> wrote:

> On Fri, 19 Jul 2013, Daniel E. Cook wrote:
>
>> Thank you - It looks like I am getting close. The problem is - It
>> doesn't look like these files have the chromosome information so I am
>> getting this error:
>>
>> gmod_bulk_load_gff3.pl --gfffile 'feature/ref_GRCh37.p10_top_level.gff3.sorted'
>> Preparing data for inserting into the chado database
>> (This may take a while ...)
>> Unable to find srcfeature NC_000001.10 in the database.
>>
>> Where might I get a gff file that contains the chromosome information?
> Try to load this file
> ftp://ftp.ncbi.nih.gov/genomes/H_sapiens/GFF/ref_GRCh37.p10_scaffolds.gff3.gz
> first and see if it works.
>
> thanks,
> -siddhartha
>
>
>>
>> Thank you!
>>
>> -Dan
>>
>>
>> On Thu, Jul 18, 2013 at 8:21 PM, Siddhartha Basu <[hidden email]> wrote:
>> > Hi Daniel,
>> > GO annotations are generally available as GAF format
>> > http://www.geneontology.org/GO.format.gaf-2_0.shtml
>> > You could download human GO annotations from here,
>> > http://www.geneontology.org/GO.downloads.annotations.shtml
>> > OR here
>> > http://www.ebi.ac.uk/GOA/human_release
>> > In case of mapping identifier, you might need to use the corresponding gp2protein file
>> > http://www.geneontology.org/gp2protein/
>> >
>> > Briefly, you load your features from GFF3, then load your GO annotations by linking
>> > through feature_cvterm.
>> > http://gmod.org/wiki/Chado_Tables#Table:_feature_cvterm
>> > It is assumed you have already loaded GO in chado database.
>> >
>> > This might also addresses your cross posting here ....
>> > http://www.biostars.org/p/76959/
>> >
>> > Hope this helps,
>> > -siddhartha
>> >
>> > On Thu, 18 Jul 2013, Daniel E. Cook wrote:
>> >
>> >> Thanks again. One other question - I am interested in loading a list
>> >> of all human genes into chado (preferably with GO annotations). I
>> >> obtained a gff3 file from ensemble - but from the looks of it - the
>> >> file does not have GO annotations. Is there somewhere I can obtain a
>> >> gff3 file with GO annotations? What would be the best way to load a
>> >> set of human genome annotations?
>> >>
>> >> Thanks,
>> >> Dan
>> >> -DEC
>> >>
>> >>
>> >> On Wed, Jul 17, 2013 at 9:42 PM, Chris Mungall <[hidden email]> wrote:
>> >> > My recommendation would be to keep things loosely coupled. Write your app
>> >> > such that you can swap in and out ontology providers, gene annotation
>> >> > providers, interaction providers, etc.
>> >> >
>> >> > We're working on something similar and should have some stable services you
>> >> > can use for this - ping me again in a  couple of weeks
>> >> >
>> >> > Thanks for sorting out the query issue Scott.
>> >> >
>> >> > Cheers
>> >> >
>> >> >
>> >> >
>> >> > On Tue, Jul 16, 2013 at 4:13 PM, Daniel E. Cook
>> >> > <[hidden email]> wrote:
>> >> >>
>> >> >> Chris,
>> >> >>
>> >> >> Thanks for the quick reply.  I should admit that I am a bit new at to
>> >> >> chado. Broadly, my intention is to use the chado schema to pull data and
>> >> >> draw network graphs linking genes, phenotypes, and diseases. However, I
>> >> >> wanted to use django as my web framework and use a js visualization library
>> >> >> on top of this, with chado as a backend.
>> >> >>
>> >> >> To answer you question - I am using the latest version of chado (1.23) -
>> >> >> and that is the SQL of the cv_root view. Before loading the gene ontology,
>> >> >> that view *was* very fast. It's only once that is loaded that that things
>> >> >> slow down tremendously. It takes ~ 5 minutes to do a row count.
>> >> >>
>> >> >> The apparently high number of root terms may be due to the fact that I am
>> >> >> loading a human phenotype ontology, gene ontology, and a disease ontology.
>> >> >>
>> >> >> I actually already did write an ontology browser, if you can call it that.
>> >> >> I did it primarily as an exercise to familiarize myself with chado and
>> >> >> chado+django together. Here is a screenshot to give you an idea of what I
>> >> >> did: http://cl.ly/image/3P2C432D1h3o
>> >> >>
>> >> >> I think - for now - the AmiGO 2 browser may be more complex than what need
>> >> >> (starting out), although it certainly does look useful.
>> >> >>
>> >> >> Any advice/tips/pointers are greatly appreciated!
>> >> >>
>> >> >> Thank you!
>> >> >> Dan
>> >> >>
>> >> >> -DEC
>> >> >>
>> >> >>
>> >> >> On Tue, Jul 16, 2013 at 5:25 PM, Chris Mungall <[hidden email]> wrote:
>> >> >>>
>> >> >>> Hi Daniel
>> >> >>>
>> >> >>> Is this still the source for the cv_root view (not function)?
>> >> >>>
>> >> >>> +CREATE OR REPLACE VIEW cv_root AS
>> >> >>> + SELECT
>> >> >>> +  cv_id,
>> >> >>> +  cvterm_id AS root_cvterm_id
>> >> >>> + FROM cvterm
>> >> >>> + WHERE
>> >> >>> +  cvterm_id NOT IN ( SELECT subject_id FROM cvterm_relationship)    AND
>> >> >>> +  is_obsolete=0;
>> >> >>>
>> >> >>> This should be fast even for large rowcounts unless there is no indexing.
>> >> >>> Have you tried EXPLAIN?
>> >> >>>
>> >> >>> 118 seems high but I imagine that these are all helper terms.
>> >> >>>
>> >> >>> The load process relies on an XSLT step which doesn't scale well with
>> >> >>> larger ontologies. Someone could try looking into a faster processor than
>> >> >>> xsltproc, or rewriting the xslt, but it would be better to do a ground up
>> >> >>> rewrite either
>> >> >>>
>> >> >>>  1. perl - Bio::Chado::Schema
>> >> >>>  2. java - OWLTools plus the GBOL layer
>> >> >>>
>> >> >>> Populating graph_path should definitely be done via owltools and not
>> >> >>> home-grown code. See:
>> >> >>>   http://code.google.com/p/owltools/wiki/CommandLineExamples
>> >> >>>
>> >> >>> But backing up I don't know if I would write an ontology browser directly
>> >> >>> on top of the cv module or if I'd even write a new ontology browser.
>> >> >>>
>> >> >>> The cv module only supports a subset of the obo format spec, never mind
>> >> >>> OWL. This is fine for the kind of operations you would want to do within
>> >> >>> chado (e.g. join feature to cvterm to get the SO label; do queries involving
>> >> >>> transitive closure). But you're likely to start hitting limitations once you
>> >> >>> start building a browser. Even if it's a basic browser you might hit some
>> >> >>> naive limitations e.g. what constitutes a root in a multi-ontology instance.
>> >> >>> The fact you have the ontologies you say you have loaded means that maybe
>> >> >>> you want more than a basic browser.
>> >> >>>
>> >> >>> Why not use AmiGO 2?
>> >> >>>
>> >> >>> Examples:
>> >> >>>
>> >> >>> http://amigo2.geneontology.org/
>> >> >>>
>> >> >>> Source:
>> >> >>>
>> >> >>> https://github.com/kltm/amigo
>> >> >>>
>> >> >>> It's driven by a SOLR index rather than a relational backend. You could
>> >> >>> write the browser on top of this API or just use or skin the browser that's
>> >> >>> there. E.g.
>> >> >>>
>> >> >>> http://amigo2.geneontology.org/cgi-bin/amigo2/amigo/term/GO:0022008
>> >> >>>
>> >> >>> It's fairly generic, in fact we were going to set up a disease/phenotype
>> >> >>> instance at some point, and could help you through this process.
>> >> >>>
>> >> >>> Cheers
>> >> >>> Chris
>> >> >>>
>> >> >>> On Tue, Jul 16, 2013 at 2:11 PM, Daniel E. Cook
>> >> >>> <[hidden email]> wrote:
>> >> >>>>
>> >> >>>> Hello,
>> >> >>>>
>> >> >>>> I'm having a bit of trouble with the view function 'cv_root' when I load
>> >> >>>> in the gene ontology. I have loaded the human disease ontology, human
>> >> >>>> phenotype ontolgoy, and a few other ontologies.
>> >> >>>>
>> >> >>>> Everything is fine until I load the gene ontology (which, incidentally,
>> >> >>>> takes a *very* long time. Does anyone know how to speed this process up?).
>> >> >>>> Once the gene ontology is loaded, a few views go extremely slow. I know the
>> >> >>>> gene ontology dataset is large, but a count of the cv_root view can take a
>> >> >>>> couple of minutes. The result I get, by the way, is 118.
>> >> >>>>
>> >> >>>> I would like the cv_root view to work quicker because I am using it to
>> >> >>>> build a simple ontology browser. Does anyone have any ideas? I have tried
>> >> >>>> vacuuming. Not exactly sure what else to try.
>> >> >>>>
>> >> >>>> Thanks so much.
>> >> >>>> -DEC
>> >> >>>>
>> >> >>>>
>> >> >>>> ------------------------------------------------------------------------------
>> >> >>>> See everything from the browser to the database with AppDynamics
>> >> >>>> Get end-to-end visibility with application monitoring from AppDynamics
>> >> >>>> Isolate bottlenecks and diagnose root cause in seconds.
>> >> >>>> Start your free trial of AppDynamics Pro today!
>> >> >>>>
>> >> >>>> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
>> >> >>>> _______________________________________________
>> >> >>>> Gmod-schema mailing list
>> >> >>>> [hidden email]
>> >> >>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>> >> >>>>
>> >> >>>
>> >> >>
>> >> >
>> >>
>> >> ------------------------------------------------------------------------------
>> >> See everything from the browser to the database with AppDynamics
>> >> Get end-to-end visibility with application monitoring from AppDynamics
>> >> Isolate bottlenecks and diagnose root cause in seconds.
>> >> Start your free trial of AppDynamics Pro today!
>> >> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
>> >> _______________________________________________
>> >> Gmod-schema mailing list
>> >> [hidden email]
>> >> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>> >
>> > ------------------------------------------------------------------------------
>> > See everything from the browser to the database with AppDynamics
>> > Get end-to-end visibility with application monitoring from AppDynamics
>> > Isolate bottlenecks and diagnose root cause in seconds.
>> > Start your free trial of AppDynamics Pro today!
>> > http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
>> > _______________________________________________
>> > Gmod-schema mailing list
>> > [hidden email]
>> > https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>
>> ------------------------------------------------------------------------------
>> See everything from the browser to the database with AppDynamics
>> Get end-to-end visibility with application monitoring from AppDynamics
>> Isolate bottlenecks and diagnose root cause in seconds.
>> Start your free trial of AppDynamics Pro today!
>> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
>> _______________________________________________
>> Gmod-schema mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>
> ------------------------------------------------------------------------------
> See everything from the browser to the database with AppDynamics
> Get end-to-end visibility with application monitoring from AppDynamics
> Isolate bottlenecks and diagnose root cause in seconds.
> Start your free trial of AppDynamics Pro today!
> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
> _______________________________________________
> Gmod-schema mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gmod-schema

------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: cv_root view very slow for chado

Scott Cain
Hi Daniel,

I bet I know what the problem is (I don't have my computer handy so I can't look at the file): for reference sequences, the gff lines have to have both Name and ID tags that are the same (there's some ugly, distant history there). I'm guessing in the file Siddhartha linked to, they only have Name tags. Sorry about that.

Scott

Sent from my iPad

On Jul 19, 2013, at 3:01 PM, "Daniel E. Cook" <[hidden email]> wrote:

> I tried both of them. For that one I get this error:
>
> Unable to find srcfeature NT_077402.2 in the database.
>
> This is the very first feature in that file...
> -DEC
>
>
> On Fri, Jul 19, 2013 at 1:31 PM, Siddhartha Basu <[hidden email]> wrote:
>> On Fri, 19 Jul 2013, Daniel E. Cook wrote:
>>
>>> Thank you - It looks like I am getting close. The problem is - It
>>> doesn't look like these files have the chromosome information so I am
>>> getting this error:
>>>
>>> gmod_bulk_load_gff3.pl --gfffile 'feature/ref_GRCh37.p10_top_level.gff3.sorted'
>>> Preparing data for inserting into the chado database
>>> (This may take a while ...)
>>> Unable to find srcfeature NC_000001.10 in the database.
>>>
>>> Where might I get a gff file that contains the chromosome information?
>> Try to load this file
>> ftp://ftp.ncbi.nih.gov/genomes/H_sapiens/GFF/ref_GRCh37.p10_scaffolds.gff3.gz
>> first and see if it works.
>>
>> thanks,
>> -siddhartha
>>
>>
>>>
>>> Thank you!
>>>
>>> -Dan
>>>
>>>
>>> On Thu, Jul 18, 2013 at 8:21 PM, Siddhartha Basu <[hidden email]> wrote:
>>>> Hi Daniel,
>>>> GO annotations are generally available as GAF format
>>>> http://www.geneontology.org/GO.format.gaf-2_0.shtml
>>>> You could download human GO annotations from here,
>>>> http://www.geneontology.org/GO.downloads.annotations.shtml
>>>> OR here
>>>> http://www.ebi.ac.uk/GOA/human_release
>>>> In case of mapping identifier, you might need to use the corresponding gp2protein file
>>>> http://www.geneontology.org/gp2protein/
>>>>
>>>> Briefly, you load your features from GFF3, then load your GO annotations by linking
>>>> through feature_cvterm.
>>>> http://gmod.org/wiki/Chado_Tables#Table:_feature_cvterm
>>>> It is assumed you have already loaded GO in chado database.
>>>>
>>>> This might also addresses your cross posting here ....
>>>> http://www.biostars.org/p/76959/
>>>>
>>>> Hope this helps,
>>>> -siddhartha
>>>>
>>>> On Thu, 18 Jul 2013, Daniel E. Cook wrote:
>>>>
>>>>> Thanks again. One other question - I am interested in loading a list
>>>>> of all human genes into chado (preferably with GO annotations). I
>>>>> obtained a gff3 file from ensemble - but from the looks of it - the
>>>>> file does not have GO annotations. Is there somewhere I can obtain a
>>>>> gff3 file with GO annotations? What would be the best way to load a
>>>>> set of human genome annotations?
>>>>>
>>>>> Thanks,
>>>>> Dan
>>>>> -DEC
>>>>>
>>>>>
>>>>> On Wed, Jul 17, 2013 at 9:42 PM, Chris Mungall <[hidden email]> wrote:
>>>>>> My recommendation would be to keep things loosely coupled. Write your app
>>>>>> such that you can swap in and out ontology providers, gene annotation
>>>>>> providers, interaction providers, etc.
>>>>>>
>>>>>> We're working on something similar and should have some stable services you
>>>>>> can use for this - ping me again in a  couple of weeks
>>>>>>
>>>>>> Thanks for sorting out the query issue Scott.
>>>>>>
>>>>>> Cheers
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Jul 16, 2013 at 4:13 PM, Daniel E. Cook
>>>>>> <[hidden email]> wrote:
>>>>>>>
>>>>>>> Chris,
>>>>>>>
>>>>>>> Thanks for the quick reply.  I should admit that I am a bit new at to
>>>>>>> chado. Broadly, my intention is to use the chado schema to pull data and
>>>>>>> draw network graphs linking genes, phenotypes, and diseases. However, I
>>>>>>> wanted to use django as my web framework and use a js visualization library
>>>>>>> on top of this, with chado as a backend.
>>>>>>>
>>>>>>> To answer you question - I am using the latest version of chado (1.23) -
>>>>>>> and that is the SQL of the cv_root view. Before loading the gene ontology,
>>>>>>> that view *was* very fast. It's only once that is loaded that that things
>>>>>>> slow down tremendously. It takes ~ 5 minutes to do a row count.
>>>>>>>
>>>>>>> The apparently high number of root terms may be due to the fact that I am
>>>>>>> loading a human phenotype ontology, gene ontology, and a disease ontology.
>>>>>>>
>>>>>>> I actually already did write an ontology browser, if you can call it that.
>>>>>>> I did it primarily as an exercise to familiarize myself with chado and
>>>>>>> chado+django together. Here is a screenshot to give you an idea of what I
>>>>>>> did: http://cl.ly/image/3P2C432D1h3o
>>>>>>>
>>>>>>> I think - for now - the AmiGO 2 browser may be more complex than what need
>>>>>>> (starting out), although it certainly does look useful.
>>>>>>>
>>>>>>> Any advice/tips/pointers are greatly appreciated!
>>>>>>>
>>>>>>> Thank you!
>>>>>>> Dan
>>>>>>>
>>>>>>> -DEC
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Jul 16, 2013 at 5:25 PM, Chris Mungall <[hidden email]> wrote:
>>>>>>>>
>>>>>>>> Hi Daniel
>>>>>>>>
>>>>>>>> Is this still the source for the cv_root view (not function)?
>>>>>>>>
>>>>>>>> +CREATE OR REPLACE VIEW cv_root AS
>>>>>>>> + SELECT
>>>>>>>> +  cv_id,
>>>>>>>> +  cvterm_id AS root_cvterm_id
>>>>>>>> + FROM cvterm
>>>>>>>> + WHERE
>>>>>>>> +  cvterm_id NOT IN ( SELECT subject_id FROM cvterm_relationship)    AND
>>>>>>>> +  is_obsolete=0;
>>>>>>>>
>>>>>>>> This should be fast even for large rowcounts unless there is no indexing.
>>>>>>>> Have you tried EXPLAIN?
>>>>>>>>
>>>>>>>> 118 seems high but I imagine that these are all helper terms.
>>>>>>>>
>>>>>>>> The load process relies on an XSLT step which doesn't scale well with
>>>>>>>> larger ontologies. Someone could try looking into a faster processor than
>>>>>>>> xsltproc, or rewriting the xslt, but it would be better to do a ground up
>>>>>>>> rewrite either
>>>>>>>>
>>>>>>>> 1. perl - Bio::Chado::Schema
>>>>>>>> 2. java - OWLTools plus the GBOL layer
>>>>>>>>
>>>>>>>> Populating graph_path should definitely be done via owltools and not
>>>>>>>> home-grown code. See:
>>>>>>>>  http://code.google.com/p/owltools/wiki/CommandLineExamples
>>>>>>>>
>>>>>>>> But backing up I don't know if I would write an ontology browser directly
>>>>>>>> on top of the cv module or if I'd even write a new ontology browser.
>>>>>>>>
>>>>>>>> The cv module only supports a subset of the obo format spec, never mind
>>>>>>>> OWL. This is fine for the kind of operations you would want to do within
>>>>>>>> chado (e.g. join feature to cvterm to get the SO label; do queries involving
>>>>>>>> transitive closure). But you're likely to start hitting limitations once you
>>>>>>>> start building a browser. Even if it's a basic browser you might hit some
>>>>>>>> naive limitations e.g. what constitutes a root in a multi-ontology instance.
>>>>>>>> The fact you have the ontologies you say you have loaded means that maybe
>>>>>>>> you want more than a basic browser.
>>>>>>>>
>>>>>>>> Why not use AmiGO 2?
>>>>>>>>
>>>>>>>> Examples:
>>>>>>>>
>>>>>>>> http://amigo2.geneontology.org/
>>>>>>>>
>>>>>>>> Source:
>>>>>>>>
>>>>>>>> https://github.com/kltm/amigo
>>>>>>>>
>>>>>>>> It's driven by a SOLR index rather than a relational backend. You could
>>>>>>>> write the browser on top of this API or just use or skin the browser that's
>>>>>>>> there. E.g.
>>>>>>>>
>>>>>>>> http://amigo2.geneontology.org/cgi-bin/amigo2/amigo/term/GO:0022008
>>>>>>>>
>>>>>>>> It's fairly generic, in fact we were going to set up a disease/phenotype
>>>>>>>> instance at some point, and could help you through this process.
>>>>>>>>
>>>>>>>> Cheers
>>>>>>>> Chris
>>>>>>>>
>>>>>>>> On Tue, Jul 16, 2013 at 2:11 PM, Daniel E. Cook
>>>>>>>> <[hidden email]> wrote:
>>>>>>>>>
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> I'm having a bit of trouble with the view function 'cv_root' when I load
>>>>>>>>> in the gene ontology. I have loaded the human disease ontology, human
>>>>>>>>> phenotype ontolgoy, and a few other ontologies.
>>>>>>>>>
>>>>>>>>> Everything is fine until I load the gene ontology (which, incidentally,
>>>>>>>>> takes a *very* long time. Does anyone know how to speed this process up?).
>>>>>>>>> Once the gene ontology is loaded, a few views go extremely slow. I know the
>>>>>>>>> gene ontology dataset is large, but a count of the cv_root view can take a
>>>>>>>>> couple of minutes. The result I get, by the way, is 118.
>>>>>>>>>
>>>>>>>>> I would like the cv_root view to work quicker because I am using it to
>>>>>>>>> build a simple ontology browser. Does anyone have any ideas? I have tried
>>>>>>>>> vacuuming. Not exactly sure what else to try.
>>>>>>>>>
>>>>>>>>> Thanks so much.
>>>>>>>>> -DEC
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ------------------------------------------------------------------------------
>>>>>>>>> See everything from the browser to the database with AppDynamics
>>>>>>>>> Get end-to-end visibility with application monitoring from AppDynamics
>>>>>>>>> Isolate bottlenecks and diagnose root cause in seconds.
>>>>>>>>> Start your free trial of AppDynamics Pro today!
>>>>>>>>>
>>>>>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
>>>>>>>>> _______________________________________________
>>>>>>>>> Gmod-schema mailing list
>>>>>>>>> [hidden email]
>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>> See everything from the browser to the database with AppDynamics
>>>>> Get end-to-end visibility with application monitoring from AppDynamics
>>>>> Isolate bottlenecks and diagnose root cause in seconds.
>>>>> Start your free trial of AppDynamics Pro today!
>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
>>>>> _______________________________________________
>>>>> Gmod-schema mailing list
>>>>> [hidden email]
>>>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>>>
>>>> ------------------------------------------------------------------------------
>>>> See everything from the browser to the database with AppDynamics
>>>> Get end-to-end visibility with application monitoring from AppDynamics
>>>> Isolate bottlenecks and diagnose root cause in seconds.
>>>> Start your free trial of AppDynamics Pro today!
>>>> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
>>>> _______________________________________________
>>>> Gmod-schema mailing list
>>>> [hidden email]
>>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>>
>>> ------------------------------------------------------------------------------
>>> See everything from the browser to the database with AppDynamics
>>> Get end-to-end visibility with application monitoring from AppDynamics
>>> Isolate bottlenecks and diagnose root cause in seconds.
>>> Start your free trial of AppDynamics Pro today!
>>> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
>>> _______________________________________________
>>> Gmod-schema mailing list
>>> [hidden email]
>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>
>> ------------------------------------------------------------------------------
>> See everything from the browser to the database with AppDynamics
>> Get end-to-end visibility with application monitoring from AppDynamics
>> Isolate bottlenecks and diagnose root cause in seconds.
>> Start your free trial of AppDynamics Pro today!
>> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
>> _______________________________________________
>> Gmod-schema mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>
> ------------------------------------------------------------------------------
> See everything from the browser to the database with AppDynamics
> Get end-to-end visibility with application monitoring from AppDynamics
> Isolate bottlenecks and diagnose root cause in seconds.
> Start your free trial of AppDynamics Pro today!
> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
> _______________________________________________
> Gmod-schema mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gmod-schema

------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: cv_root view very slow for chado

Daniel E. Cook
I'll give it a try and get back to you Monday. I tried real quick with
the first line and it seemed to work. I have had a tough time finding
resources online regarding how to load human data into chado - so I'll
post my scripts once I am successful (-:

Thanks,
Dan


On Fri, Jul 19, 2013 at 3:06 PM, Scott Cain <[hidden email]> wrote:

> Hi Daniel,
>
> I bet I know what the problem is (I don't have my computer handy so I can't look at the file): for reference sequences, the gff lines have to have both Name and ID tags that are the same (there's some ugly, distant history there). I'm guessing in the file Siddhartha linked to, they only have Name tags. Sorry about that.
>
> Scott
>
> Sent from my iPad
>
> On Jul 19, 2013, at 3:01 PM, "Daniel E. Cook" <[hidden email]> wrote:
>
>> I tried both of them. For that one I get this error:
>>
>> Unable to find srcfeature NT_077402.2 in the database.
>>
>> This is the very first feature in that file...
>> -DEC
>>
>>
>> On Fri, Jul 19, 2013 at 1:31 PM, Siddhartha Basu <[hidden email]> wrote:
>>> On Fri, 19 Jul 2013, Daniel E. Cook wrote:
>>>
>>>> Thank you - It looks like I am getting close. The problem is - It
>>>> doesn't look like these files have the chromosome information so I am
>>>> getting this error:
>>>>
>>>> gmod_bulk_load_gff3.pl --gfffile 'feature/ref_GRCh37.p10_top_level.gff3.sorted'
>>>> Preparing data for inserting into the chado database
>>>> (This may take a while ...)
>>>> Unable to find srcfeature NC_000001.10 in the database.
>>>>
>>>> Where might I get a gff file that contains the chromosome information?
>>> Try to load this file
>>> ftp://ftp.ncbi.nih.gov/genomes/H_sapiens/GFF/ref_GRCh37.p10_scaffolds.gff3.gz
>>> first and see if it works.
>>>
>>> thanks,
>>> -siddhartha
>>>
>>>
>>>>
>>>> Thank you!
>>>>
>>>> -Dan
>>>>
>>>>
>>>> On Thu, Jul 18, 2013 at 8:21 PM, Siddhartha Basu <[hidden email]> wrote:
>>>>> Hi Daniel,
>>>>> GO annotations are generally available as GAF format
>>>>> http://www.geneontology.org/GO.format.gaf-2_0.shtml
>>>>> You could download human GO annotations from here,
>>>>> http://www.geneontology.org/GO.downloads.annotations.shtml
>>>>> OR here
>>>>> http://www.ebi.ac.uk/GOA/human_release
>>>>> In case of mapping identifier, you might need to use the corresponding gp2protein file
>>>>> http://www.geneontology.org/gp2protein/
>>>>>
>>>>> Briefly, you load your features from GFF3, then load your GO annotations by linking
>>>>> through feature_cvterm.
>>>>> http://gmod.org/wiki/Chado_Tables#Table:_feature_cvterm
>>>>> It is assumed you have already loaded GO in chado database.
>>>>>
>>>>> This might also addresses your cross posting here ....
>>>>> http://www.biostars.org/p/76959/
>>>>>
>>>>> Hope this helps,
>>>>> -siddhartha
>>>>>
>>>>> On Thu, 18 Jul 2013, Daniel E. Cook wrote:
>>>>>
>>>>>> Thanks again. One other question - I am interested in loading a list
>>>>>> of all human genes into chado (preferably with GO annotations). I
>>>>>> obtained a gff3 file from ensemble - but from the looks of it - the
>>>>>> file does not have GO annotations. Is there somewhere I can obtain a
>>>>>> gff3 file with GO annotations? What would be the best way to load a
>>>>>> set of human genome annotations?
>>>>>>
>>>>>> Thanks,
>>>>>> Dan
>>>>>> -DEC
>>>>>>
>>>>>>
>>>>>> On Wed, Jul 17, 2013 at 9:42 PM, Chris Mungall <[hidden email]> wrote:
>>>>>>> My recommendation would be to keep things loosely coupled. Write your app
>>>>>>> such that you can swap in and out ontology providers, gene annotation
>>>>>>> providers, interaction providers, etc.
>>>>>>>
>>>>>>> We're working on something similar and should have some stable services you
>>>>>>> can use for this - ping me again in a  couple of weeks
>>>>>>>
>>>>>>> Thanks for sorting out the query issue Scott.
>>>>>>>
>>>>>>> Cheers
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Jul 16, 2013 at 4:13 PM, Daniel E. Cook
>>>>>>> <[hidden email]> wrote:
>>>>>>>>
>>>>>>>> Chris,
>>>>>>>>
>>>>>>>> Thanks for the quick reply.  I should admit that I am a bit new at to
>>>>>>>> chado. Broadly, my intention is to use the chado schema to pull data and
>>>>>>>> draw network graphs linking genes, phenotypes, and diseases. However, I
>>>>>>>> wanted to use django as my web framework and use a js visualization library
>>>>>>>> on top of this, with chado as a backend.
>>>>>>>>
>>>>>>>> To answer you question - I am using the latest version of chado (1.23) -
>>>>>>>> and that is the SQL of the cv_root view. Before loading the gene ontology,
>>>>>>>> that view *was* very fast. It's only once that is loaded that that things
>>>>>>>> slow down tremendously. It takes ~ 5 minutes to do a row count.
>>>>>>>>
>>>>>>>> The apparently high number of root terms may be due to the fact that I am
>>>>>>>> loading a human phenotype ontology, gene ontology, and a disease ontology.
>>>>>>>>
>>>>>>>> I actually already did write an ontology browser, if you can call it that.
>>>>>>>> I did it primarily as an exercise to familiarize myself with chado and
>>>>>>>> chado+django together. Here is a screenshot to give you an idea of what I
>>>>>>>> did: http://cl.ly/image/3P2C432D1h3o
>>>>>>>>
>>>>>>>> I think - for now - the AmiGO 2 browser may be more complex than what need
>>>>>>>> (starting out), although it certainly does look useful.
>>>>>>>>
>>>>>>>> Any advice/tips/pointers are greatly appreciated!
>>>>>>>>
>>>>>>>> Thank you!
>>>>>>>> Dan
>>>>>>>>
>>>>>>>> -DEC
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Jul 16, 2013 at 5:25 PM, Chris Mungall <[hidden email]> wrote:
>>>>>>>>>
>>>>>>>>> Hi Daniel
>>>>>>>>>
>>>>>>>>> Is this still the source for the cv_root view (not function)?
>>>>>>>>>
>>>>>>>>> +CREATE OR REPLACE VIEW cv_root AS
>>>>>>>>> + SELECT
>>>>>>>>> +  cv_id,
>>>>>>>>> +  cvterm_id AS root_cvterm_id
>>>>>>>>> + FROM cvterm
>>>>>>>>> + WHERE
>>>>>>>>> +  cvterm_id NOT IN ( SELECT subject_id FROM cvterm_relationship)    AND
>>>>>>>>> +  is_obsolete=0;
>>>>>>>>>
>>>>>>>>> This should be fast even for large rowcounts unless there is no indexing.
>>>>>>>>> Have you tried EXPLAIN?
>>>>>>>>>
>>>>>>>>> 118 seems high but I imagine that these are all helper terms.
>>>>>>>>>
>>>>>>>>> The load process relies on an XSLT step which doesn't scale well with
>>>>>>>>> larger ontologies. Someone could try looking into a faster processor than
>>>>>>>>> xsltproc, or rewriting the xslt, but it would be better to do a ground up
>>>>>>>>> rewrite either
>>>>>>>>>
>>>>>>>>> 1. perl - Bio::Chado::Schema
>>>>>>>>> 2. java - OWLTools plus the GBOL layer
>>>>>>>>>
>>>>>>>>> Populating graph_path should definitely be done via owltools and not
>>>>>>>>> home-grown code. See:
>>>>>>>>>  http://code.google.com/p/owltools/wiki/CommandLineExamples
>>>>>>>>>
>>>>>>>>> But backing up I don't know if I would write an ontology browser directly
>>>>>>>>> on top of the cv module or if I'd even write a new ontology browser.
>>>>>>>>>
>>>>>>>>> The cv module only supports a subset of the obo format spec, never mind
>>>>>>>>> OWL. This is fine for the kind of operations you would want to do within
>>>>>>>>> chado (e.g. join feature to cvterm to get the SO label; do queries involving
>>>>>>>>> transitive closure). But you're likely to start hitting limitations once you
>>>>>>>>> start building a browser. Even if it's a basic browser you might hit some
>>>>>>>>> naive limitations e.g. what constitutes a root in a multi-ontology instance.
>>>>>>>>> The fact you have the ontologies you say you have loaded means that maybe
>>>>>>>>> you want more than a basic browser.
>>>>>>>>>
>>>>>>>>> Why not use AmiGO 2?
>>>>>>>>>
>>>>>>>>> Examples:
>>>>>>>>>
>>>>>>>>> http://amigo2.geneontology.org/
>>>>>>>>>
>>>>>>>>> Source:
>>>>>>>>>
>>>>>>>>> https://github.com/kltm/amigo
>>>>>>>>>
>>>>>>>>> It's driven by a SOLR index rather than a relational backend. You could
>>>>>>>>> write the browser on top of this API or just use or skin the browser that's
>>>>>>>>> there. E.g.
>>>>>>>>>
>>>>>>>>> http://amigo2.geneontology.org/cgi-bin/amigo2/amigo/term/GO:0022008
>>>>>>>>>
>>>>>>>>> It's fairly generic, in fact we were going to set up a disease/phenotype
>>>>>>>>> instance at some point, and could help you through this process.
>>>>>>>>>
>>>>>>>>> Cheers
>>>>>>>>> Chris
>>>>>>>>>
>>>>>>>>> On Tue, Jul 16, 2013 at 2:11 PM, Daniel E. Cook
>>>>>>>>> <[hidden email]> wrote:
>>>>>>>>>>
>>>>>>>>>> Hello,
>>>>>>>>>>
>>>>>>>>>> I'm having a bit of trouble with the view function 'cv_root' when I load
>>>>>>>>>> in the gene ontology. I have loaded the human disease ontology, human
>>>>>>>>>> phenotype ontolgoy, and a few other ontologies.
>>>>>>>>>>
>>>>>>>>>> Everything is fine until I load the gene ontology (which, incidentally,
>>>>>>>>>> takes a *very* long time. Does anyone know how to speed this process up?).
>>>>>>>>>> Once the gene ontology is loaded, a few views go extremely slow. I know the
>>>>>>>>>> gene ontology dataset is large, but a count of the cv_root view can take a
>>>>>>>>>> couple of minutes. The result I get, by the way, is 118.
>>>>>>>>>>
>>>>>>>>>> I would like the cv_root view to work quicker because I am using it to
>>>>>>>>>> build a simple ontology browser. Does anyone have any ideas? I have tried
>>>>>>>>>> vacuuming. Not exactly sure what else to try.
>>>>>>>>>>
>>>>>>>>>> Thanks so much.
>>>>>>>>>> -DEC
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ------------------------------------------------------------------------------
>>>>>>>>>> See everything from the browser to the database with AppDynamics
>>>>>>>>>> Get end-to-end visibility with application monitoring from AppDynamics
>>>>>>>>>> Isolate bottlenecks and diagnose root cause in seconds.
>>>>>>>>>> Start your free trial of AppDynamics Pro today!
>>>>>>>>>>
>>>>>>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Gmod-schema mailing list
>>>>>>>>>> [hidden email]
>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>> ------------------------------------------------------------------------------
>>>>>> See everything from the browser to the database with AppDynamics
>>>>>> Get end-to-end visibility with application monitoring from AppDynamics
>>>>>> Isolate bottlenecks and diagnose root cause in seconds.
>>>>>> Start your free trial of AppDynamics Pro today!
>>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
>>>>>> _______________________________________________
>>>>>> Gmod-schema mailing list
>>>>>> [hidden email]
>>>>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>> See everything from the browser to the database with AppDynamics
>>>>> Get end-to-end visibility with application monitoring from AppDynamics
>>>>> Isolate bottlenecks and diagnose root cause in seconds.
>>>>> Start your free trial of AppDynamics Pro today!
>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
>>>>> _______________________________________________
>>>>> Gmod-schema mailing list
>>>>> [hidden email]
>>>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>>>
>>>> ------------------------------------------------------------------------------
>>>> See everything from the browser to the database with AppDynamics
>>>> Get end-to-end visibility with application monitoring from AppDynamics
>>>> Isolate bottlenecks and diagnose root cause in seconds.
>>>> Start your free trial of AppDynamics Pro today!
>>>> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
>>>> _______________________________________________
>>>> Gmod-schema mailing list
>>>> [hidden email]
>>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>>
>>> ------------------------------------------------------------------------------
>>> See everything from the browser to the database with AppDynamics
>>> Get end-to-end visibility with application monitoring from AppDynamics
>>> Isolate bottlenecks and diagnose root cause in seconds.
>>> Start your free trial of AppDynamics Pro today!
>>> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
>>> _______________________________________________
>>> Gmod-schema mailing list
>>> [hidden email]
>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>
>> ------------------------------------------------------------------------------
>> See everything from the browser to the database with AppDynamics
>> Get end-to-end visibility with application monitoring from AppDynamics
>> Isolate bottlenecks and diagnose root cause in seconds.
>> Start your free trial of AppDynamics Pro today!
>> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
>> _______________________________________________
>> Gmod-schema mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/gmod-schema

------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: cv_root view very slow for chado

Daniel E. Cook
I tried fiddling with the refseq gff files and played around with a
few other options including bp_genbank2gff ... but ultimately I did
find a solution.

There is a script that is part of the biotoolbox package
(https://code.google.com/p/biotoolbox/) called ucsc_table2gff3.pl.

This worked for me...

mkdir gff3

cd gff3

wget 'https://biotoolbox.googlecode.com/svn-history/r600/trunk/scripts/ucsc_table2gff3.pl'

perl ucsc_table2gff3.pl --ftp refgene --db hg19 --table refGene --nocds


# Load Chromosomes

gmod_gff3_preprocessor.pl --gfffile 'hg19_chromInfo.gff3'

gmod_bulk_load_gff3.pl --gfffile 'hg19_chromInfo.gff3.sorted' --recreate_cache


# Load Data

# gmod_gff3_preprocessor.pl --gfffile 'hg19_refGene.gff3' #
Apparently, you do not need to pre-process this file.

gmod_bulk_load_gff3.pl --gfffile 'hg19_refGene.gff3' --dbxref



I am still working on linking these with GO terms and other entities
but this is a start...

Dan
-DEC


On Fri, Jul 19, 2013 at 4:06 PM, Daniel E. Cook
<[hidden email]> wrote:

> I'll give it a try and get back to you Monday. I tried real quick with
> the first line and it seemed to work. I have had a tough time finding
> resources online regarding how to load human data into chado - so I'll
> post my scripts once I am successful (-:
>
> Thanks,
> Dan
>
>
> On Fri, Jul 19, 2013 at 3:06 PM, Scott Cain <[hidden email]> wrote:
>> Hi Daniel,
>>
>> I bet I know what the problem is (I don't have my computer handy so I can't look at the file): for reference sequences, the gff lines have to have both Name and ID tags that are the same (there's some ugly, distant history there). I'm guessing in the file Siddhartha linked to, they only have Name tags. Sorry about that.
>>
>> Scott
>>
>> Sent from my iPad
>>
>> On Jul 19, 2013, at 3:01 PM, "Daniel E. Cook" <[hidden email]> wrote:
>>
>>> I tried both of them. For that one I get this error:
>>>
>>> Unable to find srcfeature NT_077402.2 in the database.
>>>
>>> This is the very first feature in that file...
>>> -DEC
>>>
>>>
>>> On Fri, Jul 19, 2013 at 1:31 PM, Siddhartha Basu <[hidden email]> wrote:
>>>> On Fri, 19 Jul 2013, Daniel E. Cook wrote:
>>>>
>>>>> Thank you - It looks like I am getting close. The problem is - It
>>>>> doesn't look like these files have the chromosome information so I am
>>>>> getting this error:
>>>>>
>>>>> gmod_bulk_load_gff3.pl --gfffile 'feature/ref_GRCh37.p10_top_level.gff3.sorted'
>>>>> Preparing data for inserting into the chado database
>>>>> (This may take a while ...)
>>>>> Unable to find srcfeature NC_000001.10 in the database.
>>>>>
>>>>> Where might I get a gff file that contains the chromosome information?
>>>> Try to load this file
>>>> ftp://ftp.ncbi.nih.gov/genomes/H_sapiens/GFF/ref_GRCh37.p10_scaffolds.gff3.gz
>>>> first and see if it works.
>>>>
>>>> thanks,
>>>> -siddhartha
>>>>
>>>>
>>>>>
>>>>> Thank you!
>>>>>
>>>>> -Dan
>>>>>
>>>>>
>>>>> On Thu, Jul 18, 2013 at 8:21 PM, Siddhartha Basu <[hidden email]> wrote:
>>>>>> Hi Daniel,
>>>>>> GO annotations are generally available as GAF format
>>>>>> http://www.geneontology.org/GO.format.gaf-2_0.shtml
>>>>>> You could download human GO annotations from here,
>>>>>> http://www.geneontology.org/GO.downloads.annotations.shtml
>>>>>> OR here
>>>>>> http://www.ebi.ac.uk/GOA/human_release
>>>>>> In case of mapping identifier, you might need to use the corresponding gp2protein file
>>>>>> http://www.geneontology.org/gp2protein/
>>>>>>
>>>>>> Briefly, you load your features from GFF3, then load your GO annotations by linking
>>>>>> through feature_cvterm.
>>>>>> http://gmod.org/wiki/Chado_Tables#Table:_feature_cvterm
>>>>>> It is assumed you have already loaded GO in chado database.
>>>>>>
>>>>>> This might also addresses your cross posting here ....
>>>>>> http://www.biostars.org/p/76959/
>>>>>>
>>>>>> Hope this helps,
>>>>>> -siddhartha
>>>>>>
>>>>>> On Thu, 18 Jul 2013, Daniel E. Cook wrote:
>>>>>>
>>>>>>> Thanks again. One other question - I am interested in loading a list
>>>>>>> of all human genes into chado (preferably with GO annotations). I
>>>>>>> obtained a gff3 file from ensemble - but from the looks of it - the
>>>>>>> file does not have GO annotations. Is there somewhere I can obtain a
>>>>>>> gff3 file with GO annotations? What would be the best way to load a
>>>>>>> set of human genome annotations?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Dan
>>>>>>> -DEC
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jul 17, 2013 at 9:42 PM, Chris Mungall <[hidden email]> wrote:
>>>>>>>> My recommendation would be to keep things loosely coupled. Write your app
>>>>>>>> such that you can swap in and out ontology providers, gene annotation
>>>>>>>> providers, interaction providers, etc.
>>>>>>>>
>>>>>>>> We're working on something similar and should have some stable services you
>>>>>>>> can use for this - ping me again in a  couple of weeks
>>>>>>>>
>>>>>>>> Thanks for sorting out the query issue Scott.
>>>>>>>>
>>>>>>>> Cheers
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Jul 16, 2013 at 4:13 PM, Daniel E. Cook
>>>>>>>> <[hidden email]> wrote:
>>>>>>>>>
>>>>>>>>> Chris,
>>>>>>>>>
>>>>>>>>> Thanks for the quick reply.  I should admit that I am a bit new at to
>>>>>>>>> chado. Broadly, my intention is to use the chado schema to pull data and
>>>>>>>>> draw network graphs linking genes, phenotypes, and diseases. However, I
>>>>>>>>> wanted to use django as my web framework and use a js visualization library
>>>>>>>>> on top of this, with chado as a backend.
>>>>>>>>>
>>>>>>>>> To answer you question - I am using the latest version of chado (1.23) -
>>>>>>>>> and that is the SQL of the cv_root view. Before loading the gene ontology,
>>>>>>>>> that view *was* very fast. It's only once that is loaded that that things
>>>>>>>>> slow down tremendously. It takes ~ 5 minutes to do a row count.
>>>>>>>>>
>>>>>>>>> The apparently high number of root terms may be due to the fact that I am
>>>>>>>>> loading a human phenotype ontology, gene ontology, and a disease ontology.
>>>>>>>>>
>>>>>>>>> I actually already did write an ontology browser, if you can call it that.
>>>>>>>>> I did it primarily as an exercise to familiarize myself with chado and
>>>>>>>>> chado+django together. Here is a screenshot to give you an idea of what I
>>>>>>>>> did: http://cl.ly/image/3P2C432D1h3o
>>>>>>>>>
>>>>>>>>> I think - for now - the AmiGO 2 browser may be more complex than what need
>>>>>>>>> (starting out), although it certainly does look useful.
>>>>>>>>>
>>>>>>>>> Any advice/tips/pointers are greatly appreciated!
>>>>>>>>>
>>>>>>>>> Thank you!
>>>>>>>>> Dan
>>>>>>>>>
>>>>>>>>> -DEC
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Jul 16, 2013 at 5:25 PM, Chris Mungall <[hidden email]> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi Daniel
>>>>>>>>>>
>>>>>>>>>> Is this still the source for the cv_root view (not function)?
>>>>>>>>>>
>>>>>>>>>> +CREATE OR REPLACE VIEW cv_root AS
>>>>>>>>>> + SELECT
>>>>>>>>>> +  cv_id,
>>>>>>>>>> +  cvterm_id AS root_cvterm_id
>>>>>>>>>> + FROM cvterm
>>>>>>>>>> + WHERE
>>>>>>>>>> +  cvterm_id NOT IN ( SELECT subject_id FROM cvterm_relationship)    AND
>>>>>>>>>> +  is_obsolete=0;
>>>>>>>>>>
>>>>>>>>>> This should be fast even for large rowcounts unless there is no indexing.
>>>>>>>>>> Have you tried EXPLAIN?
>>>>>>>>>>
>>>>>>>>>> 118 seems high but I imagine that these are all helper terms.
>>>>>>>>>>
>>>>>>>>>> The load process relies on an XSLT step which doesn't scale well with
>>>>>>>>>> larger ontologies. Someone could try looking into a faster processor than
>>>>>>>>>> xsltproc, or rewriting the xslt, but it would be better to do a ground up
>>>>>>>>>> rewrite either
>>>>>>>>>>
>>>>>>>>>> 1. perl - Bio::Chado::Schema
>>>>>>>>>> 2. java - OWLTools plus the GBOL layer
>>>>>>>>>>
>>>>>>>>>> Populating graph_path should definitely be done via owltools and not
>>>>>>>>>> home-grown code. See:
>>>>>>>>>>  http://code.google.com/p/owltools/wiki/CommandLineExamples
>>>>>>>>>>
>>>>>>>>>> But backing up I don't know if I would write an ontology browser directly
>>>>>>>>>> on top of the cv module or if I'd even write a new ontology browser.
>>>>>>>>>>
>>>>>>>>>> The cv module only supports a subset of the obo format spec, never mind
>>>>>>>>>> OWL. This is fine for the kind of operations you would want to do within
>>>>>>>>>> chado (e.g. join feature to cvterm to get the SO label; do queries involving
>>>>>>>>>> transitive closure). But you're likely to start hitting limitations once you
>>>>>>>>>> start building a browser. Even if it's a basic browser you might hit some
>>>>>>>>>> naive limitations e.g. what constitutes a root in a multi-ontology instance.
>>>>>>>>>> The fact you have the ontologies you say you have loaded means that maybe
>>>>>>>>>> you want more than a basic browser.
>>>>>>>>>>
>>>>>>>>>> Why not use AmiGO 2?
>>>>>>>>>>
>>>>>>>>>> Examples:
>>>>>>>>>>
>>>>>>>>>> http://amigo2.geneontology.org/
>>>>>>>>>>
>>>>>>>>>> Source:
>>>>>>>>>>
>>>>>>>>>> https://github.com/kltm/amigo
>>>>>>>>>>
>>>>>>>>>> It's driven by a SOLR index rather than a relational backend. You could
>>>>>>>>>> write the browser on top of this API or just use or skin the browser that's
>>>>>>>>>> there. E.g.
>>>>>>>>>>
>>>>>>>>>> http://amigo2.geneontology.org/cgi-bin/amigo2/amigo/term/GO:0022008
>>>>>>>>>>
>>>>>>>>>> It's fairly generic, in fact we were going to set up a disease/phenotype
>>>>>>>>>> instance at some point, and could help you through this process.
>>>>>>>>>>
>>>>>>>>>> Cheers
>>>>>>>>>> Chris
>>>>>>>>>>
>>>>>>>>>> On Tue, Jul 16, 2013 at 2:11 PM, Daniel E. Cook
>>>>>>>>>> <[hidden email]> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hello,
>>>>>>>>>>>
>>>>>>>>>>> I'm having a bit of trouble with the view function 'cv_root' when I load
>>>>>>>>>>> in the gene ontology. I have loaded the human disease ontology, human
>>>>>>>>>>> phenotype ontolgoy, and a few other ontologies.
>>>>>>>>>>>
>>>>>>>>>>> Everything is fine until I load the gene ontology (which, incidentally,
>>>>>>>>>>> takes a *very* long time. Does anyone know how to speed this process up?).
>>>>>>>>>>> Once the gene ontology is loaded, a few views go extremely slow. I know the
>>>>>>>>>>> gene ontology dataset is large, but a count of the cv_root view can take a
>>>>>>>>>>> couple of minutes. The result I get, by the way, is 118.
>>>>>>>>>>>
>>>>>>>>>>> I would like the cv_root view to work quicker because I am using it to
>>>>>>>>>>> build a simple ontology browser. Does anyone have any ideas? I have tried
>>>>>>>>>>> vacuuming. Not exactly sure what else to try.
>>>>>>>>>>>
>>>>>>>>>>> Thanks so much.
>>>>>>>>>>> -DEC
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ------------------------------------------------------------------------------
>>>>>>>>>>> See everything from the browser to the database with AppDynamics
>>>>>>>>>>> Get end-to-end visibility with application monitoring from AppDynamics
>>>>>>>>>>> Isolate bottlenecks and diagnose root cause in seconds.
>>>>>>>>>>> Start your free trial of AppDynamics Pro today!
>>>>>>>>>>>
>>>>>>>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> Gmod-schema mailing list
>>>>>>>>>>> [hidden email]
>>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> ------------------------------------------------------------------------------
>>>>>>> See everything from the browser to the database with AppDynamics
>>>>>>> Get end-to-end visibility with application monitoring from AppDynamics
>>>>>>> Isolate bottlenecks and diagnose root cause in seconds.
>>>>>>> Start your free trial of AppDynamics Pro today!
>>>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
>>>>>>> _______________________________________________
>>>>>>> Gmod-schema mailing list
>>>>>>> [hidden email]
>>>>>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>>>>>
>>>>>> ------------------------------------------------------------------------------
>>>>>> See everything from the browser to the database with AppDynamics
>>>>>> Get end-to-end visibility with application monitoring from AppDynamics
>>>>>> Isolate bottlenecks and diagnose root cause in seconds.
>>>>>> Start your free trial of AppDynamics Pro today!
>>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
>>>>>> _______________________________________________
>>>>>> Gmod-schema mailing list
>>>>>> [hidden email]
>>>>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>> See everything from the browser to the database with AppDynamics
>>>>> Get end-to-end visibility with application monitoring from AppDynamics
>>>>> Isolate bottlenecks and diagnose root cause in seconds.
>>>>> Start your free trial of AppDynamics Pro today!
>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
>>>>> _______________________________________________
>>>>> Gmod-schema mailing list
>>>>> [hidden email]
>>>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>>>
>>>> ------------------------------------------------------------------------------
>>>> See everything from the browser to the database with AppDynamics
>>>> Get end-to-end visibility with application monitoring from AppDynamics
>>>> Isolate bottlenecks and diagnose root cause in seconds.
>>>> Start your free trial of AppDynamics Pro today!
>>>> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
>>>> _______________________________________________
>>>> Gmod-schema mailing list
>>>> [hidden email]
>>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>>
>>> ------------------------------------------------------------------------------
>>> See everything from the browser to the database with AppDynamics
>>> Get end-to-end visibility with application monitoring from AppDynamics
>>> Isolate bottlenecks and diagnose root cause in seconds.
>>> Start your free trial of AppDynamics Pro today!
>>> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
>>> _______________________________________________
>>> Gmod-schema mailing list
>>> [hidden email]
>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema

------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema