stag-storenode.pl overwrites existing cvterms in vocabularies different from the one being loaded

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

stag-storenode.pl overwrites existing cvterms in vocabularies different from the one being loaded

Mara Kim
Hello again gmod-ers!

The stag-storenode.pl script overwrites (ie. UPDATES instead of INSERTS on cvterm) existing cvterms, despite the terms being in separate namespaces.  This is obvious when inserting a vocabulary term without a definition, which results in a 'new' term withthe definition of the original term.  The only thing shared by the original cvterm and the new cvterm seems to be the order that they appear in the .obo file given to go2chadoxml, which translates into the order they appear in the resulting xml file.

Am I misusing ontologies here?  Is it not alright to simultaneously use multiple ontologies in different obo files?  I thought it was alright so long as they were in distinct namespaces.

Thanks again,
Mara

------------------------------------------------------------------------------
Minimize network downtime and maximize team effectiveness.
Reduce network management and security costs.Learn how to hire
the most talented Cisco Certified professionals. Visit the
Employer Resources Portal
http://www.cisco.com/web/learning/employer_resources/index.html
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: stag-storenode.pl overwrites existing cvterms in vocabularies different from the one being loaded

Scott Cain
Hi Mara,

That shouldn't be happening. For example, I know I've got entries in my cvterm table that have the same name but belong to different cv. Can you tell us what combinations of ontologies did this so we can try to reproduce it?

Thanks,
Scott


Sent from my iPhone

On Apr 4, 2013, at 7:24 AM, Mara Kim <[hidden email]> wrote:

Hello again gmod-ers!

The stag-storenode.pl script overwrites (ie. UPDATES instead of INSERTS on cvterm) existing cvterms, despite the terms being in separate namespaces.  This is obvious when inserting a vocabulary term without a definition, which results in a 'new' term withthe definition of the original term.  The only thing shared by the original cvterm and the new cvterm seems to be the order that they appear in the .obo file given to go2chadoxml, which translates into the order they appear in the resulting xml file.

Am I misusing ontologies here?  Is it not alright to simultaneously use multiple ontologies in different obo files?  I thought it was alright so long as they were in distinct namespaces.

Thanks again,
Mara
------------------------------------------------------------------------------
Minimize network downtime and maximize team effectiveness.
Reduce network management and security costs.Learn how to hire
the most talented Cisco Certified professionals. Visit the
Employer Resources Portal
http://www.cisco.com/web/learning/employer_resources/index.html
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema

------------------------------------------------------------------------------
Minimize network downtime and maximize team effectiveness.
Reduce network management and security costs.Learn how to hire
the most talented Cisco Certified professionals. Visit the
Employer Resources Portal
http://www.cisco.com/web/learning/employer_resources/index.html
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: stag-storenode.pl overwrites existing cvterms in vocabularies different from the one being loaded

Bob MacCallum
We see cvterms move from one cv to another when we load in multiple
ontologies with terms "borrowed" from each other.

For example (off the top of my head), load PATO and a term will belong
to cv.name="quality".
Then load EFO or something and it may switch to cv.name="efo".

It's not really a problem for us because we look up all cvterms via
dbxrefs (rather than cv.name and cvterm.name).  However we did find
that make_cvtermpath.pl (which uses cv.name) was missing terms that
had moved to other cvs.  So we made our own version, lifting heavily
from the GMOD version:
https://github.com/bobular/VBPopBio/blob/master/api/Bio-Chado-VBPopBio/bin/make_cvtermpath.pl

HTH!


On Mon, Apr 8, 2013 at 10:06 AM, Scott Cain <[hidden email]> wrote:

> Hi Mara,
>
> That shouldn't be happening. For example, I know I've got entries in my
> cvterm table that have the same name but belong to different cv. Can you
> tell us what combinations of ontologies did this so we can try to reproduce
> it?
>
> Thanks,
> Scott
>
>
> Sent from my iPhone
>
> On Apr 4, 2013, at 7:24 AM, Mara Kim <[hidden email]> wrote:
>
> Hello again gmod-ers!
>
> The stag-storenode.pl script overwrites (ie. UPDATES instead of INSERTS on
> cvterm) existing cvterms, despite the terms being in separate namespaces.
> This is obvious when inserting a vocabulary term without a definition, which
> results in a 'new' term withthe definition of the original term.  The only
> thing shared by the original cvterm and the new cvterm seems to be the order
> that they appear in the .obo file given to go2chadoxml, which translates
> into the order they appear in the resulting xml file.
>
> Am I misusing ontologies here?  Is it not alright to simultaneously use
> multiple ontologies in different obo files?  I thought it was alright so
> long as they were in distinct namespaces.
>
> Thanks again,
> Mara
>
> ------------------------------------------------------------------------------
> Minimize network downtime and maximize team effectiveness.
> Reduce network management and security costs.Learn how to hire
> the most talented Cisco Certified professionals. Visit the
> Employer Resources Portal
> http://www.cisco.com/web/learning/employer_resources/index.html
>
> _______________________________________________
> Gmod-schema mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>
>
> ------------------------------------------------------------------------------
> Minimize network downtime and maximize team effectiveness.
> Reduce network management and security costs.Learn how to hire
> the most talented Cisco Certified professionals. Visit the
> Employer Resources Portal
> http://www.cisco.com/web/learning/employer_resources/index.html
> _______________________________________________
> Gmod-schema mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>

------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: stag-storenode.pl overwrites existing cvterms in vocabularies different from the one being loaded

Naama Menda-3

make_cvtermpath.pl populates cvtermpath based on existing cverms.

For updating cvterms, or loading new ontologies, you might want to try using gmod_load_cvterms.pl

with the -u option it updates existing terms, and should not overwrite terms in other namespaces.

-Naama


On Tue, Apr 9, 2013 at 8:11 AM, Bob MacCallum <[hidden email]> wrote:
We see cvterms move from one cv to another when we load in multiple
ontologies with terms "borrowed" from each other.

For example (off the top of my head), load PATO and a term will belong
to cv.name="quality".
Then load EFO or something and it may switch to cv.name="efo".

It's not really a problem for us because we look up all cvterms via
dbxrefs (rather than cv.name and cvterm.name).  However we did find
that make_cvtermpath.pl (which uses cv.name) was missing terms that
had moved to other cvs.  So we made our own version, lifting heavily
from the GMOD version:
https://github.com/bobular/VBPopBio/blob/master/api/Bio-Chado-VBPopBio/bin/make_cvtermpath.pl

HTH!


On Mon, Apr 8, 2013 at 10:06 AM, Scott Cain <[hidden email]> wrote:
> Hi Mara,
>
> That shouldn't be happening. For example, I know I've got entries in my
> cvterm table that have the same name but belong to different cv. Can you
> tell us what combinations of ontologies did this so we can try to reproduce
> it?
>
> Thanks,
> Scott
>
>
> Sent from my iPhone
>
> On Apr 4, 2013, at 7:24 AM, Mara Kim <[hidden email]> wrote:
>
> Hello again gmod-ers!
>
> The stag-storenode.pl script overwrites (ie. UPDATES instead of INSERTS on
> cvterm) existing cvterms, despite the terms being in separate namespaces.
> This is obvious when inserting a vocabulary term without a definition, which
> results in a 'new' term withthe definition of the original term.  The only
> thing shared by the original cvterm and the new cvterm seems to be the order
> that they appear in the .obo file given to go2chadoxml, which translates
> into the order they appear in the resulting xml file.
>
> Am I misusing ontologies here?  Is it not alright to simultaneously use
> multiple ontologies in different obo files?  I thought it was alright so
> long as they were in distinct namespaces.
>
> Thanks again,
> Mara
>
> ------------------------------------------------------------------------------
> Minimize network downtime and maximize team effectiveness.
> Reduce network management and security costs.Learn how to hire
> the most talented Cisco Certified professionals. Visit the
> Employer Resources Portal
> http://www.cisco.com/web/learning/employer_resources/index.html
>
> _______________________________________________
> Gmod-schema mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>
>
> ------------------------------------------------------------------------------
> Minimize network downtime and maximize team effectiveness.
> Reduce network management and security costs.Learn how to hire
> the most talented Cisco Certified professionals. Visit the
> Employer Resources Portal
> http://www.cisco.com/web/learning/employer_resources/index.html
> _______________________________________________
> Gmod-schema mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>

------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema


------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: stag-storenode.pl overwrites existing cvterms in vocabularies different from the one being loaded

Bob MacCallum
Hi Naama,
Just to clarify, the experiences I wrote about were using stag-storenode.pl

You are correct to clarify that make_cvtermpath has a different
purpose altogether.

We have also used gmod_load_cvterms.pl successfully but can't quite
remember why we made the switch to stag-storenode.
I think it was something to do with performance when loading GAZ.

cheers,
Bob.


On Tue, Apr 9, 2013 at 3:08 PM, Naama Menda <[hidden email]> wrote:

>
> make_cvtermpath.pl populates cvtermpath based on existing cverms.
>
> For updating cvterms, or loading new ontologies, you might want to try using
> gmod_load_cvterms.pl
> http://gmod.svn.sourceforge.net/viewvc/gmod/schema/trunk/chado/bin/gmod_load_cvterms.pl?revision=25286&view=markup
>
> with the -u option it updates existing terms, and should not overwrite terms
> in other namespaces.
>
> -Naama
>
>
> On Tue, Apr 9, 2013 at 8:11 AM, Bob MacCallum <[hidden email]>
> wrote:
>>
>> We see cvterms move from one cv to another when we load in multiple
>> ontologies with terms "borrowed" from each other.
>>
>> For example (off the top of my head), load PATO and a term will belong
>> to cv.name="quality".
>> Then load EFO or something and it may switch to cv.name="efo".
>>
>> It's not really a problem for us because we look up all cvterms via
>> dbxrefs (rather than cv.name and cvterm.name).  However we did find
>> that make_cvtermpath.pl (which uses cv.name) was missing terms that
>> had moved to other cvs.  So we made our own version, lifting heavily
>> from the GMOD version:
>>
>> https://github.com/bobular/VBPopBio/blob/master/api/Bio-Chado-VBPopBio/bin/make_cvtermpath.pl
>>
>> HTH!
>>
>>
>> On Mon, Apr 8, 2013 at 10:06 AM, Scott Cain <[hidden email]> wrote:
>> > Hi Mara,
>> >
>> > That shouldn't be happening. For example, I know I've got entries in my
>> > cvterm table that have the same name but belong to different cv. Can you
>> > tell us what combinations of ontologies did this so we can try to
>> > reproduce
>> > it?
>> >
>> > Thanks,
>> > Scott
>> >
>> >
>> > Sent from my iPhone
>> >
>> > On Apr 4, 2013, at 7:24 AM, Mara Kim <[hidden email]> wrote:
>> >
>> > Hello again gmod-ers!
>> >
>> > The stag-storenode.pl script overwrites (ie. UPDATES instead of INSERTS
>> > on
>> > cvterm) existing cvterms, despite the terms being in separate
>> > namespaces.
>> > This is obvious when inserting a vocabulary term without a definition,
>> > which
>> > results in a 'new' term withthe definition of the original term.  The
>> > only
>> > thing shared by the original cvterm and the new cvterm seems to be the
>> > order
>> > that they appear in the .obo file given to go2chadoxml, which translates
>> > into the order they appear in the resulting xml file.
>> >
>> > Am I misusing ontologies here?  Is it not alright to simultaneously use
>> > multiple ontologies in different obo files?  I thought it was alright so
>> > long as they were in distinct namespaces.
>> >
>> > Thanks again,
>> > Mara
>> >
>> >
>> > ------------------------------------------------------------------------------
>> > Minimize network downtime and maximize team effectiveness.
>> > Reduce network management and security costs.Learn how to hire
>> > the most talented Cisco Certified professionals. Visit the
>> > Employer Resources Portal
>> > http://www.cisco.com/web/learning/employer_resources/index.html
>> >
>> > _______________________________________________
>> > Gmod-schema mailing list
>> > [hidden email]
>> > https://lists.sourceforge.net/lists/listinfo/gmod-schema
>> >
>> >
>> >
>> > ------------------------------------------------------------------------------
>> > Minimize network downtime and maximize team effectiveness.
>> > Reduce network management and security costs.Learn how to hire
>> > the most talented Cisco Certified professionals. Visit the
>> > Employer Resources Portal
>> > http://www.cisco.com/web/learning/employer_resources/index.html
>> > _______________________________________________
>> > Gmod-schema mailing list
>> > [hidden email]
>> > https://lists.sourceforge.net/lists/listinfo/gmod-schema
>> >
>>
>>
>> ------------------------------------------------------------------------------
>> Precog is a next-generation analytics platform capable of advanced
>> analytics on semi-structured data. The platform includes APIs for building
>> apps and a phenomenal toolset for data science. Developers can use
>> our toolset for easy data analysis & visualization. Get a free account!
>> http://www2.precog.com/precogplatform/slashdotnewsletter
>>
>> _______________________________________________
>> Gmod-schema mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>
>

------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: stag-storenode.pl overwrites existing cvterms in vocabularies different from the one being loaded

Mara Kim
In reply to this post by Scott Cain
Hi Scott,

The problem occurs with these three ontologies.  I use go2chadoxml to create the input to stag-storenode.pl.  If I load organism.chado.xml, I get these cvterms in the organism_properties namespace:

db=# select * from cvterm where cv_id = 19;
 cvterm_id | cv_id |       name        |                                   definition                                   | dbxref_id | is_obsolet
e | is_relationshiptype
-----------+-------+-------------------+--------------------------------------------------------------------------------+-----------+-----------
--+---------------------
     39858 |    19 | data_url          | The URL from which the genome was downloaded.                                  |     92671 |          
0 |                   0
     39857 |    19 | sequencing_center | The sequencing center that provided this genome.                               |     92670 |          
0 |                   0
     39856 |    19 | taxonomy_id       | The NCBI taxonomy id of the organism from http://www.ncbi.nlm.nih.gov/taxonomy |     92669 |          
0 |                   0
     39855 |    19 | genome_properties | Properties of a genome                                                         |     92668 |          
0 |                   0
     39804 |    19 | forced            | Genome was forced to be imported                                               |     92673 |          
0 |                   0
     39859 |    19 | download_date     | The date that the genome was downloaded in yyyy-mm-dd format.                  |     92672 |          
0 |                   0
(6 rows)


If I then load crbh.chado.xml, suddenly terms have been commandeered into the crbh namespace:
rokasdb=# select * from cvterm where cv_id = 19;
 cvterm_id | cv_id |     name      |                          definition                           | dbxref_id | is_obsolete | is_relationshipty
pe
-----------+-------+---------------+---------------------------------------------------------------+-----------+-------------+------------------
---
     39804 |    19 | forced        | Genome was forced to be imported                              |     92673 |           0 |                 
 0
     39859 |    19 | download_date | The date that the genome was downloaded in yyyy-mm-dd format. |     92672 |           0 |                 
 0
(2 rows)

db=# select * from cvterm where cv_id = 21;
 cvterm_id | cv_id |        name         |                               definition                                | dbxref_id | is_obsolete | i
s_relationshiptype
-----------+-------+---------------------+-------------------------------------------------------------------------+-----------+-------------+--
-------------------
     39858 |    21 | seqlen_ratio_cutoff | The cutoff ratio for the difference in seqlen for a reciprocal best hit |     92671 |           0 | 
                 0
     39857 |    21 | properties          | The sequencing center that provided this genome.                        |     92670 |           0 | 
                 0
     39856 |    21 | cluster             | A group of genes that form a clustering reciprocal best hit cluster.    |     92669 |           0 | 
                 0
     39855 |    21 | objects             | Properties of a genome                                                  |     92668 |           0 | 
                 0
(4 rows)

Note how the cvterm_id's are identical.  Also, the the terms that have no definition (properties, objects) retain the definition they had as members of the blast ontology.



On Mon, Apr 8, 2013 at 4:06 AM, Scott Cain <[hidden email]> wrote:
Hi Mara,

That shouldn't be happening. For example, I know I've got entries in my cvterm table that have the same name but belong to different cv. Can you tell us what combinations of ontologies did this so we can try to reproduce it?

Thanks,
Scott


Sent from my iPhone

On Apr 4, 2013, at 7:24 AM, Mara Kim <[hidden email]> wrote:

Hello again gmod-ers!

The stag-storenode.pl script overwrites (ie. UPDATES instead of INSERTS on cvterm) existing cvterms, despite the terms being in separate namespaces.  This is obvious when inserting a vocabulary term without a definition, which results in a 'new' term withthe definition of the original term.  The only thing shared by the original cvterm and the new cvterm seems to be the order that they appear in the .obo file given to go2chadoxml, which translates into the order they appear in the resulting xml file.

Am I misusing ontologies here?  Is it not alright to simultaneously use multiple ontologies in different obo files?  I thought it was alright so long as they were in distinct namespaces.

Thanks again,
Mara
------------------------------------------------------------------------------
Minimize network downtime and maximize team effectiveness.
Reduce network management and security costs.Learn how to hire
the most talented Cisco Certified professionals. Visit the
Employer Resources Portal
http://www.cisco.com/web/learning/employer_resources/index.html
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema



--
M

------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema

crbh.obo (948 bytes) Download Attachment
organism.obo (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: stag-storenode.pl overwrites existing cvterms in vocabularies different from the one being loaded

Mara Kim
So with some more investigation, I realized the problem extends further back!  I discovered that if I reload an ontology with stag-storenode.pl, it will somehow reclaim the cvterms it used to have.  Doing this backwards in time reveals that all of my ontologies have been writing over each other D:


On Tue, Apr 9, 2013 at 11:46 AM, Mara Kim <[hidden email]> wrote:
Hi Scott,

The problem occurs with these three ontologies.  I use go2chadoxml to create the input to stag-storenode.pl.  If I load organism.chado.xml, I get these cvterms in the organism_properties namespace:

db=# select * from cvterm where cv_id = 19;
 cvterm_id | cv_id |       name        |                                   definition                                   | dbxref_id | is_obsolet
e | is_relationshiptype
-----------+-------+-------------------+--------------------------------------------------------------------------------+-----------+-----------
--+---------------------
     39858 |    19 | data_url          | The URL from which the genome was downloaded.                                  |     92671 |          
0 |                   0
     39857 |    19 | sequencing_center | The sequencing center that provided this genome.                               |     92670 |          
0 |                   0
     39856 |    19 | taxonomy_id       | The NCBI taxonomy id of the organism from http://www.ncbi.nlm.nih.gov/taxonomy |     92669 |          
0 |                   0
     39855 |    19 | genome_properties | Properties of a genome                                                         |     92668 |          
0 |                   0
     39804 |    19 | forced            | Genome was forced to be imported                                               |     92673 |          
0 |                   0
     39859 |    19 | download_date     | The date that the genome was downloaded in yyyy-mm-dd format.                  |     92672 |          
0 |                   0
(6 rows)


If I then load crbh.chado.xml, suddenly terms have been commandeered into the crbh namespace:
rokasdb=# select * from cvterm where cv_id = 19;
 cvterm_id | cv_id |     name      |                          definition                           | dbxref_id | is_obsolete | is_relationshipty
pe
-----------+-------+---------------+---------------------------------------------------------------+-----------+-------------+------------------
---
     39804 |    19 | forced        | Genome was forced to be imported                              |     92673 |           0 |                 
 0
     39859 |    19 | download_date | The date that the genome was downloaded in yyyy-mm-dd format. |     92672 |           0 |                 
 0
(2 rows)

db=# select * from cvterm where cv_id = 21;
 cvterm_id | cv_id |        name         |                               definition                                | dbxref_id | is_obsolete | i
s_relationshiptype
-----------+-------+---------------------+-------------------------------------------------------------------------+-----------+-------------+--
-------------------
     39858 |    21 | seqlen_ratio_cutoff | The cutoff ratio for the difference in seqlen for a reciprocal best hit |     92671 |           0 | 
                 0
     39857 |    21 | properties          | The sequencing center that provided this genome.                        |     92670 |           0 | 
                 0
     39856 |    21 | cluster             | A group of genes that form a clustering reciprocal best hit cluster.    |     92669 |           0 | 
                 0
     39855 |    21 | objects             | Properties of a genome                                                  |     92668 |           0 | 
                 0
(4 rows)

Note how the cvterm_id's are identical.  Also, the the terms that have no definition (properties, objects) retain the definition they had as members of the blast ontology.



On Mon, Apr 8, 2013 at 4:06 AM, Scott Cain <[hidden email]> wrote:
Hi Mara,

That shouldn't be happening. For example, I know I've got entries in my cvterm table that have the same name but belong to different cv. Can you tell us what combinations of ontologies did this so we can try to reproduce it?

Thanks,
Scott


Sent from my iPhone

On Apr 4, 2013, at 7:24 AM, Mara Kim <[hidden email]> wrote:

Hello again gmod-ers!

The stag-storenode.pl script overwrites (ie. UPDATES instead of INSERTS on cvterm) existing cvterms, despite the terms being in separate namespaces.  This is obvious when inserting a vocabulary term without a definition, which results in a 'new' term withthe definition of the original term.  The only thing shared by the original cvterm and the new cvterm seems to be the order that they appear in the .obo file given to go2chadoxml, which translates into the order they appear in the resulting xml file.

Am I misusing ontologies here?  Is it not alright to simultaneously use multiple ontologies in different obo files?  I thought it was alright so long as they were in distinct namespaces.

Thanks again,
Mara
------------------------------------------------------------------------------
Minimize network downtime and maximize team effectiveness.
Reduce network management and security costs.Learn how to hire
the most talented Cisco Certified professionals. Visit the
Employer Resources Portal
http://www.cisco.com/web/learning/employer_resources/index.html
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema



--
M



--
M

------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: stag-storenode.pl overwrites existing cvterms in vocabularies different from the one being loaded

Mara Kim
I'm still having this problem.  Any ideas?


On Tue, Apr 9, 2013 at 3:07 PM, Mara Kim <[hidden email]> wrote:
So with some more investigation, I realized the problem extends further back!  I discovered that if I reload an ontology with stag-storenode.pl, it will somehow reclaim the cvterms it used to have.  Doing this backwards in time reveals that all of my ontologies have been writing over each other D:


On Tue, Apr 9, 2013 at 11:46 AM, Mara Kim <[hidden email]> wrote:
Hi Scott,

The problem occurs with these three ontologies.  I use go2chadoxml to create the input to stag-storenode.pl.  If I load organism.chado.xml, I get these cvterms in the organism_properties namespace:

db=# select * from cvterm where cv_id = 19;
 cvterm_id | cv_id |       name        |                                   definition                                   | dbxref_id | is_obsolet
e | is_relationshiptype
-----------+-------+-------------------+--------------------------------------------------------------------------------+-----------+-----------
--+---------------------
     39858 |    19 | data_url          | The URL from which the genome was downloaded.                                  |     92671 |          
0 |                   0
     39857 |    19 | sequencing_center | The sequencing center that provided this genome.                               |     92670 |          
0 |                   0
     39856 |    19 | taxonomy_id       | The NCBI taxonomy id of the organism from http://www.ncbi.nlm.nih.gov/taxonomy |     92669 |          
0 |                   0
     39855 |    19 | genome_properties | Properties of a genome                                                         |     92668 |          
0 |                   0
     39804 |    19 | forced            | Genome was forced to be imported                                               |     92673 |          
0 |                   0
     39859 |    19 | download_date     | The date that the genome was downloaded in yyyy-mm-dd format.                  |     92672 |          
0 |                   0
(6 rows)


If I then load crbh.chado.xml, suddenly terms have been commandeered into the crbh namespace:
rokasdb=# select * from cvterm where cv_id = 19;
 cvterm_id | cv_id |     name      |                          definition                           | dbxref_id | is_obsolete | is_relationshipty
pe
-----------+-------+---------------+---------------------------------------------------------------+-----------+-------------+------------------
---
     39804 |    19 | forced        | Genome was forced to be imported                              |     92673 |           0 |                 
 0
     39859 |    19 | download_date | The date that the genome was downloaded in yyyy-mm-dd format. |     92672 |           0 |                 
 0
(2 rows)

db=# select * from cvterm where cv_id = 21;
 cvterm_id | cv_id |        name         |                               definition                                | dbxref_id | is_obsolete | i
s_relationshiptype
-----------+-------+---------------------+-------------------------------------------------------------------------+-----------+-------------+--
-------------------
     39858 |    21 | seqlen_ratio_cutoff | The cutoff ratio for the difference in seqlen for a reciprocal best hit |     92671 |           0 | 
                 0
     39857 |    21 | properties          | The sequencing center that provided this genome.                        |     92670 |           0 | 
                 0
     39856 |    21 | cluster             | A group of genes that form a clustering reciprocal best hit cluster.    |     92669 |           0 | 
                 0
     39855 |    21 | objects             | Properties of a genome                                                  |     92668 |           0 | 
                 0
(4 rows)

Note how the cvterm_id's are identical.  Also, the the terms that have no definition (properties, objects) retain the definition they had as members of the blast ontology.



On Mon, Apr 8, 2013 at 4:06 AM, Scott Cain <[hidden email]> wrote:
Hi Mara,

That shouldn't be happening. For example, I know I've got entries in my cvterm table that have the same name but belong to different cv. Can you tell us what combinations of ontologies did this so we can try to reproduce it?

Thanks,
Scott


Sent from my iPhone

On Apr 4, 2013, at 7:24 AM, Mara Kim <[hidden email]> wrote:

Hello again gmod-ers!

The stag-storenode.pl script overwrites (ie. UPDATES instead of INSERTS on cvterm) existing cvterms, despite the terms being in separate namespaces.  This is obvious when inserting a vocabulary term without a definition, which results in a 'new' term withthe definition of the original term.  The only thing shared by the original cvterm and the new cvterm seems to be the order that they appear in the .obo file given to go2chadoxml, which translates into the order they appear in the resulting xml file.

Am I misusing ontologies here?  Is it not alright to simultaneously use multiple ontologies in different obo files?  I thought it was alright so long as they were in distinct namespaces.

Thanks again,
Mara
------------------------------------------------------------------------------
Minimize network downtime and maximize team effectiveness.
Reduce network management and security costs.Learn how to hire
the most talented Cisco Certified professionals. Visit the
Employer Resources Portal
http://www.cisco.com/web/learning/employer_resources/index.html
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema



--
M



--
M



--
M

------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: stag-storenode.pl overwrites existing cvterms in vocabularies different from the one being loaded

Scott Cain
Hi Mara,

This is clearly a bug in stag-storenode.pl; it's doing an update without checking the cv_id of the term.  I have no idea how easier or hard this would be to fix because I've never looked at the source.  It is on my list of things to but I need to get a release of GMOD in the Cloud out first (I'm close on that).  I think I'll probably look at this next week.

Scott



On Thu, Apr 25, 2013 at 2:42 PM, Mara Kim <[hidden email]> wrote:
I'm still having this problem.  Any ideas?


On Tue, Apr 9, 2013 at 3:07 PM, Mara Kim <[hidden email]> wrote:
So with some more investigation, I realized the problem extends further back!  I discovered that if I reload an ontology with stag-storenode.pl, it will somehow reclaim the cvterms it used to have.  Doing this backwards in time reveals that all of my ontologies have been writing over each other D:


On Tue, Apr 9, 2013 at 11:46 AM, Mara Kim <[hidden email]> wrote:
Hi Scott,

The problem occurs with these three ontologies.  I use go2chadoxml to create the input to stag-storenode.pl.  If I load organism.chado.xml, I get these cvterms in the organism_properties namespace:

db=# select * from cvterm where cv_id = 19;
 cvterm_id | cv_id |       name        |                                   definition                                   | dbxref_id | is_obsolet
e | is_relationshiptype
-----------+-------+-------------------+--------------------------------------------------------------------------------+-----------+-----------
--+---------------------
     39858 |    19 | data_url          | The URL from which the genome was downloaded.                                  |     92671 |          
0 |                   0
     39857 |    19 | sequencing_center | The sequencing center that provided this genome.                               |     92670 |          
0 |                   0
     39856 |    19 | taxonomy_id       | The NCBI taxonomy id of the organism from http://www.ncbi.nlm.nih.gov/taxonomy |     92669 |          
0 |                   0
     39855 |    19 | genome_properties | Properties of a genome                                                         |     92668 |          
0 |                   0
     39804 |    19 | forced            | Genome was forced to be imported                                               |     92673 |          
0 |                   0
     39859 |    19 | download_date     | The date that the genome was downloaded in yyyy-mm-dd format.                  |     92672 |          
0 |                   0
(6 rows)


If I then load crbh.chado.xml, suddenly terms have been commandeered into the crbh namespace:
rokasdb=# select * from cvterm where cv_id = 19;
 cvterm_id | cv_id |     name      |                          definition                           | dbxref_id | is_obsolete | is_relationshipty
pe
-----------+-------+---------------+---------------------------------------------------------------+-----------+-------------+------------------
---
     39804 |    19 | forced        | Genome was forced to be imported                              |     92673 |           0 |                 
 0
     39859 |    19 | download_date | The date that the genome was downloaded in yyyy-mm-dd format. |     92672 |           0 |                 
 0
(2 rows)

db=# select * from cvterm where cv_id = 21;
 cvterm_id | cv_id |        name         |                               definition                                | dbxref_id | is_obsolete | i
s_relationshiptype
-----------+-------+---------------------+-------------------------------------------------------------------------+-----------+-------------+--
-------------------
     39858 |    21 | seqlen_ratio_cutoff | The cutoff ratio for the difference in seqlen for a reciprocal best hit |     92671 |           0 | 
                 0
     39857 |    21 | properties          | The sequencing center that provided this genome.                        |     92670 |           0 | 
                 0
     39856 |    21 | cluster             | A group of genes that form a clustering reciprocal best hit cluster.    |     92669 |           0 | 
                 0
     39855 |    21 | objects             | Properties of a genome                                                  |     92668 |           0 | 
                 0
(4 rows)

Note how the cvterm_id's are identical.  Also, the the terms that have no definition (properties, objects) retain the definition they had as members of the blast ontology.



On Mon, Apr 8, 2013 at 4:06 AM, Scott Cain <[hidden email]> wrote:
Hi Mara,

That shouldn't be happening. For example, I know I've got entries in my cvterm table that have the same name but belong to different cv. Can you tell us what combinations of ontologies did this so we can try to reproduce it?

Thanks,
Scott


Sent from my iPhone

On Apr 4, 2013, at 7:24 AM, Mara Kim <[hidden email]> wrote:

Hello again gmod-ers!

The stag-storenode.pl script overwrites (ie. UPDATES instead of INSERTS on cvterm) existing cvterms, despite the terms being in separate namespaces.  This is obvious when inserting a vocabulary term without a definition, which results in a 'new' term withthe definition of the original term.  The only thing shared by the original cvterm and the new cvterm seems to be the order that they appear in the .obo file given to go2chadoxml, which translates into the order they appear in the resulting xml file.

Am I misusing ontologies here?  Is it not alright to simultaneously use multiple ontologies in different obo files?  I thought it was alright so long as they were in distinct namespaces.

Thanks again,
Mara
------------------------------------------------------------------------------
Minimize network downtime and maximize team effectiveness.
Reduce network management and security costs.Learn how to hire
the most talented Cisco Certified professionals. Visit the
Employer Resources Portal
http://www.cisco.com/web/learning/employer_resources/index.html
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema



--
M



--
M



--
M



--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research

------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: stag-storenode.pl overwrites existing cvterms in vocabularies different from the one being loaded

Mara Kim
Hi guys.

Any news on the stag-storenode issue?  I can confirm that it occurs on a fresh install of Chado 1.23.


On Thu, Apr 25, 2013 at 3:20 PM, Scott Cain <[hidden email]> wrote:
Hi Mara,

This is clearly a bug in stag-storenode.pl; it's doing an update without checking the cv_id of the term.  I have no idea how easier or hard this would be to fix because I've never looked at the source.  It is on my list of things to but I need to get a release of GMOD in the Cloud out first (I'm close on that).  I think I'll probably look at this next week.

Scott



On Thu, Apr 25, 2013 at 2:42 PM, Mara Kim <[hidden email]> wrote:
I'm still having this problem.  Any ideas?


On Tue, Apr 9, 2013 at 3:07 PM, Mara Kim <[hidden email]> wrote:
So with some more investigation, I realized the problem extends further back!  I discovered that if I reload an ontology with stag-storenode.pl, it will somehow reclaim the cvterms it used to have.  Doing this backwards in time reveals that all of my ontologies have been writing over each other D:


On Tue, Apr 9, 2013 at 11:46 AM, Mara Kim <[hidden email]> wrote:
Hi Scott,

The problem occurs with these three ontologies.  I use go2chadoxml to create the input to stag-storenode.pl.  If I load organism.chado.xml, I get these cvterms in the organism_properties namespace:

db=# select * from cvterm where cv_id = 19;
 cvterm_id | cv_id |       name        |                                   definition                                   | dbxref_id | is_obsolet
e | is_relationshiptype
-----------+-------+-------------------+--------------------------------------------------------------------------------+-----------+-----------
--+---------------------
     39858 |    19 | data_url          | The URL from which the genome was downloaded.                                  |     92671 |          
0 |                   0
     39857 |    19 | sequencing_center | The sequencing center that provided this genome.                               |     92670 |          
0 |                   0
     39856 |    19 | taxonomy_id       | The NCBI taxonomy id of the organism from http://www.ncbi.nlm.nih.gov/taxonomy |     92669 |          
0 |                   0
     39855 |    19 | genome_properties | Properties of a genome                                                         |     92668 |          
0 |                   0
     39804 |    19 | forced            | Genome was forced to be imported                                               |     92673 |          
0 |                   0
     39859 |    19 | download_date     | The date that the genome was downloaded in yyyy-mm-dd format.                  |     92672 |          
0 |                   0
(6 rows)


If I then load crbh.chado.xml, suddenly terms have been commandeered into the crbh namespace:
rokasdb=# select * from cvterm where cv_id = 19;
 cvterm_id | cv_id |     name      |                          definition                           | dbxref_id | is_obsolete | is_relationshipty
pe
-----------+-------+---------------+---------------------------------------------------------------+-----------+-------------+------------------
---
     39804 |    19 | forced        | Genome was forced to be imported                              |     92673 |           0 |                 
 0
     39859 |    19 | download_date | The date that the genome was downloaded in yyyy-mm-dd format. |     92672 |           0 |                 
 0
(2 rows)

db=# select * from cvterm where cv_id = 21;
 cvterm_id | cv_id |        name         |                               definition                                | dbxref_id | is_obsolete | i
s_relationshiptype
-----------+-------+---------------------+-------------------------------------------------------------------------+-----------+-------------+--
-------------------
     39858 |    21 | seqlen_ratio_cutoff | The cutoff ratio for the difference in seqlen for a reciprocal best hit |     92671 |           0 | 
                 0
     39857 |    21 | properties          | The sequencing center that provided this genome.                        |     92670 |           0 | 
                 0
     39856 |    21 | cluster             | A group of genes that form a clustering reciprocal best hit cluster.    |     92669 |           0 | 
                 0
     39855 |    21 | objects             | Properties of a genome                                                  |     92668 |           0 | 
                 0
(4 rows)

Note how the cvterm_id's are identical.  Also, the the terms that have no definition (properties, objects) retain the definition they had as members of the blast ontology.



On Mon, Apr 8, 2013 at 4:06 AM, Scott Cain <[hidden email]> wrote:
Hi Mara,

That shouldn't be happening. For example, I know I've got entries in my cvterm table that have the same name but belong to different cv. Can you tell us what combinations of ontologies did this so we can try to reproduce it?

Thanks,
Scott


Sent from my iPhone

On Apr 4, 2013, at 7:24 AM, Mara Kim <[hidden email]> wrote:

Hello again gmod-ers!

The stag-storenode.pl script overwrites (ie. UPDATES instead of INSERTS on cvterm) existing cvterms, despite the terms being in separate namespaces.  This is obvious when inserting a vocabulary term without a definition, which results in a 'new' term withthe definition of the original term.  The only thing shared by the original cvterm and the new cvterm seems to be the order that they appear in the .obo file given to go2chadoxml, which translates into the order they appear in the resulting xml file.

Am I misusing ontologies here?  Is it not alright to simultaneously use multiple ontologies in different obo files?  I thought it was alright so long as they were in distinct namespaces.

Thanks again,
Mara
------------------------------------------------------------------------------
Minimize network downtime and maximize team effectiveness.
Reduce network management and security costs.Learn how to hire
the most talented Cisco Certified professionals. Visit the
Employer Resources Portal
http://www.cisco.com/web/learning/employer_resources/index.html
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema



--
M



--
M



--
M



--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     <a href="tel:216-392-3087" value="+12163923087" target="_blank">216-392-3087
Ontario Institute for Cancer Research



--
M

------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite
It's a free troubleshooting tool designed for production
Get down to code-level detail for bottlenecks, with <2% overhead.
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap2
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: stag-storenode.pl overwrites existing cvterms in vocabularies different from the one being loaded

Chris Mungall

Mara,

Part or all of the problem may be the fact that you are using the same ID for different terms in your different ontologies.

organism.obo:

[Term]
id: ID:0000000
name: genome_properties
def: "Properties of a genome" []
creation_date: 2013-01-31T17:19:16Z

crbh.obo:

[Term]
id: ID:0000000
name: objects
creation_date: 2013-04-02T13:37:06Z

When using oboedit, the first thing you should do is set the ID prefix to something unique.

A quick possibly dangerous hack:

perl -pi -ne 's/ID:/CRBH:/g' crbh.obo
perl -pi -ne 's/ID:/ORGPROP:/g' organism.obo

Then try reloading from scratch



On May 31, 2013, at 12:14 PM, Mara Kim wrote:

Hi guys.

Any news on the stag-storenode issue?  I can confirm that it occurs on a fresh install of Chado 1.23.


On Thu, Apr 25, 2013 at 3:20 PM, Scott Cain <[hidden email]> wrote:
Hi Mara,

This is clearly a bug in stag-storenode.pl; it's doing an update without checking the cv_id of the term.  I have no idea how easier or hard this would be to fix because I've never looked at the source.  It is on my list of things to but I need to get a release of GMOD in the Cloud out first (I'm close on that).  I think I'll probably look at this next week.

Scott



On Thu, Apr 25, 2013 at 2:42 PM, Mara Kim <[hidden email]> wrote:
I'm still having this problem.  Any ideas?


On Tue, Apr 9, 2013 at 3:07 PM, Mara Kim <[hidden email]> wrote:
So with some more investigation, I realized the problem extends further back!  I discovered that if I reload an ontology with stag-storenode.pl, it will somehow reclaim the cvterms it used to have.  Doing this backwards in time reveals that all of my ontologies have been writing over each other D:


On Tue, Apr 9, 2013 at 11:46 AM, Mara Kim <[hidden email]> wrote:
Hi Scott,

The problem occurs with these three ontologies.  I use go2chadoxml to create the input to stag-storenode.pl.  If I load organism.chado.xml, I get these cvterms in the organism_properties namespace:

db=# select * from cvterm where cv_id = 19;
 cvterm_id | cv_id |       name        |                                   definition                                   | dbxref_id | is_obsolet
e | is_relationshiptype
-----------+-------+-------------------+--------------------------------------------------------------------------------+-----------+-----------
--+---------------------
     39858 |    19 | data_url          | The URL from which the genome was downloaded.                                  |     92671 |          
0 |                   0
     39857 |    19 | sequencing_center | The sequencing center that provided this genome.                               |     92670 |          
0 |                   0
     39856 |    19 | taxonomy_id       | The NCBI taxonomy id of the organism from http://www.ncbi.nlm.nih.gov/taxonomy |     92669 |          
0 |                   0
     39855 |    19 | genome_properties | Properties of a genome                                                         |     92668 |          
0 |                   0
     39804 |    19 | forced            | Genome was forced to be imported                                               |     92673 |          
0 |                   0
     39859 |    19 | download_date     | The date that the genome was downloaded in yyyy-mm-dd format.                  |     92672 |          
0 |                   0
(6 rows)


If I then load crbh.chado.xml, suddenly terms have been commandeered into the crbh namespace:
rokasdb=# select * from cvterm where cv_id = 19;
 cvterm_id | cv_id |     name      |                          definition                           | dbxref_id | is_obsolete | is_relationshipty
pe
-----------+-------+---------------+---------------------------------------------------------------+-----------+-------------+------------------
---
     39804 |    19 | forced        | Genome was forced to be imported                              |     92673 |           0 |                 
 0
     39859 |    19 | download_date | The date that the genome was downloaded in yyyy-mm-dd format. |     92672 |           0 |                 
 0
(2 rows)

db=# select * from cvterm where cv_id = 21;
 cvterm_id | cv_id |        name         |                               definition                                | dbxref_id | is_obsolete | i
s_relationshiptype
-----------+-------+---------------------+-------------------------------------------------------------------------+-----------+-------------+--
-------------------
     39858 |    21 | seqlen_ratio_cutoff | The cutoff ratio for the difference in seqlen for a reciprocal best hit |     92671 |           0 | 
                 0
     39857 |    21 | properties          | The sequencing center that provided this genome.                        |     92670 |           0 | 
                 0
     39856 |    21 | cluster             | A group of genes that form a clustering reciprocal best hit cluster.    |     92669 |           0 | 
                 0
     39855 |    21 | objects             | Properties of a genome                                                  |     92668 |           0 | 
                 0
(4 rows)

Note how the cvterm_id's are identical.  Also, the the terms that have no definition (properties, objects) retain the definition they had as members of the blast ontology.



On Mon, Apr 8, 2013 at 4:06 AM, Scott Cain <[hidden email]> wrote:
Hi Mara,

That shouldn't be happening. For example, I know I've got entries in my cvterm table that have the same name but belong to different cv. Can you tell us what combinations of ontologies did this so we can try to reproduce it?

Thanks,
Scott


Sent from my iPhone

On Apr 4, 2013, at 7:24 AM, Mara Kim <[hidden email]> wrote:

Hello again gmod-ers!

The stag-storenode.pl script overwrites (ie. UPDATES instead of INSERTS on cvterm) existing cvterms, despite the terms being in separate namespaces.  This is obvious when inserting a vocabulary term without a definition, which results in a 'new' term withthe definition of the original term.  The only thing shared by the original cvterm and the new cvterm seems to be the order that they appear in the .obo file given to go2chadoxml, which translates into the order they appear in the resulting xml file.

Am I misusing ontologies here?  Is it not alright to simultaneously use multiple ontologies in different obo files?  I thought it was alright so long as they were in distinct namespaces.

Thanks again,
Mara
------------------------------------------------------------------------------
Minimize network downtime and maximize team effectiveness.
Reduce network management and security costs.Learn how to hire
the most talented Cisco Certified professionals. Visit the
Employer Resources Portal
http://www.cisco.com/web/learning/employer_resources/index.html
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema



--
M



--
M



--
M



--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     <a href="tel:216-392-3087" value="+12163923087" target="_blank">216-392-3087
Ontario Institute for Cancer Research



--
M


------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite
It's a free troubleshooting tool designed for production
Get down to code-level detail for bottlenecks, with <2% overhead.
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap2
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: stag-storenode.pl overwrites existing cvterms in vocabularies different from the one being loaded

Mara Kim
Good catch, Chris.  That did in fact solve the problem.  Thanks a million!


On Fri, May 31, 2013 at 7:31 PM, Chris Mungall <[hidden email]> wrote:

Mara,

Part or all of the problem may be the fact that you are using the same ID for different terms in your different ontologies.

organism.obo:

[Term]
id: ID:0000000
name: genome_properties
def: "Properties of a genome" []
creation_date: 2013-01-31T17:19:16Z

crbh.obo:

[Term]
id: ID:0000000
name: objects
creation_date: 2013-04-02T13:37:06Z

When using oboedit, the first thing you should do is set the ID prefix to something unique.

A quick possibly dangerous hack:

perl -pi -ne 's/ID:/CRBH:/g' crbh.obo
perl -pi -ne 's/ID:/ORGPROP:/g' organism.obo

Then try reloading from scratch



On May 31, 2013, at 12:14 PM, Mara Kim wrote:

Hi guys.

Any news on the stag-storenode issue?  I can confirm that it occurs on a fresh install of Chado 1.23.


On Thu, Apr 25, 2013 at 3:20 PM, Scott Cain <[hidden email]> wrote:
Hi Mara,

This is clearly a bug in stag-storenode.pl; it's doing an update without checking the cv_id of the term.  I have no idea how easier or hard this would be to fix because I've never looked at the source.  It is on my list of things to but I need to get a release of GMOD in the Cloud out first (I'm close on that).  I think I'll probably look at this next week.

Scott



On Thu, Apr 25, 2013 at 2:42 PM, Mara Kim <[hidden email]> wrote:
I'm still having this problem.  Any ideas?


On Tue, Apr 9, 2013 at 3:07 PM, Mara Kim <[hidden email]> wrote:
So with some more investigation, I realized the problem extends further back!  I discovered that if I reload an ontology with stag-storenode.pl, it will somehow reclaim the cvterms it used to have.  Doing this backwards in time reveals that all of my ontologies have been writing over each other D:


On Tue, Apr 9, 2013 at 11:46 AM, Mara Kim <[hidden email]> wrote:
Hi Scott,

The problem occurs with these three ontologies.  I use go2chadoxml to create the input to stag-storenode.pl.  If I load organism.chado.xml, I get these cvterms in the organism_properties namespace:

db=# select * from cvterm where cv_id = 19;
 cvterm_id | cv_id |       name        |                                   definition                                   | dbxref_id | is_obsolet
e | is_relationshiptype
-----------+-------+-------------------+--------------------------------------------------------------------------------+-----------+-----------
--+---------------------
     39858 |    19 | data_url          | The URL from which the genome was downloaded.                                  |     92671 |          
0 |                   0
     39857 |    19 | sequencing_center | The sequencing center that provided this genome.                               |     92670 |          
0 |                   0
     39856 |    19 | taxonomy_id       | The NCBI taxonomy id of the organism from http://www.ncbi.nlm.nih.gov/taxonomy |     92669 |          
0 |                   0
     39855 |    19 | genome_properties | Properties of a genome                                                         |     92668 |          
0 |                   0
     39804 |    19 | forced            | Genome was forced to be imported                                               |     92673 |          
0 |                   0
     39859 |    19 | download_date     | The date that the genome was downloaded in yyyy-mm-dd format.                  |     92672 |          
0 |                   0
(6 rows)


If I then load crbh.chado.xml, suddenly terms have been commandeered into the crbh namespace:
rokasdb=# select * from cvterm where cv_id = 19;
 cvterm_id | cv_id |     name      |                          definition                           | dbxref_id | is_obsolete | is_relationshipty
pe
-----------+-------+---------------+---------------------------------------------------------------+-----------+-------------+------------------
---
     39804 |    19 | forced        | Genome was forced to be imported                              |     92673 |           0 |                 
 0
     39859 |    19 | download_date | The date that the genome was downloaded in yyyy-mm-dd format. |     92672 |           0 |                 
 0
(2 rows)

db=# select * from cvterm where cv_id = 21;
 cvterm_id | cv_id |        name         |                               definition                                | dbxref_id | is_obsolete | i
s_relationshiptype
-----------+-------+---------------------+-------------------------------------------------------------------------+-----------+-------------+--
-------------------
     39858 |    21 | seqlen_ratio_cutoff | The cutoff ratio for the difference in seqlen for a reciprocal best hit |     92671 |           0 | 
                 0
     39857 |    21 | properties          | The sequencing center that provided this genome.                        |     92670 |           0 | 
                 0
     39856 |    21 | cluster             | A group of genes that form a clustering reciprocal best hit cluster.    |     92669 |           0 | 
                 0
     39855 |    21 | objects             | Properties of a genome                                                  |     92668 |           0 | 
                 0
(4 rows)

Note how the cvterm_id's are identical.  Also, the the terms that have no definition (properties, objects) retain the definition they had as members of the blast ontology.



On Mon, Apr 8, 2013 at 4:06 AM, Scott Cain <[hidden email]> wrote:
Hi Mara,

That shouldn't be happening. For example, I know I've got entries in my cvterm table that have the same name but belong to different cv. Can you tell us what combinations of ontologies did this so we can try to reproduce it?

Thanks,
Scott


Sent from my iPhone

On Apr 4, 2013, at 7:24 AM, Mara Kim <[hidden email]> wrote:

Hello again gmod-ers!

The stag-storenode.pl script overwrites (ie. UPDATES instead of INSERTS on cvterm) existing cvterms, despite the terms being in separate namespaces.  This is obvious when inserting a vocabulary term without a definition, which results in a 'new' term withthe definition of the original term.  The only thing shared by the original cvterm and the new cvterm seems to be the order that they appear in the .obo file given to go2chadoxml, which translates into the order they appear in the resulting xml file.

Am I misusing ontologies here?  Is it not alright to simultaneously use multiple ontologies in different obo files?  I thought it was alright so long as they were in distinct namespaces.

Thanks again,
Mara
------------------------------------------------------------------------------
Minimize network downtime and maximize team effectiveness.
Reduce network management and security costs.Learn how to hire
the most talented Cisco Certified professionals. Visit the
Employer Resources Portal
http://www.cisco.com/web/learning/employer_resources/index.html
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema



--
M



--
M



--
M



--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     <a href="tel:216-392-3087" value="+12163923087" target="_blank">216-392-3087
Ontario Institute for Cancer Research



--
M




--
M

------------------------------------------------------------------------------
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: stag-storenode.pl overwrites existing cvterms in vocabularies different from the one being loaded

Scott Cain
Chris,

I second the thanks--it saved me time searching for a bug that doesn't exist :-)

Scott



On Wed, Jun 5, 2013 at 10:47 AM, Mara Kim <[hidden email]> wrote:
Good catch, Chris.  That did in fact solve the problem.  Thanks a million!


On Fri, May 31, 2013 at 7:31 PM, Chris Mungall <[hidden email]> wrote:

Mara,

Part or all of the problem may be the fact that you are using the same ID for different terms in your different ontologies.

organism.obo:

[Term]
id: ID:0000000
name: genome_properties
def: "Properties of a genome" []
creation_date: 2013-01-31T17:19:16Z

crbh.obo:

[Term]
id: ID:0000000
name: objects
creation_date: 2013-04-02T13:37:06Z

When using oboedit, the first thing you should do is set the ID prefix to something unique.

A quick possibly dangerous hack:

perl -pi -ne 's/ID:/CRBH:/g' crbh.obo
perl -pi -ne 's/ID:/ORGPROP:/g' organism.obo

Then try reloading from scratch



On May 31, 2013, at 12:14 PM, Mara Kim wrote:

Hi guys.

Any news on the stag-storenode issue?  I can confirm that it occurs on a fresh install of Chado 1.23.


On Thu, Apr 25, 2013 at 3:20 PM, Scott Cain <[hidden email]> wrote:
Hi Mara,

This is clearly a bug in stag-storenode.pl; it's doing an update without checking the cv_id of the term.  I have no idea how easier or hard this would be to fix because I've never looked at the source.  It is on my list of things to but I need to get a release of GMOD in the Cloud out first (I'm close on that).  I think I'll probably look at this next week.

Scott



On Thu, Apr 25, 2013 at 2:42 PM, Mara Kim <[hidden email]> wrote:
I'm still having this problem.  Any ideas?


On Tue, Apr 9, 2013 at 3:07 PM, Mara Kim <[hidden email]> wrote:
So with some more investigation, I realized the problem extends further back!  I discovered that if I reload an ontology with stag-storenode.pl, it will somehow reclaim the cvterms it used to have.  Doing this backwards in time reveals that all of my ontologies have been writing over each other D:


On Tue, Apr 9, 2013 at 11:46 AM, Mara Kim <[hidden email]> wrote:
Hi Scott,

The problem occurs with these three ontologies.  I use go2chadoxml to create the input to stag-storenode.pl.  If I load organism.chado.xml, I get these cvterms in the organism_properties namespace:

db=# select * from cvterm where cv_id = 19;
 cvterm_id | cv_id |       name        |                                   definition                                   | dbxref_id | is_obsolet
e | is_relationshiptype
-----------+-------+-------------------+--------------------------------------------------------------------------------+-----------+-----------
--+---------------------
     39858 |    19 | data_url          | The URL from which the genome was downloaded.                                  |     92671 |          
0 |                   0
     39857 |    19 | sequencing_center | The sequencing center that provided this genome.                               |     92670 |          
0 |                   0
     39856 |    19 | taxonomy_id       | The NCBI taxonomy id of the organism from http://www.ncbi.nlm.nih.gov/taxonomy |     92669 |          
0 |                   0
     39855 |    19 | genome_properties | Properties of a genome                                                         |     92668 |          
0 |                   0
     39804 |    19 | forced            | Genome was forced to be imported                                               |     92673 |          
0 |                   0
     39859 |    19 | download_date     | The date that the genome was downloaded in yyyy-mm-dd format.                  |     92672 |          
0 |                   0
(6 rows)


If I then load crbh.chado.xml, suddenly terms have been commandeered into the crbh namespace:
rokasdb=# select * from cvterm where cv_id = 19;
 cvterm_id | cv_id |     name      |                          definition                           | dbxref_id | is_obsolete | is_relationshipty
pe
-----------+-------+---------------+---------------------------------------------------------------+-----------+-------------+------------------
---
     39804 |    19 | forced        | Genome was forced to be imported                              |     92673 |           0 |                 
 0
     39859 |    19 | download_date | The date that the genome was downloaded in yyyy-mm-dd format. |     92672 |           0 |                 
 0
(2 rows)

db=# select * from cvterm where cv_id = 21;
 cvterm_id | cv_id |        name         |                               definition                                | dbxref_id | is_obsolete | i
s_relationshiptype
-----------+-------+---------------------+-------------------------------------------------------------------------+-----------+-------------+--
-------------------
     39858 |    21 | seqlen_ratio_cutoff | The cutoff ratio for the difference in seqlen for a reciprocal best hit |     92671 |           0 | 
                 0
     39857 |    21 | properties          | The sequencing center that provided this genome.                        |     92670 |           0 | 
                 0
     39856 |    21 | cluster             | A group of genes that form a clustering reciprocal best hit cluster.    |     92669 |           0 | 
                 0
     39855 |    21 | objects             | Properties of a genome                                                  |     92668 |           0 | 
                 0
(4 rows)

Note how the cvterm_id's are identical.  Also, the the terms that have no definition (properties, objects) retain the definition they had as members of the blast ontology.



On Mon, Apr 8, 2013 at 4:06 AM, Scott Cain <[hidden email]> wrote:
Hi Mara,

That shouldn't be happening. For example, I know I've got entries in my cvterm table that have the same name but belong to different cv. Can you tell us what combinations of ontologies did this so we can try to reproduce it?

Thanks,
Scott


Sent from my iPhone

On Apr 4, 2013, at 7:24 AM, Mara Kim <[hidden email]> wrote:

Hello again gmod-ers!

The stag-storenode.pl script overwrites (ie. UPDATES instead of INSERTS on cvterm) existing cvterms, despite the terms being in separate namespaces.  This is obvious when inserting a vocabulary term without a definition, which results in a 'new' term withthe definition of the original term.  The only thing shared by the original cvterm and the new cvterm seems to be the order that they appear in the .obo file given to go2chadoxml, which translates into the order they appear in the resulting xml file.

Am I misusing ontologies here?  Is it not alright to simultaneously use multiple ontologies in different obo files?  I thought it was alright so long as they were in distinct namespaces.

Thanks again,
Mara
------------------------------------------------------------------------------
Minimize network downtime and maximize team effectiveness.
Reduce network management and security costs.Learn how to hire
the most talented Cisco Certified professionals. Visit the
Employer Resources Portal
http://www.cisco.com/web/learning/employer_resources/index.html
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema



--
M



--
M



--
M



--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     <a href="tel:216-392-3087" value="+12163923087" target="_blank">216-392-3087
Ontario Institute for Cancer Research



--
M




--
M



--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research

------------------------------------------------------------------------------
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema