Missing cvterms

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Missing cvterms

Andrew Oberlin
Hello all,

When using the bulk_uploader I get the error that there are missing cvterms in my database. What is the best way to handle this?

Currently I have a script written that will insert the cvterm into the database under a cv that I have created for "unclassified cvterms". However, I am not sure if this is the best practice. on top of that, I have found that the bulk uploader script still claims the cvterm to be missing even though it is a cvterm in the table. Does it check specific cv_id's when inserting?

Thanks,

Andrew Oberlin

--
Andrew Oberlin
Miami University 2013
Computer Science & Mathematics
[hidden email]  (330) 998-1603


------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here
http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Missing cvterms

Scott Cain
Hi Andrew,

Is this related to terms in the ninth column, like a GO accession that
isn't recognized?  It seems that people frequently use "alt ids" for
GO terms and the bulk loader doesn't (yet) know how to deal with them.
 There are a few options:

1. Ignore the warnings; you loose the data (not a great option, but
some people may not care about the lost terms).

2. Do what you're doing already (Unknown term) and perhaps move the
unrecognized term to a feature note so it isn't lost.

3. Since there are usually just a few terms per GFF file that causes
this problem, you could identify the canonical GO id that corresponds
to alt ids that occur in the GFF file and change them, and possibly
also add a note to the feature indicating what happened.

4. Patch Bio::GMOD::DB::Adapter so that it knows how to look for alt
ids and does the right thing (where the right thing is probably what I
described in 3) (and then commit that patch back to svn).

5. Bug me until I do 4.  I've thought about it for a long time but
haven't put the time into actually doing it.  Probably not a good
short term solution :-)

Of course, if that's not what you're talking about, 1-5 may not apply at all :-)

Scott


On Fri, Mar 30, 2012 at 2:33 PM, Andrew Oberlin <[hidden email]> wrote:

> Hello all,
>
> When using the bulk_uploader I get the error that there are missing cvterms
> in my database. What is the best way to handle this?
>
> Currently I have a script written that will insert the cvterm into the
> database under a cv that I have created for "unclassified cvterms". However,
> I am not sure if this is the best practice. on top of that, I have found
> that the bulk uploader script still claims the cvterm to be missing even
> though it is a cvterm in the table. Does it check specific cv_id's when
> inserting?
>
> Thanks,
>
> Andrew Oberlin
>
> --
> Andrew Oberlin
> Miami University 2013
> Computer Science & Mathematics
> [hidden email]  (330) 998-1603
>
> ------------------------------------------------------------------------------
> This SF email is sponsosred by:
> Try Windows Azure free for 90 days Click Here
> http://p.sf.net/sfu/sfd2d-msazure
> _______________________________________________
> Gmod-schema mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>



--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research

------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here
http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Missing cvterms

Andrew Oberlin
In reply to this post by Andrew Oberlin
It actually gives me an error of abnormal termination cvterm "pseudotRNA" not found so it's not even finishing the upload I think.

Andrew Oberlin

From my HTC Sensation 4G on T-Mobile. The first nationwide 4G network

----- Reply message -----
From: "Scott Cain" <[hidden email]>
To: "Andrew Oberlin" <[hidden email]>
Cc: "GMOD Schema/Chado List" <[hidden email]>
Subject: [Gmod-schema] Missing cvterms
Date: Fri, Mar 30, 2012 3:20 pm


Hi Andrew,

Is this related to terms in the ninth column, like a GO accession that
isn't recognized?  It seems that people frequently use "alt ids" for
GO terms and the bulk loader doesn't (yet) know how to deal with them.
There are a few options:

1. Ignore the warnings; you loose the data (not a great option, but
some people may not care about the lost terms).

2. Do what you're doing already (Unknown term) and perhaps move the
unrecognized term to a feature note so it isn't lost.

3. Since there are usually just a few terms per GFF file that causes
this problem, you could identify the canonical GO id that corresponds
to alt ids that occur in the GFF file and change them, and possibly
also add a note to the feature indicating what happened.

4. Patch Bio::GMOD::DB::Adapter so that it knows how to look for alt
ids and does the right thing (where the right thing is probably what I
described in 3) (and then commit that patch back to svn).

5. Bug me until I do 4.  I've thought about it for a long time but
haven't put the time into actually doing it.  Probably not a good
short term solution :-)

Of course, if that's not what you're talking about, 1-5 may not apply at all :-)

Scott


On Fri, Mar 30, 2012 at 2:33 PM, Andrew Oberlin <[hidden email]> wrote:

> Hello all,
>
> When using the bulk_uploader I get the error that there are missing cvterms
> in my database. What is the best way to handle this?
>
> Currently I have a script written that will insert the cvterm into the
> database under a cv that I have created for "unclassified cvterms". However,
> I am not sure if this is the best practice. on top of that, I have found
> that the bulk uploader script still claims the cvterm to be missing even
> though it is a cvterm in the table. Does it check specific cv_id's when
> inserting?
>
> Thanks,
>
> Andrew Oberlin
>
> --
> Andrew Oberlin
> Miami University 2013
> Computer Science & Mathematics
> [hidden email]  (330) 998-1603
>
> ------------------------------------------------------------------------------
> This SF email is sponsosred by:
> Try Windows Azure free for 90 days Click Here
> http://p.sf.net/sfu/sfd2d-msazure
> _______________________________________________
> Gmod-schema mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>



--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research

------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here
http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Missing cvterms

Scott Cain
Oh, when that happens you pretty much have to fix the GFF by finding
the appropriate term (which in this case is pseudogenic_tRNA).

Scott


On Fri, Mar 30, 2012 at 3:50 PM, [hidden email]
<[hidden email]> wrote:

> It actually gives me an error of abnormal termination cvterm "pseudotRNA"
> not found so it's not even finishing the upload I think.
>
> Andrew Oberlin
>
> From my HTC Sensation 4G on T-Mobile. The first nationwide 4G network
>
>
> ----- Reply message -----
> From: "Scott Cain" <[hidden email]>
> To: "Andrew Oberlin" <[hidden email]>
> Cc: "GMOD Schema/Chado List" <[hidden email]>
> Subject: [Gmod-schema] Missing cvterms
> Date: Fri, Mar 30, 2012 3:20 pm
>
>
> Hi Andrew,
>
> Is this related to terms in the ninth column, like a GO accession that
> isn't recognized?  It seems that people frequently use "alt ids" for
> GO terms and the bulk loader doesn't (yet) know how to deal with them.
> There are a few options:
>
> 1. Ignore the warnings; you loose the data (not a great option, but
> some people may not care about the lost terms).
>
> 2. Do what you're doing already (Unknown term) and perhaps move the
> unrecognized term to a feature note so it isn't lost.
>
> 3. Since there are usually just a few terms per GFF file that causes
> this problem, you could identify the canonical GO id that corresponds
> to alt ids that occur in the GFF file and change them, and possibly
> also add a note to the feature indicating what happened.
>
> 4. Patch Bio::GMOD::DB::Adapter so that it knows how to look for alt
> ids and does the right thing (where the right thing is probably what I
> described in 3) (and then commit that patch back to svn).
>
> 5. Bug me until I do 4.  I've thought about it for a long time but
> haven't put the time into actually doing it.  Probably not a good
> short term solution :-)
>
> Of course, if that's not what you're talking about, 1-5 may not apply at all
> :-)
>
> Scott
>
>
> On Fri, Mar 30, 2012 at 2:33 PM, Andrew Oberlin <[hidden email]>
> wrote:
>> Hello all,
>>
>> When using the bulk_uploader I get the error that there are missing
>> cvterms
>> in my database. What is the best way to handle this?
>>
>> Currently I have a script written that will insert the cvterm into the
>> database under a cv that I have created for "unclassified cvterms".
>> However,
>> I am not sure if this is the best practice. on top of that, I have found
>> that the bulk uploader script still claims the cvterm to be missing even
>> though it is a cvterm in the table. Does it check specific cv_id's when
>> inserting?
>>
>> Thanks,
>>
>> Andrew Oberlin
>>
>> --
>> Andrew Oberlin
>> Miami University 2013
>> Computer Science & Mathematics
>> [hidden email]  (330) 998-1603
>>
>>
>> ------------------------------------------------------------------------------
>> This SF email is sponsosred by:
>> Try Windows Azure free for 90 days Click Here
>> http://p.sf.net/sfu/sfd2d-msazure
>> _______________________________________________
>> Gmod-schema mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>
>
>
>
> --
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   scott at scottcain dot
> net
> GMOD Coordinator (http://gmod.org/)                     216-392-3087
> Ontario Institute for Cancer Research



--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research

------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here
http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Missing cvterms

Scott Cain
Hi Andrew,

Uggh.  That's ugly.  Not that this is something that I can really
advocate, but if you are going to do this, the only way it would work
is if you add the term to the "sequence" cv.

But to answer your first question: no, there aren't really various
things that can be used for column 3; those terms have to come from
the Sequence Ontology.  Of course, that doesn't mean that people won't
go using the wrong or outdated terms.  Generally, I'd say that it
would be better to fix the data at some point.  The typical place to
fix it is before loading (to make loading go smoothly), but if you
have a tool that will let you add terms, it would still be better to
fix those features after the fact.

Scott


On Mon, Apr 2, 2012 at 2:20 PM, Andrew Oberlin <[hidden email]> wrote:

> So are there various terms like this that are named different things in the
> gff files? I'm trying to make it so that I can upload a gff and not worry
> about terms like these.
>
> I have a script that will insert these terms into the cvterm list if they do
> not exist. Where would I put these terms so that the bulk uploader will find
> them?
>
> Andrew Oberlin
>
>
> On Fri, Mar 30, 2012 at 3:58 PM, Scott Cain <[hidden email]> wrote:
>>
>> Oh, when that happens you pretty much have to fix the GFF by finding
>> the appropriate term (which in this case is pseudogenic_tRNA).
>>
>> Scott
>>
>>
>> On Fri, Mar 30, 2012 at 3:50 PM, [hidden email]
>> <[hidden email]> wrote:
>> > It actually gives me an error of abnormal termination cvterm
>> > "pseudotRNA"
>> > not found so it's not even finishing the upload I think.
>> >
>> > Andrew Oberlin
>> >
>> > From my HTC Sensation 4G on T-Mobile. The first nationwide 4G network
>> >
>> >
>> > ----- Reply message -----
>> > From: "Scott Cain" <[hidden email]>
>> > To: "Andrew Oberlin" <[hidden email]>
>> > Cc: "GMOD Schema/Chado List" <[hidden email]>
>> > Subject: [Gmod-schema] Missing cvterms
>> > Date: Fri, Mar 30, 2012 3:20 pm
>> >
>> >
>> > Hi Andrew,
>> >
>> > Is this related to terms in the ninth column, like a GO accession that
>> > isn't recognized?  It seems that people frequently use "alt ids" for
>> > GO terms and the bulk loader doesn't (yet) know how to deal with them.
>> > There are a few options:
>> >
>> > 1. Ignore the warnings; you loose the data (not a great option, but
>> > some people may not care about the lost terms).
>> >
>> > 2. Do what you're doing already (Unknown term) and perhaps move the
>> > unrecognized term to a feature note so it isn't lost.
>> >
>> > 3. Since there are usually just a few terms per GFF file that causes
>> > this problem, you could identify the canonical GO id that corresponds
>> > to alt ids that occur in the GFF file and change them, and possibly
>> > also add a note to the feature indicating what happened.
>> >
>> > 4. Patch Bio::GMOD::DB::Adapter so that it knows how to look for alt
>> > ids and does the right thing (where the right thing is probably what I
>> > described in 3) (and then commit that patch back to svn).
>> >
>> > 5. Bug me until I do 4.  I've thought about it for a long time but
>> > haven't put the time into actually doing it.  Probably not a good
>> > short term solution :-)
>> >
>> > Of course, if that's not what you're talking about, 1-5 may not apply at
>> > all
>> > :-)
>> >
>> > Scott
>> >
>> >
>> > On Fri, Mar 30, 2012 at 2:33 PM, Andrew Oberlin <[hidden email]>
>> > wrote:
>> >> Hello all,
>> >>
>> >> When using the bulk_uploader I get the error that there are missing
>> >> cvterms
>> >> in my database. What is the best way to handle this?
>> >>
>> >> Currently I have a script written that will insert the cvterm into the
>> >> database under a cv that I have created for "unclassified cvterms".
>> >> However,
>> >> I am not sure if this is the best practice. on top of that, I have
>> >> found
>> >> that the bulk uploader script still claims the cvterm to be missing
>> >> even
>> >> though it is a cvterm in the table. Does it check specific cv_id's when
>> >> inserting?
>> >>
>> >> Thanks,
>> >>
>> >> Andrew Oberlin
>> >>
>> >> --
>> >> Andrew Oberlin
>> >> Miami University 2013
>> >> Computer Science & Mathematics
>> >> [hidden email]  (330) 998-1603
>> >>
>> >>
>> >>
>> >> ------------------------------------------------------------------------------
>> >> This SF email is sponsosred by:
>> >> Try Windows Azure free for 90 days Click Here
>> >> http://p.sf.net/sfu/sfd2d-msazure
>> >> _______________________________________________
>> >> Gmod-schema mailing list
>> >> [hidden email]
>> >> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>> >>
>> >
>> >
>> >
>> > --
>> > ------------------------------------------------------------------------
>> > Scott Cain, Ph. D.                                   scott at scottcain
>> > dot
>> > net
>> > GMOD Coordinator (http://gmod.org/)                     216-392-3087
>> > Ontario Institute for Cancer Research
>>
>>
>>
>> --
>> ------------------------------------------------------------------------
>> Scott Cain, Ph. D.                                   scott at scottcain
>> dot net
>> GMOD Coordinator (http://gmod.org/)                     216-392-3087
>> Ontario Institute for Cancer Research
>
>
>
>
> --
> Andrew Oberlin
> Miami University 2013
> Computer Science & Mathematics
> [hidden email]  (330) 998-1603



--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research

------------------------------------------------------------------------------
Better than sec? Nothing is better than sec when it comes to
monitoring Big Data applications. Try Boundary one-second
resolution app monitoring today. Free.
http://p.sf.net/sfu/Boundary-dev2dev
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Missing cvterms

Andrew Oberlin
Ok thanks! I will try that. The idea isn't to bypass correcting the information, but to streamline the process of editing it without encountering errors and backtracking. We plan on issuing a report that "unknown" cvterms were added to the database and that we should look at them. However, having the rest of the information there is more valuable than none of it.

Andrew Oberlin

On Mon, Apr 2, 2012 at 4:32 PM, Scott Cain <[hidden email]> wrote:
Hi Andrew,

Uggh.  That's ugly.  Not that this is something that I can really
advocate, but if you are going to do this, the only way it would work
is if you add the term to the "sequence" cv.

But to answer your first question: no, there aren't really various
things that can be used for column 3; those terms have to come from
the Sequence Ontology.  Of course, that doesn't mean that people won't
go using the wrong or outdated terms.  Generally, I'd say that it
would be better to fix the data at some point.  The typical place to
fix it is before loading (to make loading go smoothly), but if you
have a tool that will let you add terms, it would still be better to
fix those features after the fact.

Scott


On Mon, Apr 2, 2012 at 2:20 PM, Andrew Oberlin <[hidden email]> wrote:
> So are there various terms like this that are named different things in the
> gff files? I'm trying to make it so that I can upload a gff and not worry
> about terms like these.
>
> I have a script that will insert these terms into the cvterm list if they do
> not exist. Where would I put these terms so that the bulk uploader will find
> them?
>
> Andrew Oberlin
>
>
> On Fri, Mar 30, 2012 at 3:58 PM, Scott Cain <[hidden email]> wrote:
>>
>> Oh, when that happens you pretty much have to fix the GFF by finding
>> the appropriate term (which in this case is pseudogenic_tRNA).
>>
>> Scott
>>
>>
>> On Fri, Mar 30, 2012 at 3:50 PM, [hidden email]
>> <[hidden email]> wrote:
>> > It actually gives me an error of abnormal termination cvterm
>> > "pseudotRNA"
>> > not found so it's not even finishing the upload I think.
>> >
>> > Andrew Oberlin
>> >
>> > From my HTC Sensation 4G on T-Mobile. The first nationwide 4G network
>> >
>> >
>> > ----- Reply message -----
>> > From: "Scott Cain" <[hidden email]>
>> > To: "Andrew Oberlin" <[hidden email]>
>> > Cc: "GMOD Schema/Chado List" <[hidden email]>
>> > Subject: [Gmod-schema] Missing cvterms
>> > Date: Fri, Mar 30, 2012 3:20 pm
>> >
>> >
>> > Hi Andrew,
>> >
>> > Is this related to terms in the ninth column, like a GO accession that
>> > isn't recognized?  It seems that people frequently use "alt ids" for
>> > GO terms and the bulk loader doesn't (yet) know how to deal with them.
>> > There are a few options:
>> >
>> > 1. Ignore the warnings; you loose the data (not a great option, but
>> > some people may not care about the lost terms).
>> >
>> > 2. Do what you're doing already (Unknown term) and perhaps move the
>> > unrecognized term to a feature note so it isn't lost.
>> >
>> > 3. Since there are usually just a few terms per GFF file that causes
>> > this problem, you could identify the canonical GO id that corresponds
>> > to alt ids that occur in the GFF file and change them, and possibly
>> > also add a note to the feature indicating what happened.
>> >
>> > 4. Patch Bio::GMOD::DB::Adapter so that it knows how to look for alt
>> > ids and does the right thing (where the right thing is probably what I
>> > described in 3) (and then commit that patch back to svn).
>> >
>> > 5. Bug me until I do 4.  I've thought about it for a long time but
>> > haven't put the time into actually doing it.  Probably not a good
>> > short term solution :-)
>> >
>> > Of course, if that's not what you're talking about, 1-5 may not apply at
>> > all
>> > :-)
>> >
>> > Scott
>> >
>> >
>> > On Fri, Mar 30, 2012 at 2:33 PM, Andrew Oberlin <[hidden email]>
>> > wrote:
>> >> Hello all,
>> >>
>> >> When using the bulk_uploader I get the error that there are missing
>> >> cvterms
>> >> in my database. What is the best way to handle this?
>> >>
>> >> Currently I have a script written that will insert the cvterm into the
>> >> database under a cv that I have created for "unclassified cvterms".
>> >> However,
>> >> I am not sure if this is the best practice. on top of that, I have
>> >> found
>> >> that the bulk uploader script still claims the cvterm to be missing
>> >> even
>> >> though it is a cvterm in the table. Does it check specific cv_id's when
>> >> inserting?
>> >>
>> >> Thanks,
>> >>
>> >> Andrew Oberlin
>> >>
>> >> --
>> >> Andrew Oberlin
>> >> Miami University 2013
>> >> Computer Science & Mathematics
>> >> [hidden email]  <a href="tel:%28330%29%20998-1603" value="+13309981603">(330) 998-1603
>> >>
>> >>
>> >>
>> >> ------------------------------------------------------------------------------
>> >> This SF email is sponsosred by:
>> >> Try Windows Azure free for 90 days Click Here
>> >> http://p.sf.net/sfu/sfd2d-msazure
>> >> _______________________________________________
>> >> Gmod-schema mailing list
>> >> [hidden email]
>> >> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>> >>
>> >
>> >
>> >
>> > --
>> > ------------------------------------------------------------------------
>> > Scott Cain, Ph. D.                                   scott at scottcain
>> > dot
>> > net
>> > GMOD Coordinator (http://gmod.org/)                     <a href="tel:216-392-3087" value="+12163923087">216-392-3087
>> > Ontario Institute for Cancer Research
>>
>>
>>
>> --
>> ------------------------------------------------------------------------
>> Scott Cain, Ph. D.                                   scott at scottcain
>> dot net
>> GMOD Coordinator (http://gmod.org/)                     <a href="tel:216-392-3087" value="+12163923087">216-392-3087
>> Ontario Institute for Cancer Research
>
>
>
>
> --
> Andrew Oberlin
> Miami University 2013
> Computer Science & Mathematics
> [hidden email]  <a href="tel:%28330%29%20998-1603" value="+13309981603">(330) 998-1603



--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     <a href="tel:216-392-3087" value="+12163923087">216-392-3087
Ontario Institute for Cancer Research



--
Andrew Oberlin
Miami University 2013
Computer Science & Mathematics
[hidden email]  (330) 998-1603


------------------------------------------------------------------------------
Better than sec? Nothing is better than sec when it comes to
monitoring Big Data applications. Try Boundary one-second
resolution app monitoring today. Free.
http://p.sf.net/sfu/Boundary-dev2dev
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema