Gene Names

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Gene Names

Daniel Quest
All,

I encountered what should be an easy to answer question.  We are
loading COGs and InterPro's into Chado (DB and DBXREF tables).  In
this process, we are having some trouble figuring out where to place
the following concepts:

1) GeneName.  This is the name that we should call the gene as indexed
by a database such as TiGERFAM.  e.g. dnaA.  It is not something
associated with a feature.  How is this represented in Chado?

2) Sanatized Descriptions.  Genbank does not take gene descriptions
for some databases, so we have to change the descriptions in the
source databases so we can make the product assignments.  How is this
represented?

3) Short Names.  Some tools have an abbreviated name that is used to
describe the database entry.  Again, the Chado way?

4) Type.  Databases have thresholds and type classifications e.g.
Family, Domain, Repeat.  Sometimes we may wish to make a product
assignment based on the type of type classification that is used in
the reference database. (e.g. some products can be assigned with high
confidence based on computational analysis and others not).

5) Catagory COGS have catagories e.g. A, B, J, H ...

I could change the tables and add CV terms and so on in an arbitrary
fashion, but it would be so much better to know what everyone else
tends to do.  Thanks so much for your thoughts.

-Daniel

------------------------------------------------------------------------------
This SF.net email is sponsored by

Make an app they can't live without
Enter the BlackBerry Developer Challenge
http://p.sf.net/sfu/RIM-dev2dev 
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Gene Names

Don Gilbert-2-3
Daniel,

See my suggestions below, this was once standard for chado storage,
but others may differ.

See also http://gmod.org/wiki/Sample_Chado_SQL
http://gmod.org/wiki/Chado_Tables

- Don Gilbert

|From: Daniel Quest <[hidden email]>
|
|I encountered what should be an easy to answer question.  We are
|loading COGs and InterPro's into Chado (DB and DBXREF tables).  In
|this process, we are having some trouble figuring out where to place
|the following concepts:
|
|1) GeneName.  This is the name that we should call the gene as indexed
|by a database such as TiGERFAM.  e.g. dnaA.  It is not something
|associated with a feature.  How is this represented in Chado?

depends on if this is the primary name of a gene feature
 if yes, feature.name = 'dnaA'
 otherwise make it a
feature_synonym fs, synonym s ('dnaA')
  with fs.pub_id  = TiGERFAM publication link
  or s.cvterm = TiGERFAM dbxref link

|2) Sanatized Descriptions.  Genbank does not take gene descriptions
|for some databases, so we have to change the descriptions in the
|source databases so we can make the product assignments.  How is this
|represented?

featureprop fp, cvterm
  fp includes value string and cvterm link for origin details

|3) Short Names.  Some tools have an abbreviated name that is used to
|describe the database entry.  Again, the Chado way?

feature_synonym fs, synonym s
  s has the values, fs is the feature <> synonyms linkk

|4) Type.  Databases have thresholds and type classifications e.g.
|Family, Domain, Repeat.  Sometimes we may wish to make a product
|assignment based on the type of type classification that is used in
|the reference database. (e.g. some products can be assigned with high
|confidence based on computational analysis and others not).

feature f, cvterm for the primary type of a feature
featureprop fp, cvterm for secondary types

|5) Catagory COGS have catagories e.g. A, B, J, H ...

featureprop fp, cvterm for secondary types
  where COGS is a cvterm.cv

|I could change the tables and add CV terms and so on in an arbitrary
|fashion, but it would be so much better to know what everyone else
|tends to do.  Thanks so much for your thoughts.

Those who use Chado dbs extensively will find that making your own
CV, CVTERM sets as needed is the way to go.  E.g. for each public database
you import terms, types, IDs from you would likely want a new CV, CVTerm set
with DBxref links to the source (and pub links for source details, eg. published
as ...).  So COGS, TiGERFAM, .. all become DBxref
entries, with appropiate linked CV and CVTerm lists, which you use in
identifying/updating gene feature and analysis properties, synonyms, etc.


|-Daniel
|
|------------------------------------------------------------------------------
|This SF.net email is sponsored by
|
|Make an app they can't live without
|Enter the BlackBerry Developer Challenge
|http://p.sf.net/sfu/RIM-dev2dev 
|_______________________________________________
|Gmod-schema mailing list
|[hidden email]
|https://lists.sourceforge.net/lists/listinfo/gmod-schema
|

------------------------------------------------------------------------------
This SF.net email is sponsored by

Make an app they can't live without
Enter the BlackBerry Developer Challenge
http://p.sf.net/sfu/RIM-dev2dev 
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Gene Names

Scott Cain
In reply to this post by Daniel Quest
Hi Daniel,

While I don't know for sure what others do in this case, it seems
intuitive to me that all of these items would go in dbxrefprop, with
the possible exception of (1), if the GeneName would be properly
represented as an accession instead (though I'm guessing there is a
proper accession in your case).  You'll either want to find
(preferable) or create (less ideal) a controlled vocabulary for these
items and then store them all as tag value pairs in dbxrefprop for
each entry.

Scott


On Fri, Aug 6, 2010 at 1:41 PM, Daniel Quest <[hidden email]> wrote:

> All,
>
> I encountered what should be an easy to answer question.  We are
> loading COGs and InterPro's into Chado (DB and DBXREF tables).  In
> this process, we are having some trouble figuring out where to place
> the following concepts:
>
> 1) GeneName.  This is the name that we should call the gene as indexed
> by a database such as TiGERFAM.  e.g. dnaA.  It is not something
> associated with a feature.  How is this represented in Chado?
>
> 2) Sanatized Descriptions.  Genbank does not take gene descriptions
> for some databases, so we have to change the descriptions in the
> source databases so we can make the product assignments.  How is this
> represented?
>
> 3) Short Names.  Some tools have an abbreviated name that is used to
> describe the database entry.  Again, the Chado way?
>
> 4) Type.  Databases have thresholds and type classifications e.g.
> Family, Domain, Repeat.  Sometimes we may wish to make a product
> assignment based on the type of type classification that is used in
> the reference database. (e.g. some products can be assigned with high
> confidence based on computational analysis and others not).
>
> 5) Catagory COGS have catagories e.g. A, B, J, H ...
>
> I could change the tables and add CV terms and so on in an arbitrary
> fashion, but it would be so much better to know what everyone else
> tends to do.  Thanks so much for your thoughts.
>
> -Daniel
>
> ------------------------------------------------------------------------------
> This SF.net email is sponsored by
>
> Make an app they can't live without
> Enter the BlackBerry Developer Challenge
> http://p.sf.net/sfu/RIM-dev2dev
> _______________________________________________
> Gmod-schema mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>



--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research

------------------------------------------------------------------------------
This SF.net email is sponsored by

Make an app they can't live without
Enter the BlackBerry Developer Challenge
http://p.sf.net/sfu/RIM-dev2dev 
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Gene Names

Daniel Quest
Thanks everyone very helpful!
D

Sent from my iPod

On Aug 6, 2010, at 3:03 PM, Scott Cain <[hidden email]> wrote:

> Hi Daniel,
>
> While I don't know for sure what others do in this case, it seems
> intuitive to me that all of these items would go in dbxrefprop, with
> the possible exception of (1), if the GeneName would be properly
> represented as an accession instead (though I'm guessing there is a
> proper accession in your case).  You'll either want to find
> (preferable) or create (less ideal) a controlled vocabulary for these
> items and then store them all as tag value pairs in dbxrefprop for
> each entry.
>
> Scott
>
>
> On Fri, Aug 6, 2010 at 1:41 PM, Daniel Quest <[hidden email]> wrote:
>> All,
>>
>> I encountered what should be an easy to answer question.  We are
>> loading COGs and InterPro's into Chado (DB and DBXREF tables).  In
>> this process, we are having some trouble figuring out where to place
>> the following concepts:
>>
>> 1) GeneName.  This is the name that we should call the gene as indexed
>> by a database such as TiGERFAM.  e.g. dnaA.  It is not something
>> associated with a feature.  How is this represented in Chado?
>>
>> 2) Sanatized Descriptions.  Genbank does not take gene descriptions
>> for some databases, so we have to change the descriptions in the
>> source databases so we can make the product assignments.  How is this
>> represented?
>>
>> 3) Short Names.  Some tools have an abbreviated name that is used to
>> describe the database entry.  Again, the Chado way?
>>
>> 4) Type.  Databases have thresholds and type classifications e.g.
>> Family, Domain, Repeat.  Sometimes we may wish to make a product
>> assignment based on the type of type classification that is used in
>> the reference database. (e.g. some products can be assigned with high
>> confidence based on computational analysis and others not).
>>
>> 5) Catagory COGS have catagories e.g. A, B, J, H ...
>>
>> I could change the tables and add CV terms and so on in an arbitrary
>> fashion, but it would be so much better to know what everyone else
>> tends to do.  Thanks so much for your thoughts.
>>
>> -Daniel
>>
>> ------------------------------------------------------------------------------
>> This SF.net email is sponsored by
>>
>> Make an app they can't live without
>> Enter the BlackBerry Developer Challenge
>> http://p.sf.net/sfu/RIM-dev2dev
>> _______________________________________________
>> Gmod-schema mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>
>
>
>
> --
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   scott at scottcain dot net
> GMOD Coordinator (http://gmod.org/)                     216-392-3087
> Ontario Institute for Cancer Research

------------------------------------------------------------------------------
This SF.net email is sponsored by

Make an app they can't live without
Enter the BlackBerry Developer Challenge
http://p.sf.net/sfu/RIM-dev2dev 
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema