[Chado-schema] loading GFF3 file

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

[Chado-schema] loading GFF3 file

Jieyu Wang
Dear Colleagues:
I am a graduate assistant in Dr. Hu Jim's lab. I am new to Chado and
have a few questions.
The core script to load gff file is gmod_bulk_load_gff3.pl, I have
some questions after reading the helpdoc for this script.
1. I wonder is there any reference  showing how Chado generates the
IDs. For example, the first column of my gff file in the first row is
'NC_001417', and it is  read into featureloc table as
srcfeature_id (it is 4 when I checked).
I believe Chado has its own way to make sure unique IDs in each table, I
just don't know how.
2. For custom tags in column 9, it is said in the document, the custom
tag can only be read into featureprop table. Is that true? And is the
collection of tags in the document complete? I see only ID, Alias,
Dbxref, Gap, Note, Ontology_term, Target tags. But I found in column 9
something like ''locus_tag", "product" , "gene" and so on, which I
think it is common property...How do I know how Chado handle these
tags?
3. When the Ontology_term tags are used, items from the SO will be
processed automatically? How "processed"? Which tables the information
is output?  Because I thought when loading ontology during installation of
Chado, the ontology are read to the cv and cvterm table.

Since I am new to Chado,  any hint or good reference would be of great
help to me. Thank you.

------------------------------------------------------------------------------
The Palm PDK Hot Apps Program offers developers who use the
Plug-In Development Kit to bring their C/C++ apps to Palm for a share
of $1 Million in cash or HP Products. Visit us here for more details:
http://ad.doubleclick.net/clk;226879339;13503038;l?
http://clk.atdmt.com/CRS/go/247765532/direct/01/
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: [Chado-schema] loading GFF3 file

Scott Cain
Hi Jieyu,

I'll try to answer all of these questions inline, but let me know if I
assume too much! :-)

Scott


On Tue, Jul 27, 2010 at 11:14 AM, Jieyu Wang <[hidden email]> wrote:

> Dear Colleagues:
> I am a graduate assistant in Dr. Hu Jim's lab. I am new to Chado and
> have a few questions.
> The core script to load gff file is gmod_bulk_load_gff3.pl, I have
> some questions after reading the helpdoc for this script.
> 1. I wonder is there any reference  showing how Chado generates the
> IDs. For example, the first column of my gff file in the first row is
> 'NC_001417', and it is  read into featureloc table as
> srcfeature_id (it is 4 when I checked).
> I believe Chado has its own way to make sure unique IDs in each table, I
> just don't know how.

I think you might be confusing primary keys with the GFF ID, which are
completely unrelated.  Chado uses PostgreSQL's sequences to create
primary keys (featureloc.srcfeature_id, which you reference above) is
a foreign key referencing NC_001417's entry in the feature table.

On the other hand, the feature table has a uniquename column, for
which the loader tries to use the GFF ID if it can, otherwise it will
"uniqify" it, but appending the feature_id to the GFF ID.

> 2. For custom tags in column 9, it is said in the document, the custom
> tag can only be read into featureprop table. Is that true? And is the
> collection of tags in the document complete? I see only ID, Alias,
> Dbxref, Gap, Note, Ontology_term, Target tags. But I found in column 9
> something like ''locus_tag", "product" , "gene" and so on, which I
> think it is common property...How do I know how Chado handle these
> tags?

Please see the GFF3 spec:

  http://www.sequenceontology.org/gff3.shtml

The tags you mentioned are reserved in the GFF3 spec (they all start
with an initial upper case letter), and some of them get special
treatment when being loaded into Chado.  For example, Alias tags
result in entries in the synonym and feature_synonym tables.  The
other tags (which start with a lowercase letter) can be anything the
author of the GFF3 wants, and they result in entries in the
featureprop table, where the tag is a foreign key out to the cvterm
table, as featureprop.cvterm (if the term doesn't exist yet, the
loader will create one for it), and the value is put in
featureprop.value.

> 3. When the Ontology_term tags are used, items from the SO will be
> processed automatically? How "processed"? Which tables the information
> is output?  Because I thought when loading ontology during installation of
> Chado, the ontology are read to the cv and cvterm table.

When someone uses Ontology_term tags in GFF, they are stating
something about the feature being referenced, like its association
with a gene ontology term.  If the ontology is already loaded in to
Chado (meaning there are entries in the db, dbxref, cv, and cvterm
tables for them), the loader will take an entry like
"Ontology_term=GO:0012345" and create an entry in the feature_cvterm
table linking the feature and the referenced ontology term (cvterm).
If the ontology is not already loaded, the loader will issue a warning
and do nothing with that entry.

>
> Since I am new to Chado,  any hint or good reference would be of great
> help to me. Thank you.
>
> ------------------------------------------------------------------------------
> The Palm PDK Hot Apps Program offers developers who use the
> Plug-In Development Kit to bring their C/C++ apps to Palm for a share
> of $1 Million in cash or HP Products. Visit us here for more details:
> http://ad.doubleclick.net/clk;226879339;13503038;l?
> http://clk.atdmt.com/CRS/go/247765532/direct/01/
> _______________________________________________
> Gmod-schema mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>



--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research

------------------------------------------------------------------------------
The Palm PDK Hot Apps Program offers developers who use the
Plug-In Development Kit to bring their C/C++ apps to Palm for a share
of $1 Million in cash or HP Products. Visit us here for more details:
http://ad.doubleclick.net/clk;226879339;13503038;l?
http://clk.atdmt.com/CRS/go/247765532/direct/01/
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema