Suggested revisions to Chado

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Suggested revisions to Chado

Andrew McArthur
Hello all,

We've been using Chado for over 2 years now for development of the Comprehensive Antibiotic Resistance Database (http://arpcard.mcmaster.ca).  Chado and GMOD have been an excellent resource for building this database.  Based on our experience, I would like to suggest a few tweaks to the Chado schema, outlined below.

Organisms and Genetic Codes

On of the challenges of our effort is that our database includes over 200 species, subspecies, and strains of prokaryotes.  Currently Chado has the GENCODE, GENCODE_STARTCODON, and GENCODE_CODON_AA tables for storing genetic codes.  The prokaryotes we've been loading follow the "Bacterial, Archaeal, Plant Plastid Code".  This code includes alternate stop codons, so we have added a GENCODE_STOPCODON table to Chado that mirrors the format of the GENCODE_STARTCODON table.

We've been using ontologies to track the relationships between the various species, subspecies, and strains in our database and thus have not been making proper use of the ORGANISM table, something we plan to address in the future.  However, we have added a ORGANISM_GENCODE table to Chado to connect different organisms to different genetic codes in the future with the fields:

organism_gencode_id
organism_id
gencode_id

Presumably this could be flushed out with supporting tables storing properties, links to publications, links to dbxref, etc.  

Since each feature_id in Chado is joined to an organism_id via the feature.organism_id field, adding ORGANISM_GENCODE makes it possible for Chado to associate different genetic codes with features that come from different organisms.  This is not entirely satisfactory since a single animal species would contain both chromosomal and mitochondrial DNA with different genetic codes.  Perhaps a feature.gencode_id field is needed for this case?

Expression Module

We've been using the Expression module to store crystal structure information for expressed proteins.  This data comes from PDB.  As such, we had to add a EXPRESSION_DBXREF table to the Expression module to store PDB accessions.

CV Module

Our database includes a custom ontology relating to antibiotic resistance and has a cvterm details page outlining all the information known and associated with a cvterm.  This includes key publications.  As such, we had to add a CVTERM_PUB table to Chado.  We also noticed that the initial load of the Gene and Sequence Ontologies included PMID associated with cvterm via CVTERM_DBXREF.  We wrote scripts to load the Publication module and CVTERM_PUB with those publications so they would show up when examining a SO or GO cvterm.

Publication Module

Outside of the CVTERM_PUB table above, we wanted to store citation abstracts in the Publication module.  In retrospect, perhaps we were meant to store these via the PUBPROP table, but instead we opted to add the field pub.abstract.  Is there a Chado standard practice for storing abstracts in the Publication module that we should instead adopt?

Images

We noted the EIMAGE table in the Expression module documentation for storage of uuencoded image.  The table EXPRESSION_IMAGE exists to associate expression module data with images.  We have considered building an Image Module by adding the similar tables PUB_IMAGE and CVTERM_IMAGE.  Presumably the Organism, Phenotype, and Natural Diversity modules could use such tables to store images?

As usual, thanks to all the GMOD and Chado developers for this excellent resource.

Sincerely,
Andrew McArthur

------
Andrew G. McArthur, Ph.D.
Bioinformatics Consulting Services
Phone: 905.296.3252, Mobile: 905.745.2794, Fax: 647.439.0829
Skype: agmcarthur

------
Based in Gothenburg, Sweden July 2012 through August 2013.
Gothenburg is 6 hours ahead of the Eastern Time Zone (e.g. Toronto/Boston) and 9 hours ahead of the Pacific Time Zone (e.g. Los Angeles).









------------------------------------------------------------------------------
Virtualization & Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: Suggested revisions to Chado

Sook Jung
Hello,

We use chado to develop several crop genetic, genomic and breeding
databases. An Image module will be really helpful and we are actually
debating now whether we should add custom 'Image' modules or not. We
will need two tables, image and imageprop and linker tables that link
image to feature, stock, organism, genotype, etc (feature_image,
stock_image, etc).

image (image_id, name, description, contact_id, type_id)

Other option, without changing the current schema, is just to use prop
tables to store these data. (prop.type_id = cvterm_id for image_uri,
image_contact, image_title, etc) for each prop tables such as
featureprop, stockprop, etc.

Looking forward to hearing what other people think.

Sook

> Images
>
> We noted the EIMAGE table in the Expression module documentation for storage
> of uuencoded image.  The table EXPRESSION_IMAGE exists to associate
> expression module data with images.  We have considered building an Image
> Module by adding the similar tables PUB_IMAGE and CVTERM_IMAGE.  Presumably
> the Organism, Phenotype, and Natural Diversity modules could use such tables
> to store images?
>
> As usual, thanks to all the GMOD and Chado developers for this excellent
> resource.
>
> Sincerely,
> Andrew McArthur
>
> ------
> Andrew G. McArthur, Ph.D.
> Bioinformatics Consulting Services
> Email: [hidden email], Web: http://mcarthurbioinformatics.ca/
> Phone: 905.296.3252, Mobile: 905.745.2794, Fax: 647.439.0829
> Skype: agmcarthur
>
> ------
> Based in Gothenburg, Sweden July 2012 through August 2013.
> Gothenburg is 6 hours ahead of the Eastern Time Zone (e.g. Toronto/Boston)
> and 9 hours ahead of the Pacific Time Zone (e.g. Los Angeles).
>
>
>
>
>
>
>
>
>
> ------------------------------------------------------------------------------
> Virtualization & Cloud Management Using Capacity Planning
> Cloud computing makes use of virtualization - but cloud computing
> also focuses on allowing computing to be delivered as a service.
> http://www.accelacomm.com/jaw/sfnl/114/51521223/
> _______________________________________________
> Gmod-schema mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>



--
Sook Jung, PhD
Assistant Research Professor of Bioinformatics
Dept of Horticulture and Landscape Architecture
Washington State University
45 Johnson Hall, Pullman, WA 99164-6414
Email:[hidden email]

------------------------------------------------------------------------------
Virtualization & Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema