Quantcast

[Gmod-phendiver] Chado phenotype table

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[Gmod-phendiver] Chado phenotype table

Dan Bolser
Lets do it.

Could it just be an 'owl store'?



On 8 February 2012 14:15, Naama Menda <[hidden email]> wrote:

> this may be a good opportunity look again at the phenotype module,
> which we constantly delayed due to lack of agreement how phenotypes should
> be stored.
> The only thing we could agree on id that the phenotype module is outdated
> and needs revising.
>
> http://gmod.org/wiki/Talk:Chado_Natural_Diversity_Module/natdiv_schema_changes_call
>
>
> -Naama
>
>
> On Wed, Feb 8, 2012 at 5:44 AM, seth redmond <[hidden email]>
> wrote:
>>
>> I think expanding this to entity-quality style postcomposed terms would
>> solve a lot of problems, perhaps something like this?:
>> ##phenotype-description
>> Entity=GO:0035011;Ontology=http://purl.obolibrary.org/obo/go.oboQuality=PATO:0001650;Ontology=http://purl.obolibrary.org/obo/pato.obo
>>
>> Probably not a substitute for those who are recording individual-level
>> data (i.e. without having calculated associations / QTLs) or who require
>> details of the assay used, but I'm sure it would prove useful for curated
>> data.
>>
>>
>>
>> On 8 Feb 2012, at 00:52, Chris Mungall wrote:
>>
>> >
>> > I put together some recommendations for including phenotype annotations
>> > in GVF files:
>> >
>> >
>> > http://www.sequenceontology.org/wiki/index.php/Using_Phenotype_Ontologies_in_GVF
>> >
>> > VCF seems similar enough that the same syntax and recommendations should
>> > apply.
>> >
>> > It's designed to use simple pre-existing phenotype terms from ontologies
>> > like the human phenotype ontology. In principle it could be expanded to
>> > cover more expressive modes of describing the phenotypes, but this might be
>> > better done in a separate OWL file.
>> >
>> > On Feb 7, 2012, at 1:24 PM, Dan Bolser wrote:
>> >
>> >> Cheers Seth,
>> >>
>> >> I'm still trying to get my head round how all the ontologies fit
>> >> together practically and technologically, however, it won't be my
>> >> responsibility to work on this directly, so it's only for fun.
>> >>
>> >> An interesting idea came up on the VCF list to define a 'large scale
>> >> phenotype format', so I jumped in with a suggestion, but it seems the
>> >> thread has gone cold, nobody replied.
>> >>
>> >> I suspect I need to read more about how annotation is done in
>> >> practice, and about the various 'tab' formats.
>> >>
>> >>
>> >> Thanks all for help,
>> >> Dan.
>> >>
>> >> On 7 February 2012 20:54, seth redmond <[hidden email]> wrote:
>> >>> Dan,
>> >>>
>> >>> When arranging the VB system we largely sidestepped the problem of
>> >>> genotypes
>> >>> since we were already committed to supporting ensembl variation. When
>> >>> I last
>> >>> looked into it (around five months ago now), ens-var was more than
>> >>> capable
>> >>> of handling a large number of individuals  but at the cost of some
>> >>> curation
>> >>> if you want to make them readable - e.g. it's trivial enough to import
>> >>> VCFs,
>> >>> but there's as yet no facility for rearranging these into different
>> >>> population groupings without just reloading from different VCFs. Paul
>> >>> Derwent could tell you more about this (I've now left VB, but I could
>> >>> also
>> >>> dig around for my notes if this would be useful).
>> >>>
>> >>> The genotype module attached to phendiver is somewhat underdeveloped
>> >>> for our
>> >>> purposes and I think this would be a good target for those interested
>> >>> to aim
>> >>> at next. Personally the idea of tracking a full snp-chip or illumina
>> >>> run's
>> >>> worth of genotypes in phendiver unnerves me, but I'd be interested to
>> >>> hear
>> >>> how Naama got on with it? i.e. how many variant loci you're SOL deals
>> >>> with?
>> >>>
>> >>> Finally you may well have encountered these already, but looking
>> >>> through the
>> >>> list of things you need terms for, there are some general ontologies
>> >>> that a
>> >>> number of us are using for these: most obviously PATO for the
>> >>> phenotypes, UO
>> >>> for the measurement/units and GAZ (gazeteer) for the locations.
>> >>>
>> >>> Bob can fill you in better than I can on the ISA-tab loaders, but if
>> >>> there's
>> >>> anything I can help with let me know.
>> >>>
>> >>> -s
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Seth Redmond
>> >>>  Unité Génetique et Génomique des Insectes Vecteurs
>> >>>  Institut Pasteur
>> >>>  28,rue du Dr Roux
>> >>>  75724 PARIS
>> >>> [hidden email]
>> >>>
>> >>> On 7 Feb 2012, at 14:25, Dan Bolser wrote:
>> >>>
>> >>> Hi Bob,
>> >>>
>> >>> We're looking to build (like everyone else) a scaleable data archive
>> >>> for large-scale genotyping / phenotyping projects. Currently there is
>> >>> no 'phenotype archive' or even phenotype file format (that I know of),
>> >>> and it seems to make sense to marry phenotype and genotype data in one
>> >>> place (and to do that in a 'standard' way wherever possible).
>> >>>
>> >>> I think it's important to look at all available solutions before
>> >>> making a decision about what to do and how to do it, Ensembl variation
>> >>> included.
>> >>>
>> >>> I've just been hearing about ISA-tab in the context of the new
>> >>> BioSamples database at the EBI, which is part of the solution, but I
>> >>> don't know much about those formats yet TBH.
>> >>>
>> >>> Currently, I'm thinking that we need to bring in ontologies, terms, or
>> >>> URIs to explain:
>> >>>
>> >>> 1) experiment
>> >>> 2) measurement
>> >>> 3) phenotype
>> >>> 4) attribute
>> >>> 5) environment
>> >>> 6) individual
>> >>>
>> >>> and combine one term for every 'value' recorded in the phenotyping
>> >>> database, and then link those to SNPs via the individual. Although the
>> >>> SNPs could be stored in Ensembl variation or VCF or GVF, the main
>> >>> issue is to keep track of the 'individual' via some, probably
>> >>> external, accession number. How many samples can you cram into Ensembl
>> >>> variation?
>> >>>
>> >>> If this is being done in phendiver / chado, I'm keen to learn a) how,
>> >>> b) who, and c) performance.
>> >>>
>> >>> I'm not part of the VectorBase project, but I guess Paul K is? It
>> >>> would perhaps be good to get in a room with Paul and you at some point
>> >>> (I'm free at most times, but have some specific appointments).
>> >>>
>> >>> This is work that were hoping to do in the near future, but we're not
>> >>> actively working on it yet. I'm trying to get as much background
>> >>> information as I can in the mean time.
>> >>>
>> >>>
>> >>> Cheers,
>> >>> Dan.
>> >>>
>> >>> On 7 February 2012 11:34, Bob MacCallum <[hidden email]>
>> >>> wrote:
>> >>>
>> >>> Hi Dan,
>> >>>
>> >>> I guess you considered Ensembl variation for the SNPs.  It would be
>> >>>
>> >>> interesting to hear why you're looking into other options.  At
>> >>>
>> >>> VectorBase we're hoping Ensembl will handle our genomic needs while
>> >>>
>> >>> Chado handles the more unpredictable experimental data, phenotypes and
>> >>>
>> >>> sample collection meta-data more flexibly.  Obviously we have to
>> >>>
>> >>> bridge the two but think (hope) that is doable.
>> >>>
>> >>> I'll be up at EBI Weds-Fri this week with the VectorBase all-hands
>> >>>
>> >>> meeting, so maybe we could meet?
>> >>>
>> >>> (I'll summarise for the mailing list as appropriate.)
>> >>>
>> >>> For the record, we have written a prototype ISA-Tab ->
>> >>>
>> >>> Bio::Chado::Schema -> Chado loader (and in the other direction a web
>> >>>
>> >>> service and AJAX-heavy web interface
>> >>>
>> >>> http://funcgen.vectorbase.org/PopulationBETA/), but it's likely to go
>> >>>
>> >>> through some changes and is regrettably quite VectorBase specific at
>> >>>
>> >>> the moment.
>> >>>
>> >>> cheers,
>> >>>
>> >>> Bob.
>> >>>
>> >>>
>> >>> On Sat, Feb 4, 2012 at 3:13 PM, Dan Bolser <[hidden email]> wrote:
>> >>>
>> >>> Thanks Naama,
>> >>>
>> >>>
>> >>> Looks like I need to read a lot! :-)
>> >>>
>> >>>
>> >>> I think a VCF or GVF loader would be a good project to help
>> >>>
>> >>> standardize usage. There was some talk on the VCF mailing list about
>> >>>
>> >>> including phenotype information per #SAMPLE in that format, which I
>> >>>
>> >>> think would be a big help too.
>> >>>
>> >>>
>> >>>
>> >>> Thanks again for the info and links,
>> >>>
>> >>> Dan.
>> >>>
>> >>>
>> >>> On 4 February 2012 02:40, Naama Menda <[hidden email]> wrote:
>> >>>
>> >>> hi Dan (Im CC'ing the gmod schema and phendiver. lists)
>> >>>
>> >>>
>> >>> the phenotype module has not been modified yet, except for adding a
>> >>> nullable
>> >>>
>> >>> 'name' field.
>> >>>
>> >>> I think it has been working out for most people, but the idea is to
>> >>> make it
>> >>>
>> >>> more normalized, like the rest of Chado's modules, eliminating the
>> >>> multiple
>> >>>
>> >>> columns linking to 'cvterm' ,
>> >>>
>> >>> and replacing with other linking tables that would provide more
>> >>> structured
>> >>>
>> >>> way for storing phenotypes, having only the phenotype measurement in
>> >>> the
>> >>>
>> >>> phenotype table, and factoring out the semantics.
>> >>>
>> >>> Some people probably disagree with this approach, and I think the main
>> >>>
>> >>> problem is the broad definition of a phenotype, and the multiple ways
>> >>> for
>> >>>
>> >>> storing a phenotype in Chado.
>> >>>
>> >>>
>> >>> We had long long discussions about post-composing  . You can read some
>> >>> notes
>> >>>
>> >>> here
>> >>> http://gmod.org/wiki/Chado_Natural_Diversity_Module/natdiv_schema_changes_call
>> >>>
>> >>> and here
>> >>> http://sourceforge.net/mailarchive/message.php?msg_id=27597482
>> >>>
>> >>>
>> >>> There are no formal loaders for Natural Diversity schema, simply
>> >>> because
>> >>>
>> >>> there are many different ways to load your custom data.
>> >>>
>> >>> You can see some examples from the data loaded in SGN here
>> >>>
>> >>>
>> >>> https://github.com/solgenomics/Phenome/blob/master/bin/loading_scripts/solcap/load_solcap_TA_phenotypes.pl
>> >>>
>> >>>
>> >>> As you can see this is very data-specific, and makes many assumptions
>> >>> on
>> >>>
>> >>> your experiment design and metadata.
>> >>>
>> >>>
>> >>> Likewise, there is no formal way for storing QTL data. In general, you
>> >>> want
>> >>>
>> >>> to store your accessions in the stock table, create a new
>> >>> nd_experiment for
>> >>>
>> >>> each measurement, and link it with the stock via nd_experiment_stock
>> >>> and to
>> >>>
>> >>> the genotype via nd_experiment_genotype.
>> >>>
>> >>> The genotype table is mostly a spaceholder for a genotype name, and
>> >>> links to
>> >>>
>> >>> the feature table where you can load a marker, or whatever feature you
>> >>>
>> >>> need.
>> >>>
>> >>>
>> >>> The Natural Diversity paper does not elaborate on storing genotypes
>> >>> since it
>> >>>
>> >>> describes how to store experimental data (genotyping or phenotyping)
>> >>> in a
>> >>>
>> >>> generic and re-usable way. Genotypes and phenotypes are stored in
>> >>> different
>> >>>
>> >>> modules, which link back to the ND schema.
>> >>>
>> >>> It's power is in the ability to go back and forth from a 'stock' to
>> >>> its
>> >>>
>> >>> genotyping and phenotyping data, or generate new stocks from
>> >>>
>> >>> pheno/genotyping experiments.
>> >>>
>> >>> The examples in the paper talk mostly about phenotypes, since these
>> >>> are much
>> >>>
>> >>> more complex to handle than genotypes.
>> >>>
>> >>>
>> >>> The schema is a bit complicated, but after a year of discussions the
>> >>> working
>> >>>
>> >>> group came up with this model which seems to be generic enough to
>> >>> accomodate
>> >>>
>> >>> all the use cases we brought up.
>> >>>
>> >>>
>> >>> You can add your use case to the ND module wiki page, and see from
>> >>> there how
>> >>>
>> >>> it fits in
>> >>> (http://gmod.org/wiki/Chado_Natural_Diversity_Module#Use_Cases)
>> >>>
>> >>> and get more feedback from the mailing list.
>> >>>
>> >>>
>> >>> Hope this makes things a bit clearer!
>> >>>
>> >>> -Naama
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> On Fri, Feb 3, 2012 at 6:18 PM, Dan Bolser <[hidden email]> wrote:
>> >>>
>> >>>
>> >>> Hi,
>> >>>
>> >>>
>> >>> I'm not clear weather the update to the phenotype module has been done
>> >>>
>> >>> or not? In general, does the old phenotype stuff do what's needed?
>> >>>
>> >>>
>> >>> BTW, where can I read more about 'post-composed' terms? I tried
>> >>>
>> >>> Goggle, but couldn't find any good reference material... I'm
>> >>>
>> >>> interested in the examples of combining different ontologies into the
>> >>>
>> >>> description of a single value.
>> >>>
>> >>>
>> >>> Do you have VCF / GVF loaders written for the schema?
>> >>>
>> >>>
>> >>> I'm still not clear how 'QTL data' is stored... are the results of QTL
>> >>>
>> >>> algorithms just that wiggly line? i.e. trivial to store? (Sorry for
>> >>>
>> >>> confusion).
>> >>>
>> >>>
>> >>> I made some updates on the wiki page. I didn't realize it until I
>> >>>
>> >>> started to try to re-work it, but the abstract of the paper doesn't
>> >>>
>> >>> mention storing genotype data in the database. It focuses on
>> >>>
>> >>> describing 'experiments'. Perhaps you could take a look and improve
>> >>>
>> >>> what I wrote there:
>> >>>
>> >>>
>> >>> http://gmod.org/wiki/Chado_Natural_Diversity_Module_Working_Group
>> >>>
>> >>>
>> >>>
>> >>> In the mean time I'll join the mailing list for the working group.
>> >>>
>> >>>
>> >>>
>> >>> Thanks again for the infos.,
>> >>>
>> >>>
>> >>> Dan.
>> >>>
>> >>>
>> >>> On 3 February 2012 12:19, Isaak Yosief Tecle <[hidden email]> wrote:
>> >>>
>> >>> Hi Dan,
>> >>>
>> >>>
>> >>> SGN is now using a db schema called Natural Diversity to store genetic
>> >>>
>> >>> and phenotypic variation data. It is a GMOD/Chado schema developed by
>> >>>
>> >>> SGN and collaborators. It is also used by other multiple databases
>> >>> which
>> >>>
>> >>> you will find listed in the publication below.
>> >>>
>> >>>
>> >>> A documentation of the schema:
>> >>>
>> >>> http://database.oxfordjournals.org/content/2011/bar051.full
>> >>>
>> >>>
>> >>> Some info on the working group:
>> >>>
>> >>> http://gmod.org/wiki/Chado_Natural_Diversity_Module_Working_Group
>> >>>
>> >>>
>> >>> Mailing lists:
>> >>>
>> >>> https://lists.sourceforge.net/lists/listinfo/gmod-phendiver
>> >>>
>> >>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>> >>>
>> >>>
>> >>>
>> >>> Cheers,
>> >>>
>> >>> Isaak
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> On 2/3/12 12:06 PM, Dan Bolser wrote:
>> >>>
>> >>> Hi guys,
>> >>>
>> >>>
>> >>> I'm now working at 'Ensembl Genomes' a project leader of the plants
>> >>>
>> >>> division. We're looking to develop plant bifx infrastructure as part
>> >>>
>> >>> of a grant called transPLANT. Part of the work involves building a
>> >>>
>> >>> 'variation archive' for plants, which is obviously related to strain
>> >>>
>> >>> phenotyping information, population studies, and therefore, derived
>> >>>
>> >>> QTL data...
>> >>>
>> >>>
>> >>> I was wondering how you store your QTL data, and if there are any
>> >>>
>> >>> standards, emerging or defined, that we should be thinking about?
>> >>>
>> >>>
>> >>> I think it will be really good if everything we develop from the
>> >>>
>> >>> Ensembl side is coordinated with equivalent developments coming from
>> >>>
>> >>> the Chado/GMoD side, so it would be really great to work with you (as
>> >>>
>> >>> Chado/GMoD users) to ensure that that happens.
>> >>>
>> >>>
>> >>>
>> >>> Cheers,
>> >>>
>> >>> Dan.
>> >>>
>> >>> .
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>>
>> >>> -------------------------------------
>> >>>
>> >>> Isaak Yosief Tecle, PhD
>> >>>
>> >>>
>> >>> Bioinformatics Consultant
>> >>>
>> >>> to Sol Genomics Network
>> >>>
>> >>>
>> >>> Boyce Thompson Institute
>> >>>
>> >>> Cornell University
>> >>>
>> >>>
>> >>> http://sgn.cornell.edu
>> >>>
>> >>> -------------------------------------
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> _______________________________________________
>> >>>
>> >>> Cxgn-devel mailing list
>> >>>
>> >>> [hidden email]
>> >>>
>> >>> http://rubisco.sgn.cornell.edu/cgi-bin/mailman/listinfo/cxgn-devel
>> >>>
>> >>> _______________________________________________
>> >>>
>> >>> Cxgn-devel mailing list
>> >>>
>> >>> [hidden email]
>> >>>
>> >>> http://rubisco.sgn.cornell.edu/cgi-bin/mailman/listinfo/cxgn-devel
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> _______________________________________________
>> >>>
>> >>> Cxgn-devel mailing list
>> >>>
>> >>> [hidden email]
>> >>>
>> >>> http://rubisco.sgn.cornell.edu/cgi-bin/mailman/listinfo/cxgn-devel
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> ------------------------------------------------------------------------------
>> >>>
>> >>> Try before you buy = See our experts in action!
>> >>>
>> >>> The most comprehensive online learning library for Microsoft
>> >>> developers
>> >>>
>> >>> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3,
>> >>> MVC3,
>> >>>
>> >>> Metro Style Apps, more. Free future releases when you subscribe now!
>> >>>
>> >>> http://p.sf.net/sfu/learndevnow-dev2
>> >>>
>> >>> _______________________________________________
>> >>>
>> >>> Gmod-phendiver mailing list
>> >>>
>> >>> [hidden email]
>> >>>
>> >>> https://lists.sourceforge.net/lists/listinfo/gmod-phendiver
>> >>>
>> >>>
>> >>>
>> >>> ------------------------------------------------------------------------------
>> >>>
>> >>> Keep Your Developer Skills Current with LearnDevNow!
>> >>>
>> >>> The most comprehensive online learning library for Microsoft
>> >>> developers
>> >>>
>> >>> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3,
>> >>> MVC3,
>> >>>
>> >>> Metro Style Apps, more. Free future releases when you subscribe now!
>> >>>
>> >>> http://p.sf.net/sfu/learndevnow-d2d
>> >>>
>> >>> _______________________________________________
>> >>>
>> >>> Gmod-phendiver mailing list
>> >>>
>> >>> [hidden email]
>> >>>
>> >>> https://lists.sourceforge.net/lists/listinfo/gmod-phendiver
>> >>>
>> >>>
>> >>>
>> >>> ------------------------------------------------------------------------------
>> >>> Keep Your Developer Skills Current with LearnDevNow!
>> >>> The most comprehensive online learning library for Microsoft
>> >>> developers
>> >>> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3,
>> >>> MVC3,
>> >>> Metro Style Apps, more. Free future releases when you subscribe now!
>> >>> http://p.sf.net/sfu/learndevnow-d2d
>> >>> _______________________________________________
>> >>> Gmod-schema mailing list
>> >>> [hidden email]
>> >>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> ------------------------------------------------------------------------------
>> >>> Keep Your Developer Skills Current with LearnDevNow!
>> >>> The most comprehensive online learning library for Microsoft
>> >>> developers
>> >>> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3,
>> >>> MVC3,
>> >>> Metro Style Apps, more. Free future releases when you subscribe now!
>> >>> http://p.sf.net/sfu/learndevnow-d2d
>> >>> _______________________________________________
>> >>> Gmod-phendiver mailing list
>> >>> [hidden email]
>> >>> https://lists.sourceforge.net/lists/listinfo/gmod-phendiver
>> >>>
>> >>
>> >>
>> >> ------------------------------------------------------------------------------
>> >> Keep Your Developer Skills Current with LearnDevNow!
>> >> The most comprehensive online learning library for Microsoft developers
>> >> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3,
>> >> MVC3,
>> >> Metro Style Apps, more. Free future releases when you subscribe now!
>> >> http://p.sf.net/sfu/learndevnow-d2d
>> >> _______________________________________________
>> >> Gmod-schema mailing list
>> >> [hidden email]
>> >> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>> >
>>
>>
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Keep Your Developer Skills Current with LearnDevNow!
>> The most comprehensive online learning library for Microsoft developers
>> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
>> Metro Style Apps, more. Free future releases when you subscribe now!
>> http://p.sf.net/sfu/learndevnow-d2d
>> _______________________________________________
>> Gmod-phendiver mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/gmod-phendiver
>
>
>
> ------------------------------------------------------------------------------
> Keep Your Developer Skills Current with LearnDevNow!
> The most comprehensive online learning library for Microsoft developers
> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
> Metro Style Apps, more. Free future releases when you subscribe now!
> http://p.sf.net/sfu/learndevnow-d2d
> _______________________________________________
> Gmod-phendiver mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gmod-phendiver
>

------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
Gmod-phendiver mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-phendiver
Loading...