Re: [Gmod-phendiver] [Cxgn-devel] Information about QTL formats?

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: [Gmod-phendiver] [Cxgn-devel] Information about QTL formats?

Naama Menda
hi Dan (Im CC'ing the gmod schema and phendiver. lists)

the phenotype module has not been modified yet, except for adding a nullable 'name' field.
I think it has been working out for most people, but the idea is to make it more normalized, like the rest of Chado's modules, eliminating the multiple columns linking to 'cvterm' ,
and replacing with other linking tables that would provide more structured way for storing phenotypes, having only the phenotype measurement in the phenotype table, and factoring out the semantics.
Some people probably disagree with this approach, and I think the main problem is the broad definition of a phenotype, and the multiple ways for storing a phenotype in Chado.

We had long long discussions about post-composing  . You can read some notes here http://gmod.org/wiki/Chado_Natural_Diversity_Module/natdiv_schema_changes_call

There are no formal loaders for Natural Diversity schema, simply because there are many different ways to load your custom data. 
You can see some examples from the data loaded in SGN here

As you can see this is very data-specific, and makes many assumptions on your experiment design and metadata.

Likewise, there is no formal way for storing QTL data. In general, you want to store your accessions in the stock table, create a new nd_experiment for each measurement, and link it with the stock via nd_experiment_stock and to the genotype via nd_experiment_genotype.
The genotype table is mostly a spaceholder for a genotype name, and links to the feature table where you can load a marker, or whatever feature you need. 

The Natural Diversity paper does not elaborate on storing genotypes since it describes how to store experimental data (genotyping or phenotyping) in a generic and re-usable way. Genotypes and phenotypes are stored in different modules, which link back to the ND schema.
It's power is in the ability to go back and forth from a 'stock' to its genotyping and phenotyping data, or generate new stocks from pheno/genotyping experiments.
The examples in the paper talk mostly about phenotypes, since these are much more complex to handle than genotypes.

The schema is a bit complicated, but after a year of discussions the working group came up with this model which seems to be generic enough to accomodate all the use cases we brought up.

You can add your use case to the ND module wiki page, and see from there how it fits in (http://gmod.org/wiki/Chado_Natural_Diversity_Module#Use_Cases
and get more feedback from the mailing list.

Hope this makes things a bit clearer!  
-Naama

 

On Fri, Feb 3, 2012 at 6:18 PM, Dan Bolser <[hidden email]> wrote:
Hi,

I'm not clear weather the update to the phenotype module has been done
or not? In general, does the old phenotype stuff do what's needed?

BTW, where can I read more about 'post-composed' terms? I tried
Goggle, but couldn't find any good reference material... I'm
interested in the examples of combining different ontologies into the
description of a single value.

Do you have VCF / GVF loaders written for the schema?

I'm still not clear how 'QTL data' is stored... are the results of QTL
algorithms just that wiggly line? i.e. trivial to store? (Sorry for
confusion).

I made some updates on the wiki page. I didn't realize it until I
started to try to re-work it, but the abstract of the paper doesn't
mention storing genotype data in the database. It focuses on
describing 'experiments'. Perhaps you could take a look and improve
what I wrote there:

http://gmod.org/wiki/Chado_Natural_Diversity_Module_Working_Group


In the mean time I'll join the mailing list for the working group.


Thanks again for the infos.,

Dan.

On 3 February 2012 12:19, Isaak Yosief Tecle <[hidden email]> wrote:
> Hi Dan,
>
> SGN is now using a db schema called Natural Diversity to store genetic
> and phenotypic variation data. It is a GMOD/Chado schema developed by
> SGN and collaborators. It is also used by other multiple databases which
> you will find listed in the publication below.
>
> A documentation of the schema:
> http://database.oxfordjournals.org/content/2011/bar051.full
>
> Some info on the working group:
> http://gmod.org/wiki/Chado_Natural_Diversity_Module_Working_Group
>
> Mailing lists:
> https://lists.sourceforge.net/lists/listinfo/gmod-phendiver
> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>
>
> Cheers,
> Isaak
>
>
>
> On 2/3/12 12:06 PM, Dan Bolser wrote:
>> Hi guys,
>>
>> I'm now working at 'Ensembl Genomes' a project leader of the plants
>> division. We're looking to develop plant bifx infrastructure as part
>> of a grant called transPLANT. Part of the work involves building a
>> 'variation archive' for plants, which is obviously related to strain
>> phenotyping information, population studies, and therefore, derived
>> QTL data...
>>
>> I was wondering how you store your QTL data, and if there are any
>> standards, emerging or defined, that we should be thinking about?
>>
>> I think it will be really good if everything we develop from the
>> Ensembl side is coordinated with equivalent developments coming from
>> the Chado/GMoD side, so it would be really great to work with you (as
>> Chado/GMoD users) to ensure that that happens.
>>
>>
>> Cheers,
>> Dan.
>> .
>>
>
> --
> -------------------------------------
> Isaak Yosief Tecle, PhD
>
> Bioinformatics Consultant
> to Sol Genomics Network
>
> Boyce Thompson Institute
> Cornell University
>
> http://sgn.cornell.edu
> -------------------------------------
>
>
>
>
> _______________________________________________
> Cxgn-devel mailing list
> [hidden email]
> http://rubisco.sgn.cornell.edu/cgi-bin/mailman/listinfo/cxgn-devel
_______________________________________________
Cxgn-devel mailing list
[hidden email]
http://rubisco.sgn.cornell.edu/cgi-bin/mailman/listinfo/cxgn-devel


------------------------------------------------------------------------------
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
_______________________________________________
Gmod-phendiver mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-phendiver
Reply | Threaded
Open this post in threaded view
|

Re: [Gmod-phendiver] [Cxgn-devel] Information about QTL formats?

Dan Bolser
Thanks Naama,

Looks like I need to read a lot! :-)

I think a VCF or GVF loader would be a good project to help
standardize usage. There was some talk on the VCF mailing list about
including phenotype information per #SAMPLE in that format, which I
think would be a big help too.


Thanks again for the info and links,
Dan.

On 4 February 2012 02:40, Naama Menda <[hidden email]> wrote:

> hi Dan (Im CC'ing the gmod schema and phendiver. lists)
>
> the phenotype module has not been modified yet, except for adding a nullable
> 'name' field.
> I think it has been working out for most people, but the idea is to make it
> more normalized, like the rest of Chado's modules, eliminating the multiple
> columns linking to 'cvterm' ,
> and replacing with other linking tables that would provide more structured
> way for storing phenotypes, having only the phenotype measurement in the
> phenotype table, and factoring out the semantics.
> Some people probably disagree with this approach, and I think the main
> problem is the broad definition of a phenotype, and the multiple ways for
> storing a phenotype in Chado.
>
> We had long long discussions about post-composing  . You can read some notes
> here http://gmod.org/wiki/Chado_Natural_Diversity_Module/natdiv_schema_changes_call
> and here http://sourceforge.net/mailarchive/message.php?msg_id=27597482
>
> There are no formal loaders for Natural Diversity schema, simply because
> there are many different ways to load your custom data.
> You can see some examples from the data loaded in SGN here
> https://github.com/solgenomics/Phenome/blob/master/bin/loading_scripts/solcap/load_solcap_TA_phenotypes.pl
>
> As you can see this is very data-specific, and makes many assumptions on
> your experiment design and metadata.
>
> Likewise, there is no formal way for storing QTL data. In general, you want
> to store your accessions in the stock table, create a new nd_experiment for
> each measurement, and link it with the stock via nd_experiment_stock and to
> the genotype via nd_experiment_genotype.
> The genotype table is mostly a spaceholder for a genotype name, and links to
> the feature table where you can load a marker, or whatever feature you
> need.
>
> The Natural Diversity paper does not elaborate on storing genotypes since it
> describes how to store experimental data (genotyping or phenotyping) in a
> generic and re-usable way. Genotypes and phenotypes are stored in different
> modules, which link back to the ND schema.
> It's power is in the ability to go back and forth from a 'stock' to its
> genotyping and phenotyping data, or generate new stocks from
> pheno/genotyping experiments.
> The examples in the paper talk mostly about phenotypes, since these are much
> more complex to handle than genotypes.
>
> The schema is a bit complicated, but after a year of discussions the working
> group came up with this model which seems to be generic enough to accomodate
> all the use cases we brought up.
>
> You can add your use case to the ND module wiki page, and see from there how
> it fits in (http://gmod.org/wiki/Chado_Natural_Diversity_Module#Use_Cases)
> and get more feedback from the mailing list.
>
> Hope this makes things a bit clearer!
> -Naama
>
>
>
> On Fri, Feb 3, 2012 at 6:18 PM, Dan Bolser <[hidden email]> wrote:
>>
>> Hi,
>>
>> I'm not clear weather the update to the phenotype module has been done
>> or not? In general, does the old phenotype stuff do what's needed?
>>
>> BTW, where can I read more about 'post-composed' terms? I tried
>> Goggle, but couldn't find any good reference material... I'm
>> interested in the examples of combining different ontologies into the
>> description of a single value.
>>
>> Do you have VCF / GVF loaders written for the schema?
>>
>> I'm still not clear how 'QTL data' is stored... are the results of QTL
>> algorithms just that wiggly line? i.e. trivial to store? (Sorry for
>> confusion).
>>
>> I made some updates on the wiki page. I didn't realize it until I
>> started to try to re-work it, but the abstract of the paper doesn't
>> mention storing genotype data in the database. It focuses on
>> describing 'experiments'. Perhaps you could take a look and improve
>> what I wrote there:
>>
>> http://gmod.org/wiki/Chado_Natural_Diversity_Module_Working_Group
>>
>>
>> In the mean time I'll join the mailing list for the working group.
>>
>>
>> Thanks again for the infos.,
>>
>> Dan.
>>
>> On 3 February 2012 12:19, Isaak Yosief Tecle <[hidden email]> wrote:
>> > Hi Dan,
>> >
>> > SGN is now using a db schema called Natural Diversity to store genetic
>> > and phenotypic variation data. It is a GMOD/Chado schema developed by
>> > SGN and collaborators. It is also used by other multiple databases which
>> > you will find listed in the publication below.
>> >
>> > A documentation of the schema:
>> > http://database.oxfordjournals.org/content/2011/bar051.full
>> >
>> > Some info on the working group:
>> > http://gmod.org/wiki/Chado_Natural_Diversity_Module_Working_Group
>> >
>> > Mailing lists:
>> > https://lists.sourceforge.net/lists/listinfo/gmod-phendiver
>> > https://lists.sourceforge.net/lists/listinfo/gmod-schema
>> >
>> >
>> > Cheers,
>> > Isaak
>> >
>> >
>> >
>> > On 2/3/12 12:06 PM, Dan Bolser wrote:
>> >> Hi guys,
>> >>
>> >> I'm now working at 'Ensembl Genomes' a project leader of the plants
>> >> division. We're looking to develop plant bifx infrastructure as part
>> >> of a grant called transPLANT. Part of the work involves building a
>> >> 'variation archive' for plants, which is obviously related to strain
>> >> phenotyping information, population studies, and therefore, derived
>> >> QTL data...
>> >>
>> >> I was wondering how you store your QTL data, and if there are any
>> >> standards, emerging or defined, that we should be thinking about?
>> >>
>> >> I think it will be really good if everything we develop from the
>> >> Ensembl side is coordinated with equivalent developments coming from
>> >> the Chado/GMoD side, so it would be really great to work with you (as
>> >> Chado/GMoD users) to ensure that that happens.
>> >>
>> >>
>> >> Cheers,
>> >> Dan.
>> >> .
>> >>
>> >
>> > --
>> > -------------------------------------
>> > Isaak Yosief Tecle, PhD
>> >
>> > Bioinformatics Consultant
>> > to Sol Genomics Network
>> >
>> > Boyce Thompson Institute
>> > Cornell University
>> >
>> > http://sgn.cornell.edu
>> > -------------------------------------
>> >
>> >
>> >
>> >
>> > _______________________________________________
>> > Cxgn-devel mailing list
>> > [hidden email]
>> > http://rubisco.sgn.cornell.edu/cgi-bin/mailman/listinfo/cxgn-devel
>> _______________________________________________
>> Cxgn-devel mailing list
>> [hidden email]
>> http://rubisco.sgn.cornell.edu/cgi-bin/mailman/listinfo/cxgn-devel
>
>
>
> _______________________________________________
> Cxgn-devel mailing list
> [hidden email]
> http://rubisco.sgn.cornell.edu/cgi-bin/mailman/listinfo/cxgn-devel
>

------------------------------------------------------------------------------
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
_______________________________________________
Gmod-phendiver mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-phendiver
Reply | Threaded
Open this post in threaded view
|

Re: [Gmod-phendiver] [Cxgn-devel] Information about QTL formats?

Bob MacCallum
Hi Dan,
I guess you considered Ensembl variation for the SNPs.  It would be
interesting to hear why you're looking into other options.  At
VectorBase we're hoping Ensembl will handle our genomic needs while
Chado handles the more unpredictable experimental data, phenotypes and
sample collection meta-data more flexibly.  Obviously we have to
bridge the two but think (hope) that is doable.
I'll be up at EBI Weds-Fri this week with the VectorBase all-hands
meeting, so maybe we could meet?
(I'll summarise for the mailing list as appropriate.)
For the record, we have written a prototype ISA-Tab ->
Bio::Chado::Schema -> Chado loader (and in the other direction a web
service and AJAX-heavy web interface
http://funcgen.vectorbase.org/PopulationBETA/), but it's likely to go
through some changes and is regrettably quite VectorBase specific at
the moment.
cheers,
Bob.

On Sat, Feb 4, 2012 at 3:13 PM, Dan Bolser <[hidden email]> wrote:

> Thanks Naama,
>
> Looks like I need to read a lot! :-)
>
> I think a VCF or GVF loader would be a good project to help
> standardize usage. There was some talk on the VCF mailing list about
> including phenotype information per #SAMPLE in that format, which I
> think would be a big help too.
>
>
> Thanks again for the info and links,
> Dan.
>
> On 4 February 2012 02:40, Naama Menda <[hidden email]> wrote:
>> hi Dan (Im CC'ing the gmod schema and phendiver. lists)
>>
>> the phenotype module has not been modified yet, except for adding a nullable
>> 'name' field.
>> I think it has been working out for most people, but the idea is to make it
>> more normalized, like the rest of Chado's modules, eliminating the multiple
>> columns linking to 'cvterm' ,
>> and replacing with other linking tables that would provide more structured
>> way for storing phenotypes, having only the phenotype measurement in the
>> phenotype table, and factoring out the semantics.
>> Some people probably disagree with this approach, and I think the main
>> problem is the broad definition of a phenotype, and the multiple ways for
>> storing a phenotype in Chado.
>>
>> We had long long discussions about post-composing  . You can read some notes
>> here http://gmod.org/wiki/Chado_Natural_Diversity_Module/natdiv_schema_changes_call
>> and here http://sourceforge.net/mailarchive/message.php?msg_id=27597482
>>
>> There are no formal loaders for Natural Diversity schema, simply because
>> there are many different ways to load your custom data.
>> You can see some examples from the data loaded in SGN here
>> https://github.com/solgenomics/Phenome/blob/master/bin/loading_scripts/solcap/load_solcap_TA_phenotypes.pl
>>
>> As you can see this is very data-specific, and makes many assumptions on
>> your experiment design and metadata.
>>
>> Likewise, there is no formal way for storing QTL data. In general, you want
>> to store your accessions in the stock table, create a new nd_experiment for
>> each measurement, and link it with the stock via nd_experiment_stock and to
>> the genotype via nd_experiment_genotype.
>> The genotype table is mostly a spaceholder for a genotype name, and links to
>> the feature table where you can load a marker, or whatever feature you
>> need.
>>
>> The Natural Diversity paper does not elaborate on storing genotypes since it
>> describes how to store experimental data (genotyping or phenotyping) in a
>> generic and re-usable way. Genotypes and phenotypes are stored in different
>> modules, which link back to the ND schema.
>> It's power is in the ability to go back and forth from a 'stock' to its
>> genotyping and phenotyping data, or generate new stocks from
>> pheno/genotyping experiments.
>> The examples in the paper talk mostly about phenotypes, since these are much
>> more complex to handle than genotypes.
>>
>> The schema is a bit complicated, but after a year of discussions the working
>> group came up with this model which seems to be generic enough to accomodate
>> all the use cases we brought up.
>>
>> You can add your use case to the ND module wiki page, and see from there how
>> it fits in (http://gmod.org/wiki/Chado_Natural_Diversity_Module#Use_Cases)
>> and get more feedback from the mailing list.
>>
>> Hope this makes things a bit clearer!
>> -Naama
>>
>>
>>
>> On Fri, Feb 3, 2012 at 6:18 PM, Dan Bolser <[hidden email]> wrote:
>>>
>>> Hi,
>>>
>>> I'm not clear weather the update to the phenotype module has been done
>>> or not? In general, does the old phenotype stuff do what's needed?
>>>
>>> BTW, where can I read more about 'post-composed' terms? I tried
>>> Goggle, but couldn't find any good reference material... I'm
>>> interested in the examples of combining different ontologies into the
>>> description of a single value.
>>>
>>> Do you have VCF / GVF loaders written for the schema?
>>>
>>> I'm still not clear how 'QTL data' is stored... are the results of QTL
>>> algorithms just that wiggly line? i.e. trivial to store? (Sorry for
>>> confusion).
>>>
>>> I made some updates on the wiki page. I didn't realize it until I
>>> started to try to re-work it, but the abstract of the paper doesn't
>>> mention storing genotype data in the database. It focuses on
>>> describing 'experiments'. Perhaps you could take a look and improve
>>> what I wrote there:
>>>
>>> http://gmod.org/wiki/Chado_Natural_Diversity_Module_Working_Group
>>>
>>>
>>> In the mean time I'll join the mailing list for the working group.
>>>
>>>
>>> Thanks again for the infos.,
>>>
>>> Dan.
>>>
>>> On 3 February 2012 12:19, Isaak Yosief Tecle <[hidden email]> wrote:
>>> > Hi Dan,
>>> >
>>> > SGN is now using a db schema called Natural Diversity to store genetic
>>> > and phenotypic variation data. It is a GMOD/Chado schema developed by
>>> > SGN and collaborators. It is also used by other multiple databases which
>>> > you will find listed in the publication below.
>>> >
>>> > A documentation of the schema:
>>> > http://database.oxfordjournals.org/content/2011/bar051.full
>>> >
>>> > Some info on the working group:
>>> > http://gmod.org/wiki/Chado_Natural_Diversity_Module_Working_Group
>>> >
>>> > Mailing lists:
>>> > https://lists.sourceforge.net/lists/listinfo/gmod-phendiver
>>> > https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>> >
>>> >
>>> > Cheers,
>>> > Isaak
>>> >
>>> >
>>> >
>>> > On 2/3/12 12:06 PM, Dan Bolser wrote:
>>> >> Hi guys,
>>> >>
>>> >> I'm now working at 'Ensembl Genomes' a project leader of the plants
>>> >> division. We're looking to develop plant bifx infrastructure as part
>>> >> of a grant called transPLANT. Part of the work involves building a
>>> >> 'variation archive' for plants, which is obviously related to strain
>>> >> phenotyping information, population studies, and therefore, derived
>>> >> QTL data...
>>> >>
>>> >> I was wondering how you store your QTL data, and if there are any
>>> >> standards, emerging or defined, that we should be thinking about?
>>> >>
>>> >> I think it will be really good if everything we develop from the
>>> >> Ensembl side is coordinated with equivalent developments coming from
>>> >> the Chado/GMoD side, so it would be really great to work with you (as
>>> >> Chado/GMoD users) to ensure that that happens.
>>> >>
>>> >>
>>> >> Cheers,
>>> >> Dan.
>>> >> .
>>> >>
>>> >
>>> > --
>>> > -------------------------------------
>>> > Isaak Yosief Tecle, PhD
>>> >
>>> > Bioinformatics Consultant
>>> > to Sol Genomics Network
>>> >
>>> > Boyce Thompson Institute
>>> > Cornell University
>>> >
>>> > http://sgn.cornell.edu
>>> > -------------------------------------
>>> >
>>> >
>>> >
>>> >
>>> > _______________________________________________
>>> > Cxgn-devel mailing list
>>> > [hidden email]
>>> > http://rubisco.sgn.cornell.edu/cgi-bin/mailman/listinfo/cxgn-devel
>>> _______________________________________________
>>> Cxgn-devel mailing list
>>> [hidden email]
>>> http://rubisco.sgn.cornell.edu/cgi-bin/mailman/listinfo/cxgn-devel
>>
>>
>>
>> _______________________________________________
>> Cxgn-devel mailing list
>> [hidden email]
>> http://rubisco.sgn.cornell.edu/cgi-bin/mailman/listinfo/cxgn-devel
>>
>
> ------------------------------------------------------------------------------
> Try before you buy = See our experts in action!
> The most comprehensive online learning library for Microsoft developers
> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
> Metro Style Apps, more. Free future releases when you subscribe now!
> http://p.sf.net/sfu/learndevnow-dev2
> _______________________________________________
> Gmod-phendiver mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gmod-phendiver

------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
Gmod-phendiver mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-phendiver
Reply | Threaded
Open this post in threaded view
|

Re: [Gmod-phendiver] [Cxgn-devel] Information about QTL formats?

Dan Bolser
Hi Bob,

We're looking to build (like everyone else) a scaleable data archive
for large-scale genotyping / phenotyping projects. Currently there is
no 'phenotype archive' or even phenotype file format (that I know of),
and it seems to make sense to marry phenotype and genotype data in one
place (and to do that in a 'standard' way wherever possible).

I think it's important to look at all available solutions before
making a decision about what to do and how to do it, Ensembl variation
included.

I've just been hearing about ISA-tab in the context of the new
BioSamples database at the EBI, which is part of the solution, but I
don't know much about those formats yet TBH.

Currently, I'm thinking that we need to bring in ontologies, terms, or
URIs to explain:

1) experiment
2) measurement
3) phenotype
4) attribute
5) environment
6) individual

and combine one term for every 'value' recorded in the phenotyping
database, and then link those to SNPs via the individual. Although the
SNPs could be stored in Ensembl variation or VCF or GVF, the main
issue is to keep track of the 'individual' via some, probably
external, accession number. How many samples can you cram into Ensembl
variation?

If this is being done in phendiver / chado, I'm keen to learn a) how,
b) who, and c) performance.

I'm not part of the VectorBase project, but I guess Paul K is? It
would perhaps be good to get in a room with Paul and you at some point
(I'm free at most times, but have some specific appointments).

This is work that were hoping to do in the near future, but we're not
actively working on it yet. I'm trying to get as much background
information as I can in the mean time.


Cheers,
Dan.

On 7 February 2012 11:34, Bob MacCallum <[hidden email]> wrote:

> Hi Dan,
> I guess you considered Ensembl variation for the SNPs.  It would be
> interesting to hear why you're looking into other options.  At
> VectorBase we're hoping Ensembl will handle our genomic needs while
> Chado handles the more unpredictable experimental data, phenotypes and
> sample collection meta-data more flexibly.  Obviously we have to
> bridge the two but think (hope) that is doable.
> I'll be up at EBI Weds-Fri this week with the VectorBase all-hands
> meeting, so maybe we could meet?
> (I'll summarise for the mailing list as appropriate.)
> For the record, we have written a prototype ISA-Tab ->
> Bio::Chado::Schema -> Chado loader (and in the other direction a web
> service and AJAX-heavy web interface
> http://funcgen.vectorbase.org/PopulationBETA/), but it's likely to go
> through some changes and is regrettably quite VectorBase specific at
> the moment.
> cheers,
> Bob.
>
> On Sat, Feb 4, 2012 at 3:13 PM, Dan Bolser <[hidden email]> wrote:
>> Thanks Naama,
>>
>> Looks like I need to read a lot! :-)
>>
>> I think a VCF or GVF loader would be a good project to help
>> standardize usage. There was some talk on the VCF mailing list about
>> including phenotype information per #SAMPLE in that format, which I
>> think would be a big help too.
>>
>>
>> Thanks again for the info and links,
>> Dan.
>>
>> On 4 February 2012 02:40, Naama Menda <[hidden email]> wrote:
>>> hi Dan (Im CC'ing the gmod schema and phendiver. lists)
>>>
>>> the phenotype module has not been modified yet, except for adding a nullable
>>> 'name' field.
>>> I think it has been working out for most people, but the idea is to make it
>>> more normalized, like the rest of Chado's modules, eliminating the multiple
>>> columns linking to 'cvterm' ,
>>> and replacing with other linking tables that would provide more structured
>>> way for storing phenotypes, having only the phenotype measurement in the
>>> phenotype table, and factoring out the semantics.
>>> Some people probably disagree with this approach, and I think the main
>>> problem is the broad definition of a phenotype, and the multiple ways for
>>> storing a phenotype in Chado.
>>>
>>> We had long long discussions about post-composing  . You can read some notes
>>> here http://gmod.org/wiki/Chado_Natural_Diversity_Module/natdiv_schema_changes_call
>>> and here http://sourceforge.net/mailarchive/message.php?msg_id=27597482
>>>
>>> There are no formal loaders for Natural Diversity schema, simply because
>>> there are many different ways to load your custom data.
>>> You can see some examples from the data loaded in SGN here
>>> https://github.com/solgenomics/Phenome/blob/master/bin/loading_scripts/solcap/load_solcap_TA_phenotypes.pl
>>>
>>> As you can see this is very data-specific, and makes many assumptions on
>>> your experiment design and metadata.
>>>
>>> Likewise, there is no formal way for storing QTL data. In general, you want
>>> to store your accessions in the stock table, create a new nd_experiment for
>>> each measurement, and link it with the stock via nd_experiment_stock and to
>>> the genotype via nd_experiment_genotype.
>>> The genotype table is mostly a spaceholder for a genotype name, and links to
>>> the feature table where you can load a marker, or whatever feature you
>>> need.
>>>
>>> The Natural Diversity paper does not elaborate on storing genotypes since it
>>> describes how to store experimental data (genotyping or phenotyping) in a
>>> generic and re-usable way. Genotypes and phenotypes are stored in different
>>> modules, which link back to the ND schema.
>>> It's power is in the ability to go back and forth from a 'stock' to its
>>> genotyping and phenotyping data, or generate new stocks from
>>> pheno/genotyping experiments.
>>> The examples in the paper talk mostly about phenotypes, since these are much
>>> more complex to handle than genotypes.
>>>
>>> The schema is a bit complicated, but after a year of discussions the working
>>> group came up with this model which seems to be generic enough to accomodate
>>> all the use cases we brought up.
>>>
>>> You can add your use case to the ND module wiki page, and see from there how
>>> it fits in (http://gmod.org/wiki/Chado_Natural_Diversity_Module#Use_Cases)
>>> and get more feedback from the mailing list.
>>>
>>> Hope this makes things a bit clearer!
>>> -Naama
>>>
>>>
>>>
>>> On Fri, Feb 3, 2012 at 6:18 PM, Dan Bolser <[hidden email]> wrote:
>>>>
>>>> Hi,
>>>>
>>>> I'm not clear weather the update to the phenotype module has been done
>>>> or not? In general, does the old phenotype stuff do what's needed?
>>>>
>>>> BTW, where can I read more about 'post-composed' terms? I tried
>>>> Goggle, but couldn't find any good reference material... I'm
>>>> interested in the examples of combining different ontologies into the
>>>> description of a single value.
>>>>
>>>> Do you have VCF / GVF loaders written for the schema?
>>>>
>>>> I'm still not clear how 'QTL data' is stored... are the results of QTL
>>>> algorithms just that wiggly line? i.e. trivial to store? (Sorry for
>>>> confusion).
>>>>
>>>> I made some updates on the wiki page. I didn't realize it until I
>>>> started to try to re-work it, but the abstract of the paper doesn't
>>>> mention storing genotype data in the database. It focuses on
>>>> describing 'experiments'. Perhaps you could take a look and improve
>>>> what I wrote there:
>>>>
>>>> http://gmod.org/wiki/Chado_Natural_Diversity_Module_Working_Group
>>>>
>>>>
>>>> In the mean time I'll join the mailing list for the working group.
>>>>
>>>>
>>>> Thanks again for the infos.,
>>>>
>>>> Dan.
>>>>
>>>> On 3 February 2012 12:19, Isaak Yosief Tecle <[hidden email]> wrote:
>>>> > Hi Dan,
>>>> >
>>>> > SGN is now using a db schema called Natural Diversity to store genetic
>>>> > and phenotypic variation data. It is a GMOD/Chado schema developed by
>>>> > SGN and collaborators. It is also used by other multiple databases which
>>>> > you will find listed in the publication below.
>>>> >
>>>> > A documentation of the schema:
>>>> > http://database.oxfordjournals.org/content/2011/bar051.full
>>>> >
>>>> > Some info on the working group:
>>>> > http://gmod.org/wiki/Chado_Natural_Diversity_Module_Working_Group
>>>> >
>>>> > Mailing lists:
>>>> > https://lists.sourceforge.net/lists/listinfo/gmod-phendiver
>>>> > https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>>> >
>>>> >
>>>> > Cheers,
>>>> > Isaak
>>>> >
>>>> >
>>>> >
>>>> > On 2/3/12 12:06 PM, Dan Bolser wrote:
>>>> >> Hi guys,
>>>> >>
>>>> >> I'm now working at 'Ensembl Genomes' a project leader of the plants
>>>> >> division. We're looking to develop plant bifx infrastructure as part
>>>> >> of a grant called transPLANT. Part of the work involves building a
>>>> >> 'variation archive' for plants, which is obviously related to strain
>>>> >> phenotyping information, population studies, and therefore, derived
>>>> >> QTL data...
>>>> >>
>>>> >> I was wondering how you store your QTL data, and if there are any
>>>> >> standards, emerging or defined, that we should be thinking about?
>>>> >>
>>>> >> I think it will be really good if everything we develop from the
>>>> >> Ensembl side is coordinated with equivalent developments coming from
>>>> >> the Chado/GMoD side, so it would be really great to work with you (as
>>>> >> Chado/GMoD users) to ensure that that happens.
>>>> >>
>>>> >>
>>>> >> Cheers,
>>>> >> Dan.
>>>> >> .
>>>> >>
>>>> >
>>>> > --
>>>> > -------------------------------------
>>>> > Isaak Yosief Tecle, PhD
>>>> >
>>>> > Bioinformatics Consultant
>>>> > to Sol Genomics Network
>>>> >
>>>> > Boyce Thompson Institute
>>>> > Cornell University
>>>> >
>>>> > http://sgn.cornell.edu
>>>> > -------------------------------------
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > _______________________________________________
>>>> > Cxgn-devel mailing list
>>>> > [hidden email]
>>>> > http://rubisco.sgn.cornell.edu/cgi-bin/mailman/listinfo/cxgn-devel
>>>> _______________________________________________
>>>> Cxgn-devel mailing list
>>>> [hidden email]
>>>> http://rubisco.sgn.cornell.edu/cgi-bin/mailman/listinfo/cxgn-devel
>>>
>>>
>>>
>>> _______________________________________________
>>> Cxgn-devel mailing list
>>> [hidden email]
>>> http://rubisco.sgn.cornell.edu/cgi-bin/mailman/listinfo/cxgn-devel
>>>
>>
>> ------------------------------------------------------------------------------
>> Try before you buy = See our experts in action!
>> The most comprehensive online learning library for Microsoft developers
>> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
>> Metro Style Apps, more. Free future releases when you subscribe now!
>> http://p.sf.net/sfu/learndevnow-dev2
>> _______________________________________________
>> Gmod-phendiver mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/gmod-phendiver
>
> ------------------------------------------------------------------------------
> Keep Your Developer Skills Current with LearnDevNow!
> The most comprehensive online learning library for Microsoft developers
> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
> Metro Style Apps, more. Free future releases when you subscribe now!
> http://p.sf.net/sfu/learndevnow-d2d
> _______________________________________________
> Gmod-phendiver mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gmod-phendiver

------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
Gmod-phendiver mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-phendiver
Reply | Threaded
Open this post in threaded view
|

Re: [Gmod-phendiver] [Gmod-schema] [Cxgn-devel] Information about QTL formats?

Collett, James R-2
I'd just like to add that we, too, would be interested in an integrated genotype/phenotype data management solution.

We're doing mutagenesis, genome resequencing, RNA-Seq, and phenotype profiling of filamentous fungi for biofuel and bioproduct development. We'd like to move to more automated, high-throughput scale data generation and management as sequencing costs continue to drop.

In my limited experience with Chado so far, I've found its data loaders to be slow.  Will the speed of data I/O in Chado be a bottleneck in the move from individual model organism data management to high-throughput resequencing/expression/phenotyping?

Thanks,

Jim

__________________________________________________
James R. Collett, Ph.D.
Senior Scientist
Chemical and Biological Process Development Group
Energy and Environment Directorate

Pacific Northwest National Laboratory
902 Battelle Boulevard
P.O. Box 999, MSIN P8-60
Richland, WA  99352 USA




> -----Original Message-----
> From: Dan Bolser [mailto:[hidden email]]
> Sent: Tuesday, February 07, 2012 5:26 AM
> To: Bob MacCallum
> Cc: cxgn-devel; gmod schema; gmod-phendiver
> Subject: Re: [Gmod-schema] [Gmod-phendiver] [Cxgn-devel] Information
> about QTL formats?
>
> Hi Bob,
>
> We're looking to build (like everyone else) a scaleable data archive
> for large-scale genotyping / phenotyping projects. Currently there is
> no 'phenotype archive' or even phenotype file format (that I know of),
> and it seems to make sense to marry phenotype and genotype data in one
> place (and to do that in a 'standard' way wherever possible).
>
> I think it's important to look at all available solutions before making
> a decision about what to do and how to do it, Ensembl variation
> included.
>
> I've just been hearing about ISA-tab in the context of the new
> BioSamples database at the EBI, which is part of the solution, but I
> don't know much about those formats yet TBH.
>
> Currently, I'm thinking that we need to bring in ontologies, terms, or
> URIs to explain:
>
> 1) experiment
> 2) measurement
> 3) phenotype
> 4) attribute
> 5) environment
> 6) individual
>
> and combine one term for every 'value' recorded in the phenotyping
> database, and then link those to SNPs via the individual. Although the
> SNPs could be stored in Ensembl variation or VCF or GVF, the main issue
> is to keep track of the 'individual' via some, probably external,
> accession number. How many samples can you cram into Ensembl variation?
>
> If this is being done in phendiver / chado, I'm keen to learn a) how,
> b) who, and c) performance.
>
> I'm not part of the VectorBase project, but I guess Paul K is? It would
> perhaps be good to get in a room with Paul and you at some point (I'm
> free at most times, but have some specific appointments).
>
> This is work that were hoping to do in the near future, but we're not
> actively working on it yet. I'm trying to get as much background
> information as I can in the mean time.
>
>
> Cheers,
> Dan.
>
> On 7 February 2012 11:34, Bob MacCallum <[hidden email]>
> wrote:
> > Hi Dan,
> > I guess you considered Ensembl variation for the SNPs.  It would be
> > interesting to hear why you're looking into other options.  At
> > VectorBase we're hoping Ensembl will handle our genomic needs while
> > Chado handles the more unpredictable experimental data, phenotypes
> and
> > sample collection meta-data more flexibly.  Obviously we have to
> > bridge the two but think (hope) that is doable.
> > I'll be up at EBI Weds-Fri this week with the VectorBase all-hands
> > meeting, so maybe we could meet?
> > (I'll summarise for the mailing list as appropriate.) For the record,
> > we have written a prototype ISA-Tab -> Bio::Chado::Schema -> Chado
> > loader (and in the other direction a web service and AJAX-heavy web
> > interface http://funcgen.vectorbase.org/PopulationBETA/), but it's
> > likely to go through some changes and is regrettably quite VectorBase
> > specific at the moment.
> > cheers,
> > Bob.
> >
> > On Sat, Feb 4, 2012 at 3:13 PM, Dan Bolser <[hidden email]> wrote:
> >> Thanks Naama,
> >>
> >> Looks like I need to read a lot! :-)
> >>
> >> I think a VCF or GVF loader would be a good project to help
> >> standardize usage. There was some talk on the VCF mailing list about
> >> including phenotype information per #SAMPLE in that format, which I
> >> think would be a big help too.
> >>
> >>
> >> Thanks again for the info and links,
> >> Dan.
> >>
> >> On 4 February 2012 02:40, Naama Menda <[hidden email]> wrote:
> >>> hi Dan (Im CC'ing the gmod schema and phendiver. lists)
> >>>
> >>> the phenotype module has not been modified yet, except for adding a
> >>> nullable 'name' field.
> >>> I think it has been working out for most people, but the idea is to
> >>> make it more normalized, like the rest of Chado's modules,
> >>> eliminating the multiple columns linking to 'cvterm' , and
> replacing
> >>> with other linking tables that would provide more structured way
> for
> >>> storing phenotypes, having only the phenotype measurement in the
> >>> phenotype table, and factoring out the semantics.
> >>> Some people probably disagree with this approach, and I think the
> >>> main problem is the broad definition of a phenotype, and the
> >>> multiple ways for storing a phenotype in Chado.
> >>>
> >>> We had long long discussions about post-composing  . You can read
> >>> some notes here
> >>>
> http://gmod.org/wiki/Chado_Natural_Diversity_Module/natdiv_schema_ch
> >>> anges_call and here
> >>> http://sourceforge.net/mailarchive/message.php?msg_id=27597482
> >>>
> >>> There are no formal loaders for Natural Diversity schema, simply
> >>> because there are many different ways to load your custom data.
> >>> You can see some examples from the data loaded in SGN here
> >>>
> https://github.com/solgenomics/Phenome/blob/master/bin/loading_scrip
> >>> ts/solcap/load_solcap_TA_phenotypes.pl
> >>>
> >>> As you can see this is very data-specific, and makes many
> >>> assumptions on your experiment design and metadata.
> >>>
> >>> Likewise, there is no formal way for storing QTL data. In general,
> >>> you want to store your accessions in the stock table, create a new
> >>> nd_experiment for each measurement, and link it with the stock via
> >>> nd_experiment_stock and to the genotype via nd_experiment_genotype.
> >>> The genotype table is mostly a spaceholder for a genotype name, and
> >>> links to the feature table where you can load a marker, or whatever
> >>> feature you need.
> >>>
> >>> The Natural Diversity paper does not elaborate on storing genotypes
> >>> since it describes how to store experimental data (genotyping or
> >>> phenotyping) in a generic and re-usable way. Genotypes and
> >>> phenotypes are stored in different modules, which link back to the
> ND schema.
> >>> It's power is in the ability to go back and forth from a 'stock' to
> >>> its genotyping and phenotyping data, or generate new stocks from
> >>> pheno/genotyping experiments.
> >>> The examples in the paper talk mostly about phenotypes, since these
> >>> are much more complex to handle than genotypes.
> >>>
> >>> The schema is a bit complicated, but after a year of discussions
> the
> >>> working group came up with this model which seems to be generic
> >>> enough to accomodate all the use cases we brought up.
> >>>
> >>> You can add your use case to the ND module wiki page, and see from
> >>> there how it fits in
> >>> (http://gmod.org/wiki/Chado_Natural_Diversity_Module#Use_Cases)
> >>> and get more feedback from the mailing list.
> >>>
> >>> Hope this makes things a bit clearer!
> >>> -Naama
> >>>
> >>>
> >>>
> >>> On Fri, Feb 3, 2012 at 6:18 PM, Dan Bolser <[hidden email]>
> wrote:
> >>>>
> >>>> Hi,
> >>>>
> >>>> I'm not clear weather the update to the phenotype module has been
> >>>> done or not? In general, does the old phenotype stuff do what's
> needed?
> >>>>
> >>>> BTW, where can I read more about 'post-composed' terms? I tried
> >>>> Goggle, but couldn't find any good reference material... I'm
> >>>> interested in the examples of combining different ontologies into
> >>>> the description of a single value.
> >>>>
> >>>> Do you have VCF / GVF loaders written for the schema?
> >>>>
> >>>> I'm still not clear how 'QTL data' is stored... are the results of
> >>>> QTL algorithms just that wiggly line? i.e. trivial to store?
> (Sorry
> >>>> for confusion).
> >>>>
> >>>> I made some updates on the wiki page. I didn't realize it until I
> >>>> started to try to re-work it, but the abstract of the paper
> doesn't
> >>>> mention storing genotype data in the database. It focuses on
> >>>> describing 'experiments'. Perhaps you could take a look and
> improve
> >>>> what I wrote there:
> >>>>
> >>>> http://gmod.org/wiki/Chado_Natural_Diversity_Module_Working_Group
> >>>>
> >>>>
> >>>> In the mean time I'll join the mailing list for the working group.
> >>>>
> >>>>
> >>>> Thanks again for the infos.,
> >>>>
> >>>> Dan.
> >>>>
> >>>> On 3 February 2012 12:19, Isaak Yosief Tecle <[hidden email]>
> wrote:
> >>>> > Hi Dan,
> >>>> >
> >>>> > SGN is now using a db schema called Natural Diversity to store
> >>>> > genetic and phenotypic variation data. It is a GMOD/Chado schema
> >>>> > developed by SGN and collaborators. It is also used by other
> >>>> > multiple databases which you will find listed in the publication
> below.
> >>>> >
> >>>> > A documentation of the schema:
> >>>> > http://database.oxfordjournals.org/content/2011/bar051.full
> >>>> >
> >>>> > Some info on the working group:
> >>>> >
> http://gmod.org/wiki/Chado_Natural_Diversity_Module_Working_Group
> >>>> >
> >>>> > Mailing lists:
> >>>> > https://lists.sourceforge.net/lists/listinfo/gmod-phendiver
> >>>> > https://lists.sourceforge.net/lists/listinfo/gmod-schema
> >>>> >
> >>>> >
> >>>> > Cheers,
> >>>> > Isaak
> >>>> >
> >>>> >
> >>>> >
> >>>> > On 2/3/12 12:06 PM, Dan Bolser wrote:
> >>>> >> Hi guys,
> >>>> >>
> >>>> >> I'm now working at 'Ensembl Genomes' a project leader of the
> >>>> >> plants division. We're looking to develop plant bifx
> >>>> >> infrastructure as part of a grant called transPLANT. Part of
> the
> >>>> >> work involves building a 'variation archive' for plants, which
> >>>> >> is obviously related to strain phenotyping information,
> >>>> >> population studies, and therefore, derived QTL data...
> >>>> >>
> >>>> >> I was wondering how you store your QTL data, and if there are
> >>>> >> any standards, emerging or defined, that we should be thinking
> about?
> >>>> >>
> >>>> >> I think it will be really good if everything we develop from
> the
> >>>> >> Ensembl side is coordinated with equivalent developments coming
> >>>> >> from the Chado/GMoD side, so it would be really great to work
> >>>> >> with you (as Chado/GMoD users) to ensure that that happens.
> >>>> >>
> >>>> >>
> >>>> >> Cheers,
> >>>> >> Dan.
> >>>> >> .
> >>>> >>
> >>>> >
> >>>> > --
> >>>> > -------------------------------------
> >>>> > Isaak Yosief Tecle, PhD
> >>>> >
> >>>> > Bioinformatics Consultant
> >>>> > to Sol Genomics Network
> >>>> >
> >>>> > Boyce Thompson Institute
> >>>> > Cornell University
> >>>> >
> >>>> > http://sgn.cornell.edu
> >>>> > -------------------------------------
> >>>> >
> >>>> >
> >>>> >
> >>>> >
> >>>> > _______________________________________________
> >>>> > Cxgn-devel mailing list
> >>>> > [hidden email]
> >>>> > http://rubisco.sgn.cornell.edu/cgi-bin/mailman/listinfo/cxgn-
> deve
> >>>> > l
> >>>> _______________________________________________
> >>>> Cxgn-devel mailing list
> >>>> [hidden email]
> >>>> http://rubisco.sgn.cornell.edu/cgi-bin/mailman/listinfo/cxgn-devel
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> Cxgn-devel mailing list
> >>> [hidden email]
> >>> http://rubisco.sgn.cornell.edu/cgi-bin/mailman/listinfo/cxgn-devel
> >>>
> >>
> >> --------------------------------------------------------------------
> -
> >> --------- Try before you buy = See our experts in action!
> >> The most comprehensive online learning library for Microsoft
> >> developers is just $99.99! Visual Studio, SharePoint, SQL - plus
> >> HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when
> you subscribe now!
> >> http://p.sf.net/sfu/learndevnow-dev2
> >> _______________________________________________
> >> Gmod-phendiver mailing list
> >> [hidden email]
> >> https://lists.sourceforge.net/lists/listinfo/gmod-phendiver
> >
> > ---------------------------------------------------------------------
> -
> > -------- Keep Your Developer Skills Current with LearnDevNow!
> > The most comprehensive online learning library for Microsoft
> > developers is just $99.99! Visual Studio, SharePoint, SQL - plus
> > HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when
> you subscribe now!
> > http://p.sf.net/sfu/learndevnow-d2d
> > _______________________________________________
> > Gmod-phendiver mailing list
> > [hidden email]
> > https://lists.sourceforge.net/lists/listinfo/gmod-phendiver
>
> -----------------------------------------------------------------------
> -------
> Keep Your Developer Skills Current with LearnDevNow!
> The most comprehensive online learning library for Microsoft developers
> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3,
> MVC3, Metro Style Apps, more. Free future releases when you subscribe
> now!
> http://p.sf.net/sfu/learndevnow-d2d
> _______________________________________________
> Gmod-schema mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gmod-schema

------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
Gmod-phendiver mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-phendiver
Reply | Threaded
Open this post in threaded view
|

Re: [Gmod-phendiver] [Gmod-schema] [Cxgn-devel] Information about QTL formats?

seth redmond-2
In reply to this post by Dan Bolser
Dan, 

When arranging the VB system we largely sidestepped the problem of genotypes since we were already committed to supporting ensembl variation. When I last looked into it (around five months ago now), ens-var was more than capable of handling a large number of individuals  but at the cost of some curation if you want to make them readable - e.g. it's trivial enough to import VCFs, but there's as yet no facility for rearranging these into different population groupings without just reloading from different VCFs. Paul Derwent could tell you more about this (I've now left VB, but I could also dig around for my notes if this would be useful). 

The genotype module attached to phendiver is somewhat underdeveloped for our purposes and I think this would be a good target for those interested to aim at next. Personally the idea of tracking a full snp-chip or illumina run's worth of genotypes in phendiver unnerves me, but I'd be interested to hear how Naama got on with it? i.e. how many variant loci you're SOL deals with?

Finally you may well have encountered these already, but looking through the list of things you need terms for, there are some general ontologies that a number of us are using for these: most obviously PATO for the phenotypes, UO for the measurement/units and GAZ (gazeteer) for the locations.

Bob can fill you in better than I can on the ISA-tab loaders, but if there's anything I can help with let me know.

-s



--
Seth Redmond
  Unité Génetique et Génomique des Insectes Vecteurs
  Institut Pasteur
  28,rue du Dr Roux
  75724 PARIS

On 7 Feb 2012, at 14:25, Dan Bolser wrote:

Hi Bob,

We're looking to build (like everyone else) a scaleable data archive
for large-scale genotyping / phenotyping projects. Currently there is
no 'phenotype archive' or even phenotype file format (that I know of),
and it seems to make sense to marry phenotype and genotype data in one
place (and to do that in a 'standard' way wherever possible).

I think it's important to look at all available solutions before
making a decision about what to do and how to do it, Ensembl variation
included.

I've just been hearing about ISA-tab in the context of the new
BioSamples database at the EBI, which is part of the solution, but I
don't know much about those formats yet TBH.

Currently, I'm thinking that we need to bring in ontologies, terms, or
URIs to explain:

1) experiment
2) measurement
3) phenotype
4) attribute
5) environment
6) individual

and combine one term for every 'value' recorded in the phenotyping
database, and then link those to SNPs via the individual. Although the
SNPs could be stored in Ensembl variation or VCF or GVF, the main
issue is to keep track of the 'individual' via some, probably
external, accession number. How many samples can you cram into Ensembl
variation?

If this is being done in phendiver / chado, I'm keen to learn a) how,
b) who, and c) performance.

I'm not part of the VectorBase project, but I guess Paul K is? It
would perhaps be good to get in a room with Paul and you at some point
(I'm free at most times, but have some specific appointments).

This is work that were hoping to do in the near future, but we're not
actively working on it yet. I'm trying to get as much background
information as I can in the mean time.


Cheers,
Dan.

On 7 February 2012 11:34, Bob MacCallum <[hidden email]> wrote:
Hi Dan,
I guess you considered Ensembl variation for the SNPs.  It would be
interesting to hear why you're looking into other options.  At
VectorBase we're hoping Ensembl will handle our genomic needs while
Chado handles the more unpredictable experimental data, phenotypes and
sample collection meta-data more flexibly.  Obviously we have to
bridge the two but think (hope) that is doable.
I'll be up at EBI Weds-Fri this week with the VectorBase all-hands
meeting, so maybe we could meet?
(I'll summarise for the mailing list as appropriate.)
For the record, we have written a prototype ISA-Tab ->
Bio::Chado::Schema -> Chado loader (and in the other direction a web
service and AJAX-heavy web interface
http://funcgen.vectorbase.org/PopulationBETA/), but it's likely to go
through some changes and is regrettably quite VectorBase specific at
the moment.
cheers,
Bob.

On Sat, Feb 4, 2012 at 3:13 PM, Dan Bolser <[hidden email]> wrote:
Thanks Naama,

Looks like I need to read a lot! :-)

I think a VCF or GVF loader would be a good project to help
standardize usage. There was some talk on the VCF mailing list about
including phenotype information per #SAMPLE in that format, which I
think would be a big help too.


Thanks again for the info and links,
Dan.

On 4 February 2012 02:40, Naama Menda <[hidden email]> wrote:
hi Dan (Im CC'ing the gmod schema and phendiver. lists)

the phenotype module has not been modified yet, except for adding a nullable
'name' field.
I think it has been working out for most people, but the idea is to make it
more normalized, like the rest of Chado's modules, eliminating the multiple
columns linking to 'cvterm' ,
and replacing with other linking tables that would provide more structured
way for storing phenotypes, having only the phenotype measurement in the
phenotype table, and factoring out the semantics.
Some people probably disagree with this approach, and I think the main
problem is the broad definition of a phenotype, and the multiple ways for
storing a phenotype in Chado.

We had long long discussions about post-composing  . You can read some notes
here http://gmod.org/wiki/Chado_Natural_Diversity_Module/natdiv_schema_changes_call
and here http://sourceforge.net/mailarchive/message.php?msg_id=27597482

There are no formal loaders for Natural Diversity schema, simply because
there are many different ways to load your custom data.
You can see some examples from the data loaded in SGN here
https://github.com/solgenomics/Phenome/blob/master/bin/loading_scripts/solcap/load_solcap_TA_phenotypes.pl

As you can see this is very data-specific, and makes many assumptions on
your experiment design and metadata.

Likewise, there is no formal way for storing QTL data. In general, you want
to store your accessions in the stock table, create a new nd_experiment for
each measurement, and link it with the stock via nd_experiment_stock and to
the genotype via nd_experiment_genotype.
The genotype table is mostly a spaceholder for a genotype name, and links to
the feature table where you can load a marker, or whatever feature you
need.

The Natural Diversity paper does not elaborate on storing genotypes since it
describes how to store experimental data (genotyping or phenotyping) in a
generic and re-usable way. Genotypes and phenotypes are stored in different
modules, which link back to the ND schema.
It's power is in the ability to go back and forth from a 'stock' to its
genotyping and phenotyping data, or generate new stocks from
pheno/genotyping experiments.
The examples in the paper talk mostly about phenotypes, since these are much
more complex to handle than genotypes.

The schema is a bit complicated, but after a year of discussions the working
group came up with this model which seems to be generic enough to accomodate
all the use cases we brought up.

You can add your use case to the ND module wiki page, and see from there how
it fits in (http://gmod.org/wiki/Chado_Natural_Diversity_Module#Use_Cases)
and get more feedback from the mailing list.

Hope this makes things a bit clearer!
-Naama



On Fri, Feb 3, 2012 at 6:18 PM, Dan Bolser <[hidden email]> wrote:

Hi,

I'm not clear weather the update to the phenotype module has been done
or not? In general, does the old phenotype stuff do what's needed?

BTW, where can I read more about 'post-composed' terms? I tried
Goggle, but couldn't find any good reference material... I'm
interested in the examples of combining different ontologies into the
description of a single value.

Do you have VCF / GVF loaders written for the schema?

I'm still not clear how 'QTL data' is stored... are the results of QTL
algorithms just that wiggly line? i.e. trivial to store? (Sorry for
confusion).

I made some updates on the wiki page. I didn't realize it until I
started to try to re-work it, but the abstract of the paper doesn't
mention storing genotype data in the database. It focuses on
describing 'experiments'. Perhaps you could take a look and improve
what I wrote there:

http://gmod.org/wiki/Chado_Natural_Diversity_Module_Working_Group


In the mean time I'll join the mailing list for the working group.


Thanks again for the infos.,

Dan.

On 3 February 2012 12:19, Isaak Yosief Tecle <[hidden email]> wrote:
Hi Dan,

SGN is now using a db schema called Natural Diversity to store genetic
and phenotypic variation data. It is a GMOD/Chado schema developed by
SGN and collaborators. It is also used by other multiple databases which
you will find listed in the publication below.

A documentation of the schema:
http://database.oxfordjournals.org/content/2011/bar051.full

Some info on the working group:
http://gmod.org/wiki/Chado_Natural_Diversity_Module_Working_Group

Mailing lists:
https://lists.sourceforge.net/lists/listinfo/gmod-phendiver
https://lists.sourceforge.net/lists/listinfo/gmod-schema


Cheers,
Isaak



On 2/3/12 12:06 PM, Dan Bolser wrote:
Hi guys,

I'm now working at 'Ensembl Genomes' a project leader of the plants
division. We're looking to develop plant bifx infrastructure as part
of a grant called transPLANT. Part of the work involves building a
'variation archive' for plants, which is obviously related to strain
phenotyping information, population studies, and therefore, derived
QTL data...

I was wondering how you store your QTL data, and if there are any
standards, emerging or defined, that we should be thinking about?

I think it will be really good if everything we develop from the
Ensembl side is coordinated with equivalent developments coming from
the Chado/GMoD side, so it would be really great to work with you (as
Chado/GMoD users) to ensure that that happens.


Cheers,
Dan.
.


--
-------------------------------------
Isaak Yosief Tecle, PhD

Bioinformatics Consultant
to Sol Genomics Network

Boyce Thompson Institute
Cornell University

http://sgn.cornell.edu
-------------------------------------




_______________________________________________
Cxgn-devel mailing list
[hidden email]
http://rubisco.sgn.cornell.edu/cgi-bin/mailman/listinfo/cxgn-devel
_______________________________________________
Cxgn-devel mailing list
[hidden email]
http://rubisco.sgn.cornell.edu/cgi-bin/mailman/listinfo/cxgn-devel



_______________________________________________
Cxgn-devel mailing list
[hidden email]
http://rubisco.sgn.cornell.edu/cgi-bin/mailman/listinfo/cxgn-devel


------------------------------------------------------------------------------
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
_______________________________________________
Gmod-phendiver mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-phendiver

------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
Gmod-phendiver mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-phendiver

------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema


------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
Gmod-phendiver mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-phendiver
Reply | Threaded
Open this post in threaded view
|

Re: [Gmod-phendiver] [Gmod-schema] [Cxgn-devel] Information about QTL formats?

Dan Bolser
Cheers Seth,

I'm still trying to get my head round how all the ontologies fit
together practically and technologically, however, it won't be my
responsibility to work on this directly, so it's only for fun.

An interesting idea came up on the VCF list to define a 'large scale
phenotype format', so I jumped in with a suggestion, but it seems the
thread has gone cold, nobody replied.

I suspect I need to read more about how annotation is done in
practice, and about the various 'tab' formats.


Thanks all for help,
Dan.

On 7 February 2012 20:54, seth redmond <[hidden email]> wrote:

> Dan,
>
> When arranging the VB system we largely sidestepped the problem of genotypes
> since we were already committed to supporting ensembl variation. When I last
> looked into it (around five months ago now), ens-var was more than capable
> of handling a large number of individuals  but at the cost of some curation
> if you want to make them readable - e.g. it's trivial enough to import VCFs,
> but there's as yet no facility for rearranging these into different
> population groupings without just reloading from different VCFs. Paul
> Derwent could tell you more about this (I've now left VB, but I could also
> dig around for my notes if this would be useful).
>
> The genotype module attached to phendiver is somewhat underdeveloped for our
> purposes and I think this would be a good target for those interested to aim
> at next. Personally the idea of tracking a full snp-chip or illumina run's
> worth of genotypes in phendiver unnerves me, but I'd be interested to hear
> how Naama got on with it? i.e. how many variant loci you're SOL deals with?
>
> Finally you may well have encountered these already, but looking through the
> list of things you need terms for, there are some general ontologies that a
> number of us are using for these: most obviously PATO for the phenotypes, UO
> for the measurement/units and GAZ (gazeteer) for the locations.
>
> Bob can fill you in better than I can on the ISA-tab loaders, but if there's
> anything I can help with let me know.
>
> -s
>
>
>
> --
> Seth Redmond
>   Unité Génetique et Génomique des Insectes Vecteurs
>   Institut Pasteur
>   28,rue du Dr Roux
>   75724 PARIS
> [hidden email]
>
> On 7 Feb 2012, at 14:25, Dan Bolser wrote:
>
> Hi Bob,
>
> We're looking to build (like everyone else) a scaleable data archive
> for large-scale genotyping / phenotyping projects. Currently there is
> no 'phenotype archive' or even phenotype file format (that I know of),
> and it seems to make sense to marry phenotype and genotype data in one
> place (and to do that in a 'standard' way wherever possible).
>
> I think it's important to look at all available solutions before
> making a decision about what to do and how to do it, Ensembl variation
> included.
>
> I've just been hearing about ISA-tab in the context of the new
> BioSamples database at the EBI, which is part of the solution, but I
> don't know much about those formats yet TBH.
>
> Currently, I'm thinking that we need to bring in ontologies, terms, or
> URIs to explain:
>
> 1) experiment
> 2) measurement
> 3) phenotype
> 4) attribute
> 5) environment
> 6) individual
>
> and combine one term for every 'value' recorded in the phenotyping
> database, and then link those to SNPs via the individual. Although the
> SNPs could be stored in Ensembl variation or VCF or GVF, the main
> issue is to keep track of the 'individual' via some, probably
> external, accession number. How many samples can you cram into Ensembl
> variation?
>
> If this is being done in phendiver / chado, I'm keen to learn a) how,
> b) who, and c) performance.
>
> I'm not part of the VectorBase project, but I guess Paul K is? It
> would perhaps be good to get in a room with Paul and you at some point
> (I'm free at most times, but have some specific appointments).
>
> This is work that were hoping to do in the near future, but we're not
> actively working on it yet. I'm trying to get as much background
> information as I can in the mean time.
>
>
> Cheers,
> Dan.
>
> On 7 February 2012 11:34, Bob MacCallum <[hidden email]> wrote:
>
> Hi Dan,
>
> I guess you considered Ensembl variation for the SNPs.  It would be
>
> interesting to hear why you're looking into other options.  At
>
> VectorBase we're hoping Ensembl will handle our genomic needs while
>
> Chado handles the more unpredictable experimental data, phenotypes and
>
> sample collection meta-data more flexibly.  Obviously we have to
>
> bridge the two but think (hope) that is doable.
>
> I'll be up at EBI Weds-Fri this week with the VectorBase all-hands
>
> meeting, so maybe we could meet?
>
> (I'll summarise for the mailing list as appropriate.)
>
> For the record, we have written a prototype ISA-Tab ->
>
> Bio::Chado::Schema -> Chado loader (and in the other direction a web
>
> service and AJAX-heavy web interface
>
> http://funcgen.vectorbase.org/PopulationBETA/), but it's likely to go
>
> through some changes and is regrettably quite VectorBase specific at
>
> the moment.
>
> cheers,
>
> Bob.
>
>
> On Sat, Feb 4, 2012 at 3:13 PM, Dan Bolser <[hidden email]> wrote:
>
> Thanks Naama,
>
>
> Looks like I need to read a lot! :-)
>
>
> I think a VCF or GVF loader would be a good project to help
>
> standardize usage. There was some talk on the VCF mailing list about
>
> including phenotype information per #SAMPLE in that format, which I
>
> think would be a big help too.
>
>
>
> Thanks again for the info and links,
>
> Dan.
>
>
> On 4 February 2012 02:40, Naama Menda <[hidden email]> wrote:
>
> hi Dan (Im CC'ing the gmod schema and phendiver. lists)
>
>
> the phenotype module has not been modified yet, except for adding a nullable
>
> 'name' field.
>
> I think it has been working out for most people, but the idea is to make it
>
> more normalized, like the rest of Chado's modules, eliminating the multiple
>
> columns linking to 'cvterm' ,
>
> and replacing with other linking tables that would provide more structured
>
> way for storing phenotypes, having only the phenotype measurement in the
>
> phenotype table, and factoring out the semantics.
>
> Some people probably disagree with this approach, and I think the main
>
> problem is the broad definition of a phenotype, and the multiple ways for
>
> storing a phenotype in Chado.
>
>
> We had long long discussions about post-composing  . You can read some notes
>
> here http://gmod.org/wiki/Chado_Natural_Diversity_Module/natdiv_schema_changes_call
>
> and here http://sourceforge.net/mailarchive/message.php?msg_id=27597482
>
>
> There are no formal loaders for Natural Diversity schema, simply because
>
> there are many different ways to load your custom data.
>
> You can see some examples from the data loaded in SGN here
>
> https://github.com/solgenomics/Phenome/blob/master/bin/loading_scripts/solcap/load_solcap_TA_phenotypes.pl
>
>
> As you can see this is very data-specific, and makes many assumptions on
>
> your experiment design and metadata.
>
>
> Likewise, there is no formal way for storing QTL data. In general, you want
>
> to store your accessions in the stock table, create a new nd_experiment for
>
> each measurement, and link it with the stock via nd_experiment_stock and to
>
> the genotype via nd_experiment_genotype.
>
> The genotype table is mostly a spaceholder for a genotype name, and links to
>
> the feature table where you can load a marker, or whatever feature you
>
> need.
>
>
> The Natural Diversity paper does not elaborate on storing genotypes since it
>
> describes how to store experimental data (genotyping or phenotyping) in a
>
> generic and re-usable way. Genotypes and phenotypes are stored in different
>
> modules, which link back to the ND schema.
>
> It's power is in the ability to go back and forth from a 'stock' to its
>
> genotyping and phenotyping data, or generate new stocks from
>
> pheno/genotyping experiments.
>
> The examples in the paper talk mostly about phenotypes, since these are much
>
> more complex to handle than genotypes.
>
>
> The schema is a bit complicated, but after a year of discussions the working
>
> group came up with this model which seems to be generic enough to accomodate
>
> all the use cases we brought up.
>
>
> You can add your use case to the ND module wiki page, and see from there how
>
> it fits in (http://gmod.org/wiki/Chado_Natural_Diversity_Module#Use_Cases)
>
> and get more feedback from the mailing list.
>
>
> Hope this makes things a bit clearer!
>
> -Naama
>
>
>
>
> On Fri, Feb 3, 2012 at 6:18 PM, Dan Bolser <[hidden email]> wrote:
>
>
> Hi,
>
>
> I'm not clear weather the update to the phenotype module has been done
>
> or not? In general, does the old phenotype stuff do what's needed?
>
>
> BTW, where can I read more about 'post-composed' terms? I tried
>
> Goggle, but couldn't find any good reference material... I'm
>
> interested in the examples of combining different ontologies into the
>
> description of a single value.
>
>
> Do you have VCF / GVF loaders written for the schema?
>
>
> I'm still not clear how 'QTL data' is stored... are the results of QTL
>
> algorithms just that wiggly line? i.e. trivial to store? (Sorry for
>
> confusion).
>
>
> I made some updates on the wiki page. I didn't realize it until I
>
> started to try to re-work it, but the abstract of the paper doesn't
>
> mention storing genotype data in the database. It focuses on
>
> describing 'experiments'. Perhaps you could take a look and improve
>
> what I wrote there:
>
>
> http://gmod.org/wiki/Chado_Natural_Diversity_Module_Working_Group
>
>
>
> In the mean time I'll join the mailing list for the working group.
>
>
>
> Thanks again for the infos.,
>
>
> Dan.
>
>
> On 3 February 2012 12:19, Isaak Yosief Tecle <[hidden email]> wrote:
>
> Hi Dan,
>
>
> SGN is now using a db schema called Natural Diversity to store genetic
>
> and phenotypic variation data. It is a GMOD/Chado schema developed by
>
> SGN and collaborators. It is also used by other multiple databases which
>
> you will find listed in the publication below.
>
>
> A documentation of the schema:
>
> http://database.oxfordjournals.org/content/2011/bar051.full
>
>
> Some info on the working group:
>
> http://gmod.org/wiki/Chado_Natural_Diversity_Module_Working_Group
>
>
> Mailing lists:
>
> https://lists.sourceforge.net/lists/listinfo/gmod-phendiver
>
> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>
>
>
> Cheers,
>
> Isaak
>
>
>
>
> On 2/3/12 12:06 PM, Dan Bolser wrote:
>
> Hi guys,
>
>
> I'm now working at 'Ensembl Genomes' a project leader of the plants
>
> division. We're looking to develop plant bifx infrastructure as part
>
> of a grant called transPLANT. Part of the work involves building a
>
> 'variation archive' for plants, which is obviously related to strain
>
> phenotyping information, population studies, and therefore, derived
>
> QTL data...
>
>
> I was wondering how you store your QTL data, and if there are any
>
> standards, emerging or defined, that we should be thinking about?
>
>
> I think it will be really good if everything we develop from the
>
> Ensembl side is coordinated with equivalent developments coming from
>
> the Chado/GMoD side, so it would be really great to work with you (as
>
> Chado/GMoD users) to ensure that that happens.
>
>
>
> Cheers,
>
> Dan.
>
> .
>
>
>
> --
>
> -------------------------------------
>
> Isaak Yosief Tecle, PhD
>
>
> Bioinformatics Consultant
>
> to Sol Genomics Network
>
>
> Boyce Thompson Institute
>
> Cornell University
>
>
> http://sgn.cornell.edu
>
> -------------------------------------
>
>
>
>
>
> _______________________________________________
>
> Cxgn-devel mailing list
>
> [hidden email]
>
> http://rubisco.sgn.cornell.edu/cgi-bin/mailman/listinfo/cxgn-devel
>
> _______________________________________________
>
> Cxgn-devel mailing list
>
> [hidden email]
>
> http://rubisco.sgn.cornell.edu/cgi-bin/mailman/listinfo/cxgn-devel
>
>
>
>
> _______________________________________________
>
> Cxgn-devel mailing list
>
> [hidden email]
>
> http://rubisco.sgn.cornell.edu/cgi-bin/mailman/listinfo/cxgn-devel
>
>
>
> ------------------------------------------------------------------------------
>
> Try before you buy = See our experts in action!
>
> The most comprehensive online learning library for Microsoft developers
>
> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
>
> Metro Style Apps, more. Free future releases when you subscribe now!
>
> http://p.sf.net/sfu/learndevnow-dev2
>
> _______________________________________________
>
> Gmod-phendiver mailing list
>
> [hidden email]
>
> https://lists.sourceforge.net/lists/listinfo/gmod-phendiver
>
>
> ------------------------------------------------------------------------------
>
> Keep Your Developer Skills Current with LearnDevNow!
>
> The most comprehensive online learning library for Microsoft developers
>
> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
>
> Metro Style Apps, more. Free future releases when you subscribe now!
>
> http://p.sf.net/sfu/learndevnow-d2d
>
> _______________________________________________
>
> Gmod-phendiver mailing list
>
> [hidden email]
>
> https://lists.sourceforge.net/lists/listinfo/gmod-phendiver
>
>
> ------------------------------------------------------------------------------
> Keep Your Developer Skills Current with LearnDevNow!
> The most comprehensive online learning library for Microsoft developers
> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
> Metro Style Apps, more. Free future releases when you subscribe now!
> http://p.sf.net/sfu/learndevnow-d2d
> _______________________________________________
> Gmod-schema mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>
>
>
> ------------------------------------------------------------------------------
> Keep Your Developer Skills Current with LearnDevNow!
> The most comprehensive online learning library for Microsoft developers
> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
> Metro Style Apps, more. Free future releases when you subscribe now!
> http://p.sf.net/sfu/learndevnow-d2d
> _______________________________________________
> Gmod-phendiver mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gmod-phendiver
>

------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
Gmod-phendiver mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-phendiver
Reply | Threaded
Open this post in threaded view
|

Re: [Gmod-phendiver] [Gmod-schema] [Cxgn-devel] Information about QTL formats?

Chris Mungall

I put together some recommendations for including phenotype annotations in GVF files:

        http://www.sequenceontology.org/wiki/index.php/Using_Phenotype_Ontologies_in_GVF

VCF seems similar enough that the same syntax and recommendations should apply.

It's designed to use simple pre-existing phenotype terms from ontologies like the human phenotype ontology. In principle it could be expanded to cover more expressive modes of describing the phenotypes, but this might be better done in a separate OWL file.

On Feb 7, 2012, at 1:24 PM, Dan Bolser wrote:

> Cheers Seth,
>
> I'm still trying to get my head round how all the ontologies fit
> together practically and technologically, however, it won't be my
> responsibility to work on this directly, so it's only for fun.
>
> An interesting idea came up on the VCF list to define a 'large scale
> phenotype format', so I jumped in with a suggestion, but it seems the
> thread has gone cold, nobody replied.
>
> I suspect I need to read more about how annotation is done in
> practice, and about the various 'tab' formats.
>
>
> Thanks all for help,
> Dan.
>
> On 7 February 2012 20:54, seth redmond <[hidden email]> wrote:
>> Dan,
>>
>> When arranging the VB system we largely sidestepped the problem of genotypes
>> since we were already committed to supporting ensembl variation. When I last
>> looked into it (around five months ago now), ens-var was more than capable
>> of handling a large number of individuals  but at the cost of some curation
>> if you want to make them readable - e.g. it's trivial enough to import VCFs,
>> but there's as yet no facility for rearranging these into different
>> population groupings without just reloading from different VCFs. Paul
>> Derwent could tell you more about this (I've now left VB, but I could also
>> dig around for my notes if this would be useful).
>>
>> The genotype module attached to phendiver is somewhat underdeveloped for our
>> purposes and I think this would be a good target for those interested to aim
>> at next. Personally the idea of tracking a full snp-chip or illumina run's
>> worth of genotypes in phendiver unnerves me, but I'd be interested to hear
>> how Naama got on with it? i.e. how many variant loci you're SOL deals with?
>>
>> Finally you may well have encountered these already, but looking through the
>> list of things you need terms for, there are some general ontologies that a
>> number of us are using for these: most obviously PATO for the phenotypes, UO
>> for the measurement/units and GAZ (gazeteer) for the locations.
>>
>> Bob can fill you in better than I can on the ISA-tab loaders, but if there's
>> anything I can help with let me know.
>>
>> -s
>>
>>
>>
>> --
>> Seth Redmond
>>   Unité Génetique et Génomique des Insectes Vecteurs
>>   Institut Pasteur
>>   28,rue du Dr Roux
>>   75724 PARIS
>> [hidden email]
>>
>> On 7 Feb 2012, at 14:25, Dan Bolser wrote:
>>
>> Hi Bob,
>>
>> We're looking to build (like everyone else) a scaleable data archive
>> for large-scale genotyping / phenotyping projects. Currently there is
>> no 'phenotype archive' or even phenotype file format (that I know of),
>> and it seems to make sense to marry phenotype and genotype data in one
>> place (and to do that in a 'standard' way wherever possible).
>>
>> I think it's important to look at all available solutions before
>> making a decision about what to do and how to do it, Ensembl variation
>> included.
>>
>> I've just been hearing about ISA-tab in the context of the new
>> BioSamples database at the EBI, which is part of the solution, but I
>> don't know much about those formats yet TBH.
>>
>> Currently, I'm thinking that we need to bring in ontologies, terms, or
>> URIs to explain:
>>
>> 1) experiment
>> 2) measurement
>> 3) phenotype
>> 4) attribute
>> 5) environment
>> 6) individual
>>
>> and combine one term for every 'value' recorded in the phenotyping
>> database, and then link those to SNPs via the individual. Although the
>> SNPs could be stored in Ensembl variation or VCF or GVF, the main
>> issue is to keep track of the 'individual' via some, probably
>> external, accession number. How many samples can you cram into Ensembl
>> variation?
>>
>> If this is being done in phendiver / chado, I'm keen to learn a) how,
>> b) who, and c) performance.
>>
>> I'm not part of the VectorBase project, but I guess Paul K is? It
>> would perhaps be good to get in a room with Paul and you at some point
>> (I'm free at most times, but have some specific appointments).
>>
>> This is work that were hoping to do in the near future, but we're not
>> actively working on it yet. I'm trying to get as much background
>> information as I can in the mean time.
>>
>>
>> Cheers,
>> Dan.
>>
>> On 7 February 2012 11:34, Bob MacCallum <[hidden email]> wrote:
>>
>> Hi Dan,
>>
>> I guess you considered Ensembl variation for the SNPs.  It would be
>>
>> interesting to hear why you're looking into other options.  At
>>
>> VectorBase we're hoping Ensembl will handle our genomic needs while
>>
>> Chado handles the more unpredictable experimental data, phenotypes and
>>
>> sample collection meta-data more flexibly.  Obviously we have to
>>
>> bridge the two but think (hope) that is doable.
>>
>> I'll be up at EBI Weds-Fri this week with the VectorBase all-hands
>>
>> meeting, so maybe we could meet?
>>
>> (I'll summarise for the mailing list as appropriate.)
>>
>> For the record, we have written a prototype ISA-Tab ->
>>
>> Bio::Chado::Schema -> Chado loader (and in the other direction a web
>>
>> service and AJAX-heavy web interface
>>
>> http://funcgen.vectorbase.org/PopulationBETA/), but it's likely to go
>>
>> through some changes and is regrettably quite VectorBase specific at
>>
>> the moment.
>>
>> cheers,
>>
>> Bob.
>>
>>
>> On Sat, Feb 4, 2012 at 3:13 PM, Dan Bolser <[hidden email]> wrote:
>>
>> Thanks Naama,
>>
>>
>> Looks like I need to read a lot! :-)
>>
>>
>> I think a VCF or GVF loader would be a good project to help
>>
>> standardize usage. There was some talk on the VCF mailing list about
>>
>> including phenotype information per #SAMPLE in that format, which I
>>
>> think would be a big help too.
>>
>>
>>
>> Thanks again for the info and links,
>>
>> Dan.
>>
>>
>> On 4 February 2012 02:40, Naama Menda <[hidden email]> wrote:
>>
>> hi Dan (Im CC'ing the gmod schema and phendiver. lists)
>>
>>
>> the phenotype module has not been modified yet, except for adding a nullable
>>
>> 'name' field.
>>
>> I think it has been working out for most people, but the idea is to make it
>>
>> more normalized, like the rest of Chado's modules, eliminating the multiple
>>
>> columns linking to 'cvterm' ,
>>
>> and replacing with other linking tables that would provide more structured
>>
>> way for storing phenotypes, having only the phenotype measurement in the
>>
>> phenotype table, and factoring out the semantics.
>>
>> Some people probably disagree with this approach, and I think the main
>>
>> problem is the broad definition of a phenotype, and the multiple ways for
>>
>> storing a phenotype in Chado.
>>
>>
>> We had long long discussions about post-composing  . You can read some notes
>>
>> here http://gmod.org/wiki/Chado_Natural_Diversity_Module/natdiv_schema_changes_call
>>
>> and here http://sourceforge.net/mailarchive/message.php?msg_id=27597482
>>
>>
>> There are no formal loaders for Natural Diversity schema, simply because
>>
>> there are many different ways to load your custom data.
>>
>> You can see some examples from the data loaded in SGN here
>>
>> https://github.com/solgenomics/Phenome/blob/master/bin/loading_scripts/solcap/load_solcap_TA_phenotypes.pl
>>
>>
>> As you can see this is very data-specific, and makes many assumptions on
>>
>> your experiment design and metadata.
>>
>>
>> Likewise, there is no formal way for storing QTL data. In general, you want
>>
>> to store your accessions in the stock table, create a new nd_experiment for
>>
>> each measurement, and link it with the stock via nd_experiment_stock and to
>>
>> the genotype via nd_experiment_genotype.
>>
>> The genotype table is mostly a spaceholder for a genotype name, and links to
>>
>> the feature table where you can load a marker, or whatever feature you
>>
>> need.
>>
>>
>> The Natural Diversity paper does not elaborate on storing genotypes since it
>>
>> describes how to store experimental data (genotyping or phenotyping) in a
>>
>> generic and re-usable way. Genotypes and phenotypes are stored in different
>>
>> modules, which link back to the ND schema.
>>
>> It's power is in the ability to go back and forth from a 'stock' to its
>>
>> genotyping and phenotyping data, or generate new stocks from
>>
>> pheno/genotyping experiments.
>>
>> The examples in the paper talk mostly about phenotypes, since these are much
>>
>> more complex to handle than genotypes.
>>
>>
>> The schema is a bit complicated, but after a year of discussions the working
>>
>> group came up with this model which seems to be generic enough to accomodate
>>
>> all the use cases we brought up.
>>
>>
>> You can add your use case to the ND module wiki page, and see from there how
>>
>> it fits in (http://gmod.org/wiki/Chado_Natural_Diversity_Module#Use_Cases)
>>
>> and get more feedback from the mailing list.
>>
>>
>> Hope this makes things a bit clearer!
>>
>> -Naama
>>
>>
>>
>>
>> On Fri, Feb 3, 2012 at 6:18 PM, Dan Bolser <[hidden email]> wrote:
>>
>>
>> Hi,
>>
>>
>> I'm not clear weather the update to the phenotype module has been done
>>
>> or not? In general, does the old phenotype stuff do what's needed?
>>
>>
>> BTW, where can I read more about 'post-composed' terms? I tried
>>
>> Goggle, but couldn't find any good reference material... I'm
>>
>> interested in the examples of combining different ontologies into the
>>
>> description of a single value.
>>
>>
>> Do you have VCF / GVF loaders written for the schema?
>>
>>
>> I'm still not clear how 'QTL data' is stored... are the results of QTL
>>
>> algorithms just that wiggly line? i.e. trivial to store? (Sorry for
>>
>> confusion).
>>
>>
>> I made some updates on the wiki page. I didn't realize it until I
>>
>> started to try to re-work it, but the abstract of the paper doesn't
>>
>> mention storing genotype data in the database. It focuses on
>>
>> describing 'experiments'. Perhaps you could take a look and improve
>>
>> what I wrote there:
>>
>>
>> http://gmod.org/wiki/Chado_Natural_Diversity_Module_Working_Group
>>
>>
>>
>> In the mean time I'll join the mailing list for the working group.
>>
>>
>>
>> Thanks again for the infos.,
>>
>>
>> Dan.
>>
>>
>> On 3 February 2012 12:19, Isaak Yosief Tecle <[hidden email]> wrote:
>>
>> Hi Dan,
>>
>>
>> SGN is now using a db schema called Natural Diversity to store genetic
>>
>> and phenotypic variation data. It is a GMOD/Chado schema developed by
>>
>> SGN and collaborators. It is also used by other multiple databases which
>>
>> you will find listed in the publication below.
>>
>>
>> A documentation of the schema:
>>
>> http://database.oxfordjournals.org/content/2011/bar051.full
>>
>>
>> Some info on the working group:
>>
>> http://gmod.org/wiki/Chado_Natural_Diversity_Module_Working_Group
>>
>>
>> Mailing lists:
>>
>> https://lists.sourceforge.net/lists/listinfo/gmod-phendiver
>>
>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>
>>
>>
>> Cheers,
>>
>> Isaak
>>
>>
>>
>>
>> On 2/3/12 12:06 PM, Dan Bolser wrote:
>>
>> Hi guys,
>>
>>
>> I'm now working at 'Ensembl Genomes' a project leader of the plants
>>
>> division. We're looking to develop plant bifx infrastructure as part
>>
>> of a grant called transPLANT. Part of the work involves building a
>>
>> 'variation archive' for plants, which is obviously related to strain
>>
>> phenotyping information, population studies, and therefore, derived
>>
>> QTL data...
>>
>>
>> I was wondering how you store your QTL data, and if there are any
>>
>> standards, emerging or defined, that we should be thinking about?
>>
>>
>> I think it will be really good if everything we develop from the
>>
>> Ensembl side is coordinated with equivalent developments coming from
>>
>> the Chado/GMoD side, so it would be really great to work with you (as
>>
>> Chado/GMoD users) to ensure that that happens.
>>
>>
>>
>> Cheers,
>>
>> Dan.
>>
>> .
>>
>>
>>
>> --
>>
>> -------------------------------------
>>
>> Isaak Yosief Tecle, PhD
>>
>>
>> Bioinformatics Consultant
>>
>> to Sol Genomics Network
>>
>>
>> Boyce Thompson Institute
>>
>> Cornell University
>>
>>
>> http://sgn.cornell.edu
>>
>> -------------------------------------
>>
>>
>>
>>
>>
>> _______________________________________________
>>
>> Cxgn-devel mailing list
>>
>> [hidden email]
>>
>> http://rubisco.sgn.cornell.edu/cgi-bin/mailman/listinfo/cxgn-devel
>>
>> _______________________________________________
>>
>> Cxgn-devel mailing list
>>
>> [hidden email]
>>
>> http://rubisco.sgn.cornell.edu/cgi-bin/mailman/listinfo/cxgn-devel
>>
>>
>>
>>
>> _______________________________________________
>>
>> Cxgn-devel mailing list
>>
>> [hidden email]
>>
>> http://rubisco.sgn.cornell.edu/cgi-bin/mailman/listinfo/cxgn-devel
>>
>>
>>
>> ------------------------------------------------------------------------------
>>
>> Try before you buy = See our experts in action!
>>
>> The most comprehensive online learning library for Microsoft developers
>>
>> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
>>
>> Metro Style Apps, more. Free future releases when you subscribe now!
>>
>> http://p.sf.net/sfu/learndevnow-dev2
>>
>> _______________________________________________
>>
>> Gmod-phendiver mailing list
>>
>> [hidden email]
>>
>> https://lists.sourceforge.net/lists/listinfo/gmod-phendiver
>>
>>
>> ------------------------------------------------------------------------------
>>
>> Keep Your Developer Skills Current with LearnDevNow!
>>
>> The most comprehensive online learning library for Microsoft developers
>>
>> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
>>
>> Metro Style Apps, more. Free future releases when you subscribe now!
>>
>> http://p.sf.net/sfu/learndevnow-d2d
>>
>> _______________________________________________
>>
>> Gmod-phendiver mailing list
>>
>> [hidden email]
>>
>> https://lists.sourceforge.net/lists/listinfo/gmod-phendiver
>>
>>
>> ------------------------------------------------------------------------------
>> Keep Your Developer Skills Current with LearnDevNow!
>> The most comprehensive online learning library for Microsoft developers
>> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
>> Metro Style Apps, more. Free future releases when you subscribe now!
>> http://p.sf.net/sfu/learndevnow-d2d
>> _______________________________________________
>> Gmod-schema mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Keep Your Developer Skills Current with LearnDevNow!
>> The most comprehensive online learning library for Microsoft developers
>> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
>> Metro Style Apps, more. Free future releases when you subscribe now!
>> http://p.sf.net/sfu/learndevnow-d2d
>> _______________________________________________
>> Gmod-phendiver mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/gmod-phendiver
>>
>
> ------------------------------------------------------------------------------
> Keep Your Developer Skills Current with LearnDevNow!
> The most comprehensive online learning library for Microsoft developers
> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
> Metro Style Apps, more. Free future releases when you subscribe now!
> http://p.sf.net/sfu/learndevnow-d2d
> _______________________________________________
> Gmod-schema mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/gmod-schema


------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
Gmod-phendiver mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-phendiver
Reply | Threaded
Open this post in threaded view
|

Re: [Gmod-phendiver] [Gmod-schema] [Cxgn-devel] Information about QTL formats?

seth redmond-2
I think expanding this to entity-quality style postcomposed terms would solve a lot of problems, perhaps something like this?:
##phenotype-description Entity=GO:0035011;Ontology=http://purl.obolibrary.org/obo/go.oboQuality=PATO:0001650;Ontology=http://purl.obolibrary.org/obo/pato.obo

Probably not a substitute for those who are recording individual-level data (i.e. without having calculated associations / QTLs) or who require details of the assay used, but I'm sure it would prove useful for curated data.



On 8 Feb 2012, at 00:52, Chris Mungall wrote:

>
> I put together some recommendations for including phenotype annotations in GVF files:
>
> http://www.sequenceontology.org/wiki/index.php/Using_Phenotype_Ontologies_in_GVF
>
> VCF seems similar enough that the same syntax and recommendations should apply.
>
> It's designed to use simple pre-existing phenotype terms from ontologies like the human phenotype ontology. In principle it could be expanded to cover more expressive modes of describing the phenotypes, but this might be better done in a separate OWL file.
>
> On Feb 7, 2012, at 1:24 PM, Dan Bolser wrote:
>
>> Cheers Seth,
>>
>> I'm still trying to get my head round how all the ontologies fit
>> together practically and technologically, however, it won't be my
>> responsibility to work on this directly, so it's only for fun.
>>
>> An interesting idea came up on the VCF list to define a 'large scale
>> phenotype format', so I jumped in with a suggestion, but it seems the
>> thread has gone cold, nobody replied.
>>
>> I suspect I need to read more about how annotation is done in
>> practice, and about the various 'tab' formats.
>>
>>
>> Thanks all for help,
>> Dan.
>>
>> On 7 February 2012 20:54, seth redmond <[hidden email]> wrote:
>>> Dan,
>>>
>>> When arranging the VB system we largely sidestepped the problem of genotypes
>>> since we were already committed to supporting ensembl variation. When I last
>>> looked into it (around five months ago now), ens-var was more than capable
>>> of handling a large number of individuals  but at the cost of some curation
>>> if you want to make them readable - e.g. it's trivial enough to import VCFs,
>>> but there's as yet no facility for rearranging these into different
>>> population groupings without just reloading from different VCFs. Paul
>>> Derwent could tell you more about this (I've now left VB, but I could also
>>> dig around for my notes if this would be useful).
>>>
>>> The genotype module attached to phendiver is somewhat underdeveloped for our
>>> purposes and I think this would be a good target for those interested to aim
>>> at next. Personally the idea of tracking a full snp-chip or illumina run's
>>> worth of genotypes in phendiver unnerves me, but I'd be interested to hear
>>> how Naama got on with it? i.e. how many variant loci you're SOL deals with?
>>>
>>> Finally you may well have encountered these already, but looking through the
>>> list of things you need terms for, there are some general ontologies that a
>>> number of us are using for these: most obviously PATO for the phenotypes, UO
>>> for the measurement/units and GAZ (gazeteer) for the locations.
>>>
>>> Bob can fill you in better than I can on the ISA-tab loaders, but if there's
>>> anything I can help with let me know.
>>>
>>> -s
>>>
>>>
>>>
>>> --
>>> Seth Redmond
>>>  Unité Génetique et Génomique des Insectes Vecteurs
>>>  Institut Pasteur
>>>  28,rue du Dr Roux
>>>  75724 PARIS
>>> [hidden email]
>>>
>>> On 7 Feb 2012, at 14:25, Dan Bolser wrote:
>>>
>>> Hi Bob,
>>>
>>> We're looking to build (like everyone else) a scaleable data archive
>>> for large-scale genotyping / phenotyping projects. Currently there is
>>> no 'phenotype archive' or even phenotype file format (that I know of),
>>> and it seems to make sense to marry phenotype and genotype data in one
>>> place (and to do that in a 'standard' way wherever possible).
>>>
>>> I think it's important to look at all available solutions before
>>> making a decision about what to do and how to do it, Ensembl variation
>>> included.
>>>
>>> I've just been hearing about ISA-tab in the context of the new
>>> BioSamples database at the EBI, which is part of the solution, but I
>>> don't know much about those formats yet TBH.
>>>
>>> Currently, I'm thinking that we need to bring in ontologies, terms, or
>>> URIs to explain:
>>>
>>> 1) experiment
>>> 2) measurement
>>> 3) phenotype
>>> 4) attribute
>>> 5) environment
>>> 6) individual
>>>
>>> and combine one term for every 'value' recorded in the phenotyping
>>> database, and then link those to SNPs via the individual. Although the
>>> SNPs could be stored in Ensembl variation or VCF or GVF, the main
>>> issue is to keep track of the 'individual' via some, probably
>>> external, accession number. How many samples can you cram into Ensembl
>>> variation?
>>>
>>> If this is being done in phendiver / chado, I'm keen to learn a) how,
>>> b) who, and c) performance.
>>>
>>> I'm not part of the VectorBase project, but I guess Paul K is? It
>>> would perhaps be good to get in a room with Paul and you at some point
>>> (I'm free at most times, but have some specific appointments).
>>>
>>> This is work that were hoping to do in the near future, but we're not
>>> actively working on it yet. I'm trying to get as much background
>>> information as I can in the mean time.
>>>
>>>
>>> Cheers,
>>> Dan.
>>>
>>> On 7 February 2012 11:34, Bob MacCallum <[hidden email]> wrote:
>>>
>>> Hi Dan,
>>>
>>> I guess you considered Ensembl variation for the SNPs.  It would be
>>>
>>> interesting to hear why you're looking into other options.  At
>>>
>>> VectorBase we're hoping Ensembl will handle our genomic needs while
>>>
>>> Chado handles the more unpredictable experimental data, phenotypes and
>>>
>>> sample collection meta-data more flexibly.  Obviously we have to
>>>
>>> bridge the two but think (hope) that is doable.
>>>
>>> I'll be up at EBI Weds-Fri this week with the VectorBase all-hands
>>>
>>> meeting, so maybe we could meet?
>>>
>>> (I'll summarise for the mailing list as appropriate.)
>>>
>>> For the record, we have written a prototype ISA-Tab ->
>>>
>>> Bio::Chado::Schema -> Chado loader (and in the other direction a web
>>>
>>> service and AJAX-heavy web interface
>>>
>>> http://funcgen.vectorbase.org/PopulationBETA/), but it's likely to go
>>>
>>> through some changes and is regrettably quite VectorBase specific at
>>>
>>> the moment.
>>>
>>> cheers,
>>>
>>> Bob.
>>>
>>>
>>> On Sat, Feb 4, 2012 at 3:13 PM, Dan Bolser <[hidden email]> wrote:
>>>
>>> Thanks Naama,
>>>
>>>
>>> Looks like I need to read a lot! :-)
>>>
>>>
>>> I think a VCF or GVF loader would be a good project to help
>>>
>>> standardize usage. There was some talk on the VCF mailing list about
>>>
>>> including phenotype information per #SAMPLE in that format, which I
>>>
>>> think would be a big help too.
>>>
>>>
>>>
>>> Thanks again for the info and links,
>>>
>>> Dan.
>>>
>>>
>>> On 4 February 2012 02:40, Naama Menda <[hidden email]> wrote:
>>>
>>> hi Dan (Im CC'ing the gmod schema and phendiver. lists)
>>>
>>>
>>> the phenotype module has not been modified yet, except for adding a nullable
>>>
>>> 'name' field.
>>>
>>> I think it has been working out for most people, but the idea is to make it
>>>
>>> more normalized, like the rest of Chado's modules, eliminating the multiple
>>>
>>> columns linking to 'cvterm' ,
>>>
>>> and replacing with other linking tables that would provide more structured
>>>
>>> way for storing phenotypes, having only the phenotype measurement in the
>>>
>>> phenotype table, and factoring out the semantics.
>>>
>>> Some people probably disagree with this approach, and I think the main
>>>
>>> problem is the broad definition of a phenotype, and the multiple ways for
>>>
>>> storing a phenotype in Chado.
>>>
>>>
>>> We had long long discussions about post-composing  . You can read some notes
>>>
>>> here http://gmod.org/wiki/Chado_Natural_Diversity_Module/natdiv_schema_changes_call
>>>
>>> and here http://sourceforge.net/mailarchive/message.php?msg_id=27597482
>>>
>>>
>>> There are no formal loaders for Natural Diversity schema, simply because
>>>
>>> there are many different ways to load your custom data.
>>>
>>> You can see some examples from the data loaded in SGN here
>>>
>>> https://github.com/solgenomics/Phenome/blob/master/bin/loading_scripts/solcap/load_solcap_TA_phenotypes.pl
>>>
>>>
>>> As you can see this is very data-specific, and makes many assumptions on
>>>
>>> your experiment design and metadata.
>>>
>>>
>>> Likewise, there is no formal way for storing QTL data. In general, you want
>>>
>>> to store your accessions in the stock table, create a new nd_experiment for
>>>
>>> each measurement, and link it with the stock via nd_experiment_stock and to
>>>
>>> the genotype via nd_experiment_genotype.
>>>
>>> The genotype table is mostly a spaceholder for a genotype name, and links to
>>>
>>> the feature table where you can load a marker, or whatever feature you
>>>
>>> need.
>>>
>>>
>>> The Natural Diversity paper does not elaborate on storing genotypes since it
>>>
>>> describes how to store experimental data (genotyping or phenotyping) in a
>>>
>>> generic and re-usable way. Genotypes and phenotypes are stored in different
>>>
>>> modules, which link back to the ND schema.
>>>
>>> It's power is in the ability to go back and forth from a 'stock' to its
>>>
>>> genotyping and phenotyping data, or generate new stocks from
>>>
>>> pheno/genotyping experiments.
>>>
>>> The examples in the paper talk mostly about phenotypes, since these are much
>>>
>>> more complex to handle than genotypes.
>>>
>>>
>>> The schema is a bit complicated, but after a year of discussions the working
>>>
>>> group came up with this model which seems to be generic enough to accomodate
>>>
>>> all the use cases we brought up.
>>>
>>>
>>> You can add your use case to the ND module wiki page, and see from there how
>>>
>>> it fits in (http://gmod.org/wiki/Chado_Natural_Diversity_Module#Use_Cases)
>>>
>>> and get more feedback from the mailing list.
>>>
>>>
>>> Hope this makes things a bit clearer!
>>>
>>> -Naama
>>>
>>>
>>>
>>>
>>> On Fri, Feb 3, 2012 at 6:18 PM, Dan Bolser <[hidden email]> wrote:
>>>
>>>
>>> Hi,
>>>
>>>
>>> I'm not clear weather the update to the phenotype module has been done
>>>
>>> or not? In general, does the old phenotype stuff do what's needed?
>>>
>>>
>>> BTW, where can I read more about 'post-composed' terms? I tried
>>>
>>> Goggle, but couldn't find any good reference material... I'm
>>>
>>> interested in the examples of combining different ontologies into the
>>>
>>> description of a single value.
>>>
>>>
>>> Do you have VCF / GVF loaders written for the schema?
>>>
>>>
>>> I'm still not clear how 'QTL data' is stored... are the results of QTL
>>>
>>> algorithms just that wiggly line? i.e. trivial to store? (Sorry for
>>>
>>> confusion).
>>>
>>>
>>> I made some updates on the wiki page. I didn't realize it until I
>>>
>>> started to try to re-work it, but the abstract of the paper doesn't
>>>
>>> mention storing genotype data in the database. It focuses on
>>>
>>> describing 'experiments'. Perhaps you could take a look and improve
>>>
>>> what I wrote there:
>>>
>>>
>>> http://gmod.org/wiki/Chado_Natural_Diversity_Module_Working_Group
>>>
>>>
>>>
>>> In the mean time I'll join the mailing list for the working group.
>>>
>>>
>>>
>>> Thanks again for the infos.,
>>>
>>>
>>> Dan.
>>>
>>>
>>> On 3 February 2012 12:19, Isaak Yosief Tecle <[hidden email]> wrote:
>>>
>>> Hi Dan,
>>>
>>>
>>> SGN is now using a db schema called Natural Diversity to store genetic
>>>
>>> and phenotypic variation data. It is a GMOD/Chado schema developed by
>>>
>>> SGN and collaborators. It is also used by other multiple databases which
>>>
>>> you will find listed in the publication below.
>>>
>>>
>>> A documentation of the schema:
>>>
>>> http://database.oxfordjournals.org/content/2011/bar051.full
>>>
>>>
>>> Some info on the working group:
>>>
>>> http://gmod.org/wiki/Chado_Natural_Diversity_Module_Working_Group
>>>
>>>
>>> Mailing lists:
>>>
>>> https://lists.sourceforge.net/lists/listinfo/gmod-phendiver
>>>
>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>>
>>>
>>>
>>> Cheers,
>>>
>>> Isaak
>>>
>>>
>>>
>>>
>>> On 2/3/12 12:06 PM, Dan Bolser wrote:
>>>
>>> Hi guys,
>>>
>>>
>>> I'm now working at 'Ensembl Genomes' a project leader of the plants
>>>
>>> division. We're looking to develop plant bifx infrastructure as part
>>>
>>> of a grant called transPLANT. Part of the work involves building a
>>>
>>> 'variation archive' for plants, which is obviously related to strain
>>>
>>> phenotyping information, population studies, and therefore, derived
>>>
>>> QTL data...
>>>
>>>
>>> I was wondering how you store your QTL data, and if there are any
>>>
>>> standards, emerging or defined, that we should be thinking about?
>>>
>>>
>>> I think it will be really good if everything we develop from the
>>>
>>> Ensembl side is coordinated with equivalent developments coming from
>>>
>>> the Chado/GMoD side, so it would be really great to work with you (as
>>>
>>> Chado/GMoD users) to ensure that that happens.
>>>
>>>
>>>
>>> Cheers,
>>>
>>> Dan.
>>>
>>> .
>>>
>>>
>>>
>>> --
>>>
>>> -------------------------------------
>>>
>>> Isaak Yosief Tecle, PhD
>>>
>>>
>>> Bioinformatics Consultant
>>>
>>> to Sol Genomics Network
>>>
>>>
>>> Boyce Thompson Institute
>>>
>>> Cornell University
>>>
>>>
>>> http://sgn.cornell.edu
>>>
>>> -------------------------------------
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>>
>>> Cxgn-devel mailing list
>>>
>>> [hidden email]
>>>
>>> http://rubisco.sgn.cornell.edu/cgi-bin/mailman/listinfo/cxgn-devel
>>>
>>> _______________________________________________
>>>
>>> Cxgn-devel mailing list
>>>
>>> [hidden email]
>>>
>>> http://rubisco.sgn.cornell.edu/cgi-bin/mailman/listinfo/cxgn-devel
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>>
>>> Cxgn-devel mailing list
>>>
>>> [hidden email]
>>>
>>> http://rubisco.sgn.cornell.edu/cgi-bin/mailman/listinfo/cxgn-devel
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>>
>>> Try before you buy = See our experts in action!
>>>
>>> The most comprehensive online learning library for Microsoft developers
>>>
>>> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
>>>
>>> Metro Style Apps, more. Free future releases when you subscribe now!
>>>
>>> http://p.sf.net/sfu/learndevnow-dev2
>>>
>>> _______________________________________________
>>>
>>> Gmod-phendiver mailing list
>>>
>>> [hidden email]
>>>
>>> https://lists.sourceforge.net/lists/listinfo/gmod-phendiver
>>>
>>>
>>> ------------------------------------------------------------------------------
>>>
>>> Keep Your Developer Skills Current with LearnDevNow!
>>>
>>> The most comprehensive online learning library for Microsoft developers
>>>
>>> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
>>>
>>> Metro Style Apps, more. Free future releases when you subscribe now!
>>>
>>> http://p.sf.net/sfu/learndevnow-d2d
>>>
>>> _______________________________________________
>>>
>>> Gmod-phendiver mailing list
>>>
>>> [hidden email]
>>>
>>> https://lists.sourceforge.net/lists/listinfo/gmod-phendiver
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Keep Your Developer Skills Current with LearnDevNow!
>>> The most comprehensive online learning library for Microsoft developers
>>> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
>>> Metro Style Apps, more. Free future releases when you subscribe now!
>>> http://p.sf.net/sfu/learndevnow-d2d
>>> _______________________________________________
>>> Gmod-schema mailing list
>>> [hidden email]
>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Keep Your Developer Skills Current with LearnDevNow!
>>> The most comprehensive online learning library for Microsoft developers
>>> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
>>> Metro Style Apps, more. Free future releases when you subscribe now!
>>> http://p.sf.net/sfu/learndevnow-d2d
>>> _______________________________________________
>>> Gmod-phendiver mailing list
>>> [hidden email]
>>> https://lists.sourceforge.net/lists/listinfo/gmod-phendiver
>>>
>>
>> ------------------------------------------------------------------------------
>> Keep Your Developer Skills Current with LearnDevNow!
>> The most comprehensive online learning library for Microsoft developers
>> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
>> Metro Style Apps, more. Free future releases when you subscribe now!
>> http://p.sf.net/sfu/learndevnow-d2d
>> _______________________________________________
>> Gmod-schema mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>





------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
Gmod-phendiver mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-phendiver
Reply | Threaded
Open this post in threaded view
|

Re: [Gmod-phendiver] [Gmod-schema] [Cxgn-devel] Information about QTL formats?

Naama Menda
this may be a good opportunity look again at the phenotype module, 
which we constantly delayed due to lack of agreement how phenotypes should be stored.
The only thing we could agree on id that the phenotype module is outdated and needs revising.



-Naama


On Wed, Feb 8, 2012 at 5:44 AM, seth redmond <[hidden email]> wrote:
I think expanding this to entity-quality style postcomposed terms would solve a lot of problems, perhaps something like this?:
##phenotype-description Entity=GO:0035011;Ontology=http://purl.obolibrary.org/obo/go.oboQuality=PATO:0001650;Ontology=http://purl.obolibrary.org/obo/pato.obo

Probably not a substitute for those who are recording individual-level data (i.e. without having calculated associations / QTLs) or who require details of the assay used, but I'm sure it would prove useful for curated data.



On 8 Feb 2012, at 00:52, Chris Mungall wrote:

>
> I put together some recommendations for including phenotype annotations in GVF files:
>
>       http://www.sequenceontology.org/wiki/index.php/Using_Phenotype_Ontologies_in_GVF
>
> VCF seems similar enough that the same syntax and recommendations should apply.
>
> It's designed to use simple pre-existing phenotype terms from ontologies like the human phenotype ontology. In principle it could be expanded to cover more expressive modes of describing the phenotypes, but this might be better done in a separate OWL file.
>
> On Feb 7, 2012, at 1:24 PM, Dan Bolser wrote:
>
>> Cheers Seth,
>>
>> I'm still trying to get my head round how all the ontologies fit
>> together practically and technologically, however, it won't be my
>> responsibility to work on this directly, so it's only for fun.
>>
>> An interesting idea came up on the VCF list to define a 'large scale
>> phenotype format', so I jumped in with a suggestion, but it seems the
>> thread has gone cold, nobody replied.
>>
>> I suspect I need to read more about how annotation is done in
>> practice, and about the various 'tab' formats.
>>
>>
>> Thanks all for help,
>> Dan.
>>
>> On 7 February 2012 20:54, seth redmond <[hidden email]> wrote:
>>> Dan,
>>>
>>> When arranging the VB system we largely sidestepped the problem of genotypes
>>> since we were already committed to supporting ensembl variation. When I last
>>> looked into it (around five months ago now), ens-var was more than capable
>>> of handling a large number of individuals  but at the cost of some curation
>>> if you want to make them readable - e.g. it's trivial enough to import VCFs,
>>> but there's as yet no facility for rearranging these into different
>>> population groupings without just reloading from different VCFs. Paul
>>> Derwent could tell you more about this (I've now left VB, but I could also
>>> dig around for my notes if this would be useful).
>>>
>>> The genotype module attached to phendiver is somewhat underdeveloped for our
>>> purposes and I think this would be a good target for those interested to aim
>>> at next. Personally the idea of tracking a full snp-chip or illumina run's
>>> worth of genotypes in phendiver unnerves me, but I'd be interested to hear
>>> how Naama got on with it? i.e. how many variant loci you're SOL deals with?
>>>
>>> Finally you may well have encountered these already, but looking through the
>>> list of things you need terms for, there are some general ontologies that a
>>> number of us are using for these: most obviously PATO for the phenotypes, UO
>>> for the measurement/units and GAZ (gazeteer) for the locations.
>>>
>>> Bob can fill you in better than I can on the ISA-tab loaders, but if there's
>>> anything I can help with let me know.
>>>
>>> -s
>>>
>>>
>>>
>>> --
>>> Seth Redmond
>>>  Unité Génetique et Génomique des Insectes Vecteurs
>>>  Institut Pasteur
>>>  28,rue du Dr Roux
>>>  75724 PARIS
>>> [hidden email]
>>>
>>> On 7 Feb 2012, at 14:25, Dan Bolser wrote:
>>>
>>> Hi Bob,
>>>
>>> We're looking to build (like everyone else) a scaleable data archive
>>> for large-scale genotyping / phenotyping projects. Currently there is
>>> no 'phenotype archive' or even phenotype file format (that I know of),
>>> and it seems to make sense to marry phenotype and genotype data in one
>>> place (and to do that in a 'standard' way wherever possible).
>>>
>>> I think it's important to look at all available solutions before
>>> making a decision about what to do and how to do it, Ensembl variation
>>> included.
>>>
>>> I've just been hearing about ISA-tab in the context of the new
>>> BioSamples database at the EBI, which is part of the solution, but I
>>> don't know much about those formats yet TBH.
>>>
>>> Currently, I'm thinking that we need to bring in ontologies, terms, or
>>> URIs to explain:
>>>
>>> 1) experiment
>>> 2) measurement
>>> 3) phenotype
>>> 4) attribute
>>> 5) environment
>>> 6) individual
>>>
>>> and combine one term for every 'value' recorded in the phenotyping
>>> database, and then link those to SNPs via the individual. Although the
>>> SNPs could be stored in Ensembl variation or VCF or GVF, the main
>>> issue is to keep track of the 'individual' via some, probably
>>> external, accession number. How many samples can you cram into Ensembl
>>> variation?
>>>
>>> If this is being done in phendiver / chado, I'm keen to learn a) how,
>>> b) who, and c) performance.
>>>
>>> I'm not part of the VectorBase project, but I guess Paul K is? It
>>> would perhaps be good to get in a room with Paul and you at some point
>>> (I'm free at most times, but have some specific appointments).
>>>
>>> This is work that were hoping to do in the near future, but we're not
>>> actively working on it yet. I'm trying to get as much background
>>> information as I can in the mean time.
>>>
>>>
>>> Cheers,
>>> Dan.
>>>
>>> On 7 February 2012 11:34, Bob MacCallum <[hidden email]> wrote:
>>>
>>> Hi Dan,
>>>
>>> I guess you considered Ensembl variation for the SNPs.  It would be
>>>
>>> interesting to hear why you're looking into other options.  At
>>>
>>> VectorBase we're hoping Ensembl will handle our genomic needs while
>>>
>>> Chado handles the more unpredictable experimental data, phenotypes and
>>>
>>> sample collection meta-data more flexibly.  Obviously we have to
>>>
>>> bridge the two but think (hope) that is doable.
>>>
>>> I'll be up at EBI Weds-Fri this week with the VectorBase all-hands
>>>
>>> meeting, so maybe we could meet?
>>>
>>> (I'll summarise for the mailing list as appropriate.)
>>>
>>> For the record, we have written a prototype ISA-Tab ->
>>>
>>> Bio::Chado::Schema -> Chado loader (and in the other direction a web
>>>
>>> service and AJAX-heavy web interface
>>>
>>> http://funcgen.vectorbase.org/PopulationBETA/), but it's likely to go
>>>
>>> through some changes and is regrettably quite VectorBase specific at
>>>
>>> the moment.
>>>
>>> cheers,
>>>
>>> Bob.
>>>
>>>
>>> On Sat, Feb 4, 2012 at 3:13 PM, Dan Bolser <[hidden email]> wrote:
>>>
>>> Thanks Naama,
>>>
>>>
>>> Looks like I need to read a lot! :-)
>>>
>>>
>>> I think a VCF or GVF loader would be a good project to help
>>>
>>> standardize usage. There was some talk on the VCF mailing list about
>>>
>>> including phenotype information per #SAMPLE in that format, which I
>>>
>>> think would be a big help too.
>>>
>>>
>>>
>>> Thanks again for the info and links,
>>>
>>> Dan.
>>>
>>>
>>> On 4 February 2012 02:40, Naama Menda <[hidden email]> wrote:
>>>
>>> hi Dan (Im CC'ing the gmod schema and phendiver. lists)
>>>
>>>
>>> the phenotype module has not been modified yet, except for adding a nullable
>>>
>>> 'name' field.
>>>
>>> I think it has been working out for most people, but the idea is to make it
>>>
>>> more normalized, like the rest of Chado's modules, eliminating the multiple
>>>
>>> columns linking to 'cvterm' ,
>>>
>>> and replacing with other linking tables that would provide more structured
>>>
>>> way for storing phenotypes, having only the phenotype measurement in the
>>>
>>> phenotype table, and factoring out the semantics.
>>>
>>> Some people probably disagree with this approach, and I think the main
>>>
>>> problem is the broad definition of a phenotype, and the multiple ways for
>>>
>>> storing a phenotype in Chado.
>>>
>>>
>>> We had long long discussions about post-composing  . You can read some notes
>>>
>>> here http://gmod.org/wiki/Chado_Natural_Diversity_Module/natdiv_schema_changes_call
>>>
>>> and here http://sourceforge.net/mailarchive/message.php?msg_id=27597482
>>>
>>>
>>> There are no formal loaders for Natural Diversity schema, simply because
>>>
>>> there are many different ways to load your custom data.
>>>
>>> You can see some examples from the data loaded in SGN here
>>>
>>> https://github.com/solgenomics/Phenome/blob/master/bin/loading_scripts/solcap/load_solcap_TA_phenotypes.pl
>>>
>>>
>>> As you can see this is very data-specific, and makes many assumptions on
>>>
>>> your experiment design and metadata.
>>>
>>>
>>> Likewise, there is no formal way for storing QTL data. In general, you want
>>>
>>> to store your accessions in the stock table, create a new nd_experiment for
>>>
>>> each measurement, and link it with the stock via nd_experiment_stock and to
>>>
>>> the genotype via nd_experiment_genotype.
>>>
>>> The genotype table is mostly a spaceholder for a genotype name, and links to
>>>
>>> the feature table where you can load a marker, or whatever feature you
>>>
>>> need.
>>>
>>>
>>> The Natural Diversity paper does not elaborate on storing genotypes since it
>>>
>>> describes how to store experimental data (genotyping or phenotyping) in a
>>>
>>> generic and re-usable way. Genotypes and phenotypes are stored in different
>>>
>>> modules, which link back to the ND schema.
>>>
>>> It's power is in the ability to go back and forth from a 'stock' to its
>>>
>>> genotyping and phenotyping data, or generate new stocks from
>>>
>>> pheno/genotyping experiments.
>>>
>>> The examples in the paper talk mostly about phenotypes, since these are much
>>>
>>> more complex to handle than genotypes.
>>>
>>>
>>> The schema is a bit complicated, but after a year of discussions the working
>>>
>>> group came up with this model which seems to be generic enough to accomodate
>>>
>>> all the use cases we brought up.
>>>
>>>
>>> You can add your use case to the ND module wiki page, and see from there how
>>>
>>> it fits in (http://gmod.org/wiki/Chado_Natural_Diversity_Module#Use_Cases)
>>>
>>> and get more feedback from the mailing list.
>>>
>>>
>>> Hope this makes things a bit clearer!
>>>
>>> -Naama
>>>
>>>
>>>
>>>
>>> On Fri, Feb 3, 2012 at 6:18 PM, Dan Bolser <[hidden email]> wrote:
>>>
>>>
>>> Hi,
>>>
>>>
>>> I'm not clear weather the update to the phenotype module has been done
>>>
>>> or not? In general, does the old phenotype stuff do what's needed?
>>>
>>>
>>> BTW, where can I read more about 'post-composed' terms? I tried
>>>
>>> Goggle, but couldn't find any good reference material... I'm
>>>
>>> interested in the examples of combining different ontologies into the
>>>
>>> description of a single value.
>>>
>>>
>>> Do you have VCF / GVF loaders written for the schema?
>>>
>>>
>>> I'm still not clear how 'QTL data' is stored... are the results of QTL
>>>
>>> algorithms just that wiggly line? i.e. trivial to store? (Sorry for
>>>
>>> confusion).
>>>
>>>
>>> I made some updates on the wiki page. I didn't realize it until I
>>>
>>> started to try to re-work it, but the abstract of the paper doesn't
>>>
>>> mention storing genotype data in the database. It focuses on
>>>
>>> describing 'experiments'. Perhaps you could take a look and improve
>>>
>>> what I wrote there:
>>>
>>>
>>> http://gmod.org/wiki/Chado_Natural_Diversity_Module_Working_Group
>>>
>>>
>>>
>>> In the mean time I'll join the mailing list for the working group.
>>>
>>>
>>>
>>> Thanks again for the infos.,
>>>
>>>
>>> Dan.
>>>
>>>
>>> On 3 February 2012 12:19, Isaak Yosief Tecle <[hidden email]> wrote:
>>>
>>> Hi Dan,
>>>
>>>
>>> SGN is now using a db schema called Natural Diversity to store genetic
>>>
>>> and phenotypic variation data. It is a GMOD/Chado schema developed by
>>>
>>> SGN and collaborators. It is also used by other multiple databases which
>>>
>>> you will find listed in the publication below.
>>>
>>>
>>> A documentation of the schema:
>>>
>>> http://database.oxfordjournals.org/content/2011/bar051.full
>>>
>>>
>>> Some info on the working group:
>>>
>>> http://gmod.org/wiki/Chado_Natural_Diversity_Module_Working_Group
>>>
>>>
>>> Mailing lists:
>>>
>>> https://lists.sourceforge.net/lists/listinfo/gmod-phendiver
>>>
>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>>
>>>
>>>
>>> Cheers,
>>>
>>> Isaak
>>>
>>>
>>>
>>>
>>> On 2/3/12 12:06 PM, Dan Bolser wrote:
>>>
>>> Hi guys,
>>>
>>>
>>> I'm now working at 'Ensembl Genomes' a project leader of the plants
>>>
>>> division. We're looking to develop plant bifx infrastructure as part
>>>
>>> of a grant called transPLANT. Part of the work involves building a
>>>
>>> 'variation archive' for plants, which is obviously related to strain
>>>
>>> phenotyping information, population studies, and therefore, derived
>>>
>>> QTL data...
>>>
>>>
>>> I was wondering how you store your QTL data, and if there are any
>>>
>>> standards, emerging or defined, that we should be thinking about?
>>>
>>>
>>> I think it will be really good if everything we develop from the
>>>
>>> Ensembl side is coordinated with equivalent developments coming from
>>>
>>> the Chado/GMoD side, so it would be really great to work with you (as
>>>
>>> Chado/GMoD users) to ensure that that happens.
>>>
>>>
>>>
>>> Cheers,
>>>
>>> Dan.
>>>
>>> .
>>>
>>>
>>>
>>> --
>>>
>>> -------------------------------------
>>>
>>> Isaak Yosief Tecle, PhD
>>>
>>>
>>> Bioinformatics Consultant
>>>
>>> to Sol Genomics Network
>>>
>>>
>>> Boyce Thompson Institute
>>>
>>> Cornell University
>>>
>>>
>>> http://sgn.cornell.edu
>>>
>>> -------------------------------------
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>>
>>> Cxgn-devel mailing list
>>>
>>> [hidden email]
>>>
>>> http://rubisco.sgn.cornell.edu/cgi-bin/mailman/listinfo/cxgn-devel
>>>
>>> _______________________________________________
>>>
>>> Cxgn-devel mailing list
>>>
>>> [hidden email]
>>>
>>> http://rubisco.sgn.cornell.edu/cgi-bin/mailman/listinfo/cxgn-devel
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>>
>>> Cxgn-devel mailing list
>>>
>>> [hidden email]
>>>
>>> http://rubisco.sgn.cornell.edu/cgi-bin/mailman/listinfo/cxgn-devel
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>>
>>> Try before you buy = See our experts in action!
>>>
>>> The most comprehensive online learning library for Microsoft developers
>>>
>>> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
>>>
>>> Metro Style Apps, more. Free future releases when you subscribe now!
>>>
>>> http://p.sf.net/sfu/learndevnow-dev2
>>>
>>> _______________________________________________
>>>
>>> Gmod-phendiver mailing list
>>>
>>> [hidden email]
>>>
>>> https://lists.sourceforge.net/lists/listinfo/gmod-phendiver
>>>
>>>
>>> ------------------------------------------------------------------------------
>>>
>>> Keep Your Developer Skills Current with LearnDevNow!
>>>
>>> The most comprehensive online learning library for Microsoft developers
>>>
>>> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
>>>
>>> Metro Style Apps, more. Free future releases when you subscribe now!
>>>
>>> http://p.sf.net/sfu/learndevnow-d2d
>>>
>>> _______________________________________________
>>>
>>> Gmod-phendiver mailing list
>>>
>>> [hidden email]
>>>
>>> https://lists.sourceforge.net/lists/listinfo/gmod-phendiver
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Keep Your Developer Skills Current with LearnDevNow!
>>> The most comprehensive online learning library for Microsoft developers
>>> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
>>> Metro Style Apps, more. Free future releases when you subscribe now!
>>> http://p.sf.net/sfu/learndevnow-d2d
>>> _______________________________________________
>>> Gmod-schema mailing list
>>> [hidden email]
>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Keep Your Developer Skills Current with LearnDevNow!
>>> The most comprehensive online learning library for Microsoft developers
>>> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
>>> Metro Style Apps, more. Free future releases when you subscribe now!
>>> http://p.sf.net/sfu/learndevnow-d2d
>>> _______________________________________________
>>> Gmod-phendiver mailing list
>>> [hidden email]
>>> https://lists.sourceforge.net/lists/listinfo/gmod-phendiver
>>>
>>
>> ------------------------------------------------------------------------------
>> Keep Your Developer Skills Current with LearnDevNow!
>> The most comprehensive online learning library for Microsoft developers
>> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
>> Metro Style Apps, more. Free future releases when you subscribe now!
>> http://p.sf.net/sfu/learndevnow-d2d
>> _______________________________________________
>> Gmod-schema mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>





------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
Gmod-phendiver mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-phendiver


------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
Gmod-phendiver mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-phendiver
Reply | Threaded
Open this post in threaded view
|

Re: [Gmod-phendiver] [Gmod-schema] [Cxgn-devel] Information about QTL formats?

Bob MacCallum
In reply to this post by Collett, James R-2
On Tue, Feb 7, 2012 at 7:27 PM, Collett, James R <[hidden email]> wrote:
> In my limited experience with Chado so far, I've found its data loaders to be slow.  Will the speed of data I/O in Chado be a bottleneck in the move from individual model organism data management to high-throughput resequencing/expression/phenotyping?

I don't have experience of a really high throughput setting, but I'd
be very surprised if the machines will be running 24/7 leaving no time
for loading the data.  Surely aligning RNAseq reads to make transcript
read counts or similar would be more compute intensive.

------------------------------------------------------------------------------
Virtualization & Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
_______________________________________________
Gmod-phendiver mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-phendiver