Re: [Genome Informatics] Application

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: [Genome Informatics] Application

Scott Cain
Hello Prystupa,

I cc'ed the schema (Chado) mailing list, as there may be people on
that list who'd like to contribute thoughts and ideas.  Also, I can't
really address your second suggested project about modeling
evolution--it's outside of the scope of what GMOD specifically does,
and I suspect it is outside the scope of all of the projects working
together on this GSoC application.

As for the first suggestion for speeding up Chado loading, have you
taken a look at Chado or have any experience with it?  If not (and
it's certainly not required), do you have experience with large
normalized databases?  That is also not strictly required, but would
certainly be helpful.  Chado[1] is a normalized database that can
store a wide variety of biological data types, and when loading
GFF3[2] (which contain DNA sequence features), it touches several
tables.  The most widely used method for loading GFF3 currently uses a
method of parsing the GFF3 file, creating fairly heavy weight BioPerl
FeatureIO[3] objects, querying the database repeatedly for different
items to ensure uniqueness, writing a bulk upload file, and then doing
the bulk upload.  Several of these steps are time consuming, and I
have some ideas about how each of them might be addressed, but haven't
had the time to investigate them.

If you are interested in pursuing this project, I would suggest
installing Chado and loading some GFF (I can help you find something
suitable) so that you have some experience going into the proposal.
Please let me know if you have questions or problems in this regard.

Thanks,
Scott

[1]http://gmod.org/wiki/Chado

[2]http://sequenceontology.org/resources/gff3.html

[3]http://www.bioperl.org/wiki/Module:Bio::FeatureIO


2012/3/17 Павло Приступа <[hidden email]>:

>    ·  Prystupa Pavlo Oleksandrovych [hidden email]
>
>
>
> ·                   Bachelor degree in Computer engineering at Chernigov
> State Technological University, Ukraine. Now I am expecting to get master
> degree there in a year. I have not got any relevant work experience yet.
>
>
>
> ·                   Currently I would like to get some experience in Perl,
> JavaScript, python and php.
>
>
>
> ·                   I wrote some simple utils and character device driver
> for academic purpose.
>
>
>
> ·                   During school years I was loving biology and getting
> only excellent marks. I even had wanted to go learning biology with
> chemistry, but soon realized that I had better stayed in Chernigov. So it
> was the best choice for me to enter this university and choose computer
> engineering.
>
>
>
> ·                   Now I am doing research in “Genetic algorithms in
> management systems”. It is my master degree work.
>
>
>
> ·                   I have two ideas on how to contribute to your community:
>
> 1)                Speed up Chado GFF3 loading.
>
> I am planning to use genetic algorithm query optimization technique. I can’t
> describe in detail but it will be possible as soon as I finish with my
> course project (end of May). The starting point of my research will be
> Kristin Bennett, Michael C. Ferris, Yannis E. Ioannidis. A Genetic Algorithm
> for Database Query Optimization. 1991 Proceedings of the Fourth
> International Conference on Genetic Algorithms. It will be much more easy to
> investigate, than
>
> 2)                Modeling evolution using genetic algorithm
>
> It would be great work to do, but only my enthusiasm probably will be
> insufficient to finish it in rational timeframe. My idea is to choose the
> most primitive genome (will be chosen at modeling stage because of
> C-paradox) as start point of our “evolution” and Homo sapiens as finish
> point (who knows, maybe this is erroneous assumption so it probably also
> could be found during modeling, but then we will be facing with additional
> problems when founding fitting function). The real problem would be to
> choose criteria of fitting function evaluation. Hope, this can be achieved
> combining genetic algorithms with fuzzy logic systems. So, it could be
> possible to find most significant parameters and its impact degree on signs
> of congenetive. For modeling it is vital to choose rational timeframe (i.e.
> the period between the most ancient known occurrence of organism similar to
> our start point and now divide by average lifetime of single DNA to find
> generation amount). Summary amount of our individuals in any population
> could be assumed as current biomass divided by average mass of single cell.
> It is due to the fact that even in case of accidental massive extinction
> free ecological niches would be recovered very quickly (if we assume solar
> energy and other factors were quite invariant during life evolution). But as
> we know such massive extinctions have had great impact on speed and quality
> of evolutional processes. So it seems rational to get i.e. 65 million years
> B.C. as starting point and based on our knowledge of species diversity of
> that time create initial population. Finally, based on most accurate
> evolution path found, we could train neural network to predict what we can
> expect next few million years.
>
> Moreover, if we are satisfied with results of modeling, it will prove that
> our knowledge of the laws of living matter is correct.
>
> Regarding the above, I can figure out quite pessimistic conclusion: it is
> almost impossible to restore the history that way because very large amount
> of random factors affect evolution and we can’t reverse cause and effect.
> But nevertheless, I hope it could answer plenty of questions.
>
>
>
> ·                   I can bring my time, cognitive and mental power to the
> team and I hope the team also would give me required guidance, assistance
> and patience.
>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Genome Informatics GSoC" group.
> To post to this group, send email to [hidden email].
> To unsubscribe from this group, send email to
> [hidden email].
> For more options, visit this group at
> http://groups.google.com/group/genome-informatics?hl=en.



--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research

------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here
http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema
Reply | Threaded
Open this post in threaded view
|

Re: [Genome Informatics] Application

Scott Cain
CC'ing to original submitter.

On Wed, Mar 21, 2012 at 11:12 AM, Scott Cain <[hidden email]> wrote:

> Hello Prystupa,
>
> I cc'ed the schema (Chado) mailing list, as there may be people on
> that list who'd like to contribute thoughts and ideas.  Also, I can't
> really address your second suggested project about modeling
> evolution--it's outside of the scope of what GMOD specifically does,
> and I suspect it is outside the scope of all of the projects working
> together on this GSoC application.
>
> As for the first suggestion for speeding up Chado loading, have you
> taken a look at Chado or have any experience with it?  If not (and
> it's certainly not required), do you have experience with large
> normalized databases?  That is also not strictly required, but would
> certainly be helpful.  Chado[1] is a normalized database that can
> store a wide variety of biological data types, and when loading
> GFF3[2] (which contain DNA sequence features), it touches several
> tables.  The most widely used method for loading GFF3 currently uses a
> method of parsing the GFF3 file, creating fairly heavy weight BioPerl
> FeatureIO[3] objects, querying the database repeatedly for different
> items to ensure uniqueness, writing a bulk upload file, and then doing
> the bulk upload.  Several of these steps are time consuming, and I
> have some ideas about how each of them might be addressed, but haven't
> had the time to investigate them.
>
> If you are interested in pursuing this project, I would suggest
> installing Chado and loading some GFF (I can help you find something
> suitable) so that you have some experience going into the proposal.
> Please let me know if you have questions or problems in this regard.
>
> Thanks,
> Scott
>
> [1]http://gmod.org/wiki/Chado
>
> [2]http://sequenceontology.org/resources/gff3.html
>
> [3]http://www.bioperl.org/wiki/Module:Bio::FeatureIO
>
>
> 2012/3/17 Павло Приступа <[hidden email]>:
>>    ·  Prystupa Pavlo Oleksandrovych [hidden email]
>>
>>
>>
>> ·                   Bachelor degree in Computer engineering at Chernigov
>> State Technological University, Ukraine. Now I am expecting to get master
>> degree there in a year. I have not got any relevant work experience yet.
>>
>>
>>
>> ·                   Currently I would like to get some experience in Perl,
>> JavaScript, python and php.
>>
>>
>>
>> ·                   I wrote some simple utils and character device driver
>> for academic purpose.
>>
>>
>>
>> ·                   During school years I was loving biology and getting
>> only excellent marks. I even had wanted to go learning biology with
>> chemistry, but soon realized that I had better stayed in Chernigov. So it
>> was the best choice for me to enter this university and choose computer
>> engineering.
>>
>>
>>
>> ·                   Now I am doing research in “Genetic algorithms in
>> management systems”. It is my master degree work.
>>
>>
>>
>> ·                   I have two ideas on how to contribute to your community:
>>
>> 1)                Speed up Chado GFF3 loading.
>>
>> I am planning to use genetic algorithm query optimization technique. I can’t
>> describe in detail but it will be possible as soon as I finish with my
>> course project (end of May). The starting point of my research will be
>> Kristin Bennett, Michael C. Ferris, Yannis E. Ioannidis. A Genetic Algorithm
>> for Database Query Optimization. 1991 Proceedings of the Fourth
>> International Conference on Genetic Algorithms. It will be much more easy to
>> investigate, than
>>
>> 2)                Modeling evolution using genetic algorithm
>>
>> It would be great work to do, but only my enthusiasm probably will be
>> insufficient to finish it in rational timeframe. My idea is to choose the
>> most primitive genome (will be chosen at modeling stage because of
>> C-paradox) as start point of our “evolution” and Homo sapiens as finish
>> point (who knows, maybe this is erroneous assumption so it probably also
>> could be found during modeling, but then we will be facing with additional
>> problems when founding fitting function). The real problem would be to
>> choose criteria of fitting function evaluation. Hope, this can be achieved
>> combining genetic algorithms with fuzzy logic systems. So, it could be
>> possible to find most significant parameters and its impact degree on signs
>> of congenetive. For modeling it is vital to choose rational timeframe (i.e.
>> the period between the most ancient known occurrence of organism similar to
>> our start point and now divide by average lifetime of single DNA to find
>> generation amount). Summary amount of our individuals in any population
>> could be assumed as current biomass divided by average mass of single cell.
>> It is due to the fact that even in case of accidental massive extinction
>> free ecological niches would be recovered very quickly (if we assume solar
>> energy and other factors were quite invariant during life evolution). But as
>> we know such massive extinctions have had great impact on speed and quality
>> of evolutional processes. So it seems rational to get i.e. 65 million years
>> B.C. as starting point and based on our knowledge of species diversity of
>> that time create initial population. Finally, based on most accurate
>> evolution path found, we could train neural network to predict what we can
>> expect next few million years.
>>
>> Moreover, if we are satisfied with results of modeling, it will prove that
>> our knowledge of the laws of living matter is correct.
>>
>> Regarding the above, I can figure out quite pessimistic conclusion: it is
>> almost impossible to restore the history that way because very large amount
>> of random factors affect evolution and we can’t reverse cause and effect.
>> But nevertheless, I hope it could answer plenty of questions.
>>
>>
>>
>> ·                   I can bring my time, cognitive and mental power to the
>> team and I hope the team also would give me required guidance, assistance
>> and patience.
>>
>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Genome Informatics GSoC" group.
>> To post to this group, send email to [hidden email].
>> To unsubscribe from this group, send email to
>> [hidden email].
>> For more options, visit this group at
>> http://groups.google.com/group/genome-informatics?hl=en.
>
>
>
> --
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   scott at scottcain dot net
> GMOD Coordinator (http://gmod.org/)                     216-392-3087
> Ontario Institute for Cancer Research



--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research

------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here
http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________
Gmod-schema mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/gmod-schema