[BioMart Users] how to get from ensembl main database schema to ensembl mart schema

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

[BioMart Users] how to get from ensembl main database schema to ensembl mart schema

andrea_bio
Hello

I was wondering if there were any documents showing how the ensembl
marts were created from the main ensembl databases. Specifically i was
hoping there were documents describing what tables were selected as main
tables for the marts and how the dimension tables were mapped to the
main tables.

As an example the ensembl_mart_61 contains a main table for human named
translation_main (this is an abbreviation of the name but its obvious
which one i mean) and this has a field called
protein_feature_prints_bool which is essentially a boolean field
indicating whether a protein translation is assocated with a row in the
PRINTS dimension table protein_feature_prints_dm. If the translation
does have a row in this dimension table then I am guessing it has a
PRINTS domain in it!

The core database itself however has a table called translation which
represents, well, a translation. Translations are linked to rows in a
table called 'protein_feature' which in turn has a foreign key called
analysis_id which links to an 'analysis' table with fields 'database'
and 'program'. So in this schema, a translation is associated with a
PRINTS annotation if it is linked to a 'protein_feature' record which is
in turn linked to an 'analysis' record with the text 'PRINTS' somewhere
in both/either the database/program fields.

I am interested in how the biomart software is configured with 'rules'
to create the mart schema from the database schema. Is there a
configuration file with these rules in that I could look at? Is there a
worked example? As an academic exercise I'd like to recreate the ensembl
marts. I have the biomart user manual but even with that document I do
not know how to recreate the ensembl marts

I am NOT specifically interested in protein domains. I used the PRINTS
example purely for illustrative purposes as I thought it was a
straightforward example. I am interested in how you specify the 'rules'
to get from a schema to a mart.

thanks a lot

_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: [BioMart Users] how to get from ensembl main database schema to ensembl mart schema

Arek Kasprzyk-2
Hi Andrea
All the transformation information is stored in the XML file that  
MBuilder (0.7) uses to compile it's DDL for Ensembl core databases. I  
am sure the ensembl mart team will be happy to provide you the latest  
version

a



On 2011-03-12, at 11:15, "Andrea Edwards" <[hidden email]> wrote:

> Hello
>
> I was wondering if there were any documents showing how the ensembl
> marts were created from the main ensembl databases. Specifically i was
> hoping there were documents describing what tables were selected as  
> main
> tables for the marts and how the dimension tables were mapped to the
> main tables.
>
> As an example the ensembl_mart_61 contains a main table for human  
> named
> translation_main (this is an abbreviation of the name but its obvious
> which one i mean) and this has a field called
> protein_feature_prints_bool which is essentially a boolean field
> indicating whether a protein translation is assocated with a row in  
> the
> PRINTS dimension table protein_feature_prints_dm. If the translation
> does have a row in this dimension table then I am guessing it has a
> PRINTS domain in it!
>
> The core database itself however has a table called translation which
> represents, well, a translation. Translations are linked to rows in a
> table called 'protein_feature' which in turn has a foreign key called
> analysis_id which links to an 'analysis' table with fields 'database'
> and 'program'. So in this schema, a translation is associated with a
> PRINTS annotation if it is linked to a 'protein_feature' record  
> which is
> in turn linked to an 'analysis' record with the text 'PRINTS'  
> somewhere
> in both/either the database/program fields.
>
> I am interested in how the biomart software is configured with 'rules'
> to create the mart schema from the database schema. Is there a
> configuration file with these rules in that I could look at? Is  
> there a
> worked example? As an academic exercise I'd like to recreate the  
> ensembl
> marts. I have the biomart user manual but even with that document I do
> not know how to recreate the ensembl marts
>
> I am NOT specifically interested in protein domains. I used the PRINTS
> example purely for illustrative purposes as I thought it was a
> straightforward example. I am interested in how you specify the  
> 'rules'
> to get from a schema to a mart.
>
> thanks a lot
>
> _______________________________________________
> Users mailing list
> [hidden email]
> https://lists.biomart.org/mailman/listinfo/users
_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: [BioMart Users] how to get from ensembl main database schema to ensembl mart schema

Joachim Baran-2
Hi!

> All the transformation information is stored in the XML file that
> MBuilder (0.7) uses to compile it's DDL for Ensembl core databases.
  I am not sure if this is related to what you are trying to do, but if you want to create and extend the core Ensembl marts for a BioMart 0.7, then you can also have a look at the following links (both are non-OICR contributions -- so there is no official support from the BioMart team):

  * http://bergmanlab.smith.man.ac.uk/?p=35 for creating and extending Ensembl marts manually
  * https://github.com/joejimbo/MartScript for creating and extending Ensembl marts automatically

  Now, I could swear that I had some examples for MartScript, but I cannot find them anymore. If you are really after creating an Ensembl mart for a BioMart 0.7 installation, then I can send you the MartScripts that I used to create www.pubmed2ensembl.org. In any case, you need the XML-files from Ensembl as Arek said beforehand.

Joachim
_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: [BioMart Users] how to get from ensembl main database schema to ensembl mart schema

Arek Kasprzyk-2
In reply to this post by Arek Kasprzyk-2
Putting this back on the list to keep everyone else in the loop

a



On 2011-03-12, at 13:56, "Arek Kasprzyk" <[hidden email]>  
wrote:

> If you are starting from scratch it would be much better to start with
> 0.8 rc5. Creating new mart is as simple as choosing one or more main
> tables in the source schema. You can choose different tables and
> create different datasets. There is some documentation about it in
> rc5. If you want to know how the transformation algorithm works I can
> describe that to you too
>
>
> a
>
>
>
> On 2011-03-12, at 12:53, "Andrea Edwards" <[hidden email]>  
> wrote:
>
>> ok - thanks
>>
>> i don't know much about biomart as you can probably tell but i was
>> told
>> there are quite significant differences between 0.7 and 0.8.
>> If i am interested in understanding how the schema transformations
>> take
>> place so that I can design my own mart and integrate it with existing
>> marts, would i be better dropping back to 0.7? I'm keen to get a
>> mart up
>> and running very soon.
>>
>> On 12/03/2011 17:41, Arek Kasprzyk wrote:
>>> 0.8 rc 5 has still only rudimentary support for the MBuilder
>>> component. You will not be able to read 0.7 mbuilder XML with it.
>>> (ccing junjun who  has just taken over the coordination of the
>>> BioMart
>>> development to let him know that such discussions are taking place)
>>>
>>> a
>>>
>>>
>>>
>>> On 2011-03-12, at 12:28, "Andrea Edwards"<[hidden email]>
>>> wrote:
>>>
>>>> Brilliant - thanks for such a prompt reply.
>>>>
>>>> I note that you say MBuilder (0.7) whereas i have checked out the
>>>> code
>>>> for biomart 0.8 rc4
>>>>
>>>>
>>>> On 12/03/2011 16:39, Arek Kasprzyk wrote:
>>>>> Hi Andrea
>>>>> All the transformation information is stored in the XML file that
>>>>> MBuilder (0.7) uses to compile it's DDL for Ensembl core
>>>>> databases. I
>>>>> am sure the ensembl mart team will be happy to provide you the
>>>>> latest
>>>>> version
>>>>>
>>>>> a
>>>>>
>>>>>
>>>>>
>>>>> On 2011-03-12, at 11:15, "Andrea Edwards"<[hidden email]>
>>>>> wrote:
>>>>>
>>>>>> Hello
>>>>>>
>>>>>> I was wondering if there were any documents showing how the
>>>>>> ensembl
>>>>>> marts were created from the main ensembl databases.  
>>>>>> Specifically i
>>>>>> was
>>>>>> hoping there were documents describing what tables were selected
>>>>>> as
>>>>>> main
>>>>>> tables for the marts and how the dimension tables were mapped to
>>>>>> the
>>>>>> main tables.
>>>>>>
>>>>>> As an example the ensembl_mart_61 contains a main table for human
>>>>>> named
>>>>>> translation_main (this is an abbreviation of the name but its
>>>>>> obvious
>>>>>> which one i mean) and this has a field called
>>>>>> protein_feature_prints_bool which is essentially a boolean field
>>>>>> indicating whether a protein translation is assocated with a row
>>>>>> in
>>>>>> the
>>>>>> PRINTS dimension table protein_feature_prints_dm. If the
>>>>>> translation
>>>>>> does have a row in this dimension table then I am guessing it
>>>>>> has a
>>>>>> PRINTS domain in it!
>>>>>>
>>>>>> The core database itself however has a table called translation
>>>>>> which
>>>>>> represents, well, a translation. Translations are linked to rows
>>>>>> in a
>>>>>> table called 'protein_feature' which in turn has a foreign key
>>>>>> called
>>>>>> analysis_id which links to an 'analysis' table with fields
>>>>>> 'database'
>>>>>> and 'program'. So in this schema, a translation is associated
>>>>>> with a
>>>>>> PRINTS annotation if it is linked to a 'protein_feature' record
>>>>>> which is
>>>>>> in turn linked to an 'analysis' record with the text 'PRINTS'
>>>>>> somewhere
>>>>>> in both/either the database/program fields.
>>>>>>
>>>>>> I am interested in how the biomart software is configured with
>>>>>> 'rules'
>>>>>> to create the mart schema from the database schema. Is there a
>>>>>> configuration file with these rules in that I could look at? Is
>>>>>> there a
>>>>>> worked example? As an academic exercise I'd like to recreate the
>>>>>> ensembl
>>>>>> marts. I have the biomart user manual but even with that document
>>>>>> I do
>>>>>> not know how to recreate the ensembl marts
>>>>>>
>>>>>> I am NOT specifically interested in protein domains. I used the
>>>>>> PRINTS
>>>>>> example purely for illustrative purposes as I thought it was a
>>>>>> straightforward example. I am interested in how you specify the
>>>>>> 'rules'
>>>>>> to get from a schema to a mart.
>>>>>>
>>>>>> thanks a lot
>>>>>>
>>>>>> _______________________________________________
>>>>>> Users mailing list
>>>>>> [hidden email]
>>>>>> https://lists.biomart.org/mailman/listinfo/users
>>
_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: [BioMart Users] how to get from ensembl main database schema to ensembl mart schema

andrea_bio
Hi

I am not concerned whether I use biomart 0.7 or 0.8 - whichever is
easiest for what I would like to do. I havent done anything yet and I'm
starting from scratch.

All i want to do is have a go at re-creating the ensembl mart from the
ensembl core databases. I wanted to do this because ensembl is an
example of a database whose schema I am familiar with and whose mart I
have used. I wanted to do this for 2 reasons:
a) to get some practice
b) to get an intuitition of what type of mart I can create from my own
database schema and what types of query I can run and what the
filters/attributes will be
c) get an idea of how i could integrate my database with ensembl as I
believe they only need to share ids or underlying assembly to b integrated

Will i be able to recreate the ensembl mart in biomart 0.8? I presume
the ensembl xml files are available for 0.7 and I won;t be able to read
them in 0.8? Without these files how will i know the exact steps ensembl
used to specify their mart structure? How will i know what main tables
they chose or how for example they created the PRINTS dimension table
mentioned in my original query?

Thanks a lot


On 12/03/2011 18:59, Arek Kasprzyk wrote:

> Putting this back on the list to keep everyone else in the loop
>
> a
>
>
>
> On 2011-03-12, at 13:56, "Arek Kasprzyk"<[hidden email]>
> wrote:
>
>> If you are starting from scratch it would be much better to start with
>> 0.8 rc5. Creating new mart is as simple as choosing one or more main
>> tables in the source schema. You can choose different tables and
>> create different datasets. There is some documentation about it in
>> rc5. If you want to know how the transformation algorithm works I can
>> describe that to you too
>>
>>
>> a
>>
>>
>>
>> On 2011-03-12, at 12:53, "Andrea Edwards"<[hidden email]>
>> wrote:
>>
>>> ok - thanks
>>>
>>> i don't know much about biomart as you can probably tell but i was
>>> told
>>> there are quite significant differences between 0.7 and 0.8.
>>> If i am interested in understanding how the schema transformations
>>> take
>>> place so that I can design my own mart and integrate it with existing
>>> marts, would i be better dropping back to 0.7? I'm keen to get a
>>> mart up
>>> and running very soon.
>>>
>>> On 12/03/2011 17:41, Arek Kasprzyk wrote:
>>>> 0.8 rc 5 has still only rudimentary support for the MBuilder
>>>> component. You will not be able to read 0.7 mbuilder XML with it.
>>>> (ccing junjun who  has just taken over the coordination of the
>>>> BioMart
>>>> development to let him know that such discussions are taking place)
>>>>
>>>> a
>>>>
>>>>
>>>>
>>>> On 2011-03-12, at 12:28, "Andrea Edwards"<[hidden email]>
>>>> wrote:
>>>>
>>>>> Brilliant - thanks for such a prompt reply.
>>>>>
>>>>> I note that you say MBuilder (0.7) whereas i have checked out the
>>>>> code
>>>>> for biomart 0.8 rc4
>>>>>
>>>>>
>>>>> On 12/03/2011 16:39, Arek Kasprzyk wrote:
>>>>>> Hi Andrea
>>>>>> All the transformation information is stored in the XML file that
>>>>>> MBuilder (0.7) uses to compile it's DDL for Ensembl core
>>>>>> databases. I
>>>>>> am sure the ensembl mart team will be happy to provide you the
>>>>>> latest
>>>>>> version
>>>>>>
>>>>>> a
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 2011-03-12, at 11:15, "Andrea Edwards"<[hidden email]>
>>>>>> wrote:
>>>>>>
>>>>>>> Hello
>>>>>>>
>>>>>>> I was wondering if there were any documents showing how the
>>>>>>> ensembl
>>>>>>> marts were created from the main ensembl databases.
>>>>>>> Specifically i
>>>>>>> was
>>>>>>> hoping there were documents describing what tables were selected
>>>>>>> as
>>>>>>> main
>>>>>>> tables for the marts and how the dimension tables were mapped to
>>>>>>> the
>>>>>>> main tables.
>>>>>>>
>>>>>>> As an example the ensembl_mart_61 contains a main table for human
>>>>>>> named
>>>>>>> translation_main (this is an abbreviation of the name but its
>>>>>>> obvious
>>>>>>> which one i mean) and this has a field called
>>>>>>> protein_feature_prints_bool which is essentially a boolean field
>>>>>>> indicating whether a protein translation is assocated with a row
>>>>>>> in
>>>>>>> the
>>>>>>> PRINTS dimension table protein_feature_prints_dm. If the
>>>>>>> translation
>>>>>>> does have a row in this dimension table then I am guessing it
>>>>>>> has a
>>>>>>> PRINTS domain in it!
>>>>>>>
>>>>>>> The core database itself however has a table called translation
>>>>>>> which
>>>>>>> represents, well, a translation. Translations are linked to rows
>>>>>>> in a
>>>>>>> table called 'protein_feature' which in turn has a foreign key
>>>>>>> called
>>>>>>> analysis_id which links to an 'analysis' table with fields
>>>>>>> 'database'
>>>>>>> and 'program'. So in this schema, a translation is associated
>>>>>>> with a
>>>>>>> PRINTS annotation if it is linked to a 'protein_feature' record
>>>>>>> which is
>>>>>>> in turn linked to an 'analysis' record with the text 'PRINTS'
>>>>>>> somewhere
>>>>>>> in both/either the database/program fields.
>>>>>>>
>>>>>>> I am interested in how the biomart software is configured with
>>>>>>> 'rules'
>>>>>>> to create the mart schema from the database schema. Is there a
>>>>>>> configuration file with these rules in that I could look at? Is
>>>>>>> there a
>>>>>>> worked example? As an academic exercise I'd like to recreate the
>>>>>>> ensembl
>>>>>>> marts. I have the biomart user manual but even with that document
>>>>>>> I do
>>>>>>> not know how to recreate the ensembl marts
>>>>>>>
>>>>>>> I am NOT specifically interested in protein domains. I used the
>>>>>>> PRINTS
>>>>>>> example purely for illustrative purposes as I thought it was a
>>>>>>> straightforward example. I am interested in how you specify the
>>>>>>> 'rules'
>>>>>>> to get from a schema to a mart.
>>>>>>>
>>>>>>> thanks a lot
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Users mailing list
>>>>>>> [hidden email]
>>>>>>> https://lists.biomart.org/mailman/listinfo/users

_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: [BioMart Users] how to get from ensembl main database schema to ensembl mart schema

Arek Kasprzyk-2

Hi Andrea,
Ok, I have a better idea now what you want. The situation is as follows: 0.8 rc5 has automated and integrated nicely a lot of workflows needed to create a new mart from a source schema. However the particular Ensembl core transformation is a very complex one and rc5 still has only a rudimentary support for that. If you just want to have an idea how the algorithm works it is better to start with a simpler use case not Ensembl mart . It will be difficult to  recreate the exact Ensembl mart transformation from scratch for two reasons: rc5 has still a very rudimentary support for this pariticular schema so you will not get far. The 0.7 fully supports it but thereis  a large number of 'tweaks aka hacks' to the transformation algorithm to get certain things to work so you will find difficult to recreate a lot of them.
I would advise you to play with any schema to get a few datasets to work (ensembl core schema is fine to play with too). If you want to build new marts and integrate them with ensembl definitely go for rc5 and treat the existing ensembl  mart as a black box, the software will provide the means to integrate it nicely through a backwards compatibility mechanism with your newly created mart.This is much easier in 0.8 rc5. If you however just want to see how the ensembl mart transformation is achieved exactly you will need a 0.7 XML transformation file from the Ensembl team.

FYI: A short description of the basic transformation algorithm below:

starting from one or more input “candidate” table, the software finds the largest set of table joins it can perform using only 1:1 and many-to-one (M:1) relations, and merges these tables together to create  the main table. Multiple candidate tables can be given as input, in which case the algorithm creates main tables out of each selected candidate table and if unable to do so will create several separate datasets. Once the main tables are completed, if there is a 1:M relation between them they become main and sub-main tables. If there is now 1:M relation between them, they are split into separate datasets. Any tables that have a 1:M or many-to-many (M:N) relation with the newly-created main table or sub-main table are made into independent dimension tables. 


Please let us know if you have any more questions,
a

Arek Kasprzyk
Director, Bioinformatics Operations and Principal Investigator

Ontario Institute for Cancer Research
MaRS Centre, South Tower
101 College Street, Suite 800
Toronto, Ontario, Canada M5G 0A3
     
Tel:       416-673-8559
Toll-free:           1-866-678-6427
www.oicr.on.ca


Administrative Assistant: [hidden email]

 

This message and any attachments may contain confidential and/or privileged information for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this message in error, please contact the sender and delete all copies. Opinions, conclusions or other information contained in this message may not be that of the organization.


From: Andrea Edwards <[hidden email]>
Date: Sat, 12 Mar 2011 14:13:24 -0500
To: "[hidden email]" <[hidden email]>
Subject: Re: [BioMart Users] how to get from ensembl main database schema to ensembl mart schema

Hi

I am not concerned whether I use biomart 0.7 or 0.8 - whichever is
easiest for what I would like to do. I havent done anything yet and I'm
starting from scratch.

All i want to do is have a go at re-creating the ensembl mart from the
ensembl core databases. I wanted to do this because ensembl is an
example of a database whose schema I am familiar with and whose mart I
have used. I wanted to do this for 2 reasons:
a) to get some practice
b) to get an intuitition of what type of mart I can create from my own
database schema and what types of query I can run and what the
filters/attributes will be
c) get an idea of how i could integrate my database with ensembl as I
believe they only need to share ids or underlying assembly to b integrated

Will i be able to recreate the ensembl mart in biomart 0.8? I presume
the ensembl xml files are available for 0.7 and I won;t be able to read
them in 0.8? Without these files how will i know the exact steps ensembl
used to specify their mart structure? How will i know what main tables
they chose or how for example they created the PRINTS dimension table
mentioned in my original query?

Thanks a lot


On 12/03/2011 18:59, Arek Kasprzyk wrote:
Putting this back on the list to keep everyone else in the loop

a



On 2011-03-12, at 13:56, "Arek Kasprzyk"<[hidden email]>
wrote:

If you are starting from scratch it would be much better to start with
0.8 rc5. Creating new mart is as simple as choosing one or more main
tables in the source schema. You can choose different tables and
create different datasets. There is some documentation about it in
rc5. If you want to know how the transformation algorithm works I can
describe that to you too


a



On 2011-03-12, at 12:53, "Andrea Edwards"<[hidden email]>
wrote:

ok - thanks

i don't know much about biomart as you can probably tell but i was
told
there are quite significant differences between 0.7 and 0.8.
If i am interested in understanding how the schema transformations
take
place so that I can design my own mart and integrate it with existing
marts, would i be better dropping back to 0.7? I'm keen to get a
mart up
and running very soon.

On 12/03/2011 17:41, Arek Kasprzyk wrote:
0.8 rc 5 has still only rudimentary support for the MBuilder
component. You will not be able to read 0.7 mbuilder XML with it.
(ccing junjun who  has just taken over the coordination of the
BioMart
development to let him know that such discussions are taking place)

a



On 2011-03-12, at 12:28, "Andrea Edwards"<[hidden email]>
wrote:

Brilliant - thanks for such a prompt reply.

I note that you say MBuilder (0.7) whereas i have checked out the
code
for biomart 0.8 rc4


On 12/03/2011 16:39, Arek Kasprzyk wrote:
Hi Andrea
All the transformation information is stored in the XML file that
MBuilder (0.7) uses to compile it's DDL for Ensembl core
databases. I
am sure the ensembl mart team will be happy to provide you the
latest
version

a



On 2011-03-12, at 11:15, "Andrea Edwards"<[hidden email]>
wrote:

Hello

I was wondering if there were any documents showing how the
ensembl
marts were created from the main ensembl databases.
Specifically i
was
hoping there were documents describing what tables were selected
as
main
tables for the marts and how the dimension tables were mapped to
the
main tables.

As an example the ensembl_mart_61 contains a main table for human
named
translation_main (this is an abbreviation of the name but its
obvious
which one i mean) and this has a field called
protein_feature_prints_bool which is essentially a boolean field
indicating whether a protein translation is assocated with a row
in
the
PRINTS dimension table protein_feature_prints_dm. If the
translation
does have a row in this dimension table then I am guessing it
has a
PRINTS domain in it!

The core database itself however has a table called translation
which
represents, well, a translation. Translations are linked to rows
in a
table called 'protein_feature' which in turn has a foreign key
called
analysis_id which links to an 'analysis' table with fields
'database'
and 'program'. So in this schema, a translation is associated
with a
PRINTS annotation if it is linked to a 'protein_feature' record
which is
in turn linked to an 'analysis' record with the text 'PRINTS'
somewhere
in both/either the database/program fields.

I am interested in how the biomart software is configured with
'rules'
to create the mart schema from the database schema. Is there a
configuration file with these rules in that I could look at? Is
there a
worked example? As an academic exercise I'd like to recreate the
ensembl
marts. I have the biomart user manual but even with that document
I do
not know how to recreate the ensembl marts

I am NOT specifically interested in protein domains. I used the
PRINTS
example purely for illustrative purposes as I thought it was a
straightforward example. I am interested in how you specify the
'rules'
to get from a schema to a mart.

thanks a lot

_______________________________________________
Users mailing list

_______________________________________________
Users mailing list


_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: [BioMart Users] how to get from ensembl main database schema to ensembl mart schema

Rhoda Kinsella
Hi Andrea
Please find attached the xml files used to create 6 of the 7 Ensembl marts for release 61 (the sequence mart is created using a script so there is no xml to send). The xmls are for the four visible marts on the interface (Ensembl Gene, Ensembl Variation, Ensembl Regulation and Vega mart) as well as two marts that the user does not see on the mart interface, but which are accessed via the visible marts (ontology mart, genomic features mart). We are currently using Biomart version 0.7. As Arek mentioned,  the Ensembl marts are quite complex and we have a few code hacks in place as well as pre and post build patches to run in order to create the databases. Therefore it is probably best if you use the martbuilder tool to see how the mart schemas are created from the core databases and then play with a simple schema and see how you get on. 
Kind regards
Rhoda






On 12 Mar 2011, at 19:45, Arek Kasprzyk wrote:


Hi Andrea,
Ok, I have a better idea now what you want. The situation is as follows: 0.8 rc5 has automated and integrated nicely a lot of workflows needed to create a new mart from a source schema. However the particular Ensembl core transformation is a very complex one and rc5 still has only a rudimentary support for that. If you just want to have an idea how the algorithm works it is better to start with a simpler use case not Ensembl mart . It will be difficult to  recreate the exact Ensembl mart transformation from scratch for two reasons: rc5 has still a very rudimentary support for this pariticular schema so you will not get far. The 0.7 fully supports it but thereis  a large number of 'tweaks aka hacks' to the transformation algorithm to get certain things to work so you will find difficult to recreate a lot of them.
I would advise you to play with any schema to get a few datasets to work (ensembl core schema is fine to play with too). If you want to build new marts and integrate them with ensembl definitely go for rc5 and treat the existing ensembl  mart as a black box, the software will provide the means to integrate it nicely through a backwards compatibility mechanism with your newly created mart.This is much easier in 0.8 rc5. If you however just want to see how the ensembl mart transformation is achieved exactly you will need a 0.7 XML transformation file from the Ensembl team.

FYI: A short description of the basic transformation algorithm below:

starting from one or more input “candidate” table, the software finds the largest set of table joins it can perform using only 1:1 and many-to-one (M:1) relations, and merges these tables together to create  the main table. Multiple candidate tables can be given as input, in which case the algorithm creates main tables out of each selected candidate table and if unable to do so will create several separate datasets. Once the main tables are completed, if there is a 1:M relation between them they become main and sub-main tables. If there is now 1:M relation between them, they are split into separate datasets. Any tables that have a 1:M or many-to-many (M:N) relation with the newly-created main table or sub-main table are made into independent dimension tables. 


Please let us know if you have any more questions,
a

Arek Kasprzyk
Director, Bioinformatics Operations and Principal Investigator

Ontario Institute for Cancer Research
MaRS Centre, South Tower
101 College Street, Suite 800
Toronto, Ontario, Canada M5G 0A3
     
Tel:       416-673-8559
Toll-free:           1-866-678-6427
www.oicr.on.ca


Administrative Assistant: [hidden email]
 
This message and any attachments may contain confidential and/or privileged information for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this message in error, please contact the sender and delete all copies. Opinions, conclusions or other information contained in this message may not be that of the organization.

From: Andrea Edwards <[hidden email]>
Date: Sat, 12 Mar 2011 14:13:24 -0500
To: "[hidden email]" <[hidden email]>
Subject: Re: [BioMart Users] how to get from ensembl main database schema to ensembl mart schema

Hi

I am not concerned whether I use biomart 0.7 or 0.8 - whichever is
easiest for what I would like to do. I havent done anything yet and I'm
starting from scratch.

All i want to do is have a go at re-creating the ensembl mart from the
ensembl core databases. I wanted to do this because ensembl is an
example of a database whose schema I am familiar with and whose mart I
have used. I wanted to do this for 2 reasons:
a) to get some practice
b) to get an intuitition of what type of mart I can create from my own
database schema and what types of query I can run and what the
filters/attributes will be
c) get an idea of how i could integrate my database with ensembl as I
believe they only need to share ids or underlying assembly to b integrated

Will i be able to recreate the ensembl mart in biomart 0.8? I presume
the ensembl xml files are available for 0.7 and I won;t be able to read
them in 0.8? Without these files how will i know the exact steps ensembl
used to specify their mart structure? How will i know what main tables
they chose or how for example they created the PRINTS dimension table
mentioned in my original query?

Thanks a lot


On 12/03/2011 18:59, Arek Kasprzyk wrote:
Putting this back on the list to keep everyone else in the loop

a



On 2011-03-12, at 13:56, "Arek Kasprzyk"<[hidden email]>
wrote:

If you are starting from scratch it would be much better to start with
0.8 rc5. Creating new mart is as simple as choosing one or more main
tables in the source schema. You can choose different tables and
create different datasets. There is some documentation about it in
rc5. If you want to know how the transformation algorithm works I can
describe that to you too


a



On 2011-03-12, at 12:53, "Andrea Edwards"<[hidden email]>
wrote:

ok - thanks

i don't know much about biomart as you can probably tell but i was
told
there are quite significant differences between 0.7 and 0.8.
If i am interested in understanding how the schema transformations
take
place so that I can design my own mart and integrate it with existing
marts, would i be better dropping back to 0.7? I'm keen to get a
mart up
and running very soon.

On 12/03/2011 17:41, Arek Kasprzyk wrote:
0.8 rc 5 has still only rudimentary support for the MBuilder
component. You will not be able to read 0.7 mbuilder XML with it.
(ccing junjun who  has just taken over the coordination of the
BioMart
development to let him know that such discussions are taking place)

a



On 2011-03-12, at 12:28, "Andrea Edwards"<[hidden email]>
wrote:

Brilliant - thanks for such a prompt reply.

I note that you say MBuilder (0.7) whereas i have checked out the
code
for biomart 0.8 rc4


On 12/03/2011 16:39, Arek Kasprzyk wrote:
Hi Andrea
All the transformation information is stored in the XML file that
MBuilder (0.7) uses to compile it's DDL for Ensembl core
databases. I
am sure the ensembl mart team will be happy to provide you the
latest
version

a



On 2011-03-12, at 11:15, "Andrea Edwards"<[hidden email]>
wrote:

Hello

I was wondering if there were any documents showing how the
ensembl
marts were created from the main ensembl databases.
Specifically i
was
hoping there were documents describing what tables were selected
as
main
tables for the marts and how the dimension tables were mapped to
the
main tables.

As an example the ensembl_mart_61 contains a main table for human
named
translation_main (this is an abbreviation of the name but its
obvious
which one i mean) and this has a field called
protein_feature_prints_bool which is essentially a boolean field
indicating whether a protein translation is assocated with a row
in
the
PRINTS dimension table protein_feature_prints_dm. If the
translation
does have a row in this dimension table then I am guessing it
has a
PRINTS domain in it!

The core database itself however has a table called translation
which
represents, well, a translation. Translations are linked to rows
in a
table called 'protein_feature' which in turn has a foreign key
called
analysis_id which links to an 'analysis' table with fields
'database'
and 'program'. So in this schema, a translation is associated
with a
PRINTS annotation if it is linked to a 'protein_feature' record
which is
in turn linked to an 'analysis' record with the text 'PRINTS'
somewhere
in both/either the database/program fields.

I am interested in how the biomart software is configured with
'rules'
to create the mart schema from the database schema. Is there a
configuration file with these rules in that I could look at? Is
there a
worked example? As an academic exercise I'd like to recreate the
ensembl
marts. I have the biomart user manual but even with that document
I do
not know how to recreate the ensembl marts

I am NOT specifically interested in protein domains. I used the
PRINTS
example purely for illustrative purposes as I thought it was a
straightforward example. I am interested in how you specify the
'rules'
to get from a schema to a mart.

thanks a lot

_______________________________________________
Users mailing list

_______________________________________________
Users mailing list

_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users

Rhoda Kinsella Ph.D.
Ensembl Bioinformatician,
European Bioinformatics Institute (EMBL-EBI),
Wellcome Trust Genome Campus, 
Hinxton
Cambridge CB10 1SD,
UK.


_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users

Andrea_xmls.zip (172K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [BioMart Users] how to get from ensembl main database schema to ensembl mart schema

Rhoda Kinsella
In reply to this post by Arek Kasprzyk-2
Hi Andrea,
Please find attached a link to the xml files used to create 6 of the 7 Ensembl marts for release 61 (the sequence mart is created using a script so there is no xml to send). The xmls are for the four visible marts on the interface (Ensembl Gene, Ensembl Variation, Ensembl Regulation and Vega mart) as well as two marts that the user does not see on the mart interface, but which are accessed via the visible marts (ontology mart, genomic features mart). We are currently using Biomart version 0.7. As Arek mentioned,  the Ensembl marts are quite complex and we have a few code hacks in place as well as pre and post build patches to run in order to create the databases. Therefore it is probably best if you use the martbuilder tool to see how the mart schemas are created from the core databases and then play with a simple schema and see how you get on. 
Kind regards
Rhoda

The xml files can be found here:



On 12 Mar 2011, at 19:45, Arek Kasprzyk wrote:


Hi Andrea,
Ok, I have a better idea now what you want. The situation is as follows: 0.8 rc5 has automated and integrated nicely a lot of workflows needed to create a new mart from a source schema. However the particular Ensembl core transformation is a very complex one and rc5 still has only a rudimentary support for that. If you just want to have an idea how the algorithm works it is better to start with a simpler use case not Ensembl mart . It will be difficult to  recreate the exact Ensembl mart transformation from scratch for two reasons: rc5 has still a very rudimentary support for this pariticular schema so you will not get far. The 0.7 fully supports it but thereis  a large number of 'tweaks aka hacks' to the transformation algorithm to get certain things to work so you will find difficult to recreate a lot of them.
I would advise you to play with any schema to get a few datasets to work (ensembl core schema is fine to play with too). If you want to build new marts and integrate them with ensembl definitely go for rc5 and treat the existing ensembl  mart as a black box, the software will provide the means to integrate it nicely through a backwards compatibility mechanism with your newly created mart.This is much easier in 0.8 rc5. If you however just want to see how the ensembl mart transformation is achieved exactly you will need a 0.7 XML transformation file from the Ensembl team.

FYI: A short description of the basic transformation algorithm below:

starting from one or more input “candidate” table, the software finds the largest set of table joins it can perform using only 1:1 and many-to-one (M:1) relations, and merges these tables together to create  the main table. Multiple candidate tables can be given as input, in which case the algorithm creates main tables out of each selected candidate table and if unable to do so will create several separate datasets. Once the main tables are completed, if there is a 1:M relation between them they become main and sub-main tables. If there is now 1:M relation between them, they are split into separate datasets. Any tables that have a 1:M or many-to-many (M:N) relation with the newly-created main table or sub-main table are made into independent dimension tables. 


Please let us know if you have any more questions,
a

Arek Kasprzyk
Director, Bioinformatics Operations and Principal Investigator

Ontario Institute for Cancer Research
MaRS Centre, South Tower
101 College Street, Suite 800
Toronto, Ontario, Canada M5G 0A3
     
Tel:       416-673-8559
Toll-free:           1-866-678-6427
www.oicr.on.ca


Administrative Assistant: [hidden email]
 
This message and any attachments may contain confidential and/or privileged information for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this message in error, please contact the sender and delete all copies. Opinions, conclusions or other information contained in this message may not be that of the organization.

From: Andrea Edwards <[hidden email]>
Date: Sat, 12 Mar 2011 14:13:24 -0500
To: "[hidden email]" <[hidden email]>
Subject: Re: [BioMart Users] how to get from ensembl main database schema to ensembl mart schema

Hi

I am not concerned whether I use biomart 0.7 or 0.8 - whichever is
easiest for what I would like to do. I havent done anything yet and I'm
starting from scratch.

All i want to do is have a go at re-creating the ensembl mart from the
ensembl core databases. I wanted to do this because ensembl is an
example of a database whose schema I am familiar with and whose mart I
have used. I wanted to do this for 2 reasons:
a) to get some practice
b) to get an intuitition of what type of mart I can create from my own
database schema and what types of query I can run and what the
filters/attributes will be
c) get an idea of how i could integrate my database with ensembl as I
believe they only need to share ids or underlying assembly to b integrated

Will i be able to recreate the ensembl mart in biomart 0.8? I presume
the ensembl xml files are available for 0.7 and I won;t be able to read
them in 0.8? Without these files how will i know the exact steps ensembl
used to specify their mart structure? How will i know what main tables
they chose or how for example they created the PRINTS dimension table
mentioned in my original query?

Thanks a lot


On 12/03/2011 18:59, Arek Kasprzyk wrote:
Putting this back on the list to keep everyone else in the loop

a



On 2011-03-12, at 13:56, "Arek Kasprzyk"<[hidden email]>
wrote:

If you are starting from scratch it would be much better to start with
0.8 rc5. Creating new mart is as simple as choosing one or more main
tables in the source schema. You can choose different tables and
create different datasets. There is some documentation about it in
rc5. If you want to know how the transformation algorithm works I can
describe that to you too


a



On 2011-03-12, at 12:53, "Andrea Edwards"<[hidden email]>
wrote:

ok - thanks

i don't know much about biomart as you can probably tell but i was
told
there are quite significant differences between 0.7 and 0.8.
If i am interested in understanding how the schema transformations
take
place so that I can design my own mart and integrate it with existing
marts, would i be better dropping back to 0.7? I'm keen to get a
mart up
and running very soon.

On 12/03/2011 17:41, Arek Kasprzyk wrote:
0.8 rc 5 has still only rudimentary support for the MBuilder
component. You will not be able to read 0.7 mbuilder XML with it.
(ccing junjun who  has just taken over the coordination of the
BioMart
development to let him know that such discussions are taking place)

a



On 2011-03-12, at 12:28, "Andrea Edwards"<[hidden email]>
wrote:

Brilliant - thanks for such a prompt reply.

I note that you say MBuilder (0.7) whereas i have checked out the
code
for biomart 0.8 rc4


On 12/03/2011 16:39, Arek Kasprzyk wrote:
Hi Andrea
All the transformation information is stored in the XML file that
MBuilder (0.7) uses to compile it's DDL for Ensembl core
databases. I
am sure the ensembl mart team will be happy to provide you the
latest
version

a



On 2011-03-12, at 11:15, "Andrea Edwards"<[hidden email]>
wrote:

Hello

I was wondering if there were any documents showing how the
ensembl
marts were created from the main ensembl databases.
Specifically i
was
hoping there were documents describing what tables were selected
as
main
tables for the marts and how the dimension tables were mapped to
the
main tables.

As an example the ensembl_mart_61 contains a main table for human
named
translation_main (this is an abbreviation of the name but its
obvious
which one i mean) and this has a field called
protein_feature_prints_bool which is essentially a boolean field
indicating whether a protein translation is assocated with a row
in
the
PRINTS dimension table protein_feature_prints_dm. If the
translation
does have a row in this dimension table then I am guessing it
has a
PRINTS domain in it!

The core database itself however has a table called translation
which
represents, well, a translation. Translations are linked to rows
in a
table called 'protein_feature' which in turn has a foreign key
called
analysis_id which links to an 'analysis' table with fields
'database'
and 'program'. So in this schema, a translation is associated
with a
PRINTS annotation if it is linked to a 'protein_feature' record
which is
in turn linked to an 'analysis' record with the text 'PRINTS'
somewhere
in both/either the database/program fields.

I am interested in how the biomart software is configured with
'rules'
to create the mart schema from the database schema. Is there a
configuration file with these rules in that I could look at? Is
there a
worked example? As an academic exercise I'd like to recreate the
ensembl
marts. I have the biomart user manual but even with that document
I do
not know how to recreate the ensembl marts

I am NOT specifically interested in protein domains. I used the
PRINTS
example purely for illustrative purposes as I thought it was a
straightforward example. I am interested in how you specify the
'rules'
to get from a schema to a mart.

thanks a lot

_______________________________________________
Users mailing list

_______________________________________________
Users mailing list

_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users

Rhoda Kinsella Ph.D.
Ensembl Bioinformatician,
European Bioinformatics Institute (EMBL-EBI),
Wellcome Trust Genome Campus, 
Hinxton
Cambridge CB10 1SD,
UK.


_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: [BioMart Users] how to get from ensembl main database schema to ensembl mart schema

Junjun Zhang
In reply to this post by Arek Kasprzyk-2
Hi Andrea,


On 11-03-14 3:19 PM, "Andrea Edwards" <[hidden email]> wrote:

>    Thanks a lot
>
>  I am having such a lot of problems perhaps it is my end?
>
>  1) As i mentioned last week a browser doesn;t open for me when you click
> deploy server.
>  2) I tried importing an existing relational mart such as ensembl_mart_61 and
> i get a display bar saying updating configuration and then nothing happens for
> a while and i get kicked off my machine again with 'write error connection
> reset by peer'
>
>  I am connecting to a remote linux machine from windows xp via cygwin/x. I
> normally connect via putty. My machine only has 2meg of ram

I afraid that most of the problems you were experiencing are related to the
environment you run MartConfiguration. We have not tested yet BioMart under
Windows, Cygwin etc. The test environment we performed is described in the
User Manual (page 92). Is it possible for you to try 0.8 in one of those
environments?

Best regards,

Junjun




>
>  On 14/03/2011 19:24, Arek Kasprzyk wrote:
>>
>>
>>
>> ok, sounds like a bug report and a few useful suggestions for the
>> documentation as well. Yong: would you be able to follow Andrea's workflow
>> and figure out what' is going wrong there please? Junjun: cc'ing you in case
>> there is something requiring your attention here
>>
>>
>>
>>
>> Cheers,
>>
>> a
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Arek Kasprzyk
>>  Director, Bioinformatics Operations and Principal Investigator
>>
>>  Ontario Institute for Cancer Research
>>  MaRS Centre, South Tower
>>  101 College Street, Suite 800
>>  Toronto, Ontario, Canada M5G 0A3
>>
>>  Tel:       416-673-8559
>>  Toll-free:           1-866-678-6427
>>  www.oicr.on.ca <http://www.oicr.on.ca>
>>
>>
>>
>>
>> Administrative Assistant: [hidden email]
>>
>>
>>
>> This message and any attachments may contain confidential and/or privileged
>> information for the sole use of the intended recipient. Any review or
>> distribution by anyone other than the person for whom it was originally
>> intended is strictly prohibited. If you have received this message in error,
>> please contact the sender and delete all copies. Opinions, conclusions or
>> other information contained in this message may not be that of the
>> organization.
>>
>>
>>
>>
>>
>>
>>
>>
>> From:  Andrea Edwards <[hidden email]>
>>  Date:  Mon, 14 Mar 2011 14:54:37 -0400
>>  To:  Arek Kasprzyk <[hidden email]>
>>  Subject:  Re: [BioMart Users] how to get from ensembl main database schema
>> to ensembl mart schema
>>
>>
>>
>>
>>
>>
>>  Hi again
>>
>>  I'm having quite a bit of trouble with this I'm afraid. I'm trying to just
>> set up a basic mart. when i added a new data source i selected 'url mart' and
>> I added data sources for human and mouse SNPs from the biomart server. I
>> didn't understand what the point of the config file was so I added 2, one for
>> each data source.
>>  Is this correct? What are the config files as there isn't an explanation in
>> the user manual? I also searched my hard disk to find the files in case their
>> contents gave me the answer but they don't exist. If i remove one of the
>> config files I can only see one data source when i view the page in localhost
>> so it seems like you need a config file to view your dataset.
>>
>>   If i right click a config file a window appears and this window won't go
>> away no matter what i do. The same applies for the windows which appear when
>> i right click a data source in the left hand pane
>>
>>  Anyhow i have 2 config files one for human and one for mouse and I deploy
>> the server. Then on the home page under default i have 2 datasets listed
>> called hsapiens_snp_config and btaurus_snp_config. Then i click on one
>> (human) and it loads a new page with database hsapiens_snp_config and dataset
>> homo sapiens variation dbSNP132. There is a graphic saying the website is
>> loading the filters and then nothing happens. The page just stays like this.
>>
>>  I thought this might be due to slow connection so I went back to the mart
>> configurator and opened the mart and tried to remove some fields from the
>> datasets. I right clicked on the mouse snp dataset and selected schema editor
>> and show columns and nothing happened. Then i get kicked off my machine
>> (which i am connecting to remotely) saying 'write failed connection reset by
>> peer' I've tried this several times now with the same result
>>
>>  thanks a lot
>>
>>
>>  On 12/03/2011 19:45, Arek Kasprzyk wrote:
>>>
>>>
>>>
>>>
>>>
>>>
>>> Hi Andrea,
>>>
>>> Ok, I have a better idea now what you want. The situation is as follows: 0.8
>>> rc5 has automated and integrated nicely a lot of workflows needed to create
>>> a new mart from a source schema. However the particular Ensembl core
>>> transformation is a very complex one and rc5 still has only a rudimentary
>>> support for that. If you just want to have an idea how the algorithm works
>>> it is better to start with a simpler use case not Ensembl mart . It will be
>>> difficult to  recreate the exact Ensembl mart transformation from scratch
>>> for two reasons: rc5 has still a very rudimentary support for this
>>> pariticular schema so you will not get far. The 0.7 fully supports it but
>>> thereis  a large number of 'tweaks aka hacks' to the transformation
>>> algorithm to get certain things to work so you will find difficult to
>>> recreate a lot of them.
>>>
>>>
>>> I would advise you to play with any schema to get a few datasets to
>>> work (ensembl core schema is fine to play with too). If you want to build
>>> new marts and integrate them with ensembl definitely go for rc5 and treat
>>> the existing ensembl  mart as a black box, the software will provide the
>>> means to integrate it nicely through a backwards compatibility mechanism
>>> with your newly created mart.This is much easier in 0.8 rc5. If you however
>>> just want to see how the ensembl mart transformation is achieved exactly you
>>> will need a 0.7 XML transformation file from the Ensembl team.
>>>
>>>
>>>
>>>
>>> FYI: A short description of the basic transformation algorithm below:
>>>
>>>
>>>
>>>
>>> starting from one or more input ³candidate² table, the software finds the
>>> largest set of table joins it can perform using only 1:1 and many-to-one
>>> (M:1) relations, and merges these tables together to create  the main table.
>>> Multiple candidate tables can be given as input, in which case the algorithm
>>> creates main tables out of each selected candidate table and if unable to do
>>> so will create several separate datasets. Once the main tables are
>>> completed, if there is a 1:M relation between them they become main and
>>> sub-main tables. If there is now 1:M relation between them, they are split
>>> into separate datasets. Any tables that have a 1:M or many-to-many (M:N)
>>> relation with the newly-created main table or sub-main table are made into
>>> independent dimension tables.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> Please let us know if you have any more questions,
>>>
>>> a
>>>
>>>
>>>
>>>
>>> Arek Kasprzyk
>>>  Director, Bioinformatics Operations and Principal Investigator
>>>
>>>  Ontario Institute for Cancer Research
>>>  MaRS Centre, South Tower
>>>  101 College Street, Suite 800
>>>  Toronto, Ontario, Canada M5G 0A3
>>>
>>>  Tel:       416-673-8559
>>>  Toll-free:           1-866-678-6427
>>>  www.oicr.on.ca <http://www.oicr.on.ca>
>>>
>>>
>>>
>>>
>>> Administrative Assistant: [hidden email]
>>>
>>>
>>>
>>> This message and any attachments may contain confidential and/or privileged
>>> information for the sole use of the intended recipient. Any review or
>>> distribution by anyone other than the person for whom it was originally
>>> intended is strictly prohibited. If you have received this message in error,
>>> please contact the sender and delete all copies. Opinions, conclusions or
>>> other information contained in this message may not be that of the
>>> organization.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> From:  Andrea Edwards <[hidden email]>
>>>  Date:  Sat, 12 Mar 2011 14:13:24 -0500
>>>  To:  "[hidden email]" <[hidden email]>
>>>  Subject:  Re: [BioMart Users] how to get from ensembl main database schema
>>> to ensembl mart schema
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> Hi
>>>
>>>
>>>
>>>
>>> I am not concerned whether I use biomart 0.7 or 0.8 - whichever is
>>>
>>> easiest for what I would like to do. I havent done anything yet and I'm
>>>
>>> starting from scratch.
>>>
>>>
>>>
>>>
>>> All i want to do is have a go at re-creating the ensembl mart from the
>>>
>>> ensembl core databases. I wanted to do this because ensembl is an
>>>
>>> example of a database whose schema I am familiar with and whose mart I
>>>
>>> have used. I wanted to do this for 2 reasons:
>>>
>>> a) to get some practice
>>>
>>> b) to get an intuitition of what type of mart I can create from my own
>>>
>>> database schema and what types of query I can run and what the
>>>
>>> filters/attributes will be
>>>
>>> c) get an idea of how i could integrate my database with ensembl as I
>>>
>>> believe they only need to share ids or underlying assembly to b integrated
>>>
>>>
>>>
>>>
>>> Will i be able to recreate the ensembl mart in biomart 0.8? I presume
>>>
>>> the ensembl xml files are available for 0.7 and I won;t be able to read
>>>
>>> them in 0.8? Without these files how will i know the exact steps ensembl
>>>
>>> used to specify their mart structure? How will i know what main tables
>>>
>>> they chose or how for example they created the PRINTS dimension table
>>>
>>> mentioned in my original query?
>>>
>>>
>>>
>>>
>>> Thanks a lot
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On 12/03/2011 18:59, Arek Kasprzyk wrote:
>>>
>>>>
>>>>  Putting this back on the list to keep everyone else in the loop
>>>>
>>>>
>>>>
>>>>
>>>>  a
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>  On 2011-03-12, at 13:56, "Arek Kasprzyk"<[hidden email]>
>>>>
>>>>  wrote:
>>>>
>>>>
>>>>
>>>>
>>>>>
>>>>>  If you are starting from scratch it would be much better to start with
>>>>>
>>>>>  0.8 rc5. Creating new mart is as simple as choosing one or more main
>>>>>
>>>>>  tables in the source schema. You can choose different tables and
>>>>>
>>>>>  create different datasets. There is some documentation about it in
>>>>>
>>>>>  rc5. If you want to know how the transformation algorithm works I can
>>>>>
>>>>>  describe that to you too
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>  a
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>  On 2011-03-12, at 12:53, "Andrea Edwards"<[hidden email]>
>>>>>
>>>>>  wrote:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>>  ok - thanks
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>  i don't know much about biomart as you can probably tell but i was
>>>>>>
>>>>>>  told
>>>>>>
>>>>>>  there are quite significant differences between 0.7 and 0.8.
>>>>>>
>>>>>>  If i am interested in understanding how the schema transformations
>>>>>>
>>>>>>  take
>>>>>>
>>>>>>  place so that I can design my own mart and integrate it with existing
>>>>>>
>>>>>>  marts, would i be better dropping back to 0.7? I'm keen to get a
>>>>>>
>>>>>>  mart up
>>>>>>
>>>>>>  and running very soon.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>  On 12/03/2011 17:41, Arek Kasprzyk wrote:
>>>>>>
>>>>>>>
>>>>>>>  0.8 rc 5 has still only rudimentary support for the MBuilder
>>>>>>>
>>>>>>>  component. You will not be able to read 0.7 mbuilder XML with it.
>>>>>>>
>>>>>>>  (ccing junjun who  has just taken over the coordination of the
>>>>>>>
>>>>>>>  BioMart
>>>>>>>
>>>>>>>  development to let him know that such discussions are taking place)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>  a
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>  On 2011-03-12, at 12:28, "Andrea Edwards"<[hidden email]>
>>>>>>>
>>>>>>>  wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>  Brilliant - thanks for such a prompt reply.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>  I note that you say MBuilder (0.7) whereas i have checked out the
>>>>>>>>
>>>>>>>>  code
>>>>>>>>
>>>>>>>>  for biomart 0.8 rc4
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>  On 12/03/2011 16:39, Arek Kasprzyk wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>  Hi Andrea
>>>>>>>>
>>>>>>>>  All the transformation information is stored in the XML file that
>>>>>>>>
>>>>>>>>  MBuilder (0.7) uses to compile it's DDL for Ensembl core
>>>>>>>>
>>>>>>>>  databases. I
>>>>>>>>
>>>>>>>>  am sure the ensembl mart team will be happy to provide you the
>>>>>>>>
>>>>>>>>  latest
>>>>>>>>
>>>>>>>>  version
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>  a
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>  On 2011-03-12, at 11:15, "Andrea Edwards"<[hidden email]>
>>>>>>>>
>>>>>>>>  wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>  Hello
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>  I was wondering if there were any documents showing how the
>>>>>>>>
>>>>>>>>  ensembl
>>>>>>>>
>>>>>>>>  marts were created from the main ensembl databases.
>>>>>>>>
>>>>>>>>  Specifically i
>>>>>>>>
>>>>>>>>  was
>>>>>>>>
>>>>>>>>  hoping there were documents describing what tables were selected
>>>>>>>>
>>>>>>>>  as
>>>>>>>>
>>>>>>>>  main
>>>>>>>>
>>>>>>>>  tables for the marts and how the dimension tables were mapped to
>>>>>>>>
>>>>>>>>  the
>>>>>>>>
>>>>>>>>  main tables.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>  As an example the ensembl_mart_61 contains a main table for human
>>>>>>>>
>>>>>>>>  named
>>>>>>>>
>>>>>>>>  translation_main (this is an abbreviation of the name but its
>>>>>>>>
>>>>>>>>  obvious
>>>>>>>>
>>>>>>>>  which one i mean) and this has a field called
>>>>>>>>
>>>>>>>>  protein_feature_prints_bool which is essentially a boolean field
>>>>>>>>
>>>>>>>>  indicating whether a protein translation is assocated with a row
>>>>>>>>
>>>>>>>>  in
>>>>>>>>
>>>>>>>>  the
>>>>>>>>
>>>>>>>>  PRINTS dimension table protein_feature_prints_dm. If the
>>>>>>>>
>>>>>>>>  translation
>>>>>>>>
>>>>>>>>  does have a row in this dimension table then I am guessing it
>>>>>>>>
>>>>>>>>  has a
>>>>>>>>
>>>>>>>>  PRINTS domain in it!
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>  The core database itself however has a table called translation
>>>>>>>>
>>>>>>>>  which
>>>>>>>>
>>>>>>>>  represents, well, a translation. Translations are linked to rows
>>>>>>>>
>>>>>>>>  in a
>>>>>>>>
>>>>>>>>  table called 'protein_feature' which in turn has a foreign key
>>>>>>>>
>>>>>>>>  called
>>>>>>>>
>>>>>>>>  analysis_id which links to an 'analysis' table with fields
>>>>>>>>
>>>>>>>>  'database'
>>>>>>>>
>>>>>>>>  and 'program'. So in this schema, a translation is associated
>>>>>>>>
>>>>>>>>  with a
>>>>>>>>
>>>>>>>>  PRINTS annotation if it is linked to a 'protein_feature' record
>>>>>>>>
>>>>>>>>  which is
>>>>>>>>
>>>>>>>>  in turn linked to an 'analysis' record with the text 'PRINTS'
>>>>>>>>
>>>>>>>>  somewhere
>>>>>>>>
>>>>>>>>  in both/either the database/program fields.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>  I am interested in how the biomart software is configured with
>>>>>>>>
>>>>>>>>  'rules'
>>>>>>>>
>>>>>>>>  to create the mart schema from the database schema. Is there a
>>>>>>>>
>>>>>>>>  configuration file with these rules in that I could look at? Is
>>>>>>>>
>>>>>>>>  there a
>>>>>>>>
>>>>>>>>  worked example? As an academic exercise I'd like to recreate the
>>>>>>>>
>>>>>>>>  ensembl
>>>>>>>>
>>>>>>>>  marts. I have the biomart user manual but even with that document
>>>>>>>>
>>>>>>>>  I do
>>>>>>>>
>>>>>>>>  not know how to recreate the ensembl marts
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>  I am NOT specifically interested in protein domains. I used the
>>>>>>>>
>>>>>>>>  PRINTS
>>>>>>>>
>>>>>>>>  example purely for illustrative purposes as I thought it was a
>>>>>>>>
>>>>>>>>  straightforward example. I am interested in how you specify the
>>>>>>>>
>>>>>>>>  'rules'
>>>>>>>>
>>>>>>>>  to get from a schema to a mart.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>  thanks a lot
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>  _______________________________________________
>>>>>>>>
>>>>>>>>  Users mailing list
>>>>>>>>
>>>>>>>>  [hidden email]
>>>>>>>>
>>>>>>>>  https://lists.biomart.org/mailman/listinfo/users
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>>
>>> Users mailing list
>>>
>>> [hidden email]
>>>
>>> https://lists.biomart.org/mailman/listinfo/users
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>>
>>
>
>

_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users