[BioMart Users] OUTER JOIN between tables

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

[BioMart Users] OUTER JOIN between tables

nqueralt

Dear BioMart users,

 

We are setting up a local database powered by BioMart in order to annotate our local data with the data of other marts. The problem is that the type of join that BioMart is performing between tables seems to be an inner join since BioMart solely outputs all the records that match in the two marts linked. My question is, if this is true, are you planning to add outer joins in order to get in addition all the unmatched results? In our case, if we want to annotate our database with another mart data, we would perform a left join to get all the matched records + all the unmatched records from our database. And in case you are planning to implement this, when do you predict that this implementation will be accessible for us, the BioMart users?

 

This type of outputting the query results, i.e. to output both the matched and unmatched records from our database, is of highly interest in order to annotate BUT without losing information from the query because of the annotation process. Many thanks in advance.

 

Please, do not hesitate to ask me for more information.

 

Yours faithfully,

 

Núria

 

 


_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: [BioMart Users] OUTER JOIN between tables

Arek Kasprzyk
Hi Nuria,
this has been a well know problem (or a feature depends how you look at it :)) of BioMart from it's beginnings. BioMart 0.7 performs an inner join only between tables within single dataset. The join between marts depends on data overlap between them and can behave either as an outer join when there is 100% overlap or as inner join if there is no full overlap (more typical scenario). This is of course not an ideal situation because it looks somewhat non-deterministic for the administrators.

The mechanism of both joins (dataset and mart) is different. While the former is a simple SQL inner join, the latter is an 'inlist' that is an equivalent to multiple 'or' statements. The effect of this implementation was that all dimension tables in a dataset had to be a precomputed 'left join' tables ei tables with multiple NULLs which is not ideal for large tables with sparse records wasting a lot of space.

BioMart 0.8 has inherited most solutions from 0.7. However it is supposed to be more flexible and perform an inner or outer join within the dataset depending on the administrator's settings that should available through MartConfigurator and also automatically set based on the initial MBuilder components settings. I do remember personally instructing one of the developers to include this feature in the QueryCompiler and it was there as a simple switch at some point but i doubth it that it ever made it to MConfigurator or MBuilder.

The situation with the join between marts in 0.8 is identical to that of 0.7 (ei there are no provisions for the outer join). However, your email alerted me to this fact and this is definitely something that we need to think about. There is one more situation involving inlist that has not been solved either ei annotating external identifiers that effectively is an inner join as well and has been brought to our attention before. Technically the solution for both should be the same. This is definitely doable and something worth scheduling in the near future.


Junjun: please check where we are with the DATASET outer join implementation.


cheers,
a



On Mon, Sep 5, 2011 at 7:29 AM, <[hidden email]> wrote:

Dear BioMart users,

 

We are setting up a local database powered by BioMart in order to annotate our local data with the data of other marts. The problem is that the type of join that BioMart is performing between tables seems to be an inner join since BioMart solely outputs all the records that match in the two marts linked. My question is, if this is true, are you planning to add outer joins in order to get in addition all the unmatched results? In our case, if we want to annotate our database with another mart data, we would perform a left join to get all the matched records + all the unmatched records from our database. And in case you are planning to implement this, when do you predict that this implementation will be accessible for us, the BioMart users?

 

This type of outputting the query results, i.e. to output both the matched and unmatched records from our database, is of highly interest in order to annotate BUT without losing information from the query because of the annotation process. Many thanks in advance.

 

Please, do not hesitate to ask me for more information.

 

Yours faithfully,

 

Núria

 

 


_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users



_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: [BioMart Users] OUTER JOIN between tables

Junjun Zhang
Hi Nuria,

Your use case is well understood. As left-join is a more typical scenario for inter-mart queries, we are considering to make left-join the default behaviour in the next release. Technically this is not hard to implement, however, we'd like to be very careful about making this type of decision as it has global impact to the whole querying system. In this case, we will need to provide means to enable deployers (maybe end users as well) to alter the default join behaviour from the web GUI and the query XML.

For join with a dataset between main table and dimension table, we can continue to support precomputed 'left join' as Arek mentioned below. We can also make it an optional for performing 'left join' on the fly. The only concern is that under certain situations the latter may have some performance disadvantages.

We will keep you informed on this.

Hope this helps!

Best regards,
Junjun


From: Arek Kasprzyk <[hidden email]>
Date: Mon, 5 Sep 2011 09:55:42 -0400
To: "[hidden email]" <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: [BioMart Users] OUTER JOIN between tables

Hi Nuria,
this has been a well know problem (or a feature depends how you look at it :)) of BioMart from it's beginnings. BioMart 0.7 performs an inner join only between tables within single dataset. The join between marts depends on data overlap between them and can behave either as an outer join when there is 100% overlap or as inner join if there is no full overlap (more typical scenario). This is of course not an ideal situation because it looks somewhat non-deterministic for the administrators.

The mechanism of both joins (dataset and mart) is different. While the former is a simple SQL inner join, the latter is an 'inlist' that is an equivalent to multiple 'or' statements. The effect of this implementation was that all dimension tables in a dataset had to be a precomputed 'left join' tables ei tables with multiple NULLs which is not ideal for large tables with sparse records wasting a lot of space.

BioMart 0.8 has inherited most solutions from 0.7. However it is supposed to be more flexible and perform an inner or outer join within the dataset depending on the administrator's settings that should available through MartConfigurator and also automatically set based on the initial MBuilder components settings. I do remember personally instructing one of the developers to include this feature in the QueryCompiler and it was there as a simple switch at some point but i doubth it that it ever made it to MConfigurator or MBuilder.

The situation with the join between marts in 0.8 is identical to that of 0.7 (ei there are no provisions for the outer join). However, your email alerted me to this fact and this is definitely something that we need to think about. There is one more situation involving inlist that has not been solved either ei annotating external identifiers that effectively is an inner join as well and has been brought to our attention before. Technically the solution for both should be the same. This is definitely doable and something worth scheduling in the near future.


Junjun: please check where we are with the DATASET outer join implementation.


cheers,
a



On Mon, Sep 5, 2011 at 7:29 AM, <[hidden email]> wrote:

Dear BioMart users,

 

We are setting up a local database powered by BioMart in order to annotate our local data with the data of other marts. The problem is that the type of join that BioMart is performing between tables seems to be an inner join since BioMart solely outputs all the records that match in the two marts linked. My question is, if this is true, are you planning to add outer joins in order to get in addition all the unmatched results? In our case, if we want to annotate our database with another mart data, we would perform a left join to get all the matched records + all the unmatched records from our database. And in case you are planning to implement this, when do you predict that this implementation will be accessible for us, the BioMart users?

 

This type of outputting the query results, i.e. to output both the matched and unmatched records from our database, is of highly interest in order to annotate BUT without losing information from the query because of the annotation process. Many thanks in advance.

 

Please, do not hesitate to ask me for more information.

 

Yours faithfully,

 

Núria

 

 


_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users



_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: [BioMart Users] OUTER JOIN between tables

Arek Kasprzyk
Junjun:
my three cents ;)

1. I would definitely NOT expose end users to understand the querying logic for inter mart queries.
2. left join with inter mart querying needs to be solved with in list annotation at the same time
3. The 'on the fly' left join was tested two years ago when i asked for the query compiler to be extended so we know well how it performs :)

a.






On Tue, Sep 6, 2011 at 12:53 PM, Junjun Zhang <[hidden email]> wrote:
Hi Nuria,

Your use case is well understood. As left-join is a more typical scenario for inter-mart queries, we are considering to make left-join the default behaviour in the next release. Technically this is not hard to implement, however, we'd like to be very careful about making this type of decision as it has global impact to the whole querying system. In this case, we will need to provide means to enable deployers (maybe end users as well) to alter the default join behaviour from the web GUI and the query XML.

For join with a dataset between main table and dimension table, we can continue to support precomputed 'left join' as Arek mentioned below. We can also make it an optional for performing 'left join' on the fly. The only concern is that under certain situations the latter may have some performance disadvantages.

We will keep you informed on this.

Hope this helps!

Best regards,
Junjun


From: Arek Kasprzyk <[hidden email]>
Date: Mon, 5 Sep 2011 09:55:42 -0400
To: "[hidden email]" <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: [BioMart Users] OUTER JOIN between tables

Hi Nuria,
this has been a well know problem (or a feature depends how you look at it :)) of BioMart from it's beginnings. BioMart 0.7 performs an inner join only between tables within single dataset. The join between marts depends on data overlap between them and can behave either as an outer join when there is 100% overlap or as inner join if there is no full overlap (more typical scenario). This is of course not an ideal situation because it looks somewhat non-deterministic for the administrators.

The mechanism of both joins (dataset and mart) is different. While the former is a simple SQL inner join, the latter is an 'inlist' that is an equivalent to multiple 'or' statements. The effect of this implementation was that all dimension tables in a dataset had to be a precomputed 'left join' tables ei tables with multiple NULLs which is not ideal for large tables with sparse records wasting a lot of space.

BioMart 0.8 has inherited most solutions from 0.7. However it is supposed to be more flexible and perform an inner or outer join within the dataset depending on the administrator's settings that should available through MartConfigurator and also automatically set based on the initial MBuilder components settings. I do remember personally instructing one of the developers to include this feature in the QueryCompiler and it was there as a simple switch at some point but i doubth it that it ever made it to MConfigurator or MBuilder.

The situation with the join between marts in 0.8 is identical to that of 0.7 (ei there are no provisions for the outer join). However, your email alerted me to this fact and this is definitely something that we need to think about. There is one more situation involving inlist that has not been solved either ei annotating external identifiers that effectively is an inner join as well and has been brought to our attention before. Technically the solution for both should be the same. This is definitely doable and something worth scheduling in the near future.


Junjun: please check where we are with the DATASET outer join implementation.


cheers,
a



On Mon, Sep 5, 2011 at 7:29 AM, <[hidden email]> wrote:

Dear BioMart users,

 

We are setting up a local database powered by BioMart in order to annotate our local data with the data of other marts. The problem is that the type of join that BioMart is performing between tables seems to be an inner join since BioMart solely outputs all the records that match in the two marts linked. My question is, if this is true, are you planning to add outer joins in order to get in addition all the unmatched results? In our case, if we want to annotate our database with another mart data, we would perform a left join to get all the matched records + all the unmatched records from our database. And in case you are planning to implement this, when do you predict that this implementation will be accessible for us, the BioMart users?

 

This type of outputting the query results, i.e. to output both the matched and unmatched records from our database, is of highly interest in order to annotate BUT without losing information from the query because of the annotation process. Many thanks in advance.

 

Please, do not hesitate to ask me for more information.

 

Yours faithfully,

 

Núria

 

 


_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users




_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: [BioMart Users] OUTER JOIN between tables

Junjun Zhang
Hi Arek,



From:  Arek Kasprzyk <[hidden email]>
Date:  Tue, 6 Sep 2011 13:29:01 -0400
To:  jzhang <[hidden email]>
Cc:  "[hidden email]" <[hidden email]>, "[hidden email]"
<[hidden email]>
Subject:  Re: [BioMart Users] OUTER JOIN between tables


>Junjun:
>my three cents ;)
>
>1. I would definitely NOT expose end users to understand the querying
>logic for inter mart queries.

I understand your concerns. OK, let's not do it now. But I was just trying
to think of this in the context of long-term planing so that it would be
possible to support this in case there is a compelling reason to do so in
the future.

>
>2. left join with inter mart querying needs to be solved with in list
>annotation at the same time

Just wanted to confirm with you that I understand the 'in list annotation'
use case correctly. Do you think the query should behave as inner join for
'in list annotation' and BioMart should support both left join and inner
join between different marts? If so, then that's what I see it as well.
For that we can make 'left join' the default behaviour (as it's more
common) and provide a mechanism to override this in case deployer wants
alter that to 'inner join'.

>
>3. The 'on the fly' left join was tested two years ago when i asked for
>the query compiler to be extended so we know well how it performs :)

Yes, I remember this. I just wanted to double check whether the previous
testing was comprehensive enough. After reading this:
http://dev.mysql.com/doc/refman/5.0/en/left-join-optimization.html, I felt
that it's necessary to add some more tests with multiple left joins in a
single SQL.

Cheers,

Junjun


>
>a.
>
>
>
>
>
>
>On Tue, Sep 6, 2011 at 12:53 PM, Junjun Zhang <[hidden email]>
>wrote:
>
>Hi Nuria,
>
>
>
>Your use case is well understood. As left-join is a more typical scenario
>for inter-mart queries, we are considering to make left-join the default
>behaviour in the next release. Technically this is not hard to implement,
>however, we'd like to be very careful about making this type of decision
>as it has global impact to the whole querying system. In this case, we
>will need to provide means to enable deployers (maybe end users as well)
>to alter the default join behaviour from the web GUI and the query XML.
>
>For join with a dataset between main table and dimension table, we can
>continue to support precomputed 'left join' as Arek mentioned below. We
>can also make it an optional for performing 'left join' on the fly. The
>only concern is that under certain situations the latter may have some
>performance disadvantages.
>
>We will keep you informed on this.
>
>Hope this helps!
>
>Best regards,
>Junjun
>
>
>From:  Arek Kasprzyk <[hidden email]>
>Date:  Mon, 5 Sep 2011 09:55:42 -0400
>To:  "[hidden email]" <[hidden email]>
>Cc:  "[hidden email]" <[hidden email]>
>Subject:  Re: [BioMart Users] OUTER JOIN between tables
>
>
>
>Hi Nuria,
>this has been a well know problem (or a feature depends how you look at
>it :)) of BioMart from it's beginnings. BioMart 0.7 performs an inner
>join only between tables within single dataset. The join between marts
>depends on data overlap between them and can behave either as an outer
>join when there is 100% overlap or as inner join if there is no full
>overlap (more typical scenario). This is of course not an ideal situation
>because it looks somewhat non-deterministic for the administrators.
>
>The mechanism of both joins (dataset and mart) is different. While the
>former is a simple SQL inner join, the latter is an 'inlist' that is an
>equivalent to multiple 'or' statements. The effect of this implementation
>was that all dimension tables in a dataset had to be a precomputed 'left
>join' tables ei tables with multiple NULLs which is not ideal for large
>tables with sparse records wasting a lot of space.
>
>BioMart 0.8 has inherited most solutions from 0.7. However it is supposed
>to be more flexible and perform an inner or outer join within the dataset
>depending on the administrator's settings that should available through
>MartConfigurator and also automatically set based on the initial MBuilder
>components settings. I do remember personally instructing one of the
>developers to include this feature in the QueryCompiler and it was there
>as a simple switch at some point but i doubth it that it ever made it to
>MConfigurator or MBuilder.
>
>The situation with the join between marts in 0.8 is identical to that of
>0.7 (ei there are no provisions for the outer join). However, your email
>alerted me to this fact and this is definitely something that we need to
>think about. There is one more situation involving inlist that has not
>been solved either ei annotating external identifiers that effectively is
>an inner join as well and has been brought to our attention before.
>Technically the solution for both should be the same. This is definitely
>doable and something worth scheduling in the near future.
>
>
>Junjun: please check where we are with the DATASET outer join
>implementation.
>
>
>cheers,
>a
>
>
>
>On Mon, Sep 5, 2011 at 7:29 AM,  <[hidden email]> wrote:
>
>Dear BioMart users,
>
>We are setting up a local database powered by BioMart in
>order to annotate our local data with the data of other marts. The
>problem is
>that the type of join that BioMart is performing between tables seems to
>be an
>inner join since BioMart solely outputs all the records that match in the
>two
>marts linked. My question is, if this is true, are you planning to add
>outer
>joins in order to get in addition all the unmatched results? In our case,
>if we
>want to annotate our database with another mart data, we would perform a
>left
>join to get all the matched records + all the unmatched records from our
>database. And in case you are planning to implement this, when do you
>predict
>that this implementation will be accessible for us, the BioMart users?
>
>This type of outputting the query results, i.e. to output both
>the matched and unmatched records from our database, is of highly
>interest in
>order to annotate BUT without losing information from the query because
>of the
>annotation process. Many thanks in advance.
>
>Please, do not hesitate to ask me for more information.
>
>Yours faithfully,
>
>Núria
>
>
>
>
>
>_______________________________________________
>Users mailing list
>[hidden email]
>https://lists.biomart.org/mailman/listinfo/users
>
>
>
>
>
>
>
>
>
>
>
>
>

_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: [BioMart Users] OUTER JOIN between tables

Junjun Zhang
In reply to this post by Arek Kasprzyk
Hi Arek,

Can you please elaborate more on the follow point mentioned in your previous response?

"There is one more situation involving inlist that has not
been solved either ei annotating external identifiers that effectively is
an inner join as well and has been brought to our attention before."

Thanks,
Junjun


From: Arek Kasprzyk <[hidden email]>
Date: Tue, 6 Sep 2011 13:29:01 -0400
To: jzhang <[hidden email]>
Cc: "[hidden email]" <[hidden email]>, "[hidden email]" <[hidden email]>
Subject: Re: [BioMart Users] OUTER JOIN between tables

Junjun:
my three cents ;)

1. I would definitely NOT expose end users to understand the querying logic for inter mart queries.
2. left join with inter mart querying needs to be solved with in list annotation at the same time
3. The 'on the fly' left join was tested two years ago when i asked for the query compiler to be extended so we know well how it performs :)

a.






On Tue, Sep 6, 2011 at 12:53 PM, Junjun Zhang <[hidden email]> wrote:
Hi Nuria,

Your use case is well understood. As left-join is a more typical scenario for inter-mart queries, we are considering to make left-join the default behaviour in the next release. Technically this is not hard to implement, however, we'd like to be very careful about making this type of decision as it has global impact to the whole querying system. In this case, we will need to provide means to enable deployers (maybe end users as well) to alter the default join behaviour from the web GUI and the query XML.

For join with a dataset between main table and dimension table, we can continue to support precomputed 'left join' as Arek mentioned below. We can also make it an optional for performing 'left join' on the fly. The only concern is that under certain situations the latter may have some performance disadvantages.

We will keep you informed on this.

Hope this helps!

Best regards,
Junjun


From: Arek Kasprzyk <[hidden email]>
Date: Mon, 5 Sep 2011 09:55:42 -0400
To: "[hidden email]" <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: [BioMart Users] OUTER JOIN between tables

Hi Nuria,
this has been a well know problem (or a feature depends how you look at it :)) of BioMart from it's beginnings. BioMart 0.7 performs an inner join only between tables within single dataset. The join between marts depends on data overlap between them and can behave either as an outer join when there is 100% overlap or as inner join if there is no full overlap (more typical scenario). This is of course not an ideal situation because it looks somewhat non-deterministic for the administrators.

The mechanism of both joins (dataset and mart) is different. While the former is a simple SQL inner join, the latter is an 'inlist' that is an equivalent to multiple 'or' statements. The effect of this implementation was that all dimension tables in a dataset had to be a precomputed 'left join' tables ei tables with multiple NULLs which is not ideal for large tables with sparse records wasting a lot of space.

BioMart 0.8 has inherited most solutions from 0.7. However it is supposed to be more flexible and perform an inner or outer join within the dataset depending on the administrator's settings that should available through MartConfigurator and also automatically set based on the initial MBuilder components settings. I do remember personally instructing one of the developers to include this feature in the QueryCompiler and it was there as a simple switch at some point but i doubth it that it ever made it to MConfigurator or MBuilder.

The situation with the join between marts in 0.8 is identical to that of 0.7 (ei there are no provisions for the outer join). However, your email alerted me to this fact and this is definitely something that we need to think about. There is one more situation involving inlist that has not been solved either ei annotating external identifiers that effectively is an inner join as well and has been brought to our attention before. Technically the solution for both should be the same. This is definitely doable and something worth scheduling in the near future.


Junjun: please check where we are with the DATASET outer join implementation.


cheers,
a



On Mon, Sep 5, 2011 at 7:29 AM, <[hidden email]> wrote:

Dear BioMart users,

 

We are setting up a local database powered by BioMart inorder to annotate our local data with the data of other marts. The problem is that the type of join that BioMart is performing between tables seems to be an inner join since BioMart solely outputs all the records that match in the two marts linked. My question is, if this is true, are you planning to add outer joins in order to get in addition all the unmatched results? In our case, if we want to annotate our database with another mart data, we would perform a left join to get all the matched records + all the unmatched records from our database. And in case you are planning to implement this, when do you predict that this implementation will be accessible for us, the BioMart users?

 

This type of outputting the query results, i.e. to output both the matched and unmatched records from our database, is of highly interest in order to annotate BUT without losing information from the query because of the annotation process. Many thanks in advance.

 

Please, do not hesitate to ask me for more information.

 

Yours faithfully,

 

Núria

 

 


_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users




_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: [BioMart Users] OUTER JOIN between tables

Arek Kasprzyk
In reply to this post by Junjun Zhang
<snip>

>2. left join with inter mart querying needs to be solved with in list
>annotation at the same time

Just wanted to confirm with you that I understand the 'in list annotation'
use case correctly. Do you think the query should behave as inner join for
'in list annotation' and BioMart should support both left join and inner
join between different marts? If so, then that's what I see it as well.
For that we can make 'left join' the default behaviour (as it's more
common) and provide a mechanism to override this in case deployer wants
alter that to 'inner join'.


</snip>

I would just do everything as  default left join. In this way people would not be loosing their identifiers
when trying to annotate stuff that does not exist in a given mart. For inter-mart querying just the same. We can
discuss more technical details offline at some point

a

_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: [BioMart Users] OUTER JOIN between tables

Arek Kasprzyk
In reply to this post by Junjun Zhang
just replied to it :)

On Tue, Sep 6, 2011 at 5:50 PM, Junjun Zhang <[hidden email]> wrote:
Hi Arek,

Can you please elaborate more on the follow point mentioned in your previous response?

"There is one more situation involving inlist that has not
been solved either ei annotating external identifiers that effectively is
an inner join as well and has been brought to our attention before."

Thanks,
Junjun


From: Arek Kasprzyk <[hidden email]>
Date: Tue, 6 Sep 2011 13:29:01 -0400
To: jzhang <[hidden email]>
Cc: "[hidden email]" <[hidden email]>, "[hidden email]" <[hidden email]>

Subject: Re: [BioMart Users] OUTER JOIN between tables

Junjun:
my three cents ;)

1. I would definitely NOT expose end users to understand the querying logic for inter mart queries.
2. left join with inter mart querying needs to be solved with in list annotation at the same time
3. The 'on the fly' left join was tested two years ago when i asked for the query compiler to be extended so we know well how it performs :)

a.






On Tue, Sep 6, 2011 at 12:53 PM, Junjun Zhang <[hidden email]> wrote:
Hi Nuria,

Your use case is well understood. As left-join is a more typical scenario for inter-mart queries, we are considering to make left-join the default behaviour in the next release. Technically this is not hard to implement, however, we'd like to be very careful about making this type of decision as it has global impact to the whole querying system. In this case, we will need to provide means to enable deployers (maybe end users as well) to alter the default join behaviour from the web GUI and the query XML.

For join with a dataset between main table and dimension table, we can continue to support precomputed 'left join' as Arek mentioned below. We can also make it an optional for performing 'left join' on the fly. The only concern is that under certain situations the latter may have some performance disadvantages.

We will keep you informed on this.

Hope this helps!

Best regards,
Junjun


From: Arek Kasprzyk <[hidden email]>
Date: Mon, 5 Sep 2011 09:55:42 -0400
To: "[hidden email]" <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: [BioMart Users] OUTER JOIN between tables

Hi Nuria,
this has been a well know problem (or a feature depends how you look at it :)) of BioMart from it's beginnings. BioMart 0.7 performs an inner join only between tables within single dataset. The join between marts depends on data overlap between them and can behave either as an outer join when there is 100% overlap or as inner join if there is no full overlap (more typical scenario). This is of course not an ideal situation because it looks somewhat non-deterministic for the administrators.

The mechanism of both joins (dataset and mart) is different. While the former is a simple SQL inner join, the latter is an 'inlist' that is an equivalent to multiple 'or' statements. The effect of this implementation was that all dimension tables in a dataset had to be a precomputed 'left join' tables ei tables with multiple NULLs which is not ideal for large tables with sparse records wasting a lot of space.

BioMart 0.8 has inherited most solutions from 0.7. However it is supposed to be more flexible and perform an inner or outer join within the dataset depending on the administrator's settings that should available through MartConfigurator and also automatically set based on the initial MBuilder components settings. I do remember personally instructing one of the developers to include this feature in the QueryCompiler and it was there as a simple switch at some point but i doubth it that it ever made it to MConfigurator or MBuilder.

The situation with the join between marts in 0.8 is identical to that of 0.7 (ei there are no provisions for the outer join). However, your email alerted me to this fact and this is definitely something that we need to think about. There is one more situation involving inlist that has not been solved either ei annotating external identifiers that effectively is an inner join as well and has been brought to our attention before. Technically the solution for both should be the same. This is definitely doable and something worth scheduling in the near future.


Junjun: please check where we are with the DATASET outer join implementation.


cheers,
a



On Mon, Sep 5, 2011 at 7:29 AM, <[hidden email]> wrote:

Dear BioMart users,

 

We are setting up a local database powered by BioMart inorder to annotate our local data with the data of other marts. The problem is that the type of join that BioMart is performing between tables seems to be an inner join since BioMart solely outputs all the records that match in the two marts linked. My question is, if this is true, are you planning to add outer joins in order to get in addition all the unmatched results? In our case, if we want to annotate our database with another mart data, we would perform a left join to get all the matched records + all the unmatched records from our database. And in case you are planning to implement this, when do you predict that this implementation will be accessible for us, the BioMart users?

 

This type of outputting the query results, i.e. to output both the matched and unmatched records from our database, is of highly interest in order to annotate BUT without losing information from the query because of the annotation process. Many thanks in advance.

 

Please, do not hesitate to ask me for more information.

 

Yours faithfully,

 

Núria

 

 


_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users





_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: [BioMart Users] OUTER JOIN between tables

Junjun Zhang
In reply to this post by Arek Kasprzyk
OK, let's make 'left join' the default behaviour. However, we will have to bare it in mind that there are use cases that require 'inner join' between data marts. For example, find the common genes that are over expressed in breast cancer and pancreatic cancer datasets. I understand that we do not support this type of query in the ICGC data portal at this time, but since this is a frequently requested use case we should properly design and implement support for that in the near future. Let's discuss details later.
Cheers,
Junjun


From: Arek Kasprzyk <[hidden email]>
Date: Tue, 6 Sep 2011 17:55:24 -0400
To: jzhang <[hidden email]>
Cc: "[hidden email]" <[hidden email]>, "[hidden email]" <[hidden email]>
Subject: Re: [BioMart Users] OUTER JOIN between tables

<snip>

>2. left join with inter mart querying needs to be solved with in list
>annotation at the same time

Just wanted to confirm with you that I understand the 'in list annotation'
use case correctly. Do you think the query should behave as inner join for
'in list annotation' and BioMart should support both left join and inner
join between different marts? If so, then that's what I see it as well.
For that we can make 'left join' the default behaviour (as it's more
common) and provide a mechanism to override this in case deployer wants
alter that to 'inner join'.


</snip>

I would just do everything as  default left join. In this way people would not be loosing their identifiers
when trying to annotate stuff that does not exist in a given mart. For inter-mart querying just the same. We can
discuss more technical details offline at some point

a

_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: [BioMart Users] OUTER JOIN between tables

Arek Kasprzyk
Let's do it properly. Why don't you make a proposal taking into account all use cases and I'll review it.
We can take it offlline

a


On Tue, Sep 6, 2011 at 6:28 PM, Junjun Zhang <[hidden email]> wrote:
OK, let's make 'left join' the default behaviour. However, we will have to bare it in mind that there are use cases that require 'inner join' between data marts. For example, find the common genes that are over expressed in breast cancer and pancreatic cancer datasets. I understand that we do not support this type of query in the ICGC data portal at this time, but since this is a frequently requested use case we should properly design and implement support for that in the near future. Let's discuss details later.
Cheers,
Junjun


From: Arek Kasprzyk <[hidden email]>
Date: Tue, 6 Sep 2011 17:55:24 -0400

To: jzhang <[hidden email]>
Cc: "[hidden email]" <[hidden email]>, "[hidden email]" <[hidden email]>
Subject: Re: [BioMart Users] OUTER JOIN between tables

<snip>

>2. left join with inter mart querying needs to be solved with in list
>annotation at the same time

Just wanted to confirm with you that I understand the 'in list annotation'
use case correctly. Do you think the query should behave as inner join for
'in list annotation' and BioMart should support both left join and inner
join between different marts? If so, then that's what I see it as well.
For that we can make 'left join' the default behaviour (as it's more
common) and provide a mechanism to override this in case deployer wants
alter that to 'inner join'.


</snip>

I would just do everything as  default left join. In this way people would not be loosing their identifiers
when trying to annotate stuff that does not exist in a given mart. For inter-mart querying just the same. We can
discuss more technical details offline at some point

a


_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: [BioMart Users] OUTER JOIN between tables

nqueralt

Thank you Arek and JunJun for your fast reply, concern and excellent explanations.

 

 

Cheers,

 

Núria


De: Arek Kasprzyk [mailto:[hidden email]]
Enviado el: miércoles, 07 de septiembre de 2011 0:38
Para: Junjun Zhang
CC: QUERALT, NURIA (IDIBAPS); [hidden email]
Asunto: Re: [BioMart Users] OUTER JOIN between tables

 

Let's do it properly. Why don't you make a proposal taking into account all use cases and I'll review it.
We can take it offlline

a

On Tue, Sep 6, 2011 at 6:28 PM, Junjun Zhang <[hidden email]> wrote:

OK, let's make 'left join' the default behaviour. However, we will have to bare it in mind that there are use cases that require 'inner join' between data marts. For example, find the common genes that are over expressed in breast cancer and pancreatic cancer datasets. I understand that we do not support this type of query in the ICGC data portal at this time, but since this is a frequently requested use case we should properly design and implement support for that in the near future. Let's discuss details later.

Cheers,

Junjun

 

 

From: Arek Kasprzyk <[hidden email]>

Date: Tue, 6 Sep 2011 17:55:24 -0400


To: jzhang <[hidden email]>
Cc: "[hidden email]" <[hidden email]>, "[hidden email]" <[hidden email]>
Subject: Re: [BioMart Users] OUTER JOIN between tables

 

<snip>

>2. left join with inter mart querying needs to be solved with in list
>annotation at the same time

Just wanted to confirm with you that I understand the 'in list annotation'
use case correctly. Do you think the query should behave as inner join for
'in list annotation' and BioMart should support both left join and inner
join between different marts? If so, then that's what I see it as well.
For that we can make 'left join' the default behaviour (as it's more
common) and provide a mechanism to override this in case deployer wants
alter that to 'inner join'.

 


</snip>

I would just do everything as  default left join. In this way people would not be loosing their identifiers
when trying to annotate stuff that does not exist in a given mart. For inter-mart querying just the same. We can
discuss more technical details offline at some point

a

 


_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users