Re: [BioMart Users] Queries being cut off early with no warning...

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: [BioMart Users] Queries being cut off early with no warning...

Arek Kasprzyk
Hi Rhoda,
(cc'ing users because this can be of interest to others).
there is no active development on 0.7 anymore. However there are still some 'generic' tricks you could use to improve your situation;

1. Ask people to go through 'download via email' route for more heavy queries
2. Limit attributes combination that results in many and heavy table joins via
a. using 'max select' when configuring mart
b. simply removing some atts
3. Using 'default' filters to limit the queries

However, i would start by checking two things:

1. Load on the server. The performance of the queries are hugely affected by that and this can be very misleading. If the load is high even very 'innnocent' queries take ages. If this is the case perhaps you need more hardware?
2. Type of the heavy queries that people do most often. If you could tell me what they are perhaps we could come up with a solution that would target just those queries?



a





On Tue, Sep 20, 2011 at 5:47 AM, Rhoda Kinsella <[hidden email]> wrote:
Hi Arek and Junjun
I have a query about BioMart and perhaps you can give me some advice about how to solve this or whether something can be added to the code to rectify it. Basically we are getting an increasing number of users reporting that they are only getting partial result files or no result files back when they use biomart and they are complaining that there was no warning or error message. I have asked our webteam about a cut off time that they have set for queries to see if this has been changed. This was put in place some time ago as some queries were taking too long and killing the servers or people kept resubmitting the same query over and over and this froze the servers for everyone else. I was wondering if you have implemented or are planning to implement some sort of queuing system for queries in the new code or would it be possible to warn users if they have not got an incomplete file download. I fear that some users are ploughing ahead with their work and not realizing they are missing a chunk of the data. Is there a way that we can automatically warn users that they are asking for too much data all at once and ask them to apply more filters? Is there anything that I can do with our current 0.7 version to try to deal with this issue? I'm worried people are going to start using alternatives to Biomart if this continues. Any help or advice would be greatly appreciated.
Regards
Rhoda


Rhoda Kinsella Ph.D.
Ensembl Bioinformatician,
European Bioinformatics Institute (EMBL-EBI),
Wellcome Trust Genome Campus,
Hinxton
Cambridge CB10 1SD,
UK.



_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: [BioMart Users] Queries being cut off early with no warning...

Rhoda Kinsella
Hi Arek
The helpdesk team and I have worked together to try to help users by making the same suggestions you mentioned in your email (i.e encouraging use of filters and limiting the number of attributes selected, using "download results via email" option etc..) and I have also implemented max select in several places in the configuration. I think we are going to have to look at streamlining the data we provide in some way in the future. The issue is that the volume of data is growing, especially for variation and as the tables get bigger the queries take longer. I know that the load on the server can sometimes be very high and that this affects user response times. Have you guys tried partitioning of data to improve build time and/or result response time and had any success with this?
Regards,
Rhoda

On 20 Sep 2011, at 13:21, Arek Kasprzyk wrote:

Hi Rhoda,
(cc'ing users because this can be of interest to others).
there is no active development on 0.7 anymore. However there are still some 'generic' tricks you could use to improve your situation;

1. Ask people to go through 'download via email' route for more heavy queries
2. Limit attributes combination that results in many and heavy table joins via
a. using 'max select' when configuring mart
b. simply removing some atts
3. Using 'default' filters to limit the queries

However, i would start by checking two things:

1. Load on the server. The performance of the queries are hugely affected by that and this can be very misleading. If the load is high even very 'innnocent' queries take ages. If this is the case perhaps you need more hardware?
2. Type of the heavy queries that people do most often. If you could tell me what they are perhaps we could come up with a solution that would target just those queries?



a





On Tue, Sep 20, 2011 at 5:47 AM, Rhoda Kinsella <[hidden email]> wrote:
Hi Arek and Junjun
I have a query about BioMart and perhaps you can give me some advice about how to solve this or whether something can be added to the code to rectify it. Basically we are getting an increasing number of users reporting that they are only getting partial result files or no result files back when they use biomart and they are complaining that there was no warning or error message. I have asked our webteam about a cut off time that they have set for queries to see if this has been changed. This was put in place some time ago as some queries were taking too long and killing the servers or people kept resubmitting the same query over and over and this froze the servers for everyone else. I was wondering if you have implemented or are planning to implement some sort of queuing system for queries in the new code or would it be possible to warn users if they have not got an incomplete file download. I fear that some users are ploughing ahead with their work and not realizing they are missing a chunk of the data. Is there a way that we can automatically warn users that they are asking for too much data all at once and ask them to apply more filters? Is there anything that I can do with our current 0.7 version to try to deal with this issue? I'm worried people are going to start using alternatives to Biomart if this continues. Any help or advice would be greatly appreciated.
Regards
Rhoda


Rhoda Kinsella Ph.D.
Ensembl Bioinformatician,
European Bioinformatics Institute (EMBL-EBI),
Wellcome Trust Genome Campus,
Hinxton
Cambridge CB10 1SD,
UK.



Rhoda Kinsella Ph.D.
Ensembl Bioinformatician,
European Bioinformatics Institute (EMBL-EBI),
Wellcome Trust Genome Campus, 
Hinxton
Cambridge CB10 1SD,
UK.


_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: [BioMart Users] Queries being cut off early with no warning...

Arek Kasprzyk
Hi Rhoda,
yes we are using partitioning a lot for the ICGC portal but this rely on the fact that the datasets there lend themselves naturally into a partitioning solution ei different tumor types. The new 'parallel' and streaming query engine thanks to Syed's work helps great with that. For the variation you could use in the future a similar solution and partition your datasets by chromosome. This seems to be quite natural as well.

For 0.7 I would strongly encourage you to try to figure which are the 'killer' queries so we could look into that in more details and come up with some sort of more targetted solution. As far as filters are concerned i was talking mostly about 'default fliters' ei filters that could be switched on at all times e.g chromosome without a user being able to switch them off. I think MEditor provides support for that. I know it is a crude solution but mabye would help you a bit.

a


On Tue, Sep 20, 2011 at 9:08 AM, Rhoda Kinsella <[hidden email]> wrote:
Hi Arek
The helpdesk team and I have worked together to try to help users by making the same suggestions you mentioned in your email (i.e encouraging use of filters and limiting the number of attributes selected, using "download results via email" option etc..) and I have also implemented max select in several places in the configuration. I think we are going to have to look at streamlining the data we provide in some way in the future. The issue is that the volume of data is growing, especially for variation and as the tables get bigger the queries take longer. I know that the load on the server can sometimes be very high and that this affects user response times. Have you guys tried partitioning of data to improve build time and/or result response time and had any success with this?
Regards,
Rhoda

On 20 Sep 2011, at 13:21, Arek Kasprzyk wrote:

Hi Rhoda,
(cc'ing users because this can be of interest to others).
there is no active development on 0.7 anymore. However there are still some 'generic' tricks you could use to improve your situation;

1. Ask people to go through 'download via email' route for more heavy queries
2. Limit attributes combination that results in many and heavy table joins via
a. using 'max select' when configuring mart
b. simply removing some atts
3. Using 'default' filters to limit the queries

However, i would start by checking two things:

1. Load on the server. The performance of the queries are hugely affected by that and this can be very misleading. If the load is high even very 'innnocent' queries take ages. If this is the case perhaps you need more hardware?
2. Type of the heavy queries that people do most often. If you could tell me what they are perhaps we could come up with a solution that would target just those queries?



a





On Tue, Sep 20, 2011 at 5:47 AM, Rhoda Kinsella <[hidden email]> wrote:
Hi Arek and Junjun
I have a query about BioMart and perhaps you can give me some advice about how to solve this or whether something can be added to the code to rectify it. Basically we are getting an increasing number of users reporting that they are only getting partial result files or no result files back when they use biomart and they are complaining that there was no warning or error message. I have asked our webteam about a cut off time that they have set for queries to see if this has been changed. This was put in place some time ago as some queries were taking too long and killing the servers or people kept resubmitting the same query over and over and this froze the servers for everyone else. I was wondering if you have implemented or are planning to implement some sort of queuing system for queries in the new code or would it be possible to warn users if they have not got an incomplete file download. I fear that some users are ploughing ahead with their work and not realizing they are missing a chunk of the data. Is there a way that we can automatically warn users that they are asking for too much data all at once and ask them to apply more filters? Is there anything that I can do with our current 0.7 version to try to deal with this issue? I'm worried people are going to start using alternatives to Biomart if this continues. Any help or advice would be greatly appreciated.
Regards
Rhoda


Rhoda Kinsella Ph.D.
Ensembl Bioinformatician,
European Bioinformatics Institute (EMBL-EBI),
Wellcome Trust Genome Campus,
Hinxton
Cambridge CB10 1SD,
UK.



Rhoda Kinsella Ph.D.
Ensembl Bioinformatician,
European Bioinformatics Institute (EMBL-EBI),
Wellcome Trust Genome Campus, 
Hinxton
Cambridge CB10 1SD,
UK.



_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: [BioMart Users] Queries being cut off early with no warning...

Rhoda Kinsella
Hi Arek
Thank you for all your suggestions. I will look again at our filters and attributes and see what I can do to improve things for the next few releases before we move to 0.8. Is there some documentation on the new 'parallel' and streaming query engine and the ICGC partitioning solution so I can see what was involved? Or is there someone in particular I can contact about this if I need advice?
Regards
Rhoda

On 20 Sep 2011, at 14:28, Arek Kasprzyk wrote:

Hi Rhoda,
yes we are using partitioning a lot for the ICGC portal but this rely on the fact that the datasets there lend themselves naturally into a partitioning solution ei different tumor types. The new 'parallel' and streaming query engine thanks to Syed's work helps great with that. For the variation you could use in the future a similar solution and partition your datasets by chromosome. This seems to be quite natural as well.

For 0.7 I would strongly encourage you to try to figure which are the 'killer' queries so we could look into that in more details and come up with some sort of more targetted solution. As far as filters are concerned i was talking mostly about 'default fliters' ei filters that could be switched on at all times e.g chromosome without a user being able to switch them off. I think MEditor provides support for that. I know it is a crude solution but mabye would help you a bit.

a


On Tue, Sep 20, 2011 at 9:08 AM, Rhoda Kinsella <[hidden email]> wrote:
Hi Arek
The helpdesk team and I have worked together to try to help users by making the same suggestions you mentioned in your email (i.e encouraging use of filters and limiting the number of attributes selected, using "download results via email" option etc..) and I have also implemented max select in several places in the configuration. I think we are going to have to look at streamlining the data we provide in some way in the future. The issue is that the volume of data is growing, especially for variation and as the tables get bigger the queries take longer. I know that the load on the server can sometimes be very high and that this affects user response times. Have you guys tried partitioning of data to improve build time and/or result response time and had any success with this?
Regards,
Rhoda

On 20 Sep 2011, at 13:21, Arek Kasprzyk wrote:

Hi Rhoda,
(cc'ing users because this can be of interest to others).
there is no active development on 0.7 anymore. However there are still some 'generic' tricks you could use to improve your situation;

1. Ask people to go through 'download via email' route for more heavy queries
2. Limit attributes combination that results in many and heavy table joins via
a. using 'max select' when configuring mart
b. simply removing some atts
3. Using 'default' filters to limit the queries

However, i would start by checking two things:

1. Load on the server. The performance of the queries are hugely affected by that and this can be very misleading. If the load is high even very 'innnocent' queries take ages. If this is the case perhaps you need more hardware?
2. Type of the heavy queries that people do most often. If you could tell me what they are perhaps we could come up with a solution that would target just those queries?



a





On Tue, Sep 20, 2011 at 5:47 AM, Rhoda Kinsella <[hidden email]> wrote:
Hi Arek and Junjun
I have a query about BioMart and perhaps you can give me some advice about how to solve this or whether something can be added to the code to rectify it. Basically we are getting an increasing number of users reporting that they are only getting partial result files or no result files back when they use biomart and they are complaining that there was no warning or error message. I have asked our webteam about a cut off time that they have set for queries to see if this has been changed. This was put in place some time ago as some queries were taking too long and killing the servers or people kept resubmitting the same query over and over and this froze the servers for everyone else. I was wondering if you have implemented or are planning to implement some sort of queuing system for queries in the new code or would it be possible to warn users if they have not got an incomplete file download. I fear that some users are ploughing ahead with their work and not realizing they are missing a chunk of the data. Is there a way that we can automatically warn users that they are asking for too much data all at once and ask them to apply more filters? Is there anything that I can do with our current 0.7 version to try to deal with this issue? I'm worried people are going to start using alternatives to Biomart if this continues. Any help or advice would be greatly appreciated.
Regards
Rhoda


Rhoda Kinsella Ph.D.
Ensembl Bioinformatician,
European Bioinformatics Institute (EMBL-EBI),
Wellcome Trust Genome Campus,
Hinxton
Cambridge CB10 1SD,
UK.



Rhoda Kinsella Ph.D.
Ensembl Bioinformatician,
European Bioinformatics Institute (EMBL-EBI),
Wellcome Trust Genome Campus, 
Hinxton
Cambridge CB10 1SD,
UK.



Rhoda Kinsella Ph.D.
Ensembl Bioinformatician,
European Bioinformatics Institute (EMBL-EBI),
Wellcome Trust Genome Campus, 
Hinxton
Cambridge CB10 1SD,
UK.


_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: [BioMart Users] Queries being cut off early with no warning...

Arek Kasprzyk
Hi Rhoda,
yes there is:

http://database.oxfordjournals.org/content/2011/bar038.full?keytype=ref&ijkey=5Qv7xNnHDCNJP91

Syed designed and implemented the parallel query engine. We have not really changed anything since then. I am sure he will be happy to talk to you abou this


a


On Tue, Sep 20, 2011 at 9:37 AM, Rhoda Kinsella <[hidden email]> wrote:
Hi Arek
Thank you for all your suggestions. I will look again at our filters and attributes and see what I can do to improve things for the next few releases before we move to 0.8. Is there some documentation on the new 'parallel' and streaming query engine and the ICGC partitioning solution so I can see what was involved? Or is there someone in particular I can contact about this if I need advice?
Regards
Rhoda

On 20 Sep 2011, at 14:28, Arek Kasprzyk wrote:

Hi Rhoda,
yes we are using partitioning a lot for the ICGC portal but this rely on the fact that the datasets there lend themselves naturally into a partitioning solution ei different tumor types. The new 'parallel' and streaming query engine thanks to Syed's work helps great with that. For the variation you could use in the future a similar solution and partition your datasets by chromosome. This seems to be quite natural as well.

For 0.7 I would strongly encourage you to try to figure which are the 'killer' queries so we could look into that in more details and come up with some sort of more targetted solution. As far as filters are concerned i was talking mostly about 'default fliters' ei filters that could be switched on at all times e.g chromosome without a user being able to switch them off. I think MEditor provides support for that. I know it is a crude solution but mabye would help you a bit.

a


On Tue, Sep 20, 2011 at 9:08 AM, Rhoda Kinsella <[hidden email]> wrote:
Hi Arek
The helpdesk team and I have worked together to try to help users by making the same suggestions you mentioned in your email (i.e encouraging use of filters and limiting the number of attributes selected, using "download results via email" option etc..) and I have also implemented max select in several places in the configuration. I think we are going to have to look at streamlining the data we provide in some way in the future. The issue is that the volume of data is growing, especially for variation and as the tables get bigger the queries take longer. I know that the load on the server can sometimes be very high and that this affects user response times. Have you guys tried partitioning of data to improve build time and/or result response time and had any success with this?
Regards,
Rhoda

On 20 Sep 2011, at 13:21, Arek Kasprzyk wrote:

Hi Rhoda,
(cc'ing users because this can be of interest to others).
there is no active development on 0.7 anymore. However there are still some 'generic' tricks you could use to improve your situation;

1. Ask people to go through 'download via email' route for more heavy queries
2. Limit attributes combination that results in many and heavy table joins via
a. using 'max select' when configuring mart
b. simply removing some atts
3. Using 'default' filters to limit the queries

However, i would start by checking two things:

1. Load on the server. The performance of the queries are hugely affected by that and this can be very misleading. If the load is high even very 'innnocent' queries take ages. If this is the case perhaps you need more hardware?
2. Type of the heavy queries that people do most often. If you could tell me what they are perhaps we could come up with a solution that would target just those queries?



a





On Tue, Sep 20, 2011 at 5:47 AM, Rhoda Kinsella <[hidden email]> wrote:
Hi Arek and Junjun
I have a query about BioMart and perhaps you can give me some advice about how to solve this or whether something can be added to the code to rectify it. Basically we are getting an increasing number of users reporting that they are only getting partial result files or no result files back when they use biomart and they are complaining that there was no warning or error message. I have asked our webteam about a cut off time that they have set for queries to see if this has been changed. This was put in place some time ago as some queries were taking too long and killing the servers or people kept resubmitting the same query over and over and this froze the servers for everyone else. I was wondering if you have implemented or are planning to implement some sort of queuing system for queries in the new code or would it be possible to warn users if they have not got an incomplete file download. I fear that some users are ploughing ahead with their work and not realizing they are missing a chunk of the data. Is there a way that we can automatically warn users that they are asking for too much data all at once and ask them to apply more filters? Is there anything that I can do with our current 0.7 version to try to deal with this issue? I'm worried people are going to start using alternatives to Biomart if this continues. Any help or advice would be greatly appreciated.
Regards
Rhoda


Rhoda Kinsella Ph.D.
Ensembl Bioinformatician,
European Bioinformatics Institute (EMBL-EBI),
Wellcome Trust Genome Campus,
Hinxton
Cambridge CB10 1SD,
UK.



Rhoda Kinsella Ph.D.
Ensembl Bioinformatician,
European Bioinformatics Institute (EMBL-EBI),
Wellcome Trust Genome Campus, 
Hinxton
Cambridge CB10 1SD,
UK.



Rhoda Kinsella Ph.D.
Ensembl Bioinformatician,
European Bioinformatics Institute (EMBL-EBI),
Wellcome Trust Genome Campus, 
Hinxton
Cambridge CB10 1SD,
UK.



_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: [BioMart Users] Queries being cut off early with no warning...

Rhoda Kinsella
Thanks Arek
Rhoda

On 20 Sep 2011, at 15:15, Arek Kasprzyk wrote:

Hi Rhoda,
yes there is:

http://database.oxfordjournals.org/content/2011/bar038.full?keytype=ref&ijkey=5Qv7xNnHDCNJP91

Syed designed and implemented the parallel query engine. We have not really changed anything since then. I am sure he will be happy to talk to you abou this


a


On Tue, Sep 20, 2011 at 9:37 AM, Rhoda Kinsella <[hidden email]> wrote:
Hi Arek
Thank you for all your suggestions. I will look again at our filters and attributes and see what I can do to improve things for the next few releases before we move to 0.8. Is there some documentation on the new 'parallel' and streaming query engine and the ICGC partitioning solution so I can see what was involved? Or is there someone in particular I can contact about this if I need advice?
Regards
Rhoda

On 20 Sep 2011, at 14:28, Arek Kasprzyk wrote:

Hi Rhoda,
yes we are using partitioning a lot for the ICGC portal but this rely on the fact that the datasets there lend themselves naturally into a partitioning solution ei different tumor types. The new 'parallel' and streaming query engine thanks to Syed's work helps great with that. For the variation you could use in the future a similar solution and partition your datasets by chromosome. This seems to be quite natural as well.

For 0.7 I would strongly encourage you to try to figure which are the 'killer' queries so we could look into that in more details and come up with some sort of more targetted solution. As far as filters are concerned i was talking mostly about 'default fliters' ei filters that could be switched on at all times e.g chromosome without a user being able to switch them off. I think MEditor provides support for that. I know it is a crude solution but mabye would help you a bit.

a


On Tue, Sep 20, 2011 at 9:08 AM, Rhoda Kinsella <[hidden email]> wrote:
Hi Arek
The helpdesk team and I have worked together to try to help users by making the same suggestions you mentioned in your email (i.e encouraging use of filters and limiting the number of attributes selected, using "download results via email" option etc..) and I have also implemented max select in several places in the configuration. I think we are going to have to look at streamlining the data we provide in some way in the future. The issue is that the volume of data is growing, especially for variation and as the tables get bigger the queries take longer. I know that the load on the server can sometimes be very high and that this affects user response times. Have you guys tried partitioning of data to improve build time and/or result response time and had any success with this?
Regards,
Rhoda

On 20 Sep 2011, at 13:21, Arek Kasprzyk wrote:

Hi Rhoda,
(cc'ing users because this can be of interest to others).
there is no active development on 0.7 anymore. However there are still some 'generic' tricks you could use to improve your situation;

1. Ask people to go through 'download via email' route for more heavy queries
2. Limit attributes combination that results in many and heavy table joins via
a. using 'max select' when configuring mart
b. simply removing some atts
3. Using 'default' filters to limit the queries

However, i would start by checking two things:

1. Load on the server. The performance of the queries are hugely affected by that and this can be very misleading. If the load is high even very 'innnocent' queries take ages. If this is the case perhaps you need more hardware?
2. Type of the heavy queries that people do most often. If you could tell me what they are perhaps we could come up with a solution that would target just those queries?



a





On Tue, Sep 20, 2011 at 5:47 AM, Rhoda Kinsella <[hidden email]> wrote:
Hi Arek and Junjun
I have a query about BioMart and perhaps you can give me some advice about how to solve this or whether something can be added to the code to rectify it. Basically we are getting an increasing number of users reporting that they are only getting partial result files or no result files back when they use biomart and they are complaining that there was no warning or error message. I have asked our webteam about a cut off time that they have set for queries to see if this has been changed. This was put in place some time ago as some queries were taking too long and killing the servers or people kept resubmitting the same query over and over and this froze the servers for everyone else. I was wondering if you have implemented or are planning to implement some sort of queuing system for queries in the new code or would it be possible to warn users if they have not got an incomplete file download. I fear that some users are ploughing ahead with their work and not realizing they are missing a chunk of the data. Is there a way that we can automatically warn users that they are asking for too much data all at once and ask them to apply more filters? Is there anything that I can do with our current 0.7 version to try to deal with this issue? I'm worried people are going to start using alternatives to Biomart if this continues. Any help or advice would be greatly appreciated.
Regards
Rhoda


Rhoda Kinsella Ph.D.
Ensembl Bioinformatician,
European Bioinformatics Institute (EMBL-EBI),
Wellcome Trust Genome Campus,
Hinxton
Cambridge CB10 1SD,
UK.



Rhoda Kinsella Ph.D.
Ensembl Bioinformatician,
European Bioinformatics Institute (EMBL-EBI),
Wellcome Trust Genome Campus, 
Hinxton
Cambridge CB10 1SD,
UK.



Rhoda Kinsella Ph.D.
Ensembl Bioinformatician,
European Bioinformatics Institute (EMBL-EBI),
Wellcome Trust Genome Campus, 
Hinxton
Cambridge CB10 1SD,
UK.



Rhoda Kinsella Ph.D.
Ensembl Bioinformatician,
European Bioinformatics Institute (EMBL-EBI),
Wellcome Trust Genome Campus, 
Hinxton
Cambridge CB10 1SD,
UK.


_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users
Reply | Threaded
Open this post in threaded view
|

Re: [BioMart Users] Queries being cut off early with no warning...

Syed Haider
In reply to this post by Arek Kasprzyk
Hi Rhoda,

As Arek pointed out, its described in the reference, feel free to
contact me if you need further help on this.

With reference to v0.7, if users want assurance whether they have got
full resultset, there is an additional flag *only* available for
webservice api requests, completionStamp="1"  e.g:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE Query>
<Query  completionStamp="1" virtualSchemaName = "default" formatter =
"TSV" header = "0" uniqueRows = "0" count = "" datasetConfigVersion =
"0.6" >
        <Dataset name = "hsapiens_gene_ensembl" interface = "default" >
                <Filter name = "chromosome_name" value = "MT"/>
                <Attribute name = "ensembl_gene_id" />
        </Dataset>
</Query>

and the results upon successful completion would have one additional row
at the end, i-e:
[success]

Hope this helps,
Syed


On 20/09/2011 15:15, Arek Kasprzyk wrote:

> Hi Rhoda,
> yes there is:
>
> http://database.oxfordjournals.org/content/2011/bar038.full?keytype=ref&ijkey=5Qv7xNnHDCNJP91
>
> Syed designed and implemented the parallel query engine. We have not really changed anything since then. I am sure he will be happy to talk to you abou this
>
>
> a
>
>
> On Tue, Sep 20, 2011 at 9:37 AM, Rhoda Kinsella<[hidden email]<mailto:[hidden email]>>  wrote:
> Hi Arek
> Thank you for all your suggestions. I will look again at our filters and attributes and see what I can do to improve things for the next few releases before we move to 0.8. Is there some documentation on the new 'parallel' and streaming query engine and the ICGC partitioning solution so I can see what was involved? Or is there someone in particular I can contact about this if I need advice?
> Regards
> Rhoda
>
> On 20 Sep 2011, at 14:28, Arek Kasprzyk wrote:
>
> Hi Rhoda,
> yes we are using partitioning a lot for the ICGC portal but this rely on the fact that the datasets there lend themselves naturally into a partitioning solution ei different tumor types. The new 'parallel' and streaming query engine thanks to Syed's work helps great with that. For the variation you could use in the future a similar solution and partition your datasets by chromosome. This seems to be quite natural as well.
>
> For 0.7 I would strongly encourage you to try to figure which are the 'killer' queries so we could look into that in more details and come up with some sort of more targetted solution. As far as filters are concerned i was talking mostly about 'default fliters' ei filters that could be switched on at all times e.g chromosome without a user being able to switch them off. I think MEditor provides support for that. I know it is a crude solution but mabye would help you a bit.
>
> a
>
>
> On Tue, Sep 20, 2011 at 9:08 AM, Rhoda Kinsella<[hidden email]<mailto:[hidden email]>>  wrote:
> Hi Arek
> The helpdesk team and I have worked together to try to help users by making the same suggestions you mentioned in your email (i.e encouraging use of filters and limiting the number of attributes selected, using "download results via email" option etc..) and I have also implemented max select in several places in the configuration. I think we are going to have to look at streamlining the data we provide in some way in the future. The issue is that the volume of data is growing, especially for variation and as the tables get bigger the queries take longer. I know that the load on the server can sometimes be very high and that this affects user response times. Have you guys tried partitioning of data to improve build time and/or result response time and had any success with this?
> Regards,
> Rhoda
>
> On 20 Sep 2011, at 13:21, Arek Kasprzyk wrote:
>
> Hi Rhoda,
> (cc'ing users because this can be of interest to others).
> there is no active development on 0.7 anymore. However there are still some 'generic' tricks you could use to improve your situation;
>
> 1. Ask people to go through 'download via email' route for more heavy queries
> 2. Limit attributes combination that results in many and heavy table joins via
> a. using 'max select' when configuring mart
> b. simply removing some atts
> 3. Using 'default' filters to limit the queries
>
> However, i would start by checking two things:
>
> 1. Load on the server. The performance of the queries are hugely affected by that and this can be very misleading. If the load is high even very 'innnocent' queries take ages. If this is the case perhaps you need more hardware?
> 2. Type of the heavy queries that people do most often. If you could tell me what they are perhaps we could come up with a solution that would target just those queries?
>
>
>
> a
>
>
>
>
>
> On Tue, Sep 20, 2011 at 5:47 AM, Rhoda Kinsella<[hidden email]<mailto:[hidden email]>>  wrote:
> Hi Arek and Junjun
> I have a query about BioMart and perhaps you can give me some advice about how to solve this or whether something can be added to the code to rectify it. Basically we are getting an increasing number of users reporting that they are only getting partial result files or no result files back when they use biomart and they are complaining that there was no warning or error message. I have asked our webteam about a cut off time that they have set for queries to see if this has been changed. This was put in place some time ago as some queries were taking too long and killing the servers or people kept resubmitting the same query over and over and this froze the servers for everyone else. I was wondering if you have implemented or are planning to implement some sort of queuing system for queries in the new code or would it be possible to warn users if they have not got an incomplete file download. I fear that some users are ploughing ahead with their work and not realizing they a
 
re missing a chunk of the data. Is there a way that we can automatically warn users that they are asking for too much data all at once and ask them to apply more filters? Is there anything that I can do with our current 0.7 version to try to deal with this issue? I'm worried people are going to start using alternatives to Biomart if this continues. Any help or advice would be greatly appreciated.

> Regards
> Rhoda
>
>
> Rhoda Kinsella Ph.D.
> Ensembl Bioinformatician,
> European Bioinformatics Institute (EMBL-EBI),
> Wellcome Trust Genome Campus,
> Hinxton
> Cambridge CB10 1SD,
> UK.
>
>
>
> Rhoda Kinsella Ph.D.
> Ensembl Bioinformatician,
> European Bioinformatics Institute (EMBL-EBI),
> Wellcome Trust Genome Campus,
> Hinxton
> Cambridge CB10 1SD,
> UK.
>
>
>
> Rhoda Kinsella Ph.D.
> Ensembl Bioinformatician,
> European Bioinformatics Institute (EMBL-EBI),
> Wellcome Trust Genome Campus,
> Hinxton
> Cambridge CB10 1SD,
> UK.
>
>
>
_______________________________________________
Users mailing list
[hidden email]
https://lists.biomart.org/mailman/listinfo/users