Re: strangeness in query on simple object table.

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: strangeness in query on simple object table.

joe carlson
Hi Richard,

I was taking another look at the issue of generating errors when
creating an index on a temporary table. I think the issue lies in
buildOrderBy in SQLGenerator.

For the SimpleObject tables, there is no id column, so at line 2486 (in
my copy) queryClassToString gets called with NO_ALIASES_ALL_FIELDS. The
string buffer that will be used for the ORDER BY fields gets filled with
all columns of the table: "alias.col1, alias.col2,..." But then the
HashSet "seen" is examined to see if we've added this column before, you
look for the entire string buffer contents and not the individual elements.

I'll try a push request. But the gist is that I do a split on the buffer
and ask if the individual columns have already been added to the order
by list:

>                    for(String bufferBit : buffer.toString().split(",")) {
>                       if (!seen.contains(bufferBit)) {
>                         retval.append(needComma ? ", " : " ORDER BY ");
>                         needComma = true;
>                         retval.append(bufferBit);
>                         seen.add(bufferBit);
>                         if ((!q.getSelect().contains(node))
>                             && (!q.getSelect().contains(new
> QueryField((QueryClass) node,
>                                 "id")))) {
>                           if (q.isDistinct()) {
>                             throw new ObjectStoreException("Class " +
> q.getAliases().get(node)
>                                 + " in the ORDER BY list must be in
> the SELECT list, or its"
>                                 + " id, or the query made non-distinct");
>                           } else if ((kind == QUERY_FOR_PRECOMP)
>                               || (kind == QUERY_FOR_GOFASTER)) {
>                             state.addToOrderBy(bufferBit);
>                           }
>                         }
>                       }
>                     }

(I'm not too sure if this handles the condition of kind ==
QUERY_FOR_PRECOMP... correctly. Let me know if this is not correct.

This makes sense since I was only seeing the error in cases where I was
using the SimpleObject in the query.

But: now that I have the list of columns cleaned up, I am running into
the next problem. Namely, one of the columns in the temporary table has
large text fields. Since this column is listed in the ORDER BY fields,
I'm attempting to include this column in the index. And postgres is
objecting to the fact that the column is too big to index. But that is
for another day.

Joe


On 10/08/2014 05:23 PM, Joe Carlson wrote:

> On Oct 8, 2014, at 2:35 AM, Richard Smith <[hidden email]> wrote:
>
>> Hi Joe,
>> Could you send me the code change you made to correct the duplicated
>> fields in the precomputed tables please?  This may have already been fixed
>> on the beta branch but I'd like to check.
> I am not proud of what I did here.
>
> I wanted to try to figure out why I was getting duplicated column names in the index. But in my haste of getting something to work I simple masked the problem and took out repeated column names.
>
> I hacked into PrecomutedTableManager:
>
>     protected void addIndex(String table, String field, Connection con) {
>          String simpleField = field;
>          if (simpleField.charAt(0) == '"') {
>              simpleField = simpleField.substring(1, simpleField.length() - 1);
>          }
>          StringBuffer rewrite = new StringBuffer();
>          HashSet<String> seen = new HashSet();
>          for(String f: field.split(",")) {
>            if (!seen.contains(f.trim())) {
>              if (!rewrite.toString().isEmpty()) {
>                rewrite.append(",");
>              }
>              rewrite.append(f);
>              seen.add(f);
>            }
>          }
>          field = rewrite.toString();
> …
>
> I make no statements on whether this is the correct approach.
>
>> The loop over QuerySelectables in
>> ObjectStoreInterMineImpl.goFasterWithConnection isn't a problem, the
>> precomputed table has been created on line 2114, the loop is to find any
>> outer joins or subqueries that need an additional precomputed table.
>>
>> So, I'm not sure where the actual problem lies. How long did it take to
>> create the precomputed table?  I'm not sure why it isn't logging a proper
>> timestamp, are those seconds or milliseconds in the log file?  If it took
>> a very long time to create something else may have timed out.
>>
>
> Thanks for that info.
>
> What I had seen when stepping through the code is that the query is getting executed and results are getting returned. But when I step through code I still can’t figure out something.
>
> In QueryResultService.runResults(PathQuery,int,int), I am setting canGoFaster=true. I get to the line
>   if (it.hasNext()) { // Prime the batch fetching pumps
>
> and I see that the iterator has results
>
> This drops me down to
>
>        if (canGoFaster) {
>            ((ExportResultsIterator) it).goFaster();
>        }
>        processor.write(it, output);
>
> But when I get into JSONResultProcessor.write, the call to getResultsIterator(resultIt) returns an interator with nothing in it. I’m not getting an iterator for the precomuted table. When I look at the web page, it has the proper number of results, but no results.
>
>
>> Is the failed query repeatable? And do you see anything else in any of the
>> logs?
>>
> it is pretty repeatable. And I’ve only seen it on this query.
>
> I noticed 2 things today:
>
> 1) If I do the query as a web service call, it returns results.
>
> 2) I got the URL for the web service call by clicking on the ‘web service URL’ button on the query builder page. To my horror, I saw that this was giving me a URL that started localhost:8081/… Our public mine is good; that has the right host and port. But an internal test mine is wrong.
>
> Is #2 causing #1? Do I have something misconfigured that is causing the queries to give no results because of some misconfiguration? and how did this get misconfigured? If this is the case, then the entire simple object behavior is a red herring. I checked and other queries that use temporary tables are OK.
>
>
>
>> Incidentally, creating a precomputed table just to view a results table in
>> the webapp is unnecessary and will hopefully be fixed soon:
>> https://github.com/intermine/intermine/issues/712.
>>
>> Sorry this isn't more help yet.
>>
>
> I appreciate all comments. There is no such thing as bad knowledge.
>
> Joe
>
>
>> Richard.
>>
>>
>>> Hi Julie and Alex
>>>
>>> Thanks to both of you for the info. I'm glad to hear it wasn't just me.
>>>
>>> I had walked through the code and find this issue myself. But even when
>>> making this change, there were problems. Even when I corrected for the
>>> quotation marks  I was getting an error:
>>>
>>> org.postgresql.util.PSQLException: ERROR: duplicate key value violates
>>> unique constraint "pg_attribute_relid_attnam_index"
>>>
>>> The key list in my query has one column duplicated. So the indexing that
>>> was attempted has a field duplicated and that causes an error. I worked
>>> in some code to remove the duplicated fields. But even this wasn't a
>>> total fix.
>>>
>>> Based on the log files and what showed up on the screen, the initial
>>> select and indexing worked well. I created the temporary table, indexed
>>> it. The web page showed my the proper number of results were retrieved.
>>> But nothing was shown in the table.
>>>
>>> What is weird here is that if I look at the log files I see this:
>>>
>>> 34941 [TP-Processor10] INFO
>>> org.intermine.sql.precompute.PrecomputedTableManager  - Creating new
>>> precomputed table CREATE TABLE temporary_precomp_548223 AS SELECT
>>> DISTINCT a1_.diversitySampleId AS a1_diversitySampleId, a1_.format AS
>>> a1_format, a1_.genotype AS a1_genotype, a1_.snpId AS a1_snpId, a2_.id AS
>>> a2_id, a1_.genotype AS orderbyfield1, a2_.name AS orderbyfield2 FROM
>>> SNPDiversitySample AS a1_, DiversitySample AS a2_, SNP AS a3_,
>>> Consequence AS a4_, Gene AS a5_, ConsequencesSnps AS indirect0 WHERE
>>> a1_.diversitySampleId = a2_.id AND a1_.snpId = a3_.id AND a3_.id =
>>> indirect0.Snps AND indirect0.Consequences = a4_.id AND a4_.geneId =
>>> a5_.id AND a5_.primaryIdentifier = 'Potri.001G027800' ORDER BY
>>> a1_.genotype, a2_.name, a1_.diversitySampleId, a1_.format, a1_.genotype,
>>> a1_.snpId, a2_.id
>>> 40272 [TP-Processor10] INFO
>>> org.intermine.sql.precompute.PrecomputedTableManager  - Creating 1
>>> indexes for temporary_precomp_548223
>>> 40273 [TP-Processor10] INFO
>>> org.intermine.sql.precompute.PrecomputedTableManager  - Creating index
>>> on temporary_precomp_548223 (orderbyfield1, orderbyfield2,
>>> a1_diversitySampleId, a1_format, orderbyfield1, a1_snpId, a2_id)
>>> 41002 [TP-Processor10] INFO
>>> org.intermine.sql.precompute.PrecomputedTableManager  - ANALYSEing
>>> precomputed table temporary_precomp_548223
>>> 41530 [TP-Processor10] INFO
>>> org.intermine.sql.precompute.PrecomputedTableManager  - Finished
>>> creating precomputed table temporary_precomp_548223 for category goFaster
>>> 42088 [TP-Processor10] INFO
>>> org.intermine.sql.precompute.PrecomputedTableManager  - Dropped
>>> precomputed table temporary_precomp_548223
>>>
>>> what is strange to me is that I am used to seeing some SQL in the log
>>> files that did a select between the 'Finished creating' and 'Dropped
>>> precomputed'. But I don't see anything being selected from
>>> temporary_precomp_548223.
>>>
>>> Is this another issue? I was stepping through the code and see that in
>>> ObjectStoreInterMineImpl.goFasterWithConnection the loop over
>>> QuerySelectable qs : q.getSelect does not do anything; in my query there
>>> are 2 values for QuerySelectable and neither of them is a
>>> QueryCollectionPathExpression nor a QueryObjectPathExpression. Should
>>> that bother me?
>>>
>>> Again, I appreciate the assistance,
>>>
>>> Joe Carlson
>>>
>>> On 10/07/2014 07:01 AM, Julie Sullivan wrote:
>>>> Apologies Joe! I found this bug as well, but the fix is only on the
>>>> beta branch right now:
>>>>
>>>> https://github.com/intermine/intermine/pull/729
>>>>
>>>> It's only two lines though.
>>>>
>>>> On 07/10/14 04:37, Joe Carlson wrote:
>>>>> Hello,
>>>>>
>>>>> I’m seeing something weird. The answer to my problem may very well be
>>>>> “stop making shortcuts”. If so, I’ll go away and wait for my processing
>>>>> to continue and see how it goes.
>>>>>
>>>>> I am a bit anxious to see the query performance on my huge simple
>>>>> object
>>>>> table, so I’m building a web app before completing all my post
>>>>> processing steps. In particular, I’m not rebuilding an objectstore
>>>>> summary. I have an older one, but the thing takes a long time to build
>>>>> and I was curious if it was possible to deploy without that in place.
>>>>>
>>>>> When I tried to run some queries, I was not seeing the results in the
>>>>> table view. The header “Showing 1 to 25 of 7,XXX results” was there -
>>>>> and the number was correct - but not the table.
>>>>>
>>>>> When I looked through the web server’s log files, I saw some strange
>>>>> things in the generated SQL:
>>>>>
>>>>> 141855 [TP-Processor2] INFO
>>>>> org.intermine.sql.precompute.PrecomputedTableManager  - Creating new
>>>>> precomputed table CREATE TABLE temporary_precomp_548216 AS SELECT
>>>>> DISTINCT a1_.diversitySampleId AS a1_diversitySampleId, a1_.format AS
>>>>> a1_format, a1_.genotype AS a1_genotype, a1_.snpId AS a1_snpId, a2_.id
>>>>> AS
>>>>> a2_id, a1_.genotype AS orderbyfield1, a2_.name AS orderbyfield2 FROM
>>>>> SNPDiversitySample AS a1_, DiversitySample AS a2_, SNP AS a3_,
>>>>> Consequence AS a4_, Gene AS a5_, ConsequencesSnps AS indirect0 WHERE
>>>>> a1_.diversitySampleId = a2_.id AND a1_.snpId = a3_.id AND a3_.id =
>>>>> indirect0.Snps AND indirect0.Consequences = a4_.id AND a4_.geneId =
>>>>> a5_.id AND a5_.primaryIdentifier = 'Potri.001G027800' ORDER BY
>>>>> a1_.genotype, a2_.name, a1_.diversitySampleId, a1_.format,
>>>>> a1_.genotype,
>>>>> a1_.snpId, a2_.id
>>>>> 147332 [TP-Processor2] INFO
>>>>> org.intermine.sql.precompute.PrecomputedTableManager  - Creating 1
>>>>> indexes for temporary_precomp_548216
>>>>> 147332 [TP-Processor2] INFO
>>>>> org.intermine.sql.precompute.PrecomputedTableManager  - Creating index
>>>>> on temporary_precomp_548216 (orderbyfield1, orderbyfield2,
>>>>> a1_diversitySampleId, a1_format, orderbyfield1, a1_snpId, a2_id)
>>>>> 490353 [TP-Processor2] WARN
>>>>> org.intermine.sql.precompute.PrecomputedTableManager  - Error while
>>>>> executing CREATE INDEX temporary_precomp_548216_xiffkbkfjvhootbw ON
>>>>> temporary_precomp_548216 ("orderbyfield1, orderbyfield2,
>>>>> a1_diversitySampleId, a1_format, orderbyfield1, a1_snpId, a2_id”)
>>>>> org.postgresql.util.PSQLException: ERROR: column "orderbyfield1,
>>>>> orderbyfield2, a1_diversitySampleId, a1_format, " does not exist
>>>>>          at
>>>>> org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2094)
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> The SQL for creating the temp table is good and this has the proper
>>>>> number of entries. But there are 2 strange things in the CREATE INDEX
>>>>> statement. First, the entire list of keys in enclosed in quotes - and
>>>>> so
>>>>> postgres interprets this as 1 column. This apparently arises because
>>>>> there are capitalized field values in the keys. And also, the field
>>>>> orderfield1 is specified twice. Even when I correct for the quotation
>>>>> marks, I can’t create an index using 1 field twice.
>>>>>
>>>>> Is the weirdness of this CREATE INDEX due to the fact that it is a
>>>>> simple object table involved? Or because the objectstoresummary is not
>>>>> correct. If it’s the latter, I’ll just hold off until that finishes.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Joe Carlson
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> dev mailing list
>>>>> [hidden email]
>>>>> http://mail.intermine.org/cgi-bin/mailman/listinfo/dev
>>>>>
>>>
>>> _______________________________________________
>>> dev mailing list
>>> [hidden email]
>>> http://mail.intermine.org/cgi-bin/mailman/listinfo/dev
>>>
>>>
>


_______________________________________________
dev mailing list
[hidden email]
http://mail.intermine.org/cgi-bin/mailman/listinfo/dev
Reply | Threaded
Open this post in threaded view
|

Re: strangeness in query on simple object table.

Richard Smith
Hi Joe,
That does indeed look like a bug and the correct fix, I hadn't seen that
before (the SqlGenerator is way too complicated).  It looks like you're
also doing the right thing with QUERY_FOR_PRECOMP.

If you can create a pull request that would be great, otherwise I can
patch the fix in.

Regarding the index failure on the precomputed table - I think the table
should still be created just without that index. When I precompute a query
selecting a field with very large text values I see an error message in
the logs about a failed index but the precomputed table is successfully
created and used.

If the field in question has very large text values it's unlikely to be
constrained, so the missing index won't cause a performance problem.  We
should log a warning instead of writing the Postgres error.

Cheers,
Richard.


> Hi Richard,
>
> I was taking another look at the issue of generating errors when
> creating an index on a temporary table. I think the issue lies in
> buildOrderBy in SQLGenerator.
>
> For the SimpleObject tables, there is no id column, so at line 2486 (in
> my copy) queryClassToString gets called with NO_ALIASES_ALL_FIELDS. The
> string buffer that will be used for the ORDER BY fields gets filled with
> all columns of the table: "alias.col1, alias.col2,..." But then the
> HashSet "seen" is examined to see if we've added this column before, you
> look for the entire string buffer contents and not the individual
> elements.
>
> I'll try a push request. But the gist is that I do a split on the buffer
> and ask if the individual columns have already been added to the order
> by list:
>>                    for(String bufferBit : buffer.toString().split(","))
>> {
>>                       if (!seen.contains(bufferBit)) {
>>                         retval.append(needComma ? ", " : " ORDER BY ");
>>                         needComma = true;
>>                         retval.append(bufferBit);
>>                         seen.add(bufferBit);
>>                         if ((!q.getSelect().contains(node))
>>                             && (!q.getSelect().contains(new
>> QueryField((QueryClass) node,
>>                                 "id")))) {
>>                           if (q.isDistinct()) {
>>                             throw new ObjectStoreException("Class " +
>> q.getAliases().get(node)
>>                                 + " in the ORDER BY list must be in
>> the SELECT list, or its"
>>                                 + " id, or the query made
>> non-distinct");
>>                           } else if ((kind == QUERY_FOR_PRECOMP)
>>                               || (kind == QUERY_FOR_GOFASTER)) {
>>                             state.addToOrderBy(bufferBit);
>>                           }
>>                         }
>>                       }
>>                     }
>
> (I'm not too sure if this handles the condition of kind ==
> QUERY_FOR_PRECOMP... correctly. Let me know if this is not correct.
>
> This makes sense since I was only seeing the error in cases where I was
> using the SimpleObject in the query.
>
> But: now that I have the list of columns cleaned up, I am running into
> the next problem. Namely, one of the columns in the temporary table has
> large text fields. Since this column is listed in the ORDER BY fields,
> I'm attempting to include this column in the index. And postgres is
> objecting to the fact that the column is too big to index. But that is
> for another day.
>
> Joe
>
>
> On 10/08/2014 05:23 PM, Joe Carlson wrote:
>> On Oct 8, 2014, at 2:35 AM, Richard Smith <[hidden email]> wrote:
>>
>>> Hi Joe,
>>> Could you send me the code change you made to correct the duplicated
>>> fields in the precomputed tables please?  This may have already been
>>> fixed
>>> on the beta branch but I'd like to check.
>> I am not proud of what I did here.
>>
>> I wanted to try to figure out why I was getting duplicated column names
>> in the index. But in my haste of getting something to work I simple
>> masked the problem and took out repeated column names.
>>
>> I hacked into PrecomutedTableManager:
>>
>>     protected void addIndex(String table, String field, Connection con)
>> {
>>          String simpleField = field;
>>          if (simpleField.charAt(0) == '"') {
>>              simpleField = simpleField.substring(1, simpleField.length()
>> - 1);
>>          }
>>          StringBuffer rewrite = new StringBuffer();
>>          HashSet<String> seen = new HashSet();
>>          for(String f: field.split(",")) {
>>            if (!seen.contains(f.trim())) {
>>              if (!rewrite.toString().isEmpty()) {
>>                rewrite.append(",");
>>              }
>>              rewrite.append(f);
>>              seen.add(f);
>>            }
>>          }
>>          field = rewrite.toString();
>> …
>>
>> I make no statements on whether this is the correct approach.
>>
>>> The loop over QuerySelectables in
>>> ObjectStoreInterMineImpl.goFasterWithConnection isn't a problem, the
>>> precomputed table has been created on line 2114, the loop is to find
>>> any
>>> outer joins or subqueries that need an additional precomputed table.
>>>
>>> So, I'm not sure where the actual problem lies. How long did it take to
>>> create the precomputed table?  I'm not sure why it isn't logging a
>>> proper
>>> timestamp, are those seconds or milliseconds in the log file?  If it
>>> took
>>> a very long time to create something else may have timed out.
>>>
>>
>> Thanks for that info.
>>
>> What I had seen when stepping through the code is that the query is
>> getting executed and results are getting returned. But when I step
>> through code I still can’t figure out something.
>>
>> In QueryResultService.runResults(PathQuery,int,int), I am setting
>> canGoFaster=true. I get to the line
>>   if (it.hasNext()) { // Prime the batch fetching pumps
>>
>> and I see that the iterator has results
>>
>> This drops me down to
>>
>>        if (canGoFaster) {
>>            ((ExportResultsIterator) it).goFaster();
>>        }
>>        processor.write(it, output);
>>
>> But when I get into JSONResultProcessor.write, the call to
>> getResultsIterator(resultIt) returns an interator with nothing in it.
>> I’m not getting an iterator for the precomuted table. When I look at the
>> web page, it has the proper number of results, but no results.
>>
>>
>>> Is the failed query repeatable? And do you see anything else in any of
>>> the
>>> logs?
>>>
>> it is pretty repeatable. And I’ve only seen it on this query.
>>
>> I noticed 2 things today:
>>
>> 1) If I do the query as a web service call, it returns results.
>>
>> 2) I got the URL for the web service call by clicking on the ‘web
>> service URL’ button on the query builder page. To my horror, I saw that
>> this was giving me a URL that started localhost:8081/… Our public mine
>> is good; that has the right host and port. But an internal test mine is
>> wrong.
>>
>> Is #2 causing #1? Do I have something misconfigured that is causing the
>> queries to give no results because of some misconfiguration? and how did
>> this get misconfigured? If this is the case, then the entire simple
>> object behavior is a red herring. I checked and other queries that use
>> temporary tables are OK.
>>
>>
>>
>>> Incidentally, creating a precomputed table just to view a results table
>>> in
>>> the webapp is unnecessary and will hopefully be fixed soon:
>>> https://github.com/intermine/intermine/issues/712.
>>>
>>> Sorry this isn't more help yet.
>>>
>>
>> I appreciate all comments. There is no such thing as bad knowledge.
>>
>> Joe
>>
>>
>>> Richard.
>>>
>>>
>>>> Hi Julie and Alex
>>>>
>>>> Thanks to both of you for the info. I'm glad to hear it wasn't just
>>>> me.
>>>>
>>>> I had walked through the code and find this issue myself. But even
>>>> when
>>>> making this change, there were problems. Even when I corrected for the
>>>> quotation marks  I was getting an error:
>>>>
>>>> org.postgresql.util.PSQLException: ERROR: duplicate key value violates
>>>> unique constraint "pg_attribute_relid_attnam_index"
>>>>
>>>> The key list in my query has one column duplicated. So the indexing
>>>> that
>>>> was attempted has a field duplicated and that causes an error. I
>>>> worked
>>>> in some code to remove the duplicated fields. But even this wasn't a
>>>> total fix.
>>>>
>>>> Based on the log files and what showed up on the screen, the initial
>>>> select and indexing worked well. I created the temporary table,
>>>> indexed
>>>> it. The web page showed my the proper number of results were
>>>> retrieved.
>>>> But nothing was shown in the table.
>>>>
>>>> What is weird here is that if I look at the log files I see this:
>>>>
>>>> 34941 [TP-Processor10] INFO
>>>> org.intermine.sql.precompute.PrecomputedTableManager  - Creating new
>>>> precomputed table CREATE TABLE temporary_precomp_548223 AS SELECT
>>>> DISTINCT a1_.diversitySampleId AS a1_diversitySampleId, a1_.format AS
>>>> a1_format, a1_.genotype AS a1_genotype, a1_.snpId AS a1_snpId, a2_.id
>>>> AS
>>>> a2_id, a1_.genotype AS orderbyfield1, a2_.name AS orderbyfield2 FROM
>>>> SNPDiversitySample AS a1_, DiversitySample AS a2_, SNP AS a3_,
>>>> Consequence AS a4_, Gene AS a5_, ConsequencesSnps AS indirect0 WHERE
>>>> a1_.diversitySampleId = a2_.id AND a1_.snpId = a3_.id AND a3_.id =
>>>> indirect0.Snps AND indirect0.Consequences = a4_.id AND a4_.geneId =
>>>> a5_.id AND a5_.primaryIdentifier = 'Potri.001G027800' ORDER BY
>>>> a1_.genotype, a2_.name, a1_.diversitySampleId, a1_.format,
>>>> a1_.genotype,
>>>> a1_.snpId, a2_.id
>>>> 40272 [TP-Processor10] INFO
>>>> org.intermine.sql.precompute.PrecomputedTableManager  - Creating 1
>>>> indexes for temporary_precomp_548223
>>>> 40273 [TP-Processor10] INFO
>>>> org.intermine.sql.precompute.PrecomputedTableManager  - Creating index
>>>> on temporary_precomp_548223 (orderbyfield1, orderbyfield2,
>>>> a1_diversitySampleId, a1_format, orderbyfield1, a1_snpId, a2_id)
>>>> 41002 [TP-Processor10] INFO
>>>> org.intermine.sql.precompute.PrecomputedTableManager  - ANALYSEing
>>>> precomputed table temporary_precomp_548223
>>>> 41530 [TP-Processor10] INFO
>>>> org.intermine.sql.precompute.PrecomputedTableManager  - Finished
>>>> creating precomputed table temporary_precomp_548223 for category
>>>> goFaster
>>>> 42088 [TP-Processor10] INFO
>>>> org.intermine.sql.precompute.PrecomputedTableManager  - Dropped
>>>> precomputed table temporary_precomp_548223
>>>>
>>>> what is strange to me is that I am used to seeing some SQL in the log
>>>> files that did a select between the 'Finished creating' and 'Dropped
>>>> precomputed'. But I don't see anything being selected from
>>>> temporary_precomp_548223.
>>>>
>>>> Is this another issue? I was stepping through the code and see that in
>>>> ObjectStoreInterMineImpl.goFasterWithConnection the loop over
>>>> QuerySelectable qs : q.getSelect does not do anything; in my query
>>>> there
>>>> are 2 values for QuerySelectable and neither of them is a
>>>> QueryCollectionPathExpression nor a QueryObjectPathExpression. Should
>>>> that bother me?
>>>>
>>>> Again, I appreciate the assistance,
>>>>
>>>> Joe Carlson
>>>>
>>>> On 10/07/2014 07:01 AM, Julie Sullivan wrote:
>>>>> Apologies Joe! I found this bug as well, but the fix is only on the
>>>>> beta branch right now:
>>>>>
>>>>> https://github.com/intermine/intermine/pull/729
>>>>>
>>>>> It's only two lines though.
>>>>>
>>>>> On 07/10/14 04:37, Joe Carlson wrote:
>>>>>> Hello,
>>>>>>
>>>>>> I’m seeing something weird. The answer to my problem may very well
>>>>>> be
>>>>>> “stop making shortcuts”. If so, I’ll go away and wait for my
>>>>>> processing
>>>>>> to continue and see how it goes.
>>>>>>
>>>>>> I am a bit anxious to see the query performance on my huge simple
>>>>>> object
>>>>>> table, so I’m building a web app before completing all my post
>>>>>> processing steps. In particular, I’m not rebuilding an objectstore
>>>>>> summary. I have an older one, but the thing takes a long time to
>>>>>> build
>>>>>> and I was curious if it was possible to deploy without that in
>>>>>> place.
>>>>>>
>>>>>> When I tried to run some queries, I was not seeing the results in
>>>>>> the
>>>>>> table view. The header “Showing 1 to 25 of 7,XXX results” was there
>>>>>> -
>>>>>> and the number was correct - but not the table.
>>>>>>
>>>>>> When I looked through the web server’s log files, I saw some strange
>>>>>> things in the generated SQL:
>>>>>>
>>>>>> 141855 [TP-Processor2] INFO
>>>>>> org.intermine.sql.precompute.PrecomputedTableManager  - Creating new
>>>>>> precomputed table CREATE TABLE temporary_precomp_548216 AS SELECT
>>>>>> DISTINCT a1_.diversitySampleId AS a1_diversitySampleId, a1_.format
>>>>>> AS
>>>>>> a1_format, a1_.genotype AS a1_genotype, a1_.snpId AS a1_snpId,
>>>>>> a2_.id
>>>>>> AS
>>>>>> a2_id, a1_.genotype AS orderbyfield1, a2_.name AS orderbyfield2 FROM
>>>>>> SNPDiversitySample AS a1_, DiversitySample AS a2_, SNP AS a3_,
>>>>>> Consequence AS a4_, Gene AS a5_, ConsequencesSnps AS indirect0 WHERE
>>>>>> a1_.diversitySampleId = a2_.id AND a1_.snpId = a3_.id AND a3_.id =
>>>>>> indirect0.Snps AND indirect0.Consequences = a4_.id AND a4_.geneId =
>>>>>> a5_.id AND a5_.primaryIdentifier = 'Potri.001G027800' ORDER BY
>>>>>> a1_.genotype, a2_.name, a1_.diversitySampleId, a1_.format,
>>>>>> a1_.genotype,
>>>>>> a1_.snpId, a2_.id
>>>>>> 147332 [TP-Processor2] INFO
>>>>>> org.intermine.sql.precompute.PrecomputedTableManager  - Creating 1
>>>>>> indexes for temporary_precomp_548216
>>>>>> 147332 [TP-Processor2] INFO
>>>>>> org.intermine.sql.precompute.PrecomputedTableManager  - Creating
>>>>>> index
>>>>>> on temporary_precomp_548216 (orderbyfield1, orderbyfield2,
>>>>>> a1_diversitySampleId, a1_format, orderbyfield1, a1_snpId, a2_id)
>>>>>> 490353 [TP-Processor2] WARN
>>>>>> org.intermine.sql.precompute.PrecomputedTableManager  - Error while
>>>>>> executing CREATE INDEX temporary_precomp_548216_xiffkbkfjvhootbw ON
>>>>>> temporary_precomp_548216 ("orderbyfield1, orderbyfield2,
>>>>>> a1_diversitySampleId, a1_format, orderbyfield1, a1_snpId, a2_id”)
>>>>>> org.postgresql.util.PSQLException: ERROR: column "orderbyfield1,
>>>>>> orderbyfield2, a1_diversitySampleId, a1_format, " does not exist
>>>>>>          at
>>>>>> org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2094)
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> The SQL for creating the temp table is good and this has the proper
>>>>>> number of entries. But there are 2 strange things in the CREATE
>>>>>> INDEX
>>>>>> statement. First, the entire list of keys in enclosed in quotes -
>>>>>> and
>>>>>> so
>>>>>> postgres interprets this as 1 column. This apparently arises because
>>>>>> there are capitalized field values in the keys. And also, the field
>>>>>> orderfield1 is specified twice. Even when I correct for the
>>>>>> quotation
>>>>>> marks, I can’t create an index using 1 field twice.
>>>>>>
>>>>>> Is the weirdness of this CREATE INDEX due to the fact that it is a
>>>>>> simple object table involved? Or because the objectstoresummary is
>>>>>> not
>>>>>> correct. If it’s the latter, I’ll just hold off until that finishes.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Joe Carlson
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> dev mailing list
>>>>>> [hidden email]
>>>>>> http://mail.intermine.org/cgi-bin/mailman/listinfo/dev
>>>>>>
>>>>
>>>> _______________________________________________
>>>> dev mailing list
>>>> [hidden email]
>>>> http://mail.intermine.org/cgi-bin/mailman/listinfo/dev
>>>>
>>>>
>>
>
>
>


_______________________________________________
dev mailing list
[hidden email]
http://mail.intermine.org/cgi-bin/mailman/listinfo/dev
Reply | Threaded
Open this post in threaded view
|

Re: strangeness in query on simple object table.

joe carlson
Hi Richard,

thanks for the response. Sorry I didn’t get around to making that pull request; I got distracted by other bright, shiny objects yesterday.

I saw that the query was finishing even without the index getting made. But I saw this message pop up in the log files at a time when we were trying to figure out the quickest way to query our SNP tables for the genotypes of samples that have a snp in a specific gene. We were seeing that doing it as a two-step (find the snps for a gene, then a separate query for the genotypes with that snp) was MUCH faster than a single query. One of the distinctions between the 2 methods is that the single query uses a temporary table and the other does not. Indexing may speed up the retrieval from the temp table, but I realized it would not by itself make up the time difference.

This is not unexpected behavior; tuning queries is part of the job description.

Joe

On Oct 29, 2014, at 5:42 AM, Richard Smith <[hidden email]> wrote:

> Hi Joe,
> That does indeed look like a bug and the correct fix, I hadn't seen that
> before (the SqlGenerator is way too complicated).  It looks like you're
> also doing the right thing with QUERY_FOR_PRECOMP.
>
> If you can create a pull request that would be great, otherwise I can
> patch the fix in.
>
> Regarding the index failure on the precomputed table - I think the table
> should still be created just without that index. When I precompute a query
> selecting a field with very large text values I see an error message in
> the logs about a failed index but the precomputed table is successfully
> created and used.
>
> If the field in question has very large text values it's unlikely to be
> constrained, so the missing index won't cause a performance problem.  We
> should log a warning instead of writing the Postgres error.
>
> Cheers,
> Richard.
>
>
>> Hi Richard,
>>
>> I was taking another look at the issue of generating errors when
>> creating an index on a temporary table. I think the issue lies in
>> buildOrderBy in SQLGenerator.
>>
>> For the SimpleObject tables, there is no id column, so at line 2486 (in
>> my copy) queryClassToString gets called with NO_ALIASES_ALL_FIELDS. The
>> string buffer that will be used for the ORDER BY fields gets filled with
>> all columns of the table: "alias.col1, alias.col2,..." But then the
>> HashSet "seen" is examined to see if we've added this column before, you
>> look for the entire string buffer contents and not the individual
>> elements.
>>
>> I'll try a push request. But the gist is that I do a split on the buffer
>> and ask if the individual columns have already been added to the order
>> by list:
>>>                   for(String bufferBit : buffer.toString().split(","))
>>> {
>>>                      if (!seen.contains(bufferBit)) {
>>>                        retval.append(needComma ? ", " : " ORDER BY ");
>>>                        needComma = true;
>>>                        retval.append(bufferBit);
>>>                        seen.add(bufferBit);
>>>                        if ((!q.getSelect().contains(node))
>>>                            && (!q.getSelect().contains(new
>>> QueryField((QueryClass) node,
>>>                                "id")))) {
>>>                          if (q.isDistinct()) {
>>>                            throw new ObjectStoreException("Class " +
>>> q.getAliases().get(node)
>>>                                + " in the ORDER BY list must be in
>>> the SELECT list, or its"
>>>                                + " id, or the query made
>>> non-distinct");
>>>                          } else if ((kind == QUERY_FOR_PRECOMP)
>>>                              || (kind == QUERY_FOR_GOFASTER)) {
>>>                            state.addToOrderBy(bufferBit);
>>>                          }
>>>                        }
>>>                      }
>>>                    }
>>
>> (I'm not too sure if this handles the condition of kind ==
>> QUERY_FOR_PRECOMP... correctly. Let me know if this is not correct.
>>
>> This makes sense since I was only seeing the error in cases where I was
>> using the SimpleObject in the query.
>>
>> But: now that I have the list of columns cleaned up, I am running into
>> the next problem. Namely, one of the columns in the temporary table has
>> large text fields. Since this column is listed in the ORDER BY fields,
>> I'm attempting to include this column in the index. And postgres is
>> objecting to the fact that the column is too big to index. But that is
>> for another day.
>>
>> Joe
>>
>>
>> On 10/08/2014 05:23 PM, Joe Carlson wrote:
>>> On Oct 8, 2014, at 2:35 AM, Richard Smith <[hidden email]> wrote:
>>>
>>>> Hi Joe,
>>>> Could you send me the code change you made to correct the duplicated
>>>> fields in the precomputed tables please?  This may have already been
>>>> fixed
>>>> on the beta branch but I'd like to check.
>>> I am not proud of what I did here.
>>>
>>> I wanted to try to figure out why I was getting duplicated column names
>>> in the index. But in my haste of getting something to work I simple
>>> masked the problem and took out repeated column names.
>>>
>>> I hacked into PrecomutedTableManager:
>>>
>>>    protected void addIndex(String table, String field, Connection con)
>>> {
>>>         String simpleField = field;
>>>         if (simpleField.charAt(0) == '"') {
>>>             simpleField = simpleField.substring(1, simpleField.length()
>>> - 1);
>>>         }
>>>         StringBuffer rewrite = new StringBuffer();
>>>         HashSet<String> seen = new HashSet();
>>>         for(String f: field.split(",")) {
>>>           if (!seen.contains(f.trim())) {
>>>             if (!rewrite.toString().isEmpty()) {
>>>               rewrite.append(",");
>>>             }
>>>             rewrite.append(f);
>>>             seen.add(f);
>>>           }
>>>         }
>>>         field = rewrite.toString();
>>> …
>>>
>>> I make no statements on whether this is the correct approach.
>>>
>>>> The loop over QuerySelectables in
>>>> ObjectStoreInterMineImpl.goFasterWithConnection isn't a problem, the
>>>> precomputed table has been created on line 2114, the loop is to find
>>>> any
>>>> outer joins or subqueries that need an additional precomputed table.
>>>>
>>>> So, I'm not sure where the actual problem lies. How long did it take to
>>>> create the precomputed table?  I'm not sure why it isn't logging a
>>>> proper
>>>> timestamp, are those seconds or milliseconds in the log file?  If it
>>>> took
>>>> a very long time to create something else may have timed out.
>>>>
>>>
>>> Thanks for that info.
>>>
>>> What I had seen when stepping through the code is that the query is
>>> getting executed and results are getting returned. But when I step
>>> through code I still can’t figure out something.
>>>
>>> In QueryResultService.runResults(PathQuery,int,int), I am setting
>>> canGoFaster=true. I get to the line
>>>  if (it.hasNext()) { // Prime the batch fetching pumps
>>>
>>> and I see that the iterator has results
>>>
>>> This drops me down to
>>>
>>>       if (canGoFaster) {
>>>           ((ExportResultsIterator) it).goFaster();
>>>       }
>>>       processor.write(it, output);
>>>
>>> But when I get into JSONResultProcessor.write, the call to
>>> getResultsIterator(resultIt) returns an interator with nothing in it.
>>> I’m not getting an iterator for the precomuted table. When I look at the
>>> web page, it has the proper number of results, but no results.
>>>
>>>
>>>> Is the failed query repeatable? And do you see anything else in any of
>>>> the
>>>> logs?
>>>>
>>> it is pretty repeatable. And I’ve only seen it on this query.
>>>
>>> I noticed 2 things today:
>>>
>>> 1) If I do the query as a web service call, it returns results.
>>>
>>> 2) I got the URL for the web service call by clicking on the ‘web
>>> service URL’ button on the query builder page. To my horror, I saw that
>>> this was giving me a URL that started localhost:8081/… Our public mine
>>> is good; that has the right host and port. But an internal test mine is
>>> wrong.
>>>
>>> Is #2 causing #1? Do I have something misconfigured that is causing the
>>> queries to give no results because of some misconfiguration? and how did
>>> this get misconfigured? If this is the case, then the entire simple
>>> object behavior is a red herring. I checked and other queries that use
>>> temporary tables are OK.
>>>
>>>
>>>
>>>> Incidentally, creating a precomputed table just to view a results table
>>>> in
>>>> the webapp is unnecessary and will hopefully be fixed soon:
>>>> https://github.com/intermine/intermine/issues/712.
>>>>
>>>> Sorry this isn't more help yet.
>>>>
>>>
>>> I appreciate all comments. There is no such thing as bad knowledge.
>>>
>>> Joe
>>>
>>>
>>>> Richard.
>>>>
>>>>
>>>>> Hi Julie and Alex
>>>>>
>>>>> Thanks to both of you for the info. I'm glad to hear it wasn't just
>>>>> me.
>>>>>
>>>>> I had walked through the code and find this issue myself. But even
>>>>> when
>>>>> making this change, there were problems. Even when I corrected for the
>>>>> quotation marks  I was getting an error:
>>>>>
>>>>> org.postgresql.util.PSQLException: ERROR: duplicate key value violates
>>>>> unique constraint "pg_attribute_relid_attnam_index"
>>>>>
>>>>> The key list in my query has one column duplicated. So the indexing
>>>>> that
>>>>> was attempted has a field duplicated and that causes an error. I
>>>>> worked
>>>>> in some code to remove the duplicated fields. But even this wasn't a
>>>>> total fix.
>>>>>
>>>>> Based on the log files and what showed up on the screen, the initial
>>>>> select and indexing worked well. I created the temporary table,
>>>>> indexed
>>>>> it. The web page showed my the proper number of results were
>>>>> retrieved.
>>>>> But nothing was shown in the table.
>>>>>
>>>>> What is weird here is that if I look at the log files I see this:
>>>>>
>>>>> 34941 [TP-Processor10] INFO
>>>>> org.intermine.sql.precompute.PrecomputedTableManager  - Creating new
>>>>> precomputed table CREATE TABLE temporary_precomp_548223 AS SELECT
>>>>> DISTINCT a1_.diversitySampleId AS a1_diversitySampleId, a1_.format AS
>>>>> a1_format, a1_.genotype AS a1_genotype, a1_.snpId AS a1_snpId, a2_.id
>>>>> AS
>>>>> a2_id, a1_.genotype AS orderbyfield1, a2_.name AS orderbyfield2 FROM
>>>>> SNPDiversitySample AS a1_, DiversitySample AS a2_, SNP AS a3_,
>>>>> Consequence AS a4_, Gene AS a5_, ConsequencesSnps AS indirect0 WHERE
>>>>> a1_.diversitySampleId = a2_.id AND a1_.snpId = a3_.id AND a3_.id =
>>>>> indirect0.Snps AND indirect0.Consequences = a4_.id AND a4_.geneId =
>>>>> a5_.id AND a5_.primaryIdentifier = 'Potri.001G027800' ORDER BY
>>>>> a1_.genotype, a2_.name, a1_.diversitySampleId, a1_.format,
>>>>> a1_.genotype,
>>>>> a1_.snpId, a2_.id
>>>>> 40272 [TP-Processor10] INFO
>>>>> org.intermine.sql.precompute.PrecomputedTableManager  - Creating 1
>>>>> indexes for temporary_precomp_548223
>>>>> 40273 [TP-Processor10] INFO
>>>>> org.intermine.sql.precompute.PrecomputedTableManager  - Creating index
>>>>> on temporary_precomp_548223 (orderbyfield1, orderbyfield2,
>>>>> a1_diversitySampleId, a1_format, orderbyfield1, a1_snpId, a2_id)
>>>>> 41002 [TP-Processor10] INFO
>>>>> org.intermine.sql.precompute.PrecomputedTableManager  - ANALYSEing
>>>>> precomputed table temporary_precomp_548223
>>>>> 41530 [TP-Processor10] INFO
>>>>> org.intermine.sql.precompute.PrecomputedTableManager  - Finished
>>>>> creating precomputed table temporary_precomp_548223 for category
>>>>> goFaster
>>>>> 42088 [TP-Processor10] INFO
>>>>> org.intermine.sql.precompute.PrecomputedTableManager  - Dropped
>>>>> precomputed table temporary_precomp_548223
>>>>>
>>>>> what is strange to me is that I am used to seeing some SQL in the log
>>>>> files that did a select between the 'Finished creating' and 'Dropped
>>>>> precomputed'. But I don't see anything being selected from
>>>>> temporary_precomp_548223.
>>>>>
>>>>> Is this another issue? I was stepping through the code and see that in
>>>>> ObjectStoreInterMineImpl.goFasterWithConnection the loop over
>>>>> QuerySelectable qs : q.getSelect does not do anything; in my query
>>>>> there
>>>>> are 2 values for QuerySelectable and neither of them is a
>>>>> QueryCollectionPathExpression nor a QueryObjectPathExpression. Should
>>>>> that bother me?
>>>>>
>>>>> Again, I appreciate the assistance,
>>>>>
>>>>> Joe Carlson
>>>>>
>>>>> On 10/07/2014 07:01 AM, Julie Sullivan wrote:
>>>>>> Apologies Joe! I found this bug as well, but the fix is only on the
>>>>>> beta branch right now:
>>>>>>
>>>>>> https://github.com/intermine/intermine/pull/729
>>>>>>
>>>>>> It's only two lines though.
>>>>>>
>>>>>> On 07/10/14 04:37, Joe Carlson wrote:
>>>>>>> Hello,
>>>>>>>
>>>>>>> I’m seeing something weird. The answer to my problem may very well
>>>>>>> be
>>>>>>> “stop making shortcuts”. If so, I’ll go away and wait for my
>>>>>>> processing
>>>>>>> to continue and see how it goes.
>>>>>>>
>>>>>>> I am a bit anxious to see the query performance on my huge simple
>>>>>>> object
>>>>>>> table, so I’m building a web app before completing all my post
>>>>>>> processing steps. In particular, I’m not rebuilding an objectstore
>>>>>>> summary. I have an older one, but the thing takes a long time to
>>>>>>> build
>>>>>>> and I was curious if it was possible to deploy without that in
>>>>>>> place.
>>>>>>>
>>>>>>> When I tried to run some queries, I was not seeing the results in
>>>>>>> the
>>>>>>> table view. The header “Showing 1 to 25 of 7,XXX results” was there
>>>>>>> -
>>>>>>> and the number was correct - but not the table.
>>>>>>>
>>>>>>> When I looked through the web server’s log files, I saw some strange
>>>>>>> things in the generated SQL:
>>>>>>>
>>>>>>> 141855 [TP-Processor2] INFO
>>>>>>> org.intermine.sql.precompute.PrecomputedTableManager  - Creating new
>>>>>>> precomputed table CREATE TABLE temporary_precomp_548216 AS SELECT
>>>>>>> DISTINCT a1_.diversitySampleId AS a1_diversitySampleId, a1_.format
>>>>>>> AS
>>>>>>> a1_format, a1_.genotype AS a1_genotype, a1_.snpId AS a1_snpId,
>>>>>>> a2_.id
>>>>>>> AS
>>>>>>> a2_id, a1_.genotype AS orderbyfield1, a2_.name AS orderbyfield2 FROM
>>>>>>> SNPDiversitySample AS a1_, DiversitySample AS a2_, SNP AS a3_,
>>>>>>> Consequence AS a4_, Gene AS a5_, ConsequencesSnps AS indirect0 WHERE
>>>>>>> a1_.diversitySampleId = a2_.id AND a1_.snpId = a3_.id AND a3_.id =
>>>>>>> indirect0.Snps AND indirect0.Consequences = a4_.id AND a4_.geneId =
>>>>>>> a5_.id AND a5_.primaryIdentifier = 'Potri.001G027800' ORDER BY
>>>>>>> a1_.genotype, a2_.name, a1_.diversitySampleId, a1_.format,
>>>>>>> a1_.genotype,
>>>>>>> a1_.snpId, a2_.id
>>>>>>> 147332 [TP-Processor2] INFO
>>>>>>> org.intermine.sql.precompute.PrecomputedTableManager  - Creating 1
>>>>>>> indexes for temporary_precomp_548216
>>>>>>> 147332 [TP-Processor2] INFO
>>>>>>> org.intermine.sql.precompute.PrecomputedTableManager  - Creating
>>>>>>> index
>>>>>>> on temporary_precomp_548216 (orderbyfield1, orderbyfield2,
>>>>>>> a1_diversitySampleId, a1_format, orderbyfield1, a1_snpId, a2_id)
>>>>>>> 490353 [TP-Processor2] WARN
>>>>>>> org.intermine.sql.precompute.PrecomputedTableManager  - Error while
>>>>>>> executing CREATE INDEX temporary_precomp_548216_xiffkbkfjvhootbw ON
>>>>>>> temporary_precomp_548216 ("orderbyfield1, orderbyfield2,
>>>>>>> a1_diversitySampleId, a1_format, orderbyfield1, a1_snpId, a2_id”)
>>>>>>> org.postgresql.util.PSQLException: ERROR: column "orderbyfield1,
>>>>>>> orderbyfield2, a1_diversitySampleId, a1_format, " does not exist
>>>>>>>         at
>>>>>>> org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2094)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> The SQL for creating the temp table is good and this has the proper
>>>>>>> number of entries. But there are 2 strange things in the CREATE
>>>>>>> INDEX
>>>>>>> statement. First, the entire list of keys in enclosed in quotes -
>>>>>>> and
>>>>>>> so
>>>>>>> postgres interprets this as 1 column. This apparently arises because
>>>>>>> there are capitalized field values in the keys. And also, the field
>>>>>>> orderfield1 is specified twice. Even when I correct for the
>>>>>>> quotation
>>>>>>> marks, I can’t create an index using 1 field twice.
>>>>>>>
>>>>>>> Is the weirdness of this CREATE INDEX due to the fact that it is a
>>>>>>> simple object table involved? Or because the objectstoresummary is
>>>>>>> not
>>>>>>> correct. If it’s the latter, I’ll just hold off until that finishes.
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Joe Carlson
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> dev mailing list
>>>>>>> [hidden email]
>>>>>>> http://mail.intermine.org/cgi-bin/mailman/listinfo/dev
>>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> dev mailing list
>>>>> [hidden email]
>>>>> http://mail.intermine.org/cgi-bin/mailman/listinfo/dev
>>>>>
>>>>>
>>>
>>
>>
>>
>


_______________________________________________
dev mailing list
[hidden email]
http://mail.intermine.org/cgi-bin/mailman/listinfo/dev
Reply | Threaded
Open this post in threaded view
|

Re: strangeness in query on simple object table.

Richard Smith
Hi Joe,
We'll try to fix the unnecessary precomputing and hopefully speed up the
precomputes soon:

https://github.com/intermine/intermine/issues/712
https://github.com/intermine/intermine/issues/713

It sounds like when you separate the queries each one is either querying a
single table or is less than the 5000 row batch size which would prevent
precomputing.

If you're constraining to a specific gene in the combined query I'm
surprised it creates a temporary table as I would expect < 5000 rows. Do
you know the number of results?

Cheers,
Richard.


> Hi Richard,
>
> thanks for the response. Sorry I didn’t get around to making that pull
> request; I got distracted by other bright, shiny objects yesterday.
>
> I saw that the query was finishing even without the index getting made.
> But I saw this message pop up in the log files at a time when we were
> trying to figure out the quickest way to query our SNP tables for the
> genotypes of samples that have a snp in a specific gene. We were seeing
> that doing it as a two-step (find the snps for a gene, then a separate
> query for the genotypes with that snp) was MUCH faster than a single
> query. One of the distinctions between the 2 methods is that the single
> query uses a temporary table and the other does not. Indexing may speed up
> the retrieval from the temp table, but I realized it would not by itself
> make up the time difference.
>
> This is not unexpected behavior; tuning queries is part of the job
> description.
>
> Joe
>
> On Oct 29, 2014, at 5:42 AM, Richard Smith <[hidden email]> wrote:
>
>> Hi Joe,
>> That does indeed look like a bug and the correct fix, I hadn't seen that
>> before (the SqlGenerator is way too complicated).  It looks like you're
>> also doing the right thing with QUERY_FOR_PRECOMP.
>>
>> If you can create a pull request that would be great, otherwise I can
>> patch the fix in.
>>
>> Regarding the index failure on the precomputed table - I think the table
>> should still be created just without that index. When I precompute a
>> query
>> selecting a field with very large text values I see an error message in
>> the logs about a failed index but the precomputed table is successfully
>> created and used.
>>
>> If the field in question has very large text values it's unlikely to be
>> constrained, so the missing index won't cause a performance problem.  We
>> should log a warning instead of writing the Postgres error.
>>
>> Cheers,
>> Richard.
>>
>>
>>> Hi Richard,
>>>
>>> I was taking another look at the issue of generating errors when
>>> creating an index on a temporary table. I think the issue lies in
>>> buildOrderBy in SQLGenerator.
>>>
>>> For the SimpleObject tables, there is no id column, so at line 2486 (in
>>> my copy) queryClassToString gets called with NO_ALIASES_ALL_FIELDS. The
>>> string buffer that will be used for the ORDER BY fields gets filled
>>> with
>>> all columns of the table: "alias.col1, alias.col2,..." But then the
>>> HashSet "seen" is examined to see if we've added this column before,
>>> you
>>> look for the entire string buffer contents and not the individual
>>> elements.
>>>
>>> I'll try a push request. But the gist is that I do a split on the
>>> buffer
>>> and ask if the individual columns have already been added to the order
>>> by list:
>>>>                   for(String bufferBit : buffer.toString().split(","))
>>>> {
>>>>                      if (!seen.contains(bufferBit)) {
>>>>                        retval.append(needComma ? ", " : " ORDER BY ");
>>>>                        needComma = true;
>>>>                        retval.append(bufferBit);
>>>>                        seen.add(bufferBit);
>>>>                        if ((!q.getSelect().contains(node))
>>>>                            && (!q.getSelect().contains(new
>>>> QueryField((QueryClass) node,
>>>>                                "id")))) {
>>>>                          if (q.isDistinct()) {
>>>>                            throw new ObjectStoreException("Class " +
>>>> q.getAliases().get(node)
>>>>                                + " in the ORDER BY list must be in
>>>> the SELECT list, or its"
>>>>                                + " id, or the query made
>>>> non-distinct");
>>>>                          } else if ((kind == QUERY_FOR_PRECOMP)
>>>>                              || (kind == QUERY_FOR_GOFASTER)) {
>>>>                            state.addToOrderBy(bufferBit);
>>>>                          }
>>>>                        }
>>>>                      }
>>>>                    }
>>>
>>> (I'm not too sure if this handles the condition of kind ==
>>> QUERY_FOR_PRECOMP... correctly. Let me know if this is not correct.
>>>
>>> This makes sense since I was only seeing the error in cases where I was
>>> using the SimpleObject in the query.
>>>
>>> But: now that I have the list of columns cleaned up, I am running into
>>> the next problem. Namely, one of the columns in the temporary table has
>>> large text fields. Since this column is listed in the ORDER BY fields,
>>> I'm attempting to include this column in the index. And postgres is
>>> objecting to the fact that the column is too big to index. But that is
>>> for another day.
>>>
>>> Joe
>>>
>>>
>>> On 10/08/2014 05:23 PM, Joe Carlson wrote:
>>>> On Oct 8, 2014, at 2:35 AM, Richard Smith <[hidden email]> wrote:
>>>>
>>>>> Hi Joe,
>>>>> Could you send me the code change you made to correct the duplicated
>>>>> fields in the precomputed tables please?  This may have already been
>>>>> fixed
>>>>> on the beta branch but I'd like to check.
>>>> I am not proud of what I did here.
>>>>
>>>> I wanted to try to figure out why I was getting duplicated column
>>>> names
>>>> in the index. But in my haste of getting something to work I simple
>>>> masked the problem and took out repeated column names.
>>>>
>>>> I hacked into PrecomutedTableManager:
>>>>
>>>>    protected void addIndex(String table, String field, Connection con)
>>>> {
>>>>         String simpleField = field;
>>>>         if (simpleField.charAt(0) == '"') {
>>>>             simpleField = simpleField.substring(1,
>>>> simpleField.length()
>>>> - 1);
>>>>         }
>>>>         StringBuffer rewrite = new StringBuffer();
>>>>         HashSet<String> seen = new HashSet();
>>>>         for(String f: field.split(",")) {
>>>>           if (!seen.contains(f.trim())) {
>>>>             if (!rewrite.toString().isEmpty()) {
>>>>               rewrite.append(",");
>>>>             }
>>>>             rewrite.append(f);
>>>>             seen.add(f);
>>>>           }
>>>>         }
>>>>         field = rewrite.toString();
>>>> …
>>>>
>>>> I make no statements on whether this is the correct approach.
>>>>
>>>>> The loop over QuerySelectables in
>>>>> ObjectStoreInterMineImpl.goFasterWithConnection isn't a problem, the
>>>>> precomputed table has been created on line 2114, the loop is to find
>>>>> any
>>>>> outer joins or subqueries that need an additional precomputed table.
>>>>>
>>>>> So, I'm not sure where the actual problem lies. How long did it take
>>>>> to
>>>>> create the precomputed table?  I'm not sure why it isn't logging a
>>>>> proper
>>>>> timestamp, are those seconds or milliseconds in the log file?  If it
>>>>> took
>>>>> a very long time to create something else may have timed out.
>>>>>
>>>>
>>>> Thanks for that info.
>>>>
>>>> What I had seen when stepping through the code is that the query is
>>>> getting executed and results are getting returned. But when I step
>>>> through code I still can’t figure out something.
>>>>
>>>> In QueryResultService.runResults(PathQuery,int,int), I am setting
>>>> canGoFaster=true. I get to the line
>>>>  if (it.hasNext()) { // Prime the batch fetching pumps
>>>>
>>>> and I see that the iterator has results
>>>>
>>>> This drops me down to
>>>>
>>>>       if (canGoFaster) {
>>>>           ((ExportResultsIterator) it).goFaster();
>>>>       }
>>>>       processor.write(it, output);
>>>>
>>>> But when I get into JSONResultProcessor.write, the call to
>>>> getResultsIterator(resultIt) returns an interator with nothing in it.
>>>> I’m not getting an iterator for the precomuted table. When I look at
>>>> the
>>>> web page, it has the proper number of results, but no results.
>>>>
>>>>
>>>>> Is the failed query repeatable? And do you see anything else in any
>>>>> of
>>>>> the
>>>>> logs?
>>>>>
>>>> it is pretty repeatable. And I’ve only seen it on this query.
>>>>
>>>> I noticed 2 things today:
>>>>
>>>> 1) If I do the query as a web service call, it returns results.
>>>>
>>>> 2) I got the URL for the web service call by clicking on the ‘web
>>>> service URL’ button on the query builder page. To my horror, I saw
>>>> that
>>>> this was giving me a URL that started localhost:8081/… Our public mine
>>>> is good; that has the right host and port. But an internal test mine
>>>> is
>>>> wrong.
>>>>
>>>> Is #2 causing #1? Do I have something misconfigured that is causing
>>>> the
>>>> queries to give no results because of some misconfiguration? and how
>>>> did
>>>> this get misconfigured? If this is the case, then the entire simple
>>>> object behavior is a red herring. I checked and other queries that use
>>>> temporary tables are OK.
>>>>
>>>>
>>>>
>>>>> Incidentally, creating a precomputed table just to view a results
>>>>> table
>>>>> in
>>>>> the webapp is unnecessary and will hopefully be fixed soon:
>>>>> https://github.com/intermine/intermine/issues/712.
>>>>>
>>>>> Sorry this isn't more help yet.
>>>>>
>>>>
>>>> I appreciate all comments. There is no such thing as bad knowledge.
>>>>
>>>> Joe
>>>>
>>>>
>>>>> Richard.
>>>>>
>>>>>
>>>>>> Hi Julie and Alex
>>>>>>
>>>>>> Thanks to both of you for the info. I'm glad to hear it wasn't just
>>>>>> me.
>>>>>>
>>>>>> I had walked through the code and find this issue myself. But even
>>>>>> when
>>>>>> making this change, there were problems. Even when I corrected for
>>>>>> the
>>>>>> quotation marks  I was getting an error:
>>>>>>
>>>>>> org.postgresql.util.PSQLException: ERROR: duplicate key value
>>>>>> violates
>>>>>> unique constraint "pg_attribute_relid_attnam_index"
>>>>>>
>>>>>> The key list in my query has one column duplicated. So the indexing
>>>>>> that
>>>>>> was attempted has a field duplicated and that causes an error. I
>>>>>> worked
>>>>>> in some code to remove the duplicated fields. But even this wasn't a
>>>>>> total fix.
>>>>>>
>>>>>> Based on the log files and what showed up on the screen, the initial
>>>>>> select and indexing worked well. I created the temporary table,
>>>>>> indexed
>>>>>> it. The web page showed my the proper number of results were
>>>>>> retrieved.
>>>>>> But nothing was shown in the table.
>>>>>>
>>>>>> What is weird here is that if I look at the log files I see this:
>>>>>>
>>>>>> 34941 [TP-Processor10] INFO
>>>>>> org.intermine.sql.precompute.PrecomputedTableManager  - Creating new
>>>>>> precomputed table CREATE TABLE temporary_precomp_548223 AS SELECT
>>>>>> DISTINCT a1_.diversitySampleId AS a1_diversitySampleId, a1_.format
>>>>>> AS
>>>>>> a1_format, a1_.genotype AS a1_genotype, a1_.snpId AS a1_snpId,
>>>>>> a2_.id
>>>>>> AS
>>>>>> a2_id, a1_.genotype AS orderbyfield1, a2_.name AS orderbyfield2 FROM
>>>>>> SNPDiversitySample AS a1_, DiversitySample AS a2_, SNP AS a3_,
>>>>>> Consequence AS a4_, Gene AS a5_, ConsequencesSnps AS indirect0 WHERE
>>>>>> a1_.diversitySampleId = a2_.id AND a1_.snpId = a3_.id AND a3_.id =
>>>>>> indirect0.Snps AND indirect0.Consequences = a4_.id AND a4_.geneId =
>>>>>> a5_.id AND a5_.primaryIdentifier = 'Potri.001G027800' ORDER BY
>>>>>> a1_.genotype, a2_.name, a1_.diversitySampleId, a1_.format,
>>>>>> a1_.genotype,
>>>>>> a1_.snpId, a2_.id
>>>>>> 40272 [TP-Processor10] INFO
>>>>>> org.intermine.sql.precompute.PrecomputedTableManager  - Creating 1
>>>>>> indexes for temporary_precomp_548223
>>>>>> 40273 [TP-Processor10] INFO
>>>>>> org.intermine.sql.precompute.PrecomputedTableManager  - Creating
>>>>>> index
>>>>>> on temporary_precomp_548223 (orderbyfield1, orderbyfield2,
>>>>>> a1_diversitySampleId, a1_format, orderbyfield1, a1_snpId, a2_id)
>>>>>> 41002 [TP-Processor10] INFO
>>>>>> org.intermine.sql.precompute.PrecomputedTableManager  - ANALYSEing
>>>>>> precomputed table temporary_precomp_548223
>>>>>> 41530 [TP-Processor10] INFO
>>>>>> org.intermine.sql.precompute.PrecomputedTableManager  - Finished
>>>>>> creating precomputed table temporary_precomp_548223 for category
>>>>>> goFaster
>>>>>> 42088 [TP-Processor10] INFO
>>>>>> org.intermine.sql.precompute.PrecomputedTableManager  - Dropped
>>>>>> precomputed table temporary_precomp_548223
>>>>>>
>>>>>> what is strange to me is that I am used to seeing some SQL in the
>>>>>> log
>>>>>> files that did a select between the 'Finished creating' and 'Dropped
>>>>>> precomputed'. But I don't see anything being selected from
>>>>>> temporary_precomp_548223.
>>>>>>
>>>>>> Is this another issue? I was stepping through the code and see that
>>>>>> in
>>>>>> ObjectStoreInterMineImpl.goFasterWithConnection the loop over
>>>>>> QuerySelectable qs : q.getSelect does not do anything; in my query
>>>>>> there
>>>>>> are 2 values for QuerySelectable and neither of them is a
>>>>>> QueryCollectionPathExpression nor a QueryObjectPathExpression.
>>>>>> Should
>>>>>> that bother me?
>>>>>>
>>>>>> Again, I appreciate the assistance,
>>>>>>
>>>>>> Joe Carlson
>>>>>>
>>>>>> On 10/07/2014 07:01 AM, Julie Sullivan wrote:
>>>>>>> Apologies Joe! I found this bug as well, but the fix is only on the
>>>>>>> beta branch right now:
>>>>>>>
>>>>>>> https://github.com/intermine/intermine/pull/729
>>>>>>>
>>>>>>> It's only two lines though.
>>>>>>>
>>>>>>> On 07/10/14 04:37, Joe Carlson wrote:
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> I’m seeing something weird. The answer to my problem may very well
>>>>>>>> be
>>>>>>>> “stop making shortcuts”. If so, I’ll go away and wait for my
>>>>>>>> processing
>>>>>>>> to continue and see how it goes.
>>>>>>>>
>>>>>>>> I am a bit anxious to see the query performance on my huge simple
>>>>>>>> object
>>>>>>>> table, so I’m building a web app before completing all my post
>>>>>>>> processing steps. In particular, I’m not rebuilding an objectstore
>>>>>>>> summary. I have an older one, but the thing takes a long time to
>>>>>>>> build
>>>>>>>> and I was curious if it was possible to deploy without that in
>>>>>>>> place.
>>>>>>>>
>>>>>>>> When I tried to run some queries, I was not seeing the results in
>>>>>>>> the
>>>>>>>> table view. The header “Showing 1 to 25 of 7,XXX results” was
>>>>>>>> there
>>>>>>>> -
>>>>>>>> and the number was correct - but not the table.
>>>>>>>>
>>>>>>>> When I looked through the web server’s log files, I saw some
>>>>>>>> strange
>>>>>>>> things in the generated SQL:
>>>>>>>>
>>>>>>>> 141855 [TP-Processor2] INFO
>>>>>>>> org.intermine.sql.precompute.PrecomputedTableManager  - Creating
>>>>>>>> new
>>>>>>>> precomputed table CREATE TABLE temporary_precomp_548216 AS SELECT
>>>>>>>> DISTINCT a1_.diversitySampleId AS a1_diversitySampleId, a1_.format
>>>>>>>> AS
>>>>>>>> a1_format, a1_.genotype AS a1_genotype, a1_.snpId AS a1_snpId,
>>>>>>>> a2_.id
>>>>>>>> AS
>>>>>>>> a2_id, a1_.genotype AS orderbyfield1, a2_.name AS orderbyfield2
>>>>>>>> FROM
>>>>>>>> SNPDiversitySample AS a1_, DiversitySample AS a2_, SNP AS a3_,
>>>>>>>> Consequence AS a4_, Gene AS a5_, ConsequencesSnps AS indirect0
>>>>>>>> WHERE
>>>>>>>> a1_.diversitySampleId = a2_.id AND a1_.snpId = a3_.id AND a3_.id =
>>>>>>>> indirect0.Snps AND indirect0.Consequences = a4_.id AND a4_.geneId
>>>>>>>> =
>>>>>>>> a5_.id AND a5_.primaryIdentifier = 'Potri.001G027800' ORDER BY
>>>>>>>> a1_.genotype, a2_.name, a1_.diversitySampleId, a1_.format,
>>>>>>>> a1_.genotype,
>>>>>>>> a1_.snpId, a2_.id
>>>>>>>> 147332 [TP-Processor2] INFO
>>>>>>>> org.intermine.sql.precompute.PrecomputedTableManager  - Creating 1
>>>>>>>> indexes for temporary_precomp_548216
>>>>>>>> 147332 [TP-Processor2] INFO
>>>>>>>> org.intermine.sql.precompute.PrecomputedTableManager  - Creating
>>>>>>>> index
>>>>>>>> on temporary_precomp_548216 (orderbyfield1, orderbyfield2,
>>>>>>>> a1_diversitySampleId, a1_format, orderbyfield1, a1_snpId, a2_id)
>>>>>>>> 490353 [TP-Processor2] WARN
>>>>>>>> org.intermine.sql.precompute.PrecomputedTableManager  - Error
>>>>>>>> while
>>>>>>>> executing CREATE INDEX temporary_precomp_548216_xiffkbkfjvhootbw
>>>>>>>> ON
>>>>>>>> temporary_precomp_548216 ("orderbyfield1, orderbyfield2,
>>>>>>>> a1_diversitySampleId, a1_format, orderbyfield1, a1_snpId, a2_id”)
>>>>>>>> org.postgresql.util.PSQLException: ERROR: column "orderbyfield1,
>>>>>>>> orderbyfield2, a1_diversitySampleId, a1_format, " does not exist
>>>>>>>>         at
>>>>>>>> org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2094)
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> The SQL for creating the temp table is good and this has the
>>>>>>>> proper
>>>>>>>> number of entries. But there are 2 strange things in the CREATE
>>>>>>>> INDEX
>>>>>>>> statement. First, the entire list of keys in enclosed in quotes -
>>>>>>>> and
>>>>>>>> so
>>>>>>>> postgres interprets this as 1 column. This apparently arises
>>>>>>>> because
>>>>>>>> there are capitalized field values in the keys. And also, the
>>>>>>>> field
>>>>>>>> orderfield1 is specified twice. Even when I correct for the
>>>>>>>> quotation
>>>>>>>> marks, I can’t create an index using 1 field twice.
>>>>>>>>
>>>>>>>> Is the weirdness of this CREATE INDEX due to the fact that it is a
>>>>>>>> simple object table involved? Or because the objectstoresummary is
>>>>>>>> not
>>>>>>>> correct. If it’s the latter, I’ll just hold off until that
>>>>>>>> finishes.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Joe Carlson
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> dev mailing list
>>>>>>>> [hidden email]
>>>>>>>> http://mail.intermine.org/cgi-bin/mailman/listinfo/dev
>>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> dev mailing list
>>>>>> [hidden email]
>>>>>> http://mail.intermine.org/cgi-bin/mailman/listinfo/dev
>>>>>>
>>>>>>
>>>>
>>>
>>>
>>>
>>
>
>
>


_______________________________________________
dev mailing list
[hidden email]
http://mail.intermine.org/cgi-bin/mailman/listinfo/dev