Thread: Re: [HACKERS] choose_bitmap_and again (was Re: [PERFORM] Strangely Variable Query Performance)

I wrote:
> Thinking more about this leads me to the following proposal:

> 1. Explicitly group the indexes according to the subset of
> WHERE-conditions (and partial index conditions, if any) they use.
> Within each such group, discard all but the cheapest-scan-cost one.

> 2. Sort the remaining indexes according to scan cost.

> 3. For each index in order, consider it as a standalone scan, and also
> consider adding it on to the AND-group led by each preceding index,
> using the same logic as now: reject using any WHERE-condition twice
> in a group, and then add on only if the total cost of the AND-group
> scan is reduced.

Here is a patch along these lines, in fact two patches (HEAD and 8.2
versions).  The 8.2 version picks up some additional partial-index
intelligence that I added to HEAD on Mar 21 but did not at that time
risk back-patching --- since this is a fairly large rewrite of the
routine, keeping the branches in sync seems best.

Steve, can you try this out on your queries and see if it makes better
or worse decisions?  It seems to fix your initial complaint but I do
not have a large stock of test cases to try.

            regards, tom lane


Attachment
> Steve, can you try this out on your queries and see if it makes better
> or worse decisions?  It seems to fix your initial complaint but I do
> not have a large stock of test cases to try.
>

     Wow, this is a remarkable difference.  Queries that were taking
minutes to complete are coming up in seconds.  Good work, I think this'll
solve my customer's needs for their demo on the 19th :)

Thank you so much!


Steve


Steve <cheetah@tanabi.org> writes:
>> Steve, can you try this out on your queries and see if it makes better
>> or worse decisions?  It seems to fix your initial complaint but I do
>> not have a large stock of test cases to try.

>      Wow, this is a remarkable difference.  Queries that were taking
> minutes to complete are coming up in seconds.  Good work, I think this'll
> solve my customer's needs for their demo on the 19th :)

Can you find any cases where it makes a worse choice than before?
Another thing to pay attention to is whether the planning time gets
noticeably worse.  If we can't find any cases where it loses badly
on those measures, I'll feel comfortable in applying it...

            regards, tom lane

>
> Can you find any cases where it makes a worse choice than before?
> Another thing to pay attention to is whether the planning time gets
> noticeably worse.  If we can't find any cases where it loses badly
> on those measures, I'll feel comfortable in applying it...
>

     I'll see what I can find -- I'll let you know on Monday if I can
find any queries that perform worse.  My tests so far have shown
equivalent or better performance so far but I've only done sort of a
survey so far ... I've got plenty of special cases to test that should
put this through the paces.


Steve

>
> Can you find any cases where it makes a worse choice than before?
> Another thing to pay attention to is whether the planning time gets
> noticeably worse.  If we can't find any cases where it loses badly
> on those measures, I'll feel comfortable in applying it...
>

     Okay, here's the vedict; all the "extremely slow" queries (i.e.
queries that took more than 30 seconds and upwards of several minutes to
complete) are now running in the realm of reason.  In fact, most queries
that took between 1 and 4 minutes are now down to taking about 9 seconds
which is obviously a tremendous improvement.

     A few of the queries that were taking 9 seconds or less are
"slightly slower" -- meaning a second or two slower.  However most of them
are running at the same speed they were before, or better.

     So I'd say as far as I can tell with my application and my
dataset, this change is solid and an obvious improvement.


Talk to you later,

Steve

Steve wrote:

> >Can you find any cases where it makes a worse choice than before?
> >Another thing to pay attention to is whether the planning time gets
> >noticeably worse.  If we can't find any cases where it loses badly
> >on those measures, I'll feel comfortable in applying it...
>
>     Okay, here's the vedict; all the "extremely slow" queries (i.e.
> queries that took more than 30 seconds and upwards of several minutes to
> complete) are now running in the realm of reason.  In fact, most queries
> that took between 1 and 4 minutes are now down to taking about 9 seconds
> which is obviously a tremendous improvement.
>
>     A few of the queries that were taking 9 seconds or less are
> "slightly slower" -- meaning a second or two slower.  However most of them
> are running at the same speed they were before, or better.
>
>     So I'd say as far as I can tell with my application and my
> dataset, this change is solid and an obvious improvement.

Maybe it would be interesting to see in detail those cases that got a
bit slower, to further tweak the heuristic if necessary.  Is the extra
time, time spent in planning or in execution?

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Alvaro Herrera <alvherre@commandprompt.com> writes:
> Steve wrote:
>> So I'd say as far as I can tell with my application and my
>> dataset, this change is solid and an obvious improvement.

> Maybe it would be interesting to see in detail those cases that got a
> bit slower, to further tweak the heuristic if necessary.  Is the extra
> time, time spent in planning or in execution?

Since there doesn't seem to be vast interest out there in testing this
further, I'm going to go ahead and apply the patch to get it out of my
working directory.  We can always tweak it more later if new info
surfaces.

            regards, tom lane

>> Maybe it would be interesting to see in detail those cases that got a
>> bit slower, to further tweak the heuristic if necessary.  Is the extra
>> time, time spent in planning or in execution?
>
> Since there doesn't seem to be vast interest out there in testing this
> further, I'm going to go ahead and apply the patch to get it out of my
> working directory.  We can always tweak it more later if new info
> surfaces.
>

     Doing my routine patching seems to have exploded my mail server,
sorry for not replying sooner!

     I don't actually have planning vs. execution time statistics from
the older version for the queries in question -- there were not 'problem
queries' and therefore were never really analyzed.  My customer's already
dragging me off to another issue, so I've got to shift gears.

Appreciate all your work -- thanks again!!! :)


Steve