Re: [HACKERS] BETWEEN optimizer problems with single-value range - Mailing list pgsql-performance

From Simon Riggs
Subject Re: [HACKERS] BETWEEN optimizer problems with single-value range
Date
Msg-id 1142463908.3859.188.camel@localhost.localdomain
Whole thread Raw
In response to BETWEEN optimizer problems with single-value range  ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
Responses Re: [HACKERS] BETWEEN optimizer problems with single-value  ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
List pgsql-performance
On Wed, 2006-03-15 at 11:56 -0600, Kevin Grittner wrote:
> Attached is a simplified example of a performance problem we have seen,
> with a workaround and a suggestion for enhancement (hence both the
> performance and hackers lists).
>
> Our software is allowing users to specify the start and end dates for a
> query.  When they enter the same date for both, the optimizer makes a
> very bad choice.  We can work around it in application code by using an
> equality test if both dates match.  I think the planner should be able
> to make a better choice here.

> (One obvious way to fix it would be to
> rewrite "BETWEEN a AND b" as "= a" when a is equal to b, but it seems
> like there is some underlying problem which should be fixed instead (or
> in addition to) this.

That might work, but I'm not sure if that is in itself the problem and
it would be mostly wasted overhead in 99% of cases.

The main issue appears to be that the planner chooses "Cal_CalDate"
index rather than "Cal_CtofcNo" index when the BETWEEN values match.

It seems that the cost of the first and third EXPLAINs is equal, yet for
some reason it chooses different indexes in each case. My understanding
was that it would pick the first index created if plan costs were equal.
Is that behaviour repeatable with each query?

ISTM that if we have equal plan costs then we should be choosing the
index for which we have more leading columns, since that is more likely
to lead to a more selective answer. But the plan selection is a simple
"pick the best, or if they're equal pick the best sort order".

> The first query uses BETWEEN with the same date for both min and max
> values.  The second query uses an equality test for the same date.  The
> third query uses BETWEEN with a two-day range.  In all queries, there
> are less than 4,600 rows for the specified cotfcNo value out of over 18
> million rows in the table.  We tried boosting the statistics samples for
> the columns in the selection, which made the estimates of rows more
> accurate, but didn't change the choice of plans.

The selectivity seems the same in both - clamped to a minimum of 1 row,
so changing that doesn't look like it would help.

Best Regards, Simon Riggs





pgsql-performance by date:

Previous
From: "Merlin Moncure"
Date:
Subject: Re: BETWEEN optimizer problems with single-value range
Next
From: Chris
Date:
Subject: Re: Slow SELECTS after large update cycle