Home > mailing lists

Re: Index oddity - Mailing list pgsql-performance

From	Tom Lane
Subject	Re: Index oddity
Date	June 10, 2004 05:02:37
Msg-id	19759.1086843718@sss.pgh.pa.us Whole thread Raw
In response to	Re: Index oddity (ken <southerland@samsixedd.com>)
Responses	Re: Index oddity (still)
List	pgsql-performance

Tree view

ken <southerland@samsixedd.com> writes:
> ... and here is the plan with statistics set to 1000 ...

>  Seq Scan on nrgfeature f  (cost=0.00..31675.57 rows=18608 width=218)
> (actual time=63.544..1002.701 rows=225 loops=1)
>    Filter: ((upperrightx > 321264.236977215::double precision) AND
> (lowerleftx < 324046.799812083::double precision) AND (upperrighty >
> 123286.261898636::double precision) AND (lowerlefty <
> 124985.927450476::double precision) AND (diagonalsize > 49.999::double
> precision))

> ... so yeah, its obviously finding way, way less rows than it thinks it
> will.

Yup.  I think your problem here is that the conditions on the different
columns are highly correlated, but the planner doesn't know anything
about that, because it has no cross-column statistics.

You could check that the individual conditions are accurately estimated
by looking at the estimated and actual row counts in

explain analyze
SELECT * FROM nrgfeature f WHERE upperRightX > 321264.23697721504;

explain analyze
SELECT * FROM nrgfeature f WHERE lowerLeftX < 324046.79981208267;

etc --- but I'll bet lunch that they are right on the money with the
higher statistics targets, and probably not too far off even at the
default.  The trouble is that the planner is simply multiplying these
probabilities together to get its estimate for the combined query,
and when the columns are not independent that leads to a very bad
estimate.

In particular it sounds like you have a *whole lot* of rows that have
diagonalSize just under 50, and so changing the diagonalSize condition
to include those makes for a big change in the predicted number of rows,
even though for the specific values of upperRightX and friends in your
test query there isn't any change in the actual number of matching rows.

I don't have any advice to magically solve this problem.  I would
suggest experimenting with alternative data representations -- for
example, maybe it would help to store "leftX" and "width" in place
of "leftX" and "rightX", etc.  What you want to do is try to decorrelate
the column values.  leftX and rightX are likely to have a strong
correlation, but maybe leftX and width don't.

            regards, tom lane

pgsql-performance by date:

From: "Joshua D. Drake"
Date: 10 June 2004, 03:37:14
Subject: Re: Index oddity

From: Frank van Vugt
Date: 10 June 2004, 14:24:26
Subject: *very* inefficient choice made by the planner (regarding IN(...))

Re: Index oddity - Mailing list pgsql-performance

Previous

Next