Thread: Gist cost estimates

Gist cost estimates

From
Matthias
Date:
Hello,

I've noticed the new range data types in 9.2dev. I'm really looking
forward to use them, so I built postgres 9.2dev on windows to try.

While testing I noticed one thing. I have a simple test table with 1
million rows. There's a column called valid_range (of type int4range)
which is GiST indexed. Now when I do a query like

 select * from mytable where valid_range && int4range(100,200)

it will use the created gist index. But it will completely fail with
the cost estimation. For whatever reason it always assumes 5104 rows
will be returned, while in reality more than 300k rows are returned.
If I change the query to look like

 select * from mytable where valid_range && int4range(null,null)

it will still estimate 5104 rows to be returned (in reality it's 1M
rows -- the whole table). This leads to grossly inefficient query
plans.

Curiously I have the same problem with postgres' cube data type
(tested on 9.1 and which also estimates exactly 5104 rows). And
postgis indexes have a similar (though maybe unrelated) problem.

 Do you have any explanation for these grossly wrong cost estimates?
Are they unimplemented? What can I do to debug this further?

 Thank you,
 -Matthias

 P.S.: I've already increased the statistics collection size (done by
 vacuum analyze) to no avail

Re: Gist cost estimates

From
Tom Lane
Date:
Matthias <nitrogenycs@googlemail.com> writes:
> I've noticed the new range data types in 9.2dev. I'm really looking
> forward to use them, so I built postgres 9.2dev on windows to try.
> ...
>  Do you have any explanation for these grossly wrong cost estimates?

The range operators don't have any selectivity estimation worthy of the
name yet.  I'm still hoping to see that fixed before 9.2 final, but
the days grow short ...

            regards, tom lane