Palle Girgensohn <girgen@pingpong.net> writes:
> --On torsdag, januari 13, 2005 18.18.37 -0500 Tom Lane <tgl@sss.pgh.pa.us>
> wrote:
>> So there's something nuts about the statistics in this case.
On looking into it, it's the same old issue of not having column
correlation statistics. pg_stats shows that ANALYZE estimated the
fraction of rows with null group_id as 0.137667 (versus exact value
of 0.147, not too bad considering I used the default statistics target)
and it estimated the fraction with this_group_id = 46 as 0.358
(vs actual 0.369, ditto). The problem is that it then estimates the
total selectivity as 0.137667 * 0.358 or a bit under 5%, much too high
to make an indexscan sensible. In reality there are only 4 rows with
this combination of values, but the planner has no way to know that.
> Anything I can do about it?
I thought of a fairly miserable hack, which relies on the fact that 8.0
does know how to accumulate statistics on functional indexes:
group=# create index fooi on group_data (abs(this_group_id)) WHERE group_id IS NULL;
CREATE INDEX
group=# analyze group_data;
ANALYZE
group=# explain select * from group_data where group_id is null and abs(this_group_id) = 46;
QUERY PLAN
------------------------------------------------------------------------------Index Scan using fooi on group_data
(cost=0.00..5302.60rows=1802 width=42) Index Cond: (abs(this_group_id) = 46) Filter: (group_id IS NULL)
(3 rows)
(The choice of abs() is arbitrary, it just has to be something other
than the unadorned column.) In this situation the planner will look at
the stats for the functional index and discover that in that index there
aren't many 46's, so it comes out with a more reasonable rowcount estimate.
We should probably make it accumulate stats on partial indexes even when
the index columns aren't expressions. This example shows that useful
stats can be derived that way. Too late for 8.0 though...
regards, tom lane