column without pg_stats entry?! - Mailing list pgsql-performance

From Bernd Heller
Subject column without pg_stats entry?!
Date
Msg-id 11D72EC4-6ACC-11D9-A12E-000A957B8C6E@users.sourceforge.net
Whole thread Raw
Responses Re: column without pg_stats entry?!
List pgsql-performance
Hello everyone,

I'm having a problem with some of my tables and I'm not sure if
postgres' behaviour is maybe even a bug. I'm (still) using 8.0rc5 at
present.

I have a table that contains among other columns one of the sort:
    purge_date timestamp

most records will have this field set to NULL, at present all of them
really. the table has about 100k row right now. in regular intervals
I'm doing some cleanup on this table using a query like:
    delete from mytable where purge_date is not null and purge_date <
current_date

And I have created these btree indexes:
    create index on mytable (purge_date);
    create index on mytable (purge_date) where purge_date is not null;

my problem is that the planner always chooses a seq scan over an index
scan. only when I set enable_seqscan to false does it use an index
scan. The costs of both plans are extremely different, with the index
scan being 5-10 times more expensive than the seq scan, which is
obviously not true given that all rows have this column set to NULL.

I wondered why the planner was making such bad assumptions about the
number of rows to find and had a look at pg_stats. and there was the
surprise:
there is no entry in pg_stats for that column at all!! I can only
suspect that this has to do with the column being all null. I tried to
change a few records to a not-null value, but re-ANALYZE didn't catch
them apparently.

Is this desired behaviour for analyze? Can I change it somehow? If not,
is there a better way to accomplish what I'm trying? I'm not to keen on
disabling seqscan for that query explicitly. It's a simple enough query
and the planner should be able to find the right plan without help -
and I'm sure it would if it had stats about it.

Any help appreciated.

Bernd


pgsql-performance by date:

Previous
From: "Matt Casters"
Date:
Subject:
Next
From: "Andrei Bintintan"
Date:
Subject: OFFSET impact on Performance???