[Jim C. Nasby - Thu at 01:04:53PM -0500]
> What does
>
> SET enable_seqscan = false;
> EXPLAIN ANALYZE SELECT * FROM ...
>
> get you? Is it faster?
I was experimenting with this some weeks ago, by now our database server has
quite low load numbers and I haven't gotten any complaints about anything
that is too slow, so I have temporary stopped working with this issue - so I
will not contribute with any more gory details at the moment. :-)
I concluded with that our "problem" is that we (for performance reasons)
store aggregated statistics in the "wrong" tables, and since updating a row
in pg effectively means creating a new physical row in the database, the
rows in the table are not in chronological order. If "last months activity"
presents like 7% of the rows from the table is to be fetched, the planner
will usually think that a seq scan is better. As time pass by and the table
grows, it will jump to index scans.
The "old" stuff in the database eventually grow historical, so the
aggregated statistics will not be updated for most of those rows. Hence a
forced index scan will often be a bit faster than a suggested table scan. I
experimented, and doing an index scan for the 3rd time would usually be
faster than doing a full table scan for the 3rd time, but with things not
beeing in cache, the planner was right to suggest that seq scan was faster
due to less disk seeks.
The long term solution for this problem is to build a separate data
warehouse system. The short time solution is to not care at all
(eventually, buy more memory).
As long as the queries is on the form "give me everything since last
monday", it is at least theoretically possible to serve this through partial
indices, and have a cronjob dropping the old indices and creating new every
week.
Doing table clustering night time would probably also be a solution, but I
haven't cared to test it out yet. I'm a bit concerned about
performance/locking issues.
--
Tobias Brox, +47-91700050
Tallinn, Europe