Hi Tom,
I've got the impression that you have worked most actively on VACUUM and
so I ask you this directly instead of adressing pgsql-hackers list in
general. Feel free to correct me :)
I have a problem, that I think can be solved by splitting the vacuum up
to run in several transactions, each running for no more than X minutes.
The problem itself is having a database with small (5-50k rows), fast-
changing tables and huge (5-25M rows) slower changing tables, all
running in a 24/7 setup.
The small table needs to be kept small by constant vacuuming (a loop
doing a vacuum on this table after each 15 sec interval) to keep up with
its traffic.
The problem appears when the big table needs to be vacuumed, as this
vacuum on big table prevents the vacuum on small table from freeing up
the space used by dead tuples.
And the sutuation is made *worse* by running the vacuum with
vacuum_cost_limit to reduce the I/O impact, (kind of priority-reversal),
as then vacuum runs then even longer, and slows down operations on the
small table even more.
The fastest fix seems to change vacuum command to run in several
transactions.
So what should be done in addition to changing
lazy_vacuum_rel(onerel, vacstmt);
to check for some time/page_cnt limit after each heap page (near the
start of main loop in lazy_scan_heap() ), and if it is reached then
stop, clean up indexes, and return the blkno of next page needing to be
vacuumed,and
replacing the call to lazy_vacuum_rel(onerel, vacstmt); in vacuum.c with
the following loop.
next_page_to_vacuum = 0; while (next_page_to_vacuum < RelationGetNumberOfBlocks(onerel)) {
StartTransactionCommand(); ActiveSnapshot = CopySnapshot(GetTransactionSnapshot());
next_page_to_vacuum = lazy_vacuum_rel(onerel, vacstmt);
CommitTransactionCommand(); }
Must some locks also be released an reaquired inside this loop, or is
there something else I should keep in mind when trying to do this ?
The operations in this loop need not be cheap - I'm happy if I can keep
individual transactions below 5 to 10 minutes, though 1-2 min bould be
best.
P.S. One other typical case where long transactions are disastrous is
Slony1's use of LISTEN/NOTIFY (the non-indexed table pg_listener, after
having grown to 100Mb, is not too responsive), though there the real
solution for Slony1 would be switching to polling instead of interrupt
(notify) mode for high-volume databases.
--
Hannu Krosing <hannu@tm.ee>