Re: Vacuum and Transactions - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Vacuum and Transactions
Date
Msg-id 1128419482.8603.262.camel@localhost.localdomain
Whole thread Raw
In response to Vacuum and Transactions  (Rod Taylor <pg@rbt.ca>)
List pgsql-hackers
On Tue, 2005-10-04 at 00:26 -0400, Rod Taylor wrote:
> As I understand it vacuum operates outside of the regular transaction
> and if you stop it (SIGTERM, or pg_cancel_backend()) some of the work it
> accomplished will be kept when it rolls back.
> 
> For large structures with a ton of dead entries (which I seem to have a
> case), running vacuum takes long enough that high-churn structures begin
> to experience difficulties.
> 
> Is it reasonable to cancel and restart the vacuum process periodically
> (say every 12 hours) until it manages to complete the work? It takes
> about 2 hours to do the table scan, and should get in about 10 hours of
> index work each round.

That is what I've had to recommend in extreme cases, with some success. 

For a non-FULL VACUUM, all of the database changes it does will be kept,
though that is not the only cost, as you indicate.  However, you're
right to question it since it does have some downsides like not
correctly updating statistics at the end of the run.

I wouldn't try this with VACUUM FULL. In that case, I'd VACUUM first,
then when all dead-rows are gone go for the VACUUM FULL; but I would
find another way round that, like a CTAS.

The problem is that VACUUM doesn't emit enough messages for you to know
when it gets to the end of each phase, so you've not much clue about how
much of that 12 hours would be wasted. Though as you say, it seems
likely that much of it is worthwhile in the situation you describe.

The tipping point is when VACUUM finds more dead rows than fits within
maintenance_work_mem/(size of row pointer). Thats when we start to do
multiple passes of each of the indexes.

Maybe it would be good to have a VACUUM max-one-pass only command, to
allow you to break big VACUUMs down into smaller chunks. Or perhaps we
should have a trace_vacuum command as well to allow you to see where to
cancel it? (Put notices in lazy_vacuum_index and lazy_vacuum_heap).

Hope that helps.

Best Regards, Simon Riggs




pgsql-hackers by date:

Previous
From: Martijn van Oosterhout
Date:
Subject: Re: [PERFORM] A Better External Sort?
Next
From: Martijn van Oosterhout
Date:
Subject: Re: Tuning current tuplesort external sort code for 8.2