Re: autovacuum truncate exclusive lock round two - Mailing list pgsql-hackers

From Jan Wieck
Subject Re: autovacuum truncate exclusive lock round two
Date
Msg-id 508938E7.8060906@Yahoo.com
Whole thread Raw
In response to autovacuum truncate exclusive lock round two  (Jan Wieck <JanWieck@Yahoo.com>)
Responses Re: autovacuum truncate exclusive lock round two
Re: autovacuum truncate exclusive lock round two
List pgsql-hackers
Steven,

On 10/24/2012 10:46 PM, Stephen Frost wrote:
> Jan,
>
> * Jan Wieck (JanWieck@Yahoo.com) wrote:
>> This problem has been discussed before. Those familiar with the
>> subject please skip the next paragraph.
>
> Apologies if this was already thought-of and ruled out for some reason,
> but...
>
>> Because all the scanning had been done in parallel to normal DB
>> activity, it needs to verify that all those blocks are still empty.
>
> Would it be possible to use the FSM to figure out if things have changed
> since the last scan..?  Does that scan update the FSM, which would then
> be updated by another backend in the event that it decided to write
> something there?  Or do we consider the FSM to be completely
> untrustworthy wrt this (and if so, I don't suppose there's any hope to
> using the visibility map...)?

I honestly don't know if we can trust the FSM enough when it comes to 
throwing away heap pages. Can we?

>
> The notion of having to double-scan and the AccessExclusiveLock on the
> relation are telling me this work-around, while completely possible,
> isn't exactly ideal...

Under normal circumstances with just a few pages to trim off the end 
this is no problem. Those pages were the last pages just scanned by this 
very autovacuum, so they are found in the shared buffers anyway. All the 
second scan does in that case is to fetch the page once more from shared 
buffers to be 100% sure, we are not truncating off new tuples. We 
definitely need the AccessExclusiveLock to prevent someone from 
extending the relation at the end between our check for relation size 
and the truncate. Fetching 50 empty blocks from the buffer cache while 
at it isn't that big of a deal and that is what it normally looks like.

The problem case this patch is dealing with is rolling window tables 
that experienced some bloat. The typical example is a log table, that 
has new data constantly added and the oldest data constantly purged out. 
This data normally rotates through some blocks like a rolling window. If 
for some reason (purging turned off for example) this table bloats by 
several GB and later shrinks back to its normal content, soon all the 
used blocks are at the beginning of the heap and we find tens of 
thousands of empty pages at the end. Only now does the second scan take 
more than 1000ms and autovacuum is at risk to get killed while at it.

Since we have experienced this problem several times now on our 
production systems, something clearly needs to be done. But IMHO it 
doesn't happen often enough to take any risk here.


Jan

-- 
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin



pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: splitting *_desc routines
Next
From: Josh Berkus
Date:
Subject: Re: [RFC] CREATE QUEUE (log-only table) for londiste/pgQ ccompatibility