Re: Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables - Mailing list pgsql-hackers

From Jim Nasby
Subject Re: Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables
Date
Msg-id 5445AB34.6030301@BlueTreble.com
Whole thread Raw
In response to Re: Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables  (Andres Freund <andres@2ndquadrant.com>)
List pgsql-hackers
On 10/20/14, 3:11 PM, Andres Freund wrote:
> On 2014-10-19 20:43:29 -0500, Jim Nasby wrote:
>> On 10/19/14, 11:41 AM, Andres Freund wrote:
>>> On 2014-10-18 21:36:48 -0500, Jim Nasby wrote:
>>>> The "weird" part is that if it's not doing a freeze it will just punt
>>>> on a page if it can't get the cleanup lock.
>>>
>>> I don't think that's particularly wierd. Otherwise vacuum can get stuck
>>> behind a single very hot page - leading to much, much more bloat.
>>>
>>>> I have to believe that could seriously screw up autovacuum scheduling.
>>>
>>> Why?
>>
>> I'm worried there could be some pathological cases where we'd skip a
>> large number of pages, perhaps if a vacuum scan and a seqscan ended up
>> running alongside each other.
>
> I've seen little evidence of that. The reverse, a stuck autovacuum, is
> imo much more likely. For this to be an actual problem you'd need to
> encounter many pages that are not locked, but are pinned. That state
> doesn't exist for very long.

How would you actually get evidence of this... we don't log it. :) (See my proposal at
http://www.postgresql.org/message-id/54446C10.2080203@BlueTreble.com)

>> Perhaps this is just paranoia, but we have no idea how bad things
>> might be, because we don't have any logging for how many pages we
>> skipped because we couldn't lock them.
>
> But so what? If we skip individual pages it won't be too bad - and very
> likely waiting very long is going to be more painful. The page won't be
> marked 'all visible' so the next vacuum will come around to it
> again. And it'll also get cleaned up by opportunistic hot pruning.

Probably true. Hopefully we can start logging it and then we'll know for sure.


>> That ultimately, our current method for determining when and what to
>> vacuum is rather crude, and likely results in wasted effort during
>> scans as well as not firing autovac often enough. Keep in mind that
>> autovac started as a user-space utility and the best it could possibly
>> do was to keep a table of stats counters.
>
> I agree that we should trigger autovacuum more often. It's
> *intentionally* not triggered *at all* for insert only workloads (if you
> discount anti wraparound vacuums). I think it's time to change that. For
> that we'd need to make vacuums that don't delete any tuples cheaper. We
> already rescan only the changed parts of the heaps - but we always scan
> indexes fully...

Or maybe vacuum isn't the right way to handle some of these scenarios. It's become the catch-all for all of this stuff,
butmaybe that doesn't make sense anymore. Certainly when it comes to dealing with inserts there's no reason we *have*
todo anything other than set hint bits and possibly freeze xmin.
 

>> Instead of relying on the crude methods, if we reliably tracked
>> certain txids on a per-block basis in a fork, we could cheaply scan
>> the fork and make an extremely informed decision on how much a vacuum
>> would gain us, and exactly what blocks it should hit.
>
>> Let me use freezing as an example. If we had a reliable list of the
>> lowest txid for each block of a relation that would allow us to do a
>> freeze scan by hitting only blocks with minimum txid within our freeze
>> range. The same could be done for multixacts.
>
> It'd also become a prime contention point because you'd need to
> constantly update it. In contrast to a simple 'is frozen' bit (akin to
> is_visible) which only changes infrequently, and only in one direction.

Actually, the contention on freeze would very possibly be minimal, because it probably doesn't change very often. Even
ifit did, it's OK if the value isn't 100% accurate, so long as the recorded XID is guaranteed older than what's
actuallyon the page.
 

>> If we stored 3 txids for each block in a fork, we could fit
>> information for ~680 heap blocks in each fork block. So in a database
>> with 680G of heap data, we could fully determine every *block* (not
>> table) we needed to vacuum by scanning just 1GB of data. That would
>> allow for far better autovacuum scheduling than what we do today.
>
> It's not that simple. Wraparounds and locking complicate it
> significantly.

I realize what I'm talking about isn't trivial (though, I'm confused by your comment about wraparound since presumably
TransactionIdPrecedes()and it's ilk solve that problem...)
 

My ultimate point here is that we're using what are (today) very crude methods to control what gets vacuumed when, and
Ithink that now that we have resource forks would could do *much* better without a tremendous amount of work. But to
makea big advancement here we'll need to take a step back and rethink some things (like vacuum is the only way to
handlethese problems).
 

Let me put some thought into this.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com



pgsql-hackers by date:

Previous
From: "David E. Wheeler"
Date:
Subject: Re: Patch: Add launchd Support
Next
From: Josh Berkus
Date:
Subject: Re: Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables