Re: Berserk Autovacuum (let's save next Mandrill) - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Berserk Autovacuum (let's save next Mandrill)
Date
Msg-id 20200313213851.ejrk5gptnmp65uoo@alap3.anarazel.de
Whole thread Raw
In response to Re: Berserk Autovacuum (let's save next Mandrill)  (Justin Pryzby <pryzby@telsasoft.com>)
Responses Re: Berserk Autovacuum (let's save next Mandrill)  (Justin Pryzby <pryzby@telsasoft.com>)
Re: Berserk Autovacuum (let's save next Mandrill)  (Justin Pryzby <pryzby@telsasoft.com>)
Re: Berserk Autovacuum (let's save next Mandrill)  (Justin Pryzby <pryzby@telsasoft.com>)
List pgsql-hackers
Hi,

On 2020-03-13 13:44:42 -0500, Justin Pryzby wrote:
> As I understand, the initial motivation of this patch was to avoid disruptive
> anti-wraparound vacuums on insert-only table.  But if vacuum were triggered at
> all, it would freeze the oldest tuples, which is all that's needed; especially
> since fd31cd2651 "Don't vacuum all-frozen pages.", those pages would never need
> to be vacuumed again.  Recently written tuples wouldn't be frozen, which is ok,
> they're handled next time.
> 
> Another motivation of the patch is to allow indexonly scan, for which the
> planner looks at pages' "relallvisible" fraction (and at execution if a page
> isn't allvisible, visits the heap).  Again, that happens if vacuum were run at
> all.  Again, some pages won't be marked allvisible, which is fine, they're
> handled next time.
> 
> I think freeze_min_age=0 could negatively affect people who have insert-mostly
> tables (I'm not concerned, but that includes us).  If they consistently hit the
> autovacuum insert threshold before the cleanup threshold for updated/deleted
> tuples, any updated/deleted tuples would be frozen, which would be
> wasteful:  

I think that's a valid concern.


> |One disadvantage of decreasing vacuum_freeze_min_age is that it might cause
> |VACUUM to do useless work: freezing a row version is a waste of time if the row
> |is modified soon thereafter (causing it to acquire a new XID). So the setting
> |should be large enough that rows are not frozen until they are unlikely to
> |change any more.

I think the overhead here might be a bit overstated. Once a page is
dirtied (or already dirty) during vacuum, and we freeze a single row
(necessating WAL logging), there's not really a good reason to not also
freeze the rest of the row on that page. The added cost for freezing
another row is miniscule compared to the "constant" cost of freezing
anything on the page.  It's of course different if there are otherwise
no tuples worth freezing on the page (not uncommon). But there's really
no reason for that to be the case:

Afaict the only problem with more aggressively freezing when we touch
(beyond hint bits) the page anyway is that we commonly end up with
multiple WAL records for the same page:

1) lazy_scan_heap()->heap_page_prune() will log a XLOG_HEAP2_CLEAN record, but leave
   itemids in place most of the time
2) lazy_scan_heap()->log_heap_freeze() will log a XLOG_HEAP2_FREEZE_PAGE record
3a) if no indexes exist/index cleanup is disabled:
  lazy_vacuum_page()->lazy_vacuum_page() will log a XLOG_HEAP2_CLEAN
  record, removing dead tuples (including itemids)
3b) if indexes need to be cleaned up,
  lazy_vacuum_heap()->lazy_vacuum_page() will log a XLOG_HEAP2_CLEAN

which is not nice. It likely is worth merging xl_heap_freeze_page into
xl_heap_clean, and having heap pruning always freeze once it decides to
dirty a page.

We could probably always prune dead tuples as part of heap_prune_chain()
if there's no indexes - but I'm doubtful it's worth it, since there'll
be few tables with lots of dead tuples that don't have indexes.

Merging 3b's WAL record would be harder, I think.


There's also a significant source of additional WAL records here, one
that I think should really not have been introduced:

4) HeapTupleSatisfiesVacuum() called both by heap_prune_chain(), and
  lazy_scan_heap() will often trigger a WAL record when the checksums or
  wal_log_hint_bits are enabled. If the page hasn't been modified in the
  current checkpoint window (extremely common for VACUUM, reasonably
  common for opportunistic pruning), we will log a full page write.

  Imo this really should have been avoided when checksums were added,
  that's a pretty substantial and unnecessary increase in overhead.


It's probably overkill to tie fixing the 'insert only' case to improving
the WAL logging for vacuuming / pruning. But it'd certainly would
largely remove the tradeoff discussed here, by removing additional
overhead of freezing in tables that are also updated.


> Also, there was a discussion about index cleanup with the conclusion that it
> was safer not to skip it, since otherwise indexes might bloat.  I think that's
> right, since vacuum for cleanup is triggered by the number of dead heap tuples.
> To skip index cleanup, I think you'd want a metric for
> n_dead_since_index_cleanup.  (Or maybe analyze could track dead index tuples
> and trigger vacuum of each index separately).
> 
> Having now played with the patch, I'll suggest that 10000000 is too high a
> threshold.  If autovacuum runs without FREEZE, I don't see why it couldn't be
> much lower (100000?) or use (0.2 * n_ins + 50) like the other autovacuum GUC.

ISTM that the danger of regressing workloads due to suddenly repeatedly
scanning huge indexes that previously were never / rarely scanned is
significant (if there's a few dead tuples, otherwise most indexes will
be able to skip the scan since the vacuum_cleanup_index_scale_factor
introduction)).

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: backend type in log_line_prefix?
Next
From: Alvaro Herrera
Date:
Subject: Re: range_agg