Re: Berserk Autovacuum (let's save next Mandrill) - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Berserk Autovacuum (let's save next Mandrill)
Date
Msg-id 20200320022351.wgfrfdmo7jlerbxz@alap3.anarazel.de
Whole thread Raw
In response to Re: Berserk Autovacuum (let's save next Mandrill)  (David Rowley <dgrowleyml@gmail.com>)
List pgsql-hackers
Hi,

On 2020-03-20 15:05:03 +1300, David Rowley wrote:
> On Fri, 20 Mar 2020 at 11:17, Andres Freund <andres@anarazel.de> wrote:
> > I think there's too much "reinventing" autovacuum scheduling in a
> > "local" insert-only manner happening in this thread. And as far as I can
> > tell additionally only looking at a somewhat narrow slice of insert only
> > workloads.
> 
> I understand your concern and you might be right. However, I think the
> main reason that the default settings for the new threshold and scale
> factor has deviated this far from the existing settings is regarding
> the example of a large insert-only table that receives inserts of 1
> row per xact.  If we were to copy the existing settings then when that
> table gets to 1 billion rows, it would be eligible for an
> insert-vacuum after 200 million tuples/xacts, which does not help the
> situation since an anti-wraparound vacuum would be triggering then
> anyway.

Sure, that'd happen for inserts that happen after that threshold. I'm
just not convinced that this is as huge a problem as presented in this
thread. And I'm fairly convinced the proposed solution is the wrong
direction to go into.

It's not like that's not an issue for updates? If you update one row per
transaction, then you run into exactly the same issue for a table of the
same size?  You maybe could argue that it's more common to insert 1
billion tuples in individual transaction, than it is to update 1 billion
tuples in individual transactions, but I don't think it's a huge
difference if it even exist.

In fact the problem is worse for the update case, because that tends to
generate a lot more random looking IO during vacuum (both because only
parts of the table are updated causing small block reads/writes, and
because it will need [multiple] index scans/vacuum, and because the
vacuum is a lot more expensive CPU time wise).

Imo this line of reasoning is about adding autovacuum scheduling based
on xid age, not about insert only workloads.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: color by default
Next
From: Tom Lane
Date:
Subject: Re: Missing errcode() in ereport