Re: Berserk Autovacuum (let's save next Mandrill) - Mailing list pgsql-hackers

From Darafei "Komяpa" Praliaskouski
Subject Re: Berserk Autovacuum (let's save next Mandrill)
Date
Msg-id CAC8Q8tJimQuKPFtWCkShRORODmYZR8ULgVBwExYrtJef=TwSwg@mail.gmail.com
Whole thread Raw
In response to Re: Berserk Autovacuum (let's save next Mandrill)  (Masahiko Sawada <sawada.mshk@gmail.com>)
List pgsql-hackers


On Thu, Mar 28, 2019 at 6:43 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> 1. heap vacuum
>>
>> 2. HOT pruning
>
> Is it worth skipping it if we're writing a page anyway for the sake of hint bits and new xids? This will all be no-op anyway on append-only tables and happen only when we actually need something?
>

Yeah, these operations are required only when the table has actual
garbage. IOW, append-only tables never require them.

>>
>> 3. freezing tuples
>> 4. updating visibility map (all-visible and all-frozen)
>
> These two are needed, and current autovacuum launch process does not take into account that this is also needed for non-dead tuples.
>
>>
>> 5. index vacuum/cleanup
>
> There is a separate patch for that. But, since https://commitfest.postgresql.org/16/952/ for almost a year already Postgres skips index cleanup on tables without new dead tuples, so this case is taken care of already?

I think that's not enough. The feature "GUC for cleanup index
threshold" allows us to skip only index cleanup when there are less
insertion than the fraction of the total number of heap tuples since
last index cleanup. Therefore it helps only append-only tables (and
supporting only btree index for now). We still have to do index
vacuuming even if the table has just a few dead tuple. The proposed
patch[1] helps this situation; vacuum can run while skipping index
vacuuming and index cleanup.

So, the patch I posted can be technically applied after https://commitfest.postgresql.org/22/1817/ gets merged?

The change with my patch is that a table with 49 insertions and one delete:
 - previously will wait for 49 more deletes by default (and ignore insertions), and only then clean up both table and indexes.
 - with patch will freeze/update VM for insertions, and scan the index.

In my experience only btree index is requiring a slow full index scan, that's why only it was in the "GUC for cleanup index
threshold" patch. Is it wrong and more index types do a full index scan on vacuum after deletion of a single tuple?

 
>> 6. truncation
>
> This shouldn't be a heavy operation?
>

I don't think so. This could take AccessExclusiveLock on the table and
take a long time with large shared buffer as per reported on that
thread[2].

While this can be a useful optimization, I believe it is out of scope for this patch. I want to fix vacuum never coming to append only tables without breaking other behaviors. Truncation is likely a case of enough dead tuples to trigger a vacuum via currently existing mechanisms.
 
>>
>>
>> With the proposed patch[1] we can control to do 5 or not. In addition
>> to that, another proposed patch[2] allows us to control 6.
>>
>> For append-only tables (and similar tables), what we periodically want
>> to do would be 3 and 4 (possibly we can do 2 as well). So maybe we
>> need to have both an option of (auto)vacuum to control whether to do 1
>> and something like a new autovacuum threshold (or an option) to invoke
>> the vacuum that disables 1, 5 and 6. The vacuum that does only 2, 3
>> and 4 would be much cheaper than today's vacuum and anti-wraparound
>> vacuum would be able to skip almost pages.
>
>
> Why will we want to get rid of 1? It's a noop from write perspective and saves a scan to do it if it's not noop.
>

Because that's for tables that have many inserts but have some
updates/deletes. I think that this strategy would help not only
append-only tables but also such tables.

How much do we save by skipping a heap vacuum on almost-append-only table, where amount of updates is below 50 which is current threshold?
 

> Why make it faster in emergency situations when situation can be made non-emergency from the very beginning instead?
>

I don't understand the meaning of "situation can be made non-emergency
from the very beginning". Could you please elaborate on that?

Let's imagine a simple append-only workflow on current default settings Postgres. You create a table, and start inserting tuples, one per transaction. Let's imagine a page fits 50 tuples (my case for taxi movement data), and Amazon gp2 storage which caps you say at 1000 IOPS in non-burst mode.
Anti-wrap-around-auto-vacuum (we need a drawing of misreading of this term with a crossed out car bent in Space) will be triggered in autovacuum_freeze_max_age inserts, 200000000 by default. That converts into 4000000 pages, or around 32 GB. It will be the first vacuum ever on that table, since no other mechanism triggers it, and if it steals all the available IOPS, it will finish in 200000000/50 /1000 = 4000 seconds, killing prod for over an hour.

Telemetry workloads can easily generate 32 GB of data a day (I've seen more, but let's stick to that number). Production going down for an hour a day isn't good and I consider it an emergency.

Now, two ways to fix it that reading documentation leads you while you're a sleepy one trying to get prod back:
 - raise autovacuum_freeze_max_age so VACUUM keeps sleeping;
 - rewrite code to use batching to insert more tuples at once.

We don't have a better recommendation mechanism for settings, and experience in tuning autovacuum into right direction comes at the cost of a job or company to people :)

Both ways not fix the problem but just delay the inevitable. Ratio of "one hour of vacuum per day of operation" keeps, you just delay it.
Let's say had same thing with 1000 records batched inserts, and moved autovacuum_freeze_max_age to the highest possible value. How much will the downtime last?

2**31 (max tid) * 1000 (tuples per tid) / 50 (tuples in page) / 1000 (pages per second) / 86400 (seconds in day) = 49 days.

This matches highest estimation in Mandrill's report, so that might be what have happened to them.

This all would not be needed if autovacuum came after 50 inserted tuples. It will just mark page as all visible and all frozen and be gone, while it's still in memory. This will get rid of emergency altogether.

Is this elaborate enough disaster scenario? :)


--
Darafei Praliaskouski

pgsql-hackers by date:

Previous
From: GUO Rui
Date:
Subject: Google Summer of Code: question about GiST API advancement project
Next
From: Darafei "Komяpa" Praliaskouski
Date:
Subject: Re: Berserk Autovacuum (let's save next Mandrill)