Re: Eagerly scan all-visible pages to amortize aggressive vacuum - Mailing list pgsql-hackers

From Melanie Plageman
Subject Re: Eagerly scan all-visible pages to amortize aggressive vacuum
Date
Msg-id CAAKRu_aiDH6=HSaNCGmG9PFS4Vw-fMbVdF6XggM5Eqyz_=tLJQ@mail.gmail.com
Whole thread Raw
In response to Re: Eagerly scan all-visible pages to amortize aggressive vacuum  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On Fri, Jan 24, 2025 at 11:20 AM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Fri, Jan 24, 2025 at 9:15 AM Melanie Plageman
> <melanieplageman@gmail.com> wrote:
> > So, in this case, there is only one table in question, so 1 autovacuum
> > worker (and up to 2 maintenance parallel workers for index vacuuming).
> > The duration I provided is just the absolute duration from start of
> > vacuum to finish -- not considering the amount of time each parallel
> > worker may have been working (also it includes time spent delaying).
> > The benchmark ran for 2.8 hours. I configured vacuum to run
> > frequently. In this case, master spent 47% of the total time vacuuming
> > and the patch spent 56%.
>
> Definitely not insignificant, but I think it's OK for a worst case.
> Autovacuum is a background process, so it's not like a 20% regression
> on query performance.

So, I've done a few runs with FPIs turned off to reduce run variance
caused by vacuum and checkpoint timing.
Of course this means that the amount of IO done by vacuum is very
different from a benchmark run with realistic settings.

I reran two of my simulations:

1)
- hot tail
    32 clients inserting 20 rows then updating 1 row
    duration: 3 hours

There is a small increase in total time spent vacuuming (< 10%). But
it is spread out. The first aggressive vacuum of the table is 20
seconds with the patch and 9 minutes on master. And this is not an
append-only workload -- the tail of the table (up to 200,000 rows old)
is being updated (and potentially unfrozen). So, this feels like a
win.

The insert/update P99 latency is lower (better) with the patch around
the time of the aggressive vacuum.

2)
- hot tail with delete (worst-case)
    32 clients inserting 20 rows then updating 1 row and 1
rate-limited client deleting all data before it can be aggressively
vacuumed
    durations: 3 hours

There is a 10-15% increase in total time spent vacuuming with the
patch (30-40% of total benchmark runtime is spent vacuuming).

I ran the benchmark for 4 hours as well, and for that duration I
started to see a larger increase in vacuum IO time with the patch.
However, the 4 hour run had only one aggressive vacuum (around the 2.5
hour mark), so the numbers are hard to compare because the patch is
meant to do some of the work of the next aggressive vacuum in advance.

The insert/update P99 latency is the same or lower (better) with the patch.

Next I plan to run the hottail delete benchmark with default settings
(including FPIs) with master and with the patch for about 24 hours
each. I'm hoping the long duration will smooth out some of the run
variance even with FPIs.

- Melanie



pgsql-hackers by date:

Previous
From: Nazir Bilal Yavuz
Date:
Subject: Re: Windows CFBot is broken because ecpg dec_test.c error
Next
From: Andres Freund
Date:
Subject: Re: Windows CFBot is broken because ecpg dec_test.c error