Re: Eager page freeze criteria clarification - Mailing list pgsql-hackers

From Melanie Plageman
Subject Re: Eager page freeze criteria clarification
Date
Msg-id CAAKRu_axOmycL=OmQprUiyMDDQxSoJ8Ht-GwtY2gxHPcRJu5tA@mail.gmail.com
Whole thread Raw
In response to Re: Eager page freeze criteria clarification  (Melanie Plageman <melanieplageman@gmail.com>)
List pgsql-hackers
On Sat, Sep 23, 2023 at 3:53 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
>
> Workload F:
>
> +------+--------+---------------------+--------------------+------------------+
> | algo | WAL GB | cptr bgwriter writes| other reads/writes | IO time AV worker|
> +------+--------+---------------------+---------------------+-----------------+
> |    M |    173 |           1,202,231 |         53,957,448 |           12,389 |
> |    4 |    189 |           1,212,521 |         55,589,140 |           13,084 |
> |    5 |    173 |           1,194,242 |         54,260,118 |           13,407 |
> +------+--------+---------------------+--------------------+------------------+
>
> +------+--------------+
> | algo | P99 latency  |
> +------+--------------+
> |    M |       19875  |
> |    4 |       19314  |
> |    5 |       19701  |
> +------+--------------+

Andres mentioned that the P99 latency for the COPY workload (workload F)
might not be meaningful, so I have calculated the duration total, mean,
median, min, max and standard deviation in milliseconds.

Workload F:
+------+------------+-------+--------+--------+--------+---------+
| algo |      Total |   Mean| Median |    Min |    Max |  Stddev |
+------+------------+-------+--------+--------+--------+---------+
|    M |  1,270,903 | 18,155| 17,755 | 17,090 | 19,994 |     869 |
|    4 |  1,167,135 | 16,673| 16,421 | 15,585 | 19,485 |     811 |
|    5 |  1,250,145 | 17,859| 17,704 | 15,763 | 19,871 |   1,009 |
+------+------------+-------+--------+--------+--------+---------+

Interestingly, algorithm 4 had the lowest total duration for all COPYs.
Some investigation of other data collected during the runs led us to
believe this may be due to autovacuum workers doing more IO with
algorithm 4 and thus generating more WAL and ending up initializing more
WAL files themselves. Whereas on master and with algorithm 5, client
backends had to initialize WAL files themselves, leading COPYs to take
longer. This was supported by the presence of more WALInit wait events
for client backends on master and with algorithm 5.

Calculating these made me realize that my conclusions about the work
queue workload (workload I) didn't make much sense. Because this
workload updated a non-indexed column, most pruning was HOT pruning done
on access and basically no page freezing was done by vacuum. This means
we weren't seeing negative performance effects of freezing related to
the work queue table.

The difference in this benchmark came from the relatively poor
performance of the concurrent COPYs when that table was frozen more
aggressively. I plan to run a new version of this workload which updates
an indexed column for comparison and does not use a concurrent COPY.

This is the duration total, mean, median, min, max, and standard
deviation in milliseconds of the COPYs which ran concurrently with the
work queue pgbench.

Workload I COPYs:
+------+--------+-------+--------+--------+--------+---------+
| algo |  Total |  Mean | Median |   Min  |   Max  |  Stddev |
+------+--------+-------+--------+--------+--------+---------+
|    M | 191,032|  4,898|  4,726 |  4,486 |  9,353 |     800 |
|    4 | 193,534|  4,962|  4,793 |  4,533 |  9,381 |     812 |
|    5 | 194,351|  4,983|  4,771 |  4,617 |  9,159 |     783 |
+------+--------+-------+--------+--------+--------+---------+

I think this shows that algorithm 4 COPYs performed the worst. This is
in contrast to the COPY-only workload (F) which did not show worse
performance for algorithm 4. I think this means I should modify the work
queue example and use something other than concurrent COPYs to avoid
obscuring characteristics of the work queue example.

- Melanie



pgsql-hackers by date:

Previous
From: Erik Wienhold
Date:
Subject: Re: How to Know the number of attrs?
Next
From: vignesh C
Date:
Subject: Re: Invalidate the subscription worker in cases where a user loses their superuser status