Re: Eager page freeze criteria clarification - Mailing list pgsql-hackers
From | Melanie Plageman |
---|---|
Subject | Re: Eager page freeze criteria clarification |
Date | |
Msg-id | CAAKRu_axOmycL=OmQprUiyMDDQxSoJ8Ht-GwtY2gxHPcRJu5tA@mail.gmail.com Whole thread Raw |
In response to | Re: Eager page freeze criteria clarification (Melanie Plageman <melanieplageman@gmail.com>) |
List | pgsql-hackers |
On Sat, Sep 23, 2023 at 3:53 PM Melanie Plageman <melanieplageman@gmail.com> wrote: > > Workload F: > > +------+--------+---------------------+--------------------+------------------+ > | algo | WAL GB | cptr bgwriter writes| other reads/writes | IO time AV worker| > +------+--------+---------------------+---------------------+-----------------+ > | M | 173 | 1,202,231 | 53,957,448 | 12,389 | > | 4 | 189 | 1,212,521 | 55,589,140 | 13,084 | > | 5 | 173 | 1,194,242 | 54,260,118 | 13,407 | > +------+--------+---------------------+--------------------+------------------+ > > +------+--------------+ > | algo | P99 latency | > +------+--------------+ > | M | 19875 | > | 4 | 19314 | > | 5 | 19701 | > +------+--------------+ Andres mentioned that the P99 latency for the COPY workload (workload F) might not be meaningful, so I have calculated the duration total, mean, median, min, max and standard deviation in milliseconds. Workload F: +------+------------+-------+--------+--------+--------+---------+ | algo | Total | Mean| Median | Min | Max | Stddev | +------+------------+-------+--------+--------+--------+---------+ | M | 1,270,903 | 18,155| 17,755 | 17,090 | 19,994 | 869 | | 4 | 1,167,135 | 16,673| 16,421 | 15,585 | 19,485 | 811 | | 5 | 1,250,145 | 17,859| 17,704 | 15,763 | 19,871 | 1,009 | +------+------------+-------+--------+--------+--------+---------+ Interestingly, algorithm 4 had the lowest total duration for all COPYs. Some investigation of other data collected during the runs led us to believe this may be due to autovacuum workers doing more IO with algorithm 4 and thus generating more WAL and ending up initializing more WAL files themselves. Whereas on master and with algorithm 5, client backends had to initialize WAL files themselves, leading COPYs to take longer. This was supported by the presence of more WALInit wait events for client backends on master and with algorithm 5. Calculating these made me realize that my conclusions about the work queue workload (workload I) didn't make much sense. Because this workload updated a non-indexed column, most pruning was HOT pruning done on access and basically no page freezing was done by vacuum. This means we weren't seeing negative performance effects of freezing related to the work queue table. The difference in this benchmark came from the relatively poor performance of the concurrent COPYs when that table was frozen more aggressively. I plan to run a new version of this workload which updates an indexed column for comparison and does not use a concurrent COPY. This is the duration total, mean, median, min, max, and standard deviation in milliseconds of the COPYs which ran concurrently with the work queue pgbench. Workload I COPYs: +------+--------+-------+--------+--------+--------+---------+ | algo | Total | Mean | Median | Min | Max | Stddev | +------+--------+-------+--------+--------+--------+---------+ | M | 191,032| 4,898| 4,726 | 4,486 | 9,353 | 800 | | 4 | 193,534| 4,962| 4,793 | 4,533 | 9,381 | 812 | | 5 | 194,351| 4,983| 4,771 | 4,617 | 9,159 | 783 | +------+--------+-------+--------+--------+--------+---------+ I think this shows that algorithm 4 COPYs performed the worst. This is in contrast to the COPY-only workload (F) which did not show worse performance for algorithm 4. I think this means I should modify the work queue example and use something other than concurrent COPYs to avoid obscuring characteristics of the work queue example. - Melanie
pgsql-hackers by date: