Home > mailing lists

Re: New WAL record to detect the checkpoint redo location - Mailing list pgsql-hackers

From	Andres Freund
Subject	Re: New WAL record to detect the checkpoint redo location
Date	October 5, 2023 21:34:00
Msg-id	20231005183400.n5myso7vu6crd656@alap3.anarazel.de Whole thread Raw
In response to	Re: New WAL record to detect the checkpoint redo location (Robert Haas <robertmhaas@gmail.com>)
Responses	Re: New WAL record to detect the checkpoint redo location Re: New WAL record to detect the checkpoint redo location
List	pgsql-hackers

Tree view

Hi,

On 2023-10-02 10:42:37 -0400, Robert Haas wrote:
> I was trying to think of a test case where XLogInsertRecord would be
> exercised as heavily as possible, so I really wanted to generate a lot
> of WAL while doing as little real work as possible. The best idea that
> I had was to run pg_create_restore_point() in a loop.

What I use for that is pg_logical_emit_message(). Something like

SELECT count(*)
FROM
    (
        SELECT pg_logical_emit_message(false, '1', 'short'), generate_series(1, 10000)
    );

run via pgbench does seem to exercise that path nicely.

> One possible conclusion is that the differences here aren't actually
> big enough to get stressed about, but I don't want to jump to that
> conclusion without investigating the competing hypothesis that this
> isn't the right way to test this, and that some better test would show
> clearer results. Suggestions?

I saw some small differences in runtime running pgbench with the above query,
with a single client. Comparing profiles showed a surprising degree of
difference. That turns out to mostly a consequence of the fact that
ReserveXLogInsertLocation() isn't inlined anymore, because there now are two
callers of the function in XLogInsertRecord().

Unfortunately, I still see a small performance difference after that. To get
the most reproducible numbers, I disable turbo boost, bound postgres to one
cpu core, bound pgbench to another core. Over a few runs I quite reproducibly
get ~319.323 tps with your patches applied (+ always inline), and ~324.674
with master.

If I add an unlikely around if (rechdr->xl_rmid == RM_XLOG_ID), the
performance does improve. But that "only" brings it up to 322.406. Not sure
what the rest is.

One thing that's notable, but not related to the patch, is that we waste a
fair bit of cpu time below XLogInsertRecord() with divisions. I think they're
all due to the use of UsableBytesInSegment in
XLogBytePosToRecPtr/XLogBytePosToEndRecPtr.  The multiplication of
XLogSegNoOffsetToRecPtr() also shows.

Greetings,

Andres Freund

pgsql-hackers by date:

From: Tom Lane
Date: 05 October 2023, 20:37:38
Subject: Re: Annoying build warnings from latest Apple toolchain

From: Nathan Bossart
Date: 05 October 2023, 21:39:15
Subject: Re: Add a new BGWORKER_BYPASS_ROLELOGINCHECK flag

Re: New WAL record to detect the checkpoint redo location - Mailing list pgsql-hackers

Previous

Next