Re: New WAL record to detect the checkpoint redo location - Mailing list pgsql-hackers

From Andres Freund
Subject Re: New WAL record to detect the checkpoint redo location
Date
Msg-id 20231005183400.n5myso7vu6crd656@alap3.anarazel.de
Whole thread Raw
In response to Re: New WAL record to detect the checkpoint redo location  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: New WAL record to detect the checkpoint redo location
Re: New WAL record to detect the checkpoint redo location
List pgsql-hackers
Hi,

On 2023-10-02 10:42:37 -0400, Robert Haas wrote:
> I was trying to think of a test case where XLogInsertRecord would be
> exercised as heavily as possible, so I really wanted to generate a lot
> of WAL while doing as little real work as possible. The best idea that
> I had was to run pg_create_restore_point() in a loop.

What I use for that is pg_logical_emit_message(). Something like

SELECT count(*)
FROM
    (
        SELECT pg_logical_emit_message(false, '1', 'short'), generate_series(1, 10000)
    );

run via pgbench does seem to exercise that path nicely.


> One possible conclusion is that the differences here aren't actually
> big enough to get stressed about, but I don't want to jump to that
> conclusion without investigating the competing hypothesis that this
> isn't the right way to test this, and that some better test would show
> clearer results. Suggestions?

I saw some small differences in runtime running pgbench with the above query,
with a single client. Comparing profiles showed a surprising degree of
difference. That turns out to mostly a consequence of the fact that
ReserveXLogInsertLocation() isn't inlined anymore, because there now are two
callers of the function in XLogInsertRecord().

Unfortunately, I still see a small performance difference after that. To get
the most reproducible numbers, I disable turbo boost, bound postgres to one
cpu core, bound pgbench to another core. Over a few runs I quite reproducibly
get ~319.323 tps with your patches applied (+ always inline), and ~324.674
with master.

If I add an unlikely around if (rechdr->xl_rmid == RM_XLOG_ID), the
performance does improve. But that "only" brings it up to 322.406. Not sure
what the rest is.


One thing that's notable, but not related to the patch, is that we waste a
fair bit of cpu time below XLogInsertRecord() with divisions. I think they're
all due to the use of UsableBytesInSegment in
XLogBytePosToRecPtr/XLogBytePosToEndRecPtr.  The multiplication of
XLogSegNoOffsetToRecPtr() also shows.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Annoying build warnings from latest Apple toolchain
Next
From: Nathan Bossart
Date:
Subject: Re: Add a new BGWORKER_BYPASS_ROLELOGINCHECK flag