Re: hash_search_with_hash_value is high in "perf top" on a replica - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: hash_search_with_hash_value is high in "perf top" on a replica
Date
Msg-id CA+hUKG+K_0-gYFiEZXNSz0=rv5YwN5LdVmZHVdLUO3DoKypc7w@mail.gmail.com
Whole thread Raw
In response to Re: hash_search_with_hash_value is high in "perf top" on a replica  (Ants Aasma <ants.aasma@cybertec.at>)
Responses Re: hash_search_with_hash_value is high in "perf top" on a replica
List pgsql-hackers
On Sun, Feb 2, 2025 at 3:44 AM Ants Aasma <ants.aasma@cybertec.at> wrote:
> The other direction is to split off WAL decoding, buffer lookup and maybe even pinning to a separate process from the
mainredo loop. 

Hi Ants,

FWIW I have a patch set that changes xlogprefetcher.c to use
read_stream.c, which I hope to propose for 19.  It pins ahead of time,
which means that it  already consolidates some pin/unpin/repin
sequences when it looks ahead, a little bit like what you're
suggesting, just not in a separate process.  Here[1] is a preview, but
it needs some major rebasing and has some issues... it's on my list...
I mention this because, although the primary goal of the
recovery-streamification project is recovery I/O performance,
especially anticipating direct I/O (especially without FPW), there are
many more opportunities along those lines for cached blocks.  I've
written about some of them before and posted toy sketch code, but I
didn't let myself get too distracted as the first thing seems to be
real streamification (replaying LsnReadQueue).  With a bit more work
it could avoid lots of buffer mapping table lookups too: I posted a
few ideas and toy patches, one based on already held pins which begins
to pay off once you hold many pins in a window, and one more generally
caching recently accessed buffers in smgr relations.  That'd be
independent of whether our buffer mapping table also sucks and needs a
rewrite, it bypasses it completely in many interesting cases, and
would need to be so cheap as to be basically free when it doesn't
help.  An assumption I just made, in remembering all that: OP didn't
mention it but I guess that this COPY replay is probably repeatedly
hammering the same pages from separate records here, because otherwise
multi-insert stuff would already avoid a lot of mapping table lookups
already?  There are many more automatic batching opportunities, like
holding content locks across records (I posted a toy patch to measure
the potential speedup for that once, too lazy to search for it...),
and then to really benefit from all of these things in more general
real workloads I think you need to be able to do a little bit of
re-ordering because eg heap/index accesses are so often interleaved.
IOW there seem to be plenty of lower-hanging fruit in the "mechanical
sympathy" category, before you need another thread.

No argument against also making the hashing faster, the mapping table
better, and eventually parallelising some bits when it's not just
compensating for inefficient code that throws away all the locality, I
just wanted to mention that related work in the pipeline :-)

[1] https://github.com/macdice/postgres/commit/52533b652e544464add6f174019161889a37ed93



pgsql-hackers by date:

Previous
From: Alexander Korotkov
Date:
Subject: Re: SIGSEGV, FPE fix in pg_controldata
Next
From: Thomas Munro
Date:
Subject: Re: hash_search_with_hash_value is high in "perf top" on a replica