Re: hash_search_with_hash_value is high in "perf top" on a replica - Mailing list pgsql-hackers

From Andres Freund
Subject Re: hash_search_with_hash_value is high in "perf top" on a replica
Date
Msg-id ii7jyq47owdsce5bwelxusaknc7fqsnhuljnx4vfafu5ob7sy6@akljdxwtnhiw
Whole thread Raw
In response to Re: hash_search_with_hash_value is high in "perf top" on a replica  (Ants Aasma <ants.aasma@cybertec.at>)
Responses Re: hash_search_with_hash_value is high in "perf top" on a replica
List pgsql-hackers
Hi,

On 2025-02-01 15:43:41 +0100, Ants Aasma wrote:
> On Fri, Jan 31, 2025, 15:43 Andres Freund <andres@anarazel.de> wrote:
> 
> > > Maybe it's a red herring though, but it looks pretty suspicious.
> >
> > It's unfortunately not too surprising - our buffer mapping table is a
> > pretty
> > big bottleneck.  Both because a hash table is just not a good fit for the
> > buffer mapping table due to the lack of locality and because dynahash is
> > really poor hash table implementation.
> >
> 
> I measured similar things when looking at apply throughput recently. For
> in-cache workloads buffer lookup and locking was about half of the load.
> 
> One other direction is to extract more memory concurrency. Prefetcher could
> batch multiple lookups together so CPU OoO execution has a chance to fire
> off multiple memory accesses at the same time.

I think at the moment we have a *hilariously* cache-inefficient buffer lookup,
that's the first thing to address. A hash table for buffer mapping lookups imo
is a bad idea, due to loosing all locality in a workload that exhibits a *lot*
of locality. But furthermore, dynahash.c is very far from a cache efficient
hashtable implementation.

The other aspect is that in many workloads we'll look up a small set of
buffers over and over, which a) wastes cycles b) wastes cache space for stuff
that could be elided much more efficiently.

We also do a lot of hash lookups for smgr, because we don't have any
cross-record caching infrastructure for that.


> The other direction is to split off WAL decoding, buffer lookup and maybe
> even pinning to a separate process from the main redo loop.

Maybe, but I think we're rather far away from those things being the most
productive thing to tackle.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: jian he
Date:
Subject: Re: NOT ENFORCED constraint feature
Next
From: Alexander Lakhin
Date:
Subject: Re: Improving tracking/processing of buildfarm test failures