Andres Freund <andres@anarazel.de> writes:
> Indeed. The buffer mapping hashtable already is visible as a major
> bottleneck in a number of workloads. Even in readonly pgbench if s_b is
> large enough (so the hashtable is larger than the cache). Not to speak
> of things like a cached sequential scan with a cheap qual and wide rows.
To be fair, the added overhead is in buffer allocation not buffer lookup.
So it shouldn't add cost to fully-cached cases. As Tomas noted upthread,
the potential trouble spot is where the working set is bigger than shared
buffers but still fits in RAM (so there's no actual I/O needed, but we do
still have to shuffle buffers a lot).
> Wonder if the temporary fix is just to do explicit hashtable probes for
> all pages iff the size of the relation is < s_b / 500 or so. That'll
> address the case where small tables are frequently dropped - and
> dropping large relations is more expensive from the OS and data loading
> perspective, so it's not gonna happen as often.
Oooh, interesting idea. We'd need a reliable idea of how long the
relation had been (preferably without adding an lseek call), but maybe
that's do-able.
regards, tom lane