On Thu, Aug 27, 2020 at 6:15 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> > --4.90%--smgropen
> > |--2.86%--ReadBufferWithoutRelcache
>
> Looking at an earlier report of this problem I was thinking whether it'd
> make sense to replace SMgrRelationHash with a simplehash table; I have a
> half-written patch for that, but I haven't completed that work.
> However, in the older profile things were looking different, as
> hash_search_with_hash_value was taking 35.25%, and smgropen was 33.74%
> of it. BufTableLookup was also there but only 1.51%. So I'm not so
> sure now that that'll pay off as clearly as I had hoped.
Right, my hypothesis requires an uncacheably large buffer mapping
table, and I think smgropen() needs a different explanation because
it's not expected to be as large or as random, at least not with a
pgbench workload. I think the reasons for a profile with a smgropen()
showing up so high, and in particular higher than BufTableLookup(),
must be:
1. We call smgropen() twice for every call to BufTableLookup(). Once
in XLogReadBufferExtended(), and then again in
ReadBufferWithoutRelcache().
2. We also call it for every block forced out of the buffer pool, and
in recovery that has to be done by the recovery loop.
3. We also call it for every block in the buffer pool during the
end-of-recovery checkpoint.
Not sure but the last two might perform worse due to proximity to
interleaving pwrite() system calls (just a thought, not investigated).
In any case, I'm going to propose we move those things out of the
recovery loop, in a new thread.