Re: Cache relation sizes? - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: Cache relation sizes?
Date
Msg-id CA+hUKG+d-9sETQaGfBGbGBOAPS-GjDns_vSMYhDuRW=VsYrzZw@mail.gmail.com
Whole thread Raw
In response to Re: Cache relation sizes?  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Responses Re: Cache relation sizes?
List pgsql-hackers
On Tue, Dec 31, 2019 at 4:43 PM Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> I still believe that one shared memory element for every
> non-mapped relation is not only too-complex but also too-much, as
> Andres (and implicitly I) wrote. I feel that just one flag for
> all works fine but partitioned flags (that is, relations or files
> corresponds to the same hash value share one flag) can reduce the
> shared memory elements to a fixed small number.

There is one potentially interesting case that doesn't require any
kind of shared cache invalidation AFAICS.  XLogReadBufferExtended()
calls smgrnblocks() for every buffer access, even if the buffer is
already in our buffer pool.  I tried to make yet another quick
experiment-grade patch to cache the size[1], this time for use in
recovery only.

initdb -D pgdata
postgres -D pgdata -c checkpoint_timeout=60min

In another shell:
pgbench -i -s100 postgres
pgbench -M prepared -T60 postgres
killall -9 postgres
mv pgdata pgdata-save

Master branch:

cp -r pgdata-save pgdata
strace -c -f postgres -D pgdata
[... wait for "redo done", then hit ^C ...]
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
...
 18.61   22.492286          26    849396           lseek
  6.95    8.404369          30    277134           pwrite64
  6.63    8.009679          28    277892           pread64
  0.50    0.604037          39     15169           sync_file_range
...

Patched:

rm -fr pgdata
cp -r pgdata-save pgdata
strace -c -f ~/install/bin/postgres -D pgdata
[... wait for "redo done", then hit ^C ...]
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
...
 16.33    8.097631          29    277134           pwrite64
 15.56    7.715052          27    277892           pread64
  1.13    0.559648          39     14137           sync_file_range
...
  0.00    0.001505          25        59           lseek

> Note: I'm still not sure how much lseek impacts performance.

It doesn't seem great that we are effectively making system calls for
most WAL records we replay, but, sadly, in this case the patch didn't
really make any measurable difference when run without strace on this
Linux VM.  I suspect there is some workload and stack where it would
make a difference (CF the read(postmaster pipe) call for every WAL
record that was removed), but this is just something I noticed in
passing while working on something else, so I haven't investigated
much.

[1] https://github.com/postgres/postgres/compare/master...macdice:cache-nblocks
(just a test, unfinished, probably has bugs)



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Clarifying/rationalizing Vars' varno/varattno/varnoold/varoattno
Next
From: Amit Kapila
Date:
Subject: Re: [HACKERS] Block level parallel vacuum