Re: [PoC] Non-volatile WAL buffer - Mailing list pgsql-hackers
From | Heikki Linnakangas |
---|---|
Subject | Re: [PoC] Non-volatile WAL buffer |
Date | |
Msg-id | 451659d8-10a5-f5a4-6003-bea8dfcda9e6@iki.fi Whole thread Raw |
In response to | Re: [PoC] Non-volatile WAL buffer (Tomas Vondra <tomas.vondra@enterprisedb.com>) |
Responses |
Re: [PoC] Non-volatile WAL buffer
(Tomas Vondra <tomas.vondra@enterprisedb.com>)
|
List | pgsql-hackers |
On 26/11/2020 21:27, Tomas Vondra wrote: > Hi, > > Here's the "simple patch" that I'm currently experimenting with. It > essentially replaces open/close/write/fsync with pmem calls > (map/unmap/memcpy/persist variants), and it's by no means committable. > But it works well enough for experiments / measurements, etc. > > The numbers (5-minute pgbench runs on scale 500) look like this: > > master/btt master/dax ntt simple > ----------------------------------------------------------- > 1 5469 7402 7977 6746 > 16 48222 80869 107025 82343 > 32 73974 158189 214718 158348 > 64 85921 154540 225715 164248 > 96 150602 221159 237008 217253 > > A chart illustrating these results is attached. The four columns are > showing unpatched master with WAL on a pmem device, in BTT or DAX modes, > "ntt" is the patch submitted to this thread, and "simple" is the patch > I've hacked together. > > As expected, the BTT case performs poorly (compared to the rest). > > The "master/dax" and "simple" perform about the same. There are some > differences, but those may be attributed to noise. The NTT patch does > outperform these cases by ~20-40% in some cases. > > The question is why. I recall suggestions this is due to page faults > when writing data into the WAL, but I did experiment with various > settings that I think should prevent that (e.g. disabling WAL reuse > and/or disabling zeroing the segments) but that made no measurable > difference. The page faults are only a problem when mmap() is used *without* DAX. Takashi tried a patch earlier to mmap() WAL segments and insert WAL to them directly. See 0002-Use-WAL-segments-as-WAL-buffers.patch at https://www.postgresql.org/message-id/000001d5dff4%24995ed180%24cc1c7480%24%40hco.ntt.co.jp_1. Could you test that patch too, please? Using your nomenclature, that patch skips wal_buffers and does: clients -> wal segments (PMEM DAX) He got good results with that with DAX, but otherwise it performed worse. And then we discussed why that might be, and the page fault hypothesis was brought up. I think 0002-Use-WAL-segments-as-WAL-buffers.patch is the most promising approach here. But because it's slower without DAX, we need to keep the current code for non-DAX systems. Unfortunately it means that we need to maintain both implementations, selectable with a GUC or some DAX detection magic. The question then is whether the code complexity is worth the performance gin on DAX-enabled systems. Andres was not excited about mmapping the WAL segments because of performance reasons. I'm not sure how much of his critique applies if we keep supporting both methods and only use mmap() if so configured. - Heikki
pgsql-hackers by date: