Re: [PoC] Non-volatile WAL buffer - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: [PoC] Non-volatile WAL buffer
Date
Msg-id 256f6556-b517-e81e-0b5d-df60b2fcbdef@enterprisedb.com
Whole thread Raw
In response to Re: [PoC] Non-volatile WAL buffer  (Heikki Linnakangas <hlinnaka@iki.fi>)
Responses Re: [PoC] Non-volatile WAL buffer  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
List pgsql-hackers

On 11/26/20 9:59 PM, Heikki Linnakangas wrote:
> On 26/11/2020 21:27, Tomas Vondra wrote:
>> Hi,
>>
>> Here's the "simple patch" that I'm currently experimenting with. It
>> essentially replaces open/close/write/fsync with pmem calls
>> (map/unmap/memcpy/persist variants), and it's by no means committable.
>> But it works well enough for experiments / measurements, etc.
>>
>> The numbers (5-minute pgbench runs on scale 500) look like this:
>>
>>           master/btt    master/dax           ntt        simple
>>     -----------------------------------------------------------
>>       1         5469          7402          7977          6746
>>      16        48222         80869        107025         82343
>>      32        73974        158189        214718        158348
>>      64        85921        154540        225715        164248
>>      96       150602        221159        237008        217253
>>
>> A chart illustrating these results is attached. The four columns are
>> showing unpatched master with WAL on a pmem device, in BTT or DAX modes,
>> "ntt" is the patch submitted to this thread, and "simple" is the patch
>> I've hacked together.
>>
>> As expected, the BTT case performs poorly (compared to the rest).
>>
>> The "master/dax" and "simple" perform about the same. There are some
>> differences, but those may be attributed to noise. The NTT patch does
>> outperform these cases by ~20-40% in some cases.
>>
>> The question is why. I recall suggestions this is due to page faults
>> when writing data into the WAL, but I did experiment with various
>> settings that I think should prevent that (e.g. disabling WAL reuse
>> and/or disabling zeroing the segments) but that made no measurable
>> difference.
> 
> The page faults are only a problem when mmap() is used *without* DAX.
> 
> Takashi tried a patch earlier to mmap() WAL segments and insert WAL to
> them directly. See 0002-Use-WAL-segments-as-WAL-buffers.patch at
> https://www.postgresql.org/message-id/000001d5dff4%24995ed180%24cc1c7480%24%40hco.ntt.co.jp_1.
> Could you test that patch too, please? Using your nomenclature, that
> patch skips wal_buffers and does:
> 
>   clients -> wal segments (PMEM DAX)
> 
> He got good results with that with DAX, but otherwise it performed
> worse. And then we discussed why that might be, and the page fault
> hypothesis was brought up.
> 

D'oh, I haven't noticed there's a patch doing that. This thread has so
many different patches - which is good, but a bit confusing.

> I think 0002-Use-WAL-segments-as-WAL-buffers.patch is the most promising
> approach here. But because it's slower without DAX, we need to keep the
> current code for non-DAX systems. Unfortunately it means that we need to
> maintain both implementations, selectable with a GUC or some DAX
> detection magic. The question then is whether the code complexity is
> worth the performance gin on DAX-enabled systems.
> 

Sure, I can give it a spin. The question is whether it applies to
current master, or whether some sort of rebase is needed. I'll try.

> Andres was not excited about mmapping the WAL segments because of
> performance reasons. I'm not sure how much of his critique applies if we
> keep supporting both methods and only use mmap() if so configured.
> 

Yeah. I don't think we can just discard the current approach, there are
far too many OS variants that even if Linux is happy one of the other
critters won't be.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: [PoC] Non-volatile WAL buffer
Next
From: Patrick Handja
Date:
Subject: Setof RangeType returns