Re: [HACKERS][PATCH] Applying PMDK to WAL operations for persistentmemory - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: [HACKERS][PATCH] Applying PMDK to WAL operations for persistentmemory
Date
Msg-id 2aec6e2a-6a32-0c39-e4e2-aad854543aa8@iki.fi
Whole thread Raw
In response to Re: [HACKERS][PATCH] Applying PMDK to WAL operations for persistent memory  (Dmitry Dolgov <9erthalion6@gmail.com>)
Responses Re: [HACKERS][PATCH] Applying PMDK to WAL operations for persistentmemory  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On 10/12/2018 23:37, Dmitry Dolgov wrote:
>> On Thu, Nov 29, 2018 at 6:48 PM Dmitry Dolgov <9erthalion6@gmail.com> wrote:
>>
>>> On Tue, Oct 2, 2018 at 4:53 AM Michael Paquier <michael@paquier.xyz> wrote:
>>>
>>> On Mon, Aug 06, 2018 at 06:00:54PM +0900, Yoshimi Ichiyanagi wrote:
>>>> The libpmem's pmem_map_file() supported 2M/1G(the size of huge page)
>>>> alignment, since it could reduce the number of page faults.
>>>> In addition, libpmem's pmem_memcpy_nodrain() is the function
>>>> to copy data using single instruction, multiple data(SIMD) instructions
>>>> and NT store instructions(MOVNT).
>>>> As a result, using these APIs is faster than using old mmap()/memcpy().
>>>>
>>>> Please see the PGCon2018 presentation[1] for the details.
>>>>
>>>> [1] https://www.pgcon.org/2018/schedule/attachments/507_PGCon2018_Introducing_PMDK_into_PostgreSQL.pdf
>>>
>>> So you say that this represents a 3% gain based on the presentation?
>>> That may be interesting to dig into it.  Could you provide fresher
>>> performance numbers?  I am moving this patch to the next CF 2018-10 for
>>> now, waiting for input from the author.
>>
>> Unfortunately, the patch has some conflicts now, so probably not only fresher
>> performance numbers are necessary, but also a rebased version.
> 
> I believe the idea behind this patch is quite important (thanks to CMU DG for
> inspiring lectures), so I decided to put some efforts and rebase it to prevent
> from rotting. At the same time I have a vague impression that the patch itself
> suggests quite narrow way of using of PMDK.

Thanks.

To re-iterate what I said earlier in this thread, I think the next step 
here is to write a patch that modifies xlog.c to use plain old 
mmap()/msync() to memory-map the WAL files, to replace the WAL buffers. 
Let's see what the performance of that is, with or without NVM hardware. 
I think that might actually make the code simpler. There's a bunch of 
really hairy code around locking the WAL buffers, which could be made 
simpler if each backend memory-mapped the WAL segment files independently.

One thing to watch out for, is that if you read() a file, and there's an 
I/O error, you have a chance to ereport() it. If you try to read from a 
memory-mapped file, and there's an I/O error, the process is killed with 
SIGBUS. So I think we have to be careful with using memory-mapped I/O 
for reading files. But for writing WAL files, it seems like a good fit.

Once we have a reliable mmap()/msync() implementation running, it should 
be straightforward to change it to use MAP_SYNC and the special CPU 
instructions for the flushing.

- Heikki


pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Typo: llvm*.cpp files identified as llvm*.c
Next
From: Andres Freund
Date:
Subject: Re: ArchiveEntry optional arguments refactoring