On 10/12/2018 23:37, Dmitry Dolgov wrote:
>> On Thu, Nov 29, 2018 at 6:48 PM Dmitry Dolgov <9erthalion6@gmail.com> wrote:
>>
>>> On Tue, Oct 2, 2018 at 4:53 AM Michael Paquier <michael@paquier.xyz> wrote:
>>>
>>> On Mon, Aug 06, 2018 at 06:00:54PM +0900, Yoshimi Ichiyanagi wrote:
>>>> The libpmem's pmem_map_file() supported 2M/1G(the size of huge page)
>>>> alignment, since it could reduce the number of page faults.
>>>> In addition, libpmem's pmem_memcpy_nodrain() is the function
>>>> to copy data using single instruction, multiple data(SIMD) instructions
>>>> and NT store instructions(MOVNT).
>>>> As a result, using these APIs is faster than using old mmap()/memcpy().
>>>>
>>>> Please see the PGCon2018 presentation[1] for the details.
>>>>
>>>> [1] https://www.pgcon.org/2018/schedule/attachments/507_PGCon2018_Introducing_PMDK_into_PostgreSQL.pdf
>>>
>>> So you say that this represents a 3% gain based on the presentation?
>>> That may be interesting to dig into it. Could you provide fresher
>>> performance numbers? I am moving this patch to the next CF 2018-10 for
>>> now, waiting for input from the author.
>>
>> Unfortunately, the patch has some conflicts now, so probably not only fresher
>> performance numbers are necessary, but also a rebased version.
>
> I believe the idea behind this patch is quite important (thanks to CMU DG for
> inspiring lectures), so I decided to put some efforts and rebase it to prevent
> from rotting. At the same time I have a vague impression that the patch itself
> suggests quite narrow way of using of PMDK.
Thanks.
To re-iterate what I said earlier in this thread, I think the next step
here is to write a patch that modifies xlog.c to use plain old
mmap()/msync() to memory-map the WAL files, to replace the WAL buffers.
Let's see what the performance of that is, with or without NVM hardware.
I think that might actually make the code simpler. There's a bunch of
really hairy code around locking the WAL buffers, which could be made
simpler if each backend memory-mapped the WAL segment files independently.
One thing to watch out for, is that if you read() a file, and there's an
I/O error, you have a chance to ereport() it. If you try to read from a
memory-mapped file, and there's an I/O error, the process is killed with
SIGBUS. So I think we have to be careful with using memory-mapped I/O
for reading files. But for writing WAL files, it seems like a good fit.
Once we have a reliable mmap()/msync() implementation running, it should
be straightforward to change it to use MAP_SYNC and the special CPU
instructions for the flushing.
- Heikki