Re: [PoC] Non-volatile WAL buffer - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: [PoC] Non-volatile WAL buffer
Date
Msg-id a60bfa39-dd59-eff1-7941-29264c558830@enterprisedb.com
Whole thread Raw
In response to RE: [PoC] Non-volatile WAL buffer  ("tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com>)
Responses RE: [PoC] Non-volatile WAL buffer  ("tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com>)
List pgsql-hackers

On 11/24/20 7:34 AM, tsunakawa.takay@fujitsu.com wrote:
> From: Tomas Vondra <tomas.vondra@enterprisedb.com>
>> So I wonder if using PMEM for the WAL buffer is the right way forward.
>> AFAIK the WAL buffer is quite concurrent (multiple clients writing
>> data), which seems to contradict the PMEM vs. DRAM trade-offs.
>>
>> The design I've originally expected would look more like this
>>
>>     clients -> wal buffers (DRAM) -> wal segments (PMEM DAX)
>>
>> i.e. mostly what we have now, but instead of writing the WAL segments
>> "the usual way" we'd write them using mmap/memcpy, without fsync.
>>
>> I suppose that's what Heikki meant too, but I'm not sure.
> 
> SQL Server probably does so.  Please see the following page and the links in "Next steps" section.  I'm saying
"probably"because the document doesn't clearly state whether SQL Server memcpys data from DRAM log cache to
non-volatilelog cache only for transaction commits or for all log cache writes.  I presume the former.
 
> 
> 
> Add persisted log buffer to a database
> https://docs.microsoft.com/en-us/sql/relational-databases/databases/add-persisted-log-buffer?view=sql-server-ver15
> --------------------------------------------------
> With non-volatile, tail of the log storage the pattern is
> 
> memcpy to LC
> memcpy to NV LC
> Set status
> Return control to caller (commit is now valid)
> ...
> 
> With this new functionality, we use a region of memory which is mapped to a file on a DAX volume to hold that buffer.
Sincethe memory hosted by the DAX volume is already persistent, we have no need to perform a separate flush, and can
immediatelycontinue with processing the next operation. Data is flushed from this buffer to more traditional storage in
thebackground.
 
> --------------------------------------------------
> 

Interesting, thanks for the likn. If I understand [1] correctly, they
essentially do this:

    clients -> buffers (DRAM) -> buffers (PMEM) -> wal (storage)

that is, they insert the PMEM buffer between the LC (in DRAM) and
traditional (non-PMEM) storage, so that a commit does not need to do any
fsyncs etc.

It seems to imply the memcpy between DRAM and PMEM happens right when
writing the WAL, but I guess that's not strictly required - we might
just as well do that in the background, I think.

It's interesting that they only place the tail of the log on PMEM, i.e.
the PMEM buffer has limited size, and the rest of the log is not on
PMEM. It's a bit as if we inserted a PMEM buffer between our wal buffers
and the WAL segments, and kept the WAL segments on regular storage. That
could work, but I'd bet they did that because at that time the NV
devices were much smaller, and placing the whole log on PMEM was not
quite possible. So it might be unnecessarily complicated, considering
the PMEM device capacity is much higher now.

So I'd suggest we simply try this:

    clients -> buffers (DRAM) -> wal segments (PMEM)

I plan to do some hacking and maybe hack together some simple tools to
benchmarks various approaches.


regards

[1]

https://docs.microsoft.com/en-us/archive/blogs/bobsql/how-it-works-it-just-runs-faster-non-volatile-memory-sql-server-tail-of-log-caching-on-nvdimm

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [HACKERS] Custom compression methods
Next
From: Tom Lane
Date:
Subject: Re: mark/restore failures on unsorted merge joins