Re: [PoC] Non-volatile WAL buffer - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: [PoC] Non-volatile WAL buffer |
Date | |
Msg-id | a60bfa39-dd59-eff1-7941-29264c558830@enterprisedb.com Whole thread Raw |
In response to | RE: [PoC] Non-volatile WAL buffer ("tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com>) |
Responses |
RE: [PoC] Non-volatile WAL buffer
("tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com>)
|
List | pgsql-hackers |
On 11/24/20 7:34 AM, tsunakawa.takay@fujitsu.com wrote: > From: Tomas Vondra <tomas.vondra@enterprisedb.com> >> So I wonder if using PMEM for the WAL buffer is the right way forward. >> AFAIK the WAL buffer is quite concurrent (multiple clients writing >> data), which seems to contradict the PMEM vs. DRAM trade-offs. >> >> The design I've originally expected would look more like this >> >> clients -> wal buffers (DRAM) -> wal segments (PMEM DAX) >> >> i.e. mostly what we have now, but instead of writing the WAL segments >> "the usual way" we'd write them using mmap/memcpy, without fsync. >> >> I suppose that's what Heikki meant too, but I'm not sure. > > SQL Server probably does so. Please see the following page and the links in "Next steps" section. I'm saying "probably"because the document doesn't clearly state whether SQL Server memcpys data from DRAM log cache to non-volatilelog cache only for transaction commits or for all log cache writes. I presume the former. > > > Add persisted log buffer to a database > https://docs.microsoft.com/en-us/sql/relational-databases/databases/add-persisted-log-buffer?view=sql-server-ver15 > -------------------------------------------------- > With non-volatile, tail of the log storage the pattern is > > memcpy to LC > memcpy to NV LC > Set status > Return control to caller (commit is now valid) > ... > > With this new functionality, we use a region of memory which is mapped to a file on a DAX volume to hold that buffer. Sincethe memory hosted by the DAX volume is already persistent, we have no need to perform a separate flush, and can immediatelycontinue with processing the next operation. Data is flushed from this buffer to more traditional storage in thebackground. > -------------------------------------------------- > Interesting, thanks for the likn. If I understand [1] correctly, they essentially do this: clients -> buffers (DRAM) -> buffers (PMEM) -> wal (storage) that is, they insert the PMEM buffer between the LC (in DRAM) and traditional (non-PMEM) storage, so that a commit does not need to do any fsyncs etc. It seems to imply the memcpy between DRAM and PMEM happens right when writing the WAL, but I guess that's not strictly required - we might just as well do that in the background, I think. It's interesting that they only place the tail of the log on PMEM, i.e. the PMEM buffer has limited size, and the rest of the log is not on PMEM. It's a bit as if we inserted a PMEM buffer between our wal buffers and the WAL segments, and kept the WAL segments on regular storage. That could work, but I'd bet they did that because at that time the NV devices were much smaller, and placing the whole log on PMEM was not quite possible. So it might be unnecessarily complicated, considering the PMEM device capacity is much higher now. So I'd suggest we simply try this: clients -> buffers (DRAM) -> wal segments (PMEM) I plan to do some hacking and maybe hack together some simple tools to benchmarks various approaches. regards [1] https://docs.microsoft.com/en-us/archive/blogs/bobsql/how-it-works-it-just-runs-faster-non-volatile-memory-sql-server-tail-of-log-caching-on-nvdimm -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: