RE: [PoC] Non-volatile WAL buffer - Mailing list pgsql-hackers
From | tsunakawa.takay@fujitsu.com |
---|---|
Subject | RE: [PoC] Non-volatile WAL buffer |
Date | |
Msg-id | TYAPR01MB29905D0578CEC4CA0CCC1EC8FE889@TYAPR01MB2990.jpnprd01.prod.outlook.com Whole thread Raw |
In response to | Re: [PoC] Non-volatile WAL buffer (Masahiko Sawada <sawada.mshk@gmail.com>) |
Responses |
Re: [PoC] Non-volatile WAL buffer
|
List | pgsql-hackers |
From: Masahiko Sawada <sawada.mshk@gmail.com> > I've done some performance benchmarks with the master and NTT v4 > patch. Let me share the results. > ... > master NTT master-unlogged > 32 113209 67107 154298 > 64 144880 54289 178883 > 96 151405 50562 180018 > > "master-unlogged" is the same setup as "master" except for using > unlogged tables (using --unlogged-tables pgbench option). The TPS > increased by about 20% compared to "master" case (i.g., logged table > case). The reason why I experimented unlogged table case as well is > that we can think these results as an ideal performance if we were > able to write WAL records in 0 sec. IOW, even if the PMEM patch would > significantly improve WAL logging performance, I think it could not > exceed this performance. But hope is that if we currently have a > performance bottle-neck in WAL logging (.e.g, locking and writing > WAL), removing or minimizing WAL logging would bring a chance to > further improve performance by eliminating the new-coming bottle-neck. Could you tell us the specifics of the storage for WAL, e.g., SSD/HDD, the interface is NVMe/SAS/SATA, read-write throughputand latency (on the product catalog), and the product model? Was the WAL stored on a storage device separate from the other files? I want to know if the comparison is as fair as possible. I guess that in the NTT (PMEM) case, the WAL traffic is not affected by the I/Os of the other files. What would the comparison look like between master and unlogged-master if you place WAL on a DAX-aware filesystem like xfsor ext4 on PMEM, which Oracle recommends as REDO log storage? That is, if we place the WAL on the fastest storage configurationpossible, what would be the difference between the logged and unlogged? I'm asking these to know if we consider it worthwhile to make further efforts in special code for WAL on PMEM. > Besides, I've checked the main wait events on each experiment using > pg_wait_sampling. Here are the top 5 wait events on "master" case > excluding wait events on the main function of auxiliary processes: > > event_type | event | sum > ------------+----------------------+------- > Client | ClientRead | 46902 > LWLock | WALWrite | 33405 > IPC | ProcArrayGroupUpdate | 8855 > LWLock | WALInsert | 3215 > LWLock | ProcArray | 3022 > > We can see the wait event on WALWrite lwlock acquisition happened many > times and it was the primary wait event. > > The result of "ntt" case is: > > event_type | event | sum > ------------+----------------------+-------- > LWLock | WALInsert | 126487 > Client | ClientRead | 12173 > LWLock | BufferContent | 4480 > Lock | transactionid | 2017 > IPC | ProcArrayGroupUpdate | 924 > > The wait event on WALWrite lwlock disappeared. Instead, there were > many wait events on WALInsert lwlock. I've not investigated this > result yet. This could be because the v4 patch acquires WALInsert lock > more than necessary or writing WAL records to PMEM took more time than > writing to DRAM as Tomas mentioned before. Increasing NUM_XLOGINSERT_LOCKS might improve the result, but I don't have much hope because PMEM appears to have limitedconcurrency... Regards Takayuki Tsunakawa
pgsql-hackers by date: