RE: [PoC] Non-volatile WAL buffer - Mailing list pgsql-hackers

From tsunakawa.takay@fujitsu.com
Subject RE: [PoC] Non-volatile WAL buffer
Date
Msg-id TYAPR01MB29905D0578CEC4CA0CCC1EC8FE889@TYAPR01MB2990.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: [PoC] Non-volatile WAL buffer  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: [PoC] Non-volatile WAL buffer
List pgsql-hackers
From: Masahiko Sawada <sawada.mshk@gmail.com>
> I've done some performance benchmarks with the master and NTT v4
> patch. Let me share the results.
> 
...
>         master  NTT     master-unlogged
> 32      113209  67107   154298
> 64      144880  54289   178883
> 96      151405  50562   180018
> 
> "master-unlogged" is the same setup as "master" except for using
> unlogged tables (using --unlogged-tables pgbench option). The TPS
> increased by about 20% compared to "master" case (i.g., logged table
> case). The reason why I experimented unlogged table case as well is
> that we can think these results as an ideal performance if we were
> able to write WAL records in 0 sec. IOW, even if the PMEM patch would
> significantly improve WAL logging performance, I think it could not
> exceed this performance. But hope is that if we currently have a
> performance bottle-neck in WAL logging (.e.g, locking and writing
> WAL), removing or minimizing WAL logging would bring a chance to
> further improve performance by eliminating the new-coming bottle-neck.

Could you tell us the specifics of the storage for WAL, e.g., SSD/HDD, the interface is NVMe/SAS/SATA, read-write
throughputand latency (on the product catalog), and the product model?
 

Was the WAL stored on a storage device separate from the other files?  I want to know if the comparison is as fair as
possible. I guess that in the NTT (PMEM) case, the WAL traffic is not affected by the I/Os of the other files.
 

What would the comparison look like between master and unlogged-master if you place WAL on a DAX-aware filesystem like
xfsor ext4 on PMEM, which Oracle recommends as REDO log storage?  That is, if we place the WAL on the fastest storage
configurationpossible, what would be the difference between the logged and unlogged?
 

I'm asking these to know if we consider it worthwhile to make further efforts in special code for WAL on PMEM.


> Besides, I've checked the main wait events on each experiment using
> pg_wait_sampling. Here are the top 5 wait events on "master" case
> excluding wait events on the main function of auxiliary processes:
> 
>  event_type |        event         |  sum
> ------------+----------------------+-------
>  Client     | ClientRead           | 46902
>  LWLock     | WALWrite             | 33405
>  IPC        | ProcArrayGroupUpdate |  8855
>  LWLock     | WALInsert            |  3215
>  LWLock     | ProcArray            |  3022
> 
> We can see the wait event on WALWrite lwlock acquisition happened many
> times and it was the primary wait event.
> 
> The result of "ntt" case is:
> 
>  event_type |        event         |  sum
> ------------+----------------------+--------
>  LWLock     | WALInsert            | 126487
>  Client     | ClientRead           |  12173
>  LWLock     | BufferContent        |   4480
>  Lock       | transactionid        |   2017
>  IPC        | ProcArrayGroupUpdate |    924
> 
> The wait event on WALWrite lwlock disappeared. Instead, there were
> many wait events on WALInsert lwlock. I've not investigated this
> result yet. This could be because the v4 patch acquires WALInsert lock
> more than necessary or writing WAL records to PMEM took more time than
> writing to DRAM as Tomas mentioned before.

Increasing NUM_XLOGINSERT_LOCKS might improve the result, but I don't have much hope because PMEM appears to have
limitedconcurrency...
 


Regards
Takayuki Tsunakawa



pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: GCC warning in back branches
Next
From: Thomas Munro
Date:
Subject: Re: doing something about the broken dynloader.h symlink