RE: [PoC] Non-volatile WAL buffer - Mailing list pgsql-hackers

From Takashi Menjo
Subject RE: [PoC] Non-volatile WAL buffer
Date
Msg-id 002701d5fd03$6e1d97a0$4a58c6e0$@hco.ntt.co.jp_1
Whole thread Raw
In response to [PoC] Non-volatile WAL buffer  (Takashi Menjo <takashi.menjou.vg@hco.ntt.co.jp>)
List pgsql-hackers
Dear hackers,

I rebased my non-volatile WAL buffer's patchset onto master.  A new v2 patchset is attached to this mail.

I also measured performance before and after patchset, varying -c/--client and -j/--jobs options of pgbench, for each
scalingfactor s = 50 or 1000.  The results are presented in the following tables and the attached charts.  Conditions,
steps,and other details will be shown later. 


Results (s=50)
==============
         Throughput [10^3 TPS]  Average latency [ms]
( c, j)  before  after          before  after
-------  ---------------------  ---------------------
( 8, 8)  35.7    37.1 (+3.9%)   0.224   0.216 (-3.6%)
(18,18)  70.9    74.7 (+5.3%)   0.254   0.241 (-5.1%)
(36,18)  76.0    80.8 (+6.3%)   0.473   0.446 (-5.7%)
(54,18)  75.5    81.8 (+8.3%)   0.715   0.660 (-7.7%)


Results (s=1000)
================
         Throughput [10^3 TPS]  Average latency [ms]
( c, j)  before  after          before  after
-------  ---------------------  ---------------------
( 8, 8)  37.4    40.1 (+7.3%)   0.214   0.199 (-7.0%)
(18,18)  79.3    86.7 (+9.3%)   0.227   0.208 (-8.4%)
(36,18)  87.2    95.5 (+9.5%)   0.413   0.377 (-8.7%)
(54,18)  86.8    94.8 (+9.3%)   0.622   0.569 (-8.5%)


Both throughput and average latency are improved for each scaling factor.  Throughput seemed to almost reach the upper
limitwhen (c,j)=(36,18). 

The percentage in s=1000 case looks larger than in s=50 case.  I think larger scaling factor leads to less contentions
onthe same tables and/or indexes, that is, less lock and unlock operations.  In such a situation, write-ahead logging
appearsto be more significant for performance. 


Conditions
==========
- Use one physical server having 2 NUMA nodes (node 0 and 1)
  - Pin postgres (server processes) to node 0 and pgbench to node 1
  - 18 cores and 192GiB DRAM per node
- Use an NVMe SSD for PGDATA and an interleaved 6-in-1 NVDIMM-N set for pg_wal
  - Both are installed on the server-side node, that is, node 0
  - Both are formatted with ext4
  - NVDIMM-N is mounted with "-o dax" option to enable Direct Access (DAX)
- Use the attached postgresql.conf
  - Two new items nvwal_path and nvwal_size are used only after patch


Steps
=====
For each (c,j) pair, I did the following steps three times then I found the median of the three as a final result shown
inthe tables above. 

(1) Run initdb with proper -D and -X options; and also give --nvwal-path and --nvwal-size options after patch
(2) Start postgres and create a database for pgbench tables
(3) Run "pgbench -i -s ___" to create tables (s = 50 or 1000)
(4) Stop postgres, remount filesystems, and start postgres again
(5) Execute pg_prewarm extension for all the four pgbench tables
(6) Run pgbench during 30 minutes


pgbench command line
====================
$ pgbench -h /tmp -p 5432 -U username -r -M prepared -T 1800 -c ___ -j ___ dbname

I gave no -b option to use the built-in "TPC-B (sort-of)" query.


Software
========
- Distro: Ubuntu 18.04
- Kernel: Linux 5.4 (vanilla kernel)
- C Compiler: gcc 7.4.0
- PMDK: 1.7
- PostgreSQL: d677550 (master on Mar 3, 2020)


Hardware
========
- System: HPE ProLiant DL380 Gen10
- CPU: Intel Xeon Gold 6154 (Skylake) x 2sockets
- DRAM: DDR4 2666MHz {32GiB/ch x 6ch}/socket x 2sockets
- NVDIMM-N: DDR4 2666MHz {16GiB/ch x 6ch}/socket x 2sockets
- NVMe SSD: Intel Optane DC P4800X Series SSDPED1K750GA


Best regards,
Takashi

--
Takashi Menjo <takashi.menjou.vg@hco.ntt.co.jp>
NTT Software Innovation Center

> -----Original Message-----
> From: Takashi Menjo <takashi.menjou.vg@hco.ntt.co.jp>
> Sent: Thursday, February 20, 2020 6:30 PM
> To: 'Amit Langote' <amitlangote09@gmail.com>
> Cc: 'Robert Haas' <robertmhaas@gmail.com>; 'Heikki Linnakangas' <hlinnaka@iki.fi>; 'PostgreSQL-development'
> <pgsql-hackers@postgresql.org>
> Subject: RE: [PoC] Non-volatile WAL buffer
>
> Dear Amit,
>
> Thank you for your advice.  Exactly, it's so to speak "do as the hackers do when in pgsql"...
>
> I'm rebasing my branch onto master.  I'll submit an updated patchset and performance report later.
>
> Best regards,
> Takashi
>
> --
> Takashi Menjo <takashi.menjou.vg@hco.ntt.co.jp> NTT Software Innovation Center
>
> > -----Original Message-----
> > From: Amit Langote <amitlangote09@gmail.com>
> > Sent: Monday, February 17, 2020 5:21 PM
> > To: Takashi Menjo <takashi.menjou.vg@hco.ntt.co.jp>
> > Cc: Robert Haas <robertmhaas@gmail.com>; Heikki Linnakangas
> > <hlinnaka@iki.fi>; PostgreSQL-development
> > <pgsql-hackers@postgresql.org>
> > Subject: Re: [PoC] Non-volatile WAL buffer
> >
> > Hello,
> >
> > On Mon, Feb 17, 2020 at 4:16 PM Takashi Menjo <takashi.menjou.vg@hco.ntt.co.jp> wrote:
> > > Hello Amit,
> > >
> > > > I apologize for not having any opinion on the patches themselves,
> > > > but let me point out that it's better to base these patches on
> > > > HEAD (master branch) than REL_12_0, because all new code is
> > > > committed to the master branch, whereas stable branches such as
> > > > REL_12_0 only receive bug fixes.  Do you have any
> > specific reason to be working on REL_12_0?
> > >
> > > Yes, because I think it's human-friendly to reproduce and discuss
> > > performance measurement.  Of course I know
> > all new accepted patches are merged into master's HEAD, not stable
> > branches and not even release tags, so I'm aware of rebasing my
> > patchset onto master sooner or later.  However, if someone, including
> > me, says that s/he applies my patchset to "master" and measures its
> > performance, we have to pay attention to which commit the "master"
> > really points to.  Although we have sha1 hashes to specify which
> > commit, we should check whether the specific commit on master has patches affecting performance or not
> because master's HEAD gets new patches day by day.  On the other hand, a release tag clearly points the commit
> all we probably know.  Also we can check more easily the features and improvements by using release notes and
> user manuals.
> >
> > Thanks for clarifying. I see where you're coming from.
> >
> > While I do sometimes see people reporting numbers with the latest
> > stable release' branch, that's normally just one of the baselines.
> > The more important baseline for ongoing development is the master
> > branch's HEAD, which is also what people volunteering to test your
> > patches would use.  Anyone who reports would have to give at least two
> > numbers -- performance with a branch's HEAD without patch applied and
> > that with patch applied -- which can be enough in most cases to see
> > the difference the patch makes.  Sure, the numbers might change on
> > each report, but that's fine I'd think.  If you continue to develop against the stable branch, you might miss to
> notice impact from any relevant developments in the master branch, even developments which possibly require
> rethinking the architecture of your own changes, although maybe that rarely occurs.
> >
> > Thanks,
> > Amit

Attachment

pgsql-hackers by date:

Previous
From: Atsushi Torikoshi
Date:
Subject: Re: RecoveryWalAll and RecoveryWalStream wait events
Next
From: Fujii Masao
Date:
Subject: Re: add types to index storage params on doc