[WIP] Pipelined Recovery - Mailing list pgsql-hackers

From Imran Zaheer
Subject [WIP] Pipelined Recovery
Date
Msg-id CA+UBfa=vDV8wbmAV0pgrx-FuJh+x8YOW23vJ90Jzr=14rV+9jA@mail.gmail.com
Whole thread Raw
List pgsql-hackers
Hi,

Based on a suggestion by my colleague Ants Aasma, I worked on this
idea of adding parallelism to the WAL recovery process.

The crux of this idea is to decode the WAL using parallel workers. Now
the replay process can get the records from the shared memory queue
directly. This way, we can decrease some CPU load on the recovery process.

Implementing this idea yielded an improvement of around 20% in the
recovery times, but results may differ based on workloads. I have
attached some benchmarks for different workloads.

Following are some recovery tests with the default configs. Here p1
shows pipeline enabled. (db size) is the backup database size on
which the recovery happens. You can see more detail related to the
benchmarks in the attached file `recoveries-benchmark-v01`.

                           elapsed (p0)       elapsed (p1)   % perf    db size

inserts.sql            272s 10ms       197s 570ms    27.37%     480 MB
updates.sql         177s 420ms      117s 80ms      34.01%     480 MB
hot-updates.sql   36s 940ms       29s 240ms       20.84%     480 MB
nonhot.sql           36s 570ms       28s 980ms       20.75%     480 MB
simple-update     20s 160ms       11s 580ms       42.56%     4913 MB
tpcb-like              20s 590ms       13s 640ms       33.75%     4913 MB

Similar approach was also suggested by Matthias van de Meent earlier in a
separate thread [1]. Right now I am using one bgw for decoding and filling
up the shared message queue, and the redo apply loop simply receives the decoded record
from the queue. After the redo is finished, the consumer (startup
process) can request a shutdown from the producer (pipeline bgw)
before exiting recovery.

This idea can be coupled with another idea of pinning the buffers in
parallel before the recovery process needs them. This will try to
parallelize most of the work being done in
`XLogReadBufferForRedoExtended`. The Redo can simply receive
the already pinned buffers from a queue, but for implementing
this, we still need some R&D on that, as IPC and pinning/unpinning of
buffers across two processes can be tricky.

If someone wants to reproduce the benchmark, they can do so using
these scripts [2].

Looking forward to your reviews, comments, etc.

[1]: https://www.postgresql.org/message-id/CAEze2Wh6C_QfxLii%2B%2BeZue5%3DKvbVXKkHyZW8PLmtLgyjmFzwCQ%40mail.gmail.com
[2]: https://github.com/imranzaheer612/pg-recovery-testing

--
Regards,
Imran Zaheer
CYBERTEC PostgreSQL International GmbH
Attachment

pgsql-hackers by date:

Previous
From: Corey Huinker
Date:
Subject: Re: Is there value in having optimizer stats for joins/foreignkeys?
Next
From: Dilip Kumar
Date:
Subject: Re: Proposal: Conflict log history table for Logical Replication