Home > mailing lists

Re: [WIP] Pipelined Recovery - Mailing list pgsql-hackers

From	Imran Zaheer
Subject	Re: [WIP] Pipelined Recovery
Date	February 3 10:25:39
Msg-id	CA+UBfakvVoCK+8Jz2qGL=LqLD=ogAccbAgjgyNoNURX-jO982w@mail.gmail.com Whole thread Raw
In response to	[WIP] Pipelined Recovery (Imran Zaheer <imran.zhir@gmail.com>)
List	pgsql-hackers

Tree view

Hi

Just found this discussion where Bruce Momjian mentioned about
replication pipelining.

[1]: https://www.postgresql.org/message-id/aJyuxlqx0-OSuGqC%40momjian.us


Thanks
Imran Zaheer

On Fri, Jan 30, 2026 at 11:28 AM Imran Zaheer <imran.zhir@gmail.com> wrote:
>
> Hi,
>
> Based on a suggestion by my colleague Ants Aasma, I worked on this
> idea of adding parallelism to the WAL recovery process.
>
> The crux of this idea is to decode the WAL using parallel workers. Now
> the replay process can get the records from the shared memory queue
> directly. This way, we can decrease some CPU load on the recovery process.
>
> Implementing this idea yielded an improvement of around 20% in the
> recovery times, but results may differ based on workloads. I have
> attached some benchmarks for different workloads.
>
> Following are some recovery tests with the default configs. Here p1
> shows pipeline enabled. (db size) is the backup database size on
> which the recovery happens. You can see more detail related to the
> benchmarks in the attached file `recoveries-benchmark-v01`.
>
>                            elapsed (p0)       elapsed (p1)   % perf    db size
>
> inserts.sql            272s 10ms       197s 570ms    27.37%     480 MB
> updates.sql         177s 420ms      117s 80ms      34.01%     480 MB
> hot-updates.sql   36s 940ms       29s 240ms       20.84%     480 MB
> nonhot.sql           36s 570ms       28s 980ms       20.75%     480 MB
> simple-update     20s 160ms       11s 580ms       42.56%     4913 MB
> tpcb-like              20s 590ms       13s 640ms       33.75%     4913 MB
>
> Similar approach was also suggested by Matthias van de Meent earlier in a
> separate thread [1]. Right now I am using one bgw for decoding and filling
> up the shared message queue, and the redo apply loop simply receives the decoded record
> from the queue. After the redo is finished, the consumer (startup
> process) can request a shutdown from the producer (pipeline bgw)
> before exiting recovery.
>
> This idea can be coupled with another idea of pinning the buffers in
> parallel before the recovery process needs them. This will try to
> parallelize most of the work being done in
> `XLogReadBufferForRedoExtended`. The Redo can simply receive
> the already pinned buffers from a queue, but for implementing
> this, we still need some R&D on that, as IPC and pinning/unpinning of
> buffers across two processes can be tricky.
>
> If someone wants to reproduce the benchmark, they can do so using
> these scripts [2].
>
> Looking forward to your reviews, comments, etc.
>
> [1]: https://www.postgresql.org/message-id/CAEze2Wh6C_QfxLii%2B%2BeZue5%3DKvbVXKkHyZW8PLmtLgyjmFzwCQ%40mail.gmail.com
> [2]: https://github.com/imranzaheer612/pg-recovery-testing
>
> --
> Regards,
> Imran Zaheer
> CYBERTEC PostgreSQL International GmbH

pgsql-hackers by date:

From: Soumya S Murali
Date: 03 February, 10:24:48
Subject: Re: 001_password.pl fails with --without-readline

From: Roman Khapov
Date: 03 February, 10:52:57
Subject: Re: Additional message in pg_terminate_backend

Re: [WIP] Pipelined Recovery - Mailing list pgsql-hackers

Previous

Next