Re: [WIP] Pipelined Recovery - Mailing list pgsql-hackers

From Xuneng Zhou
Subject Re: [WIP] Pipelined Recovery
Date
Msg-id CABPTF7XABSSwUPbnS+UE9OyeH-z3ihmdp9tOt3UJ4XcWZkE1DA@mail.gmail.com
Whole thread
In response to Re: [WIP] Pipelined Recovery  (Imran Zaheer <imran.zhir@gmail.com>)
List pgsql-hackers
Hi Henson, Imran,

On Wed, Apr 8, 2026 at 7:14 PM Imran Zaheer <imran.zhir@gmail.com> wrote:
>
> Hi
>
> I am uploading the new version with the following fixes
>
> * Rebased version.
> * Skip serialization of decoded records. As pointed out by Henson,
> there was no need to serialize the records again
>  for the sh_mq. We can simply pass the continuous bytes with minor
> pointer fixing to the sh_mq
>
> This time I am uploading the benchmarking results to drive and
> attaching the link here. Otherwise my mail will get holded for
> moderation (My guess is overall attachment size is greater than 1MB thats why).
>
> I am still not sure whether my testing approach is good enough.
> Because sometimes I am not able to get the same performance
> improvement
> with the pgbench builtin scripts as I got with the custom sql scripts.
> Maybe pgbench is not creating enough WAL to test on
> or maybe I am missing something.
>
> Benchmarks: https://drive.google.com/file/d/1Y4SYVnrFEQRE5T2r87rrTr7SWC9m19Si/view?usp=sharing
>
> Thanks & Regards
> Imran Zaheer
>
> Imran Zaheer
>
> On Wed, Apr 8, 2026 at 1:46 PM Imran Zaheer <imran.zhir@gmail.com> wrote:
> >
> > >
> > > Hi Xuneng, Imran, and everyone,
> > >
> >
> > Hi Henson and Xuneng.
> >
> > Thanks for explaining the approaches to Xuneng.
> >
> > >
> > > The two approaches target different bottlenecks. The current patch
> > > parallelizes WAL decoding, which keeps the redo path single-threaded
> > > and avoids the Hot Standby visibility problem entirely.
> > >
> >
> > You are right both approaches
> > target different bottlenecks. Pipeline patch aims to improve overall
> > cpu throughput
> > and to save CPU time by offloading the steps we can safely do in parallel with
> > out causing synchronization problems.
> >
> > > One thing I am curious about in the current patch: WAL records are
> > > already in a serialized format on disk. The producer decodes them and
> > > then re-serializes into a different custom format for shm_mq. What is
> > > the advantage of this second serialization format over simply passing
> > > the raw WAL bytes after CRC validation and letting the consumer decode
> > > directly? Offloading CRC to a separate core could still improve
> > > throughput at the cost of higher total CPU usage, without needing the
> > > custom format.
> > >
> >
> > Thanks. You are right there was no need to serialize the decoded record again.
> > I was not aware that we already have continuous bytes in memory. In my
> > next patch
> > I will remove this extra serialization step.
> >
> > > Koichi's approach parallelizes redo (buffer I/O) itself, which attacks
> > > a larger cost — Jakub's flamegraphs show BufferAlloc ->
> > > GetVictimBuffer -> FlushBuffer dominating in both p0 and p1 — but at
> > > the expense of much harder concurrency problems.
> > >
> > > Whether the decode pipelining ceiling is high enough, or whether the
> > > redo parallelization complexity is tractable, seems like the central
> > > design question for this area.
> >
> > I still have to investigate the problem related to `GetVictimBuffer` that
> > Jakub mentioned. But I was trying that how can we safely offload the work done
> >  by `XLogReadBufferForRedoExtended` to a separate
> > pipeline worker, or maybe we can try prefetching the buffer header so
> > the main redo
> > loop doesn't have to spend time getting the buffer

Thanks for your clarification! I'll try to review this patch later.

--
Best,
Xuneng



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: Adding REPACK [concurrently]
Next
From: Dragos Andriciuc
Date:
Subject: Re: DOCS - Add introductory paragraph to Getting Started chapter