Re: [WIP] Pipelined Recovery - Mailing list pgsql-hackers
| From | Xuneng Zhou |
|---|---|
| Subject | Re: [WIP] Pipelined Recovery |
| Date | |
| Msg-id | CABPTF7XABSSwUPbnS+UE9OyeH-z3ihmdp9tOt3UJ4XcWZkE1DA@mail.gmail.com Whole thread |
| In response to | Re: [WIP] Pipelined Recovery (Imran Zaheer <imran.zhir@gmail.com>) |
| List | pgsql-hackers |
Hi Henson, Imran, On Wed, Apr 8, 2026 at 7:14 PM Imran Zaheer <imran.zhir@gmail.com> wrote: > > Hi > > I am uploading the new version with the following fixes > > * Rebased version. > * Skip serialization of decoded records. As pointed out by Henson, > there was no need to serialize the records again > for the sh_mq. We can simply pass the continuous bytes with minor > pointer fixing to the sh_mq > > This time I am uploading the benchmarking results to drive and > attaching the link here. Otherwise my mail will get holded for > moderation (My guess is overall attachment size is greater than 1MB thats why). > > I am still not sure whether my testing approach is good enough. > Because sometimes I am not able to get the same performance > improvement > with the pgbench builtin scripts as I got with the custom sql scripts. > Maybe pgbench is not creating enough WAL to test on > or maybe I am missing something. > > Benchmarks: https://drive.google.com/file/d/1Y4SYVnrFEQRE5T2r87rrTr7SWC9m19Si/view?usp=sharing > > Thanks & Regards > Imran Zaheer > > Imran Zaheer > > On Wed, Apr 8, 2026 at 1:46 PM Imran Zaheer <imran.zhir@gmail.com> wrote: > > > > > > > > Hi Xuneng, Imran, and everyone, > > > > > > > Hi Henson and Xuneng. > > > > Thanks for explaining the approaches to Xuneng. > > > > > > > > The two approaches target different bottlenecks. The current patch > > > parallelizes WAL decoding, which keeps the redo path single-threaded > > > and avoids the Hot Standby visibility problem entirely. > > > > > > > You are right both approaches > > target different bottlenecks. Pipeline patch aims to improve overall > > cpu throughput > > and to save CPU time by offloading the steps we can safely do in parallel with > > out causing synchronization problems. > > > > > One thing I am curious about in the current patch: WAL records are > > > already in a serialized format on disk. The producer decodes them and > > > then re-serializes into a different custom format for shm_mq. What is > > > the advantage of this second serialization format over simply passing > > > the raw WAL bytes after CRC validation and letting the consumer decode > > > directly? Offloading CRC to a separate core could still improve > > > throughput at the cost of higher total CPU usage, without needing the > > > custom format. > > > > > > > Thanks. You are right there was no need to serialize the decoded record again. > > I was not aware that we already have continuous bytes in memory. In my > > next patch > > I will remove this extra serialization step. > > > > > Koichi's approach parallelizes redo (buffer I/O) itself, which attacks > > > a larger cost — Jakub's flamegraphs show BufferAlloc -> > > > GetVictimBuffer -> FlushBuffer dominating in both p0 and p1 — but at > > > the expense of much harder concurrency problems. > > > > > > Whether the decode pipelining ceiling is high enough, or whether the > > > redo parallelization complexity is tractable, seems like the central > > > design question for this area. > > > > I still have to investigate the problem related to `GetVictimBuffer` that > > Jakub mentioned. But I was trying that how can we safely offload the work done > > by `XLogReadBufferForRedoExtended` to a separate > > pipeline worker, or maybe we can try prefetching the buffer header so > > the main redo > > loop doesn't have to spend time getting the buffer Thanks for your clarification! I'll try to review this patch later. -- Best, Xuneng
pgsql-hackers by date: