Re: BUG #17928: Standby fails to decode WAL on termination of primary - Mailing list pgsql-bugs
From | Noah Misch |
---|---|
Subject | Re: BUG #17928: Standby fails to decode WAL on termination of primary |
Date | |
Msg-id | 20230717004905.GE3826297@rfd.leadboat.com Whole thread Raw |
In response to | Re: BUG #17928: Standby fails to decode WAL on termination of primary (Kyotaro Horiguchi <horikyota.ntt@gmail.com>) |
Responses |
Re: BUG #17928: Standby fails to decode WAL on termination of primary
Re: BUG #17928: Standby fails to decode WAL on termination of primary |
List | pgsql-bugs |
On Tue, Jul 11, 2023 at 03:54:14PM +0900, Kyotaro Horiguchi wrote: > At Mon, 10 Jul 2023 13:00:12 -0700, Noah Misch <noah@leadboat.com> wrote in > > On Mon, May 15, 2023 at 03:38:17PM +1200, Thomas Munro wrote: > > > On Fri, May 12, 2023 at 6:00 AM Alexander Lakhin <exclusion@gmail.com> wrote: > > > > 2023-05-11 20:19:22.248 MSK [2037134] FATAL: invalid memory alloc request size 2021163525 > > > > 2023-05-11 20:19:22.248 MSK [2037114] LOG: startup process (PID 2037134) exited with exit code 1 On my system, testing unpatched REL_15_2, the test reached its first failure at minute 53 and its second failure at minute 189. Perhaps a faster, deterministic test would be feasible: - Use pg_logical_emit_message to fill a few segments with 0xFF. - CHECKPOINT the primary, so the standby recycles segments. - One more pg_logical_emit_message, computing the length from pg_current_wal_insert_lsn() such that new message crosses a segment boundary and ends 4 bytes before the end of a page. - Stop the primary. - If the bug is present, the standby will exit. > > > Thanks Alexander. Looking into this. I think it is probably > > > something like: recycled standby pages are not zeroed (something we > > > already needed to do something about[1]), and when we read a recycled > > > garbage size (like your "xxxx") at the end of a page at an offset > > > where we don't have a full record header on one page, we skip the > > > ValidXLogRecordHeader() call (and always did), but the check in > > > allocate_recordbuf() which previously handled that "gracefully" (well, > > > it would try to allocate up to 1GB bogusly, but it wouldn't try to > > > allocate more than that and ereport) is a bit too late. I probably > > > need to add an earlier not-too-big validation. Thinking. > > > > I agree about an earlier not-too-big validation. Like the attached? I > > haven't tested it with Alexander's recipe or pondered it thoroughly. > > I like the patch for its code clean up, but I'm afraid that it removes > the existing record length check when reading continued > pages. The removed check would be repetitive in all cases known to me, but I wouldn't mind keeping the check instead. > However, I'm unsure about merely adding a check for too-long > records, due to the potential for requesting an excessively large > amount of memory, even if it will be released shortly. > > > > [1] https://www.postgresql.org/message-id/20210505010835.umylslxgq4a6rbwg@alap3.anarazel.de > > > > Regarding [1], is it still worth zeroing recycled pages on standbys and/or > > reading the whole header before allocating xl_tot_len? (Are there benefits > > other than avoiding a 1G backend allocation or 4G frontend allocation, or is > > that benefit worth the cycles?) > > I believe reading the whole header is the most sensible approach as it > can prevent unnecessary memory requests. Another potential solution > (or hack) for this specific case is to let XLogWalRcvFlush write a > finalizing ((uint32)0) when dying is true. This stabilizes the > behavior to "invalid record length.. got 0" when running the TAP test. > --- a/src/backend/replication/walreceiver.c > +++ b/src/backend/replication/walreceiver.c > @@ -989,6 +989,15 @@ XLogWalRcvFlush(bool dying, TimeLineID tli) > { > Assert(tli != 0); > > + if (dying) > + { > + int buflen = sizeof(uint32); > + char buf[sizeof(uint32)]; > + > + memset(buf, 0, buflen); > + XLogWalRcvWrite(buf, buflen, LogstreamResult.Write, tli); > + } > + > if (LogstreamResult.Flush < LogstreamResult.Write) > { > WalRcvData *walrcv = WalRcv; That's inexpensive in cycles, and it should make the failures in question quite rare. If we do that instead of pre-zeroing, I think we'd still want to be ready for invalid lengths, for the case of unclean standby shutdown.
pgsql-bugs by date: