Re: BUG #17928: Standby fails to decode WAL on termination of primary - Mailing list pgsql-bugs
From | Noah Misch |
---|---|
Subject | Re: BUG #17928: Standby fails to decode WAL on termination of primary |
Date | |
Msg-id | 20230812211327.GB2326466@rfd.leadboat.com Whole thread Raw |
In response to | Re: BUG #17928: Standby fails to decode WAL on termination of primary (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: BUG #17928: Standby fails to decode WAL on termination of primary
|
List | pgsql-bugs |
On Fri, Aug 11, 2023 at 08:32:00PM -0400, Tom Lane wrote: > Michael Paquier <michael@paquier.xyz> writes: > > On Sat, Aug 12, 2023 at 11:56:45AM +1200, Thomas Munro wrote: > >> On Sat, Aug 12, 2023 at 2:00 AM Noah Misch <noah@leadboat.com> wrote: > >>> I recall earlier messages theorizing that it was just harder to hit in v14, so > >>> I'm disinclined to stop at v15. I think the main choice is whether to stop at > >>> v11 (normal choice) or v12 (worry about breaking the last v11 point release). > >>> I don't have a strong opinion between those. > > > Okay. I wouldn't be inclined to patch v11 for that, FWIW, as this > > code path is touched by recovery and more. At least it does not seem > > worth taking any risk compared to the potential gain. > > That was my gut reaction too -- risk/reward seems not great for the > last v11 release. (I've been burned a couple of times now by putting > can't-fix-it-anymore bugs into the final release of a branch, so maybe > I'm just being overly paranoid. But it's something to worry about.) Okay, v12+ would be fine with me. Alas, a closer look at v1 found two defects, one with severity far greater than the bug v1 aimed to fix: ===== 1. For oversized records, it changed startup FATAL to silent data loss This affects only supported branches (v15-). In v16+, commit 8fcb32d introduced and checked XLogRecordMaxSize. But in v15 with patch v1, redo of pg_logical_emit_message(false, '_', repeat('x', 1024 * 1024 * 1024 - 1000)) just ends at the oversized record: 2523581 2023-08-12 17:17:39.512 GMT LOG: redo starts at 0/41F155B8 2523581 2023-08-12 17:17:39.512 GMT LOG: record length 1073740879 at 0/41F155F0 too long 2523581 2023-08-12 17:17:39.512 GMT LOG: redo done at 0/41F155B8 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed:0.00 s Any user could call pg_logical_emit_message() to silently terminate the WAL stream, which is far worse than the original bug. So far, I'm seeing one way to clearly fix $SUBJECT without that harm. When a record header spans a page boundary, read the next page to reassemble the header. If !ValidXLogRecordHeader() (invalid xl_rmid or xl_prev), treat as end of WAL. Otherwise, read the whole record in chunks, calculating CRC. If CRC is invalid, treat as end of WAL. Otherwise, ereport(FATAL) for the oversized record. A side benefit would be avoiding useless large allocations (1GB backend, 4GB frontend) as discussed upthread. May as well do the xl_rmid and xl_prev checks in all branches, to avoid needless XLogRecordMaxSize-1 allocations. Thoughts? For invalid-length records in v16+, since every such record is end-of-wal or corruption, those versions could skip the CRC. As an alternative, zeroing recycled pages could remove "most of" the spurious errors without adding silent data loss. I considered another alternative of back-patching XLogRecordMaxSize and proceeding with a patch like v1, but that wouldn't help with records in existing WAL archives. Of course, we could also choose to leave $SUBJECT unfixed in v15-. ===== 2. DecodeXLogRecordRequiredSpace() is the wrong ceiling v1 treats DecodeXLogRecordRequiredSpace(L) (=L+2445 here) as the max needed, but allocate_recordbuf(state, L) allocates up to L+8192. If the uninitialized length happened to contain an integer between about 1<<30-8192 and 1<<30-2445, one still got e.g. "FATAL: invalid memory alloc request size 1073741824". Since v16+ has XLogRecordMaxSize, testing XLogRecordMaxSize avoids the problem and should suffice: === --- a/src/backend/access/transam/xlogreader.c +++ b/src/backend/access/transam/xlogreader.c @@ -640,7 +640,6 @@ restart: (uint32) SizeOfXLogRecord, total_len); goto err; } -#ifndef FRONTEND /* * We may be looking at a random uint32 read from a recycled segment. For @@ -651,13 +650,12 @@ restart: * the allocation may succeed but record checks are going to fail so this * would be short-lived. */ - if (!AllocSizeIsValid(DecodeXLogRecordRequiredSpace(total_len))) + if (total_len > XLogRecordMaxSize) { report_invalid_record(state, "record length %u at %X/%X too long", total_len, LSN_FORMAT_ARGS(RecPtr)); goto err; } -#endif /* * If the whole record header is on this page, validate it immediately. === allocate_recordbuf()'s addend is not critical; it's just to "avoid useless small increases". v15- should avoid the problem like this: === diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c index 969bcc3..1bac303 100644 --- a/src/backend/access/transam/xlogreader.c +++ b/src/backend/access/transam/xlogreader.c @@ -201,4 +201,8 @@ allocate_recordbuf(XLogReaderState *state, uint32 reclength) newSize += XLOG_BLCKSZ - (newSize % XLOG_BLCKSZ); newSize = Max(newSize, 5 * Max(BLCKSZ, XLOG_BLCKSZ)); +#ifndef FRONTEND + if (!AllocSizeIsValid(newSize)) + newSize = reclength; +#endif if (state->readRecordBuf) === In v15, one can reach this case with pg_logical_emit_message(false, '_', repeat('x', 1024 * 1024 * 1024 - 4000)). On v16+, one might reach this with unlucky trailing garbage, but XLogRecordMaxSize prevents reaching it through pg_logical_emit_message(). Thanks, nm
pgsql-bugs by date: