Re: Panic during xlog building with big values - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: Panic during xlog building with big values
Date
Msg-id aPiIxbkkuBlVDIKc@paquier.xyz
Whole thread Raw
In response to Re: Panic during xlog building with big values  ("Maksim.Melnikov" <m.melnikov@postgrespro.ru>)
List pgsql-hackers
On Tue, Oct 14, 2025 at 10:08:12AM +0300, Maksim.Melnikov wrote:
> I've checked RecordTransactionCommit too, but I don't think it can fire
> similar error. Problem, that was described above, occurred because we used
> external column storage without compression and with REPLICA IDENTITY FULL.
> To be honest, it's degenerate case, that can occur only in case of tuple
> update/delete, when we need full row to identify updated/deleted value, more
> info can be found in doc [1].

"Degenerate" sounds like a pretty good term to define your test case.
So the issue is that the uncompressed TOAST blobs get so large that
the mainrdata_len computed with a single call of XLogRegisterData()
triggers the size restriction.  The protections added in XLogInsert()
are doing their job here: the record generated by the UPDATE cannot be
replayed, failing on an allocation failure in the standby if one lifts
the size restriction in XLogInsert().  What's pretty "good" about your
case is that the first INSERT is large, but small enough so as a
palloc() would fail on the initial insertion, making it succeed.  Only
the second UPDATE would become large enough, still you are able to
bypass the allocation limits with a combination of the old and new
tuple data that need to be replicated because of the full replica
identity.  Fun case, I'd say.

> I've fixed comments with yours remarks, thanks. Patch is attached.

I see what you are doing in your patch.  ExtractReplicaIdentity() has
only two callers: heap_update() or heap_delete().  Both document that
this stuff happens before entering a critical section to avoid a PANIC
on allocation, but this does not count for the overhead required by a
WAL record because we don't know yet how large the record will be
(well most of it is going to be the old tuple key anyway), as we may
have pages, some of them compressed or holes.  Then your patch adds an
extra check depending on the size of the "old" key generated.

+static void
+log_heap_precheck(Relation reln, HeapTuple tp)
+{
+#define XLogRecordMaxOverhead ((uint32) (1024 * 1024))
+
+    if (tp && RelationIsLogicallyLogged(reln))
+    {
+        uint32        data_len = tp->t_len - SizeofHeapTupleHeader;
+
+        XLogPreCheckSize(data_len + XLogRecordMaxOverhead);
+    }
+}

This adds a size prediction of XLogRecordMaxOverhead on top of the
existing XLogRecordMaxSize, which is itself an estimation with a 4MB
allocation overhead allowed, so you are adding a second estimation
layer on top of the existing one based on how much the XLogReader
needs when processing a record.  This is not optimal, and we cannot
have a precise number until we have computed all the elements that
build a WAL record.

Some numbers I've grabbed on the way, while looking at your case, for
reference:
- size of allocation at replay: 1073750016
- number of repeat values in the UPDATE: 1073741717
- size registered in XLogRegisterData(): 1073741746

A different way to think about the problem would be to rework the way
we flatten the tuple when a old tuple is extracted in full.  For
example, if some attributes are external but not compressed, we could
also take the route to force some compression in the key extracted to
make it shorter and able to fit in a record all the time.  External
but uncompressed data is not a very common case, so this may not
justify the extra implementation cost and complications in the tuple
flattening routines.

Perhaps the best answer is just to do nothing here.
--
Michael

Attachment

pgsql-hackers by date:

Previous
From: Daniel Gustafsson
Date:
Subject: Re: Meson install warnings when running postgres build from a sandbox
Next
From: Xuneng Zhou
Date:
Subject: Re: Fix lag columns in pg_stat_replication not advancing when replay LSN stalls