Re: Dead code with short varlenas in toast_save_datum() - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: Dead code with short varlenas in toast_save_datum()
Date
Msg-id aJ6qWmIa6_LhZUMy@paquier.xyz
Whole thread Raw
In response to Re: Dead code with short varlenas in toast_save_datum()  (Michael Paquier <michael@paquier.xyz>)
List pgsql-hackers
On Sun, Aug 10, 2025 at 07:35:13PM +0900, Michael Paquier wrote:
> Thanks for compiling a patch to close more the coverage gap.  I'll
> look at what you have here.

First, thanks for working on that.  That's going to help a lot to make
sure that messing with this area of the code will not break some of
the current assumptions.

So, we have two things here for the rewrite code paths in
toast_save_datum():
1) A SQL test for va_valueid == InvalidOid where we need a new value
that does not conflict with the new and old TOAST tables.
2) An isolation test for the much trickier case where we are going to
reuse a chunk_id.

First, in the SQL test.  The trick where you are using a PLAIN storage
to not allocate a chunk_id on initial storage with a value large
enough to force TOAST on rewrite, while the value is small enough to
fit on a single page, is a nice one.  We could have used a \gset as
well with a toasted value, but that won't change the fact that we
check for a new value allocated.  The location in strings.sql feels
incorrect because this is a rewrite issue, so I have moved the test to
vacuum.sql and applied a slightly-tweaked result.  A second thing I
have added is a test to make sure that the same chunk_id is reused
after the rewrite.  That's also worth tracking and cheap, covering the
non-InvalidOid case.

With the isolation test, the case is different, and it looks like the
test is incomplete: we want to make sure that the new chunk IDs are
the same before and after, but we cannot use \gset in this context.
What I would suggest is to create an intermediate table storing the
contents we want to compare after the CLUSTER, with a CTAS that stores
the primary key of cluster_toast_value_reuse.id and the chunk_id
associated to each row.  Then, after the CLUSTER, we join the pkey
values in the CTAS table and cluster_toast_value_reuse, compare their
chunk IDs and they should match.  The test triggers 29 times the
todo=0 code path, as far as I can see, but we should not need that
many tuples with generate_series(), no?  If the test is written so as
we compare the old and new chunk IDs with the pkey values, the number
of tuples does not matter much, but that would be a bit cheaper to
run.  Could you update the isolation test to do something among these
lines?
--
Michael

Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [PATCH] bms_prev_member() can read beyond the end of the array of allocated words
Next
From: Michael Paquier
Date:
Subject: Re: Compilation issues for HASH_STATISTICS and HASH_DEBUG options