On Sat, Feb 14, 2026 at 9:15 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> On Sat, Feb 14, 2026 at 6:38 PM Noah Misch <noah@leadboat.com> wrote:
> > [mblen-valgrind-after-report-v1.patch]
>
> LGTM. The new valgrind check should clearly be after the new non-local exit.
>
> Studying the other patch...
/*
- * Total slice size in bytes can't be any
longer than the start
- * position plus substring length times the
encoding max length.
- * If that overflows, we can just use -1.
+ * Total slice size in bytes can't be any
longer than the
+ * inclusive end position times the encoding
max length. If that
+ * overflows, we can just use -1.
*/
- if (pg_mul_s32_overflow(E, eml, &slice_size))
+ if (pg_mul_s32_overflow(E - 1, eml, &slice_size))
slice_size = -1;
Isn't it still conceptually "exclusive", but adjusted to be zero-indexed?
/* Now we can get the actual length of the slice in MB
characters */
- slice_strlen = pg_mbstrlen_with_len(VARDATA_ANY(slice),
-
slice_len);
+ slice_strlen =
+ (slice_size == -1 ?
+ pg_mbstrlen_with_len(VARDATA_ANY(slice), slice_len) :
+ pg_mbcharcliplen_chars(VARDATA_ANY(slice),
slice_len, E - 1));
Comment presumably needs adjustment to say that we only count as far
as we need to, and why.
There is something a bit strange about all this, though.
pg_mbstrlen_with_len(..., -1) returns 0, so if you ask for characters
that really exist past 2^29 (~500 million), you must get an empty
string, right? That's hard to reach, pre-existing and out of scope
for the immediate problem report, except ... now we're contorting the
code even further to keep it.
The outline I had come up with before seeing your patch was: let's
just delete it. The position search can check bounds incrementally,
following our general approach. This avoids the reported problem by
ditching the pre-flight scan through the slice (up to 4x more
pg_mblen_XXX calls and memory access than we strictly need), and also
the special cases for empty strings since they already fall out of the
general behaviour (am I missing something?), not leaving much code
behind. As far as I can see so far, the only user-visible side-effect
requires corruption: substring() moves from the
internal-NUL-is-terminator category to internal-NUL-is-character
category, but that's an implementation detail.
When I saw your patch yesterday, I initially abandoned the thought,
thinking that your idea looked more conservative, but after sleeping
on it and reflecting again on these oddities, I have merged my draft
implementation with your tests, ancient detoasting fence post
observation and commit message, just to see if you think this approach
might be worth considering further.