Re: BUG #19406: substring(text) fails on valid UTF-8 toasted value in PostgreSQL 15.16 - Mailing list pgsql-bugs

From Noah Misch
Subject Re: BUG #19406: substring(text) fails on valid UTF-8 toasted value in PostgreSQL 15.16
Date
Msg-id 20260213224804.2c@rfd.leadboat.com
Whole thread Raw
In response to Re: BUG #19406: substring(text) fails on valid UTF-8 toasted value in PostgreSQL 15.16  (Noah Misch <noah@leadboat.com>)
Responses Re: BUG #19406: substring(text) fails on valid UTF-8 toasted value in PostgreSQL 15.16
List pgsql-bugs
On Fri, Feb 13, 2026 at 09:27:02AM -0800, Noah Misch wrote:
> On Fri, Feb 13, 2026 at 07:46:22AM +0000, PG Bug reporting form wrote:
> > After upgrading from PostgreSQL 15.15 to 15.16, substring(text) raises:
> > >ERROR: invalid byte sequence for encoding "UTF8": 0xe6 0x97
> > on valid UTF-8 text stored in a TOAST-compressed column.
> 
> > user=> select substring(data from 1 for 1) from toast_repro;
> > ERROR:  22021: invalid byte sequence for encoding "UTF8": 0xe6 0x97
> 
> Thanks for the report.  That is a bug and a regression; I regret missing it
> during review.  The substring operation works by taking a 4-byte slice from
> the toasted value (4 bytes being the max length of a UTF8 char in PostgreSQL),
> the finding the actual first character within those bytes.  However, it
> incorrectly requires those four bytes to be a valid UTF8 string.  I'll start
> on a fix.

Attached.  I may add some more tests, e.g. a toasted invalid string where the
detoasted length is less than the slice we request.  This version is viable,
however.

I audited the other pg_mbstrlen_with_len(), and I think they're all okay with
an error if the input has an incomplete char.  Hence, those don't need changes
beyond what we're already released.  Most pass either parser input or an
existing datum with its len.  text_position_get_match_pos() is the most subtle
caller, and I think it's fine.

I audited other uses of slice detoast.  The only other one is bytea substring,
which is obviously indifferent to character encoding.

Attachment

pgsql-bugs by date:

Previous
From: Nathan Bossart
Date:
Subject: Re: BUG #19407: pg_dump : DROP RULE creates forward references
Next
From: Tom Lane
Date:
Subject: Re: BUG #19408: Bad plan for UNION ALL subquery with outer WHERE, ORDER BY, LIMIT, and separate indexes