Home > mailing lists

Re: BUG #19406: substring(text) fails on valid UTF-8 toasted value in PostgreSQL 15.16 - Mailing list pgsql-bugs

From	Noah Misch
Subject	Re: BUG #19406: substring(text) fails on valid UTF-8 toasted value in PostgreSQL 15.16
Date	February 14 01:48:04
Msg-id	20260213224804.2c@rfd.leadboat.com Whole thread Raw
In response to	Re: BUG #19406: substring(text) fails on valid UTF-8 toasted value in PostgreSQL 15.16 (Noah Misch <noah@leadboat.com>)
Responses	Re: BUG #19406: substring(text) fails on valid UTF-8 toasted value in PostgreSQL 15.16
List	pgsql-bugs

Tree view

On Fri, Feb 13, 2026 at 09:27:02AM -0800, Noah Misch wrote:
> On Fri, Feb 13, 2026 at 07:46:22AM +0000, PG Bug reporting form wrote:
> > After upgrading from PostgreSQL 15.15 to 15.16, substring(text) raises:
> > >ERROR: invalid byte sequence for encoding "UTF8": 0xe6 0x97
> > on valid UTF-8 text stored in a TOAST-compressed column.
> 
> > user=> select substring(data from 1 for 1) from toast_repro;
> > ERROR:  22021: invalid byte sequence for encoding "UTF8": 0xe6 0x97
> 
> Thanks for the report.  That is a bug and a regression; I regret missing it
> during review.  The substring operation works by taking a 4-byte slice from
> the toasted value (4 bytes being the max length of a UTF8 char in PostgreSQL),
> the finding the actual first character within those bytes.  However, it
> incorrectly requires those four bytes to be a valid UTF8 string.  I'll start
> on a fix.

Attached.  I may add some more tests, e.g. a toasted invalid string where the
detoasted length is less than the slice we request.  This version is viable,
however.

I audited the other pg_mbstrlen_with_len(), and I think they're all okay with
an error if the input has an incomplete char.  Hence, those don't need changes
beyond what we're already released.  Most pass either parser input or an
existing datum with its len.  text_position_get_match_pos() is the most subtle
caller, and I think it's fine.

I audited other uses of slice detoast.  The only other one is bytea substring,
which is obviously indifferent to character encoding.

Attachment

toast-slice-mblen-v1.patch

pgsql-bugs by date:

From: Nathan Bossart
Date: 14 February, 01:16:59
Subject: Re: BUG #19407: pg_dump : DROP RULE creates forward references

From: Tom Lane
Date: 14 February, 02:36:07
Subject: Re: BUG #19408: Bad plan for UNION ALL subquery with outer WHERE, ORDER BY, LIMIT, and separate indexes

Re: BUG #19406: substring(text) fails on valid UTF-8 toasted value in PostgreSQL 15.16 - Mailing list pgsql-bugs

Attachment

Previous

Next