BugReport: PostgreSQL 17.8. Processing UTF8 encoded strings - Mailing list pgsql-bugs

From Vladimir Valikaev
Subject BugReport: PostgreSQL 17.8. Processing UTF8 encoded strings
Date
Msg-id 9e005eef-a5dc-4ca3-8589-d7836c459e4d@4vrs.com
Whole thread Raw
Responses Re: BugReport: PostgreSQL 17.8. Processing UTF8 encoded strings
List pgsql-bugs

Greetings,

After updating PostgreSQL from version 17.7 to 17.8, we encountered a problem when extracting a substring from a UTF8 encoded string:
ERROR:  invalid byte sequence for encoding "UTF8": 0xe2


Server:
Linux i-db-sandbox1.4vrs.com 6.1.0-43-cloud-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.162-1 (2026-02-08) x86_64 GNU/Linux

$ cat /etc/apt/sources.list.d/pgdg.list 
deb http://apt.postgresql.org/pub/repos/apt/ bookworm-pgdg main
deb-src http://apt.postgresql.org/pub/repos/apt/ bookworm-pgdg main
deb https://apt-archive.postgresql.org/pub/repos/apt bookworm-pgdg-archive main

PostgreSQL 17.8 (Debian 17.8-1.pgdg12+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 12.2.0-14+deb12u1) 12.2.0, 64-bit

Steps to reproduce (psql):

db_fev:vladimir@i-db-sandbox1 => create table test123(id integer, m text);
CREATE TABLE

db_fev:vladimir@i-db-sandbox1 => insert into test123 (id,m) values (1, repeat('a', 1027)||E'\xe2\x80\x8d'||repeat('a', 1027));
INSERT 0 1

db_fev:vladimir@i-db-sandbox1 => select length(SUBSTRING(m from 1 for 256)) from test123;
ERROR:  invalid byte sequence for encoding "UTF8": 0xe2

db_fev:vladimir@i-db-sandbox1 => select length(SUBSTRING(substring(m from 1 for length(m)) from 1 for 256)) from test123;
 length 
--------
    256
(1 row)


Database db_feb:
    Name    | Encoding  | Locale Provider | LC_COLLATE | LC_CTYPE | Locale | ICU Rules |
------------+-----------+-----------------+------------+----------+--------+-----------+
 db_feb     | UTF8      | libc            | C          | C        | [NULL] | [NULL]    |

The problem does not appear on PostgreSQL 17.7. Also, the problem does not occur if the string is fully loaded into memory:

db_feb:vladimir@i-db-sandbox1 =# select length(SUBSTRING(substring(m from 1 for length(m)) from 1 for 256)) from test123;
 length 
--------
    256
(1 row)


The bugreport has also been sent to bugs@postgrespro.ru


-- 
Best Regards,
Vladimir Valikaev
Streamline - Property Management Software

pgsql-bugs by date:

Previous
From: PG Bug reporting form
Date:
Subject: BUG #19415: Spelling error about 'vacuume' in zh_CN.po
Next
From: Anthonin Bonnefoy
Date:
Subject: Re: BUG #18985: fast shutdown does not close connections from qlik data gateway data movement aka. replicate