Home > mailing lists

Re: Proposal: tighten validation for legacy EUC encodings or document that accepted byte sequences may be unconvertible to UTF8 - Mailing list pgsql-hackers

From	Zhongpu Chen
Subject	Re: Proposal: tighten validation for legacy EUC encodings or document that accepted byte sequences may be unconvertible to UTF8
Date	May 2 05:39:26
Msg-id	CA+1gyq+LF_91g_i0WXeKK6JGF8viaqaF213S-9Arq=SG=4GAaA@mail.gmail.com Whole thread
In response to	Proposal: tighten validation for legacy EUC encodings or document that accepted byte sequences may be unconvertible to UTF8 (Zhongpu Chen <chenloveit@gmail.com>)
Responses	Re: Proposal: tighten validation for legacy EUC encodings or document that accepted byte sequences may be unconvertible to UTF8
List	pgsql-hackers

Tree view

The issue is not specific to E'\\x..' literals. A normal COPY FROM data file with ENCODING 'EUC_CN' can create text rows that later cannot be retrieved with SELECT.

This suggests that input validation for EUC_CN is only structural, while the EUC_CN-to-UTF8 conversion table is stricter.

On Sat, May 2, 2026 at 10:31 AM Zhongpu Chen <chenloveit@gmail.com> wrote:

See the related bug report https://www.postgresql.org/message-id/CA%2B1gyqL7uiQhfLcYWpHNUKQgHjQc7sOPthSTiaxLDZzcrGFYSg%40mail.gmail.com

Currently PostgreSQL accepts structurally well-formed EUC_CN byte sequences such as 0xA2A3 into text columns. The value round-trips when client_encoding is EUC_CN, but fails when client_encoding is UTF8 because euc_cn_to_utf8 has no mapping.

If this behavior is intentional for compatibility, the documentation should explicitly say that validation for some legacy encodings is byte-structure validation, not mapping-table validation.
If it is not intentional, stricter validation could reject unassigned byte positions at input time.

--
Zhongpu Chen

Zhongpu Chen

pgsql-hackers by date:

From: Zhongpu Chen
Date: 02 May, 05:31:12
Subject: Proposal: tighten validation for legacy EUC encodings or document that accepted byte sequences may be unconvertible to UTF8

From: "David G. Johnston"
Date: 02 May, 06:28:31
Subject: Re: Proposal: tighten validation for legacy EUC encodings or document that accepted byte sequences may be unconvertible to UTF8

Re: Proposal: tighten validation for legacy EUC encodings or document that accepted byte sequences may be unconvertible to UTF8 - Mailing list pgsql-hackers

Previous

Next