Proposal: tighten validation for legacy EUC encodings or document that accepted byte sequences may be unconvertible to UTF8 - Mailing list pgsql-hackers

From Zhongpu Chen
Subject Proposal: tighten validation for legacy EUC encodings or document that accepted byte sequences may be unconvertible to UTF8
Date
Msg-id CA+1gyqJJJDhq=cc_D0ad59WH_OD2G_mN54xTru0KYoNaLkF48Q@mail.gmail.com
Whole thread
Responses Re: Proposal: tighten validation for legacy EUC encodings or document that accepted byte sequences may be unconvertible to UTF8
List pgsql-hackers

Currently PostgreSQL accepts structurally well-formed EUC_CN byte sequences such as 0xA2A3 into text columns. The value round-trips when client_encoding is EUC_CN, but fails when client_encoding is UTF8 because euc_cn_to_utf8 has no mapping.

If this behavior is intentional for compatibility, the documentation should explicitly say that validation for some legacy encodings is byte-structure validation, not mapping-table validation.
If it is not intentional, stricter validation could reject unassigned byte positions at input time.

--
Zhongpu Chen

pgsql-hackers by date:

Previous
From: Chao Li
Date:
Subject: Re: Refactor code around GUC default_toast_compression
Next
From: Zhongpu Chen
Date:
Subject: Re: Proposal: tighten validation for legacy EUC encodings or document that accepted byte sequences may be unconvertible to UTF8