Home > mailing lists

Proposal: tighten validation for legacy EUC encodings or document that accepted byte sequences may be unconvertible to UTF8 - Mailing list pgsql-hackers

From	Zhongpu Chen
Subject	Proposal: tighten validation for legacy EUC encodings or document that accepted byte sequences may be unconvertible to UTF8
Date	May 2 05:31:12
Msg-id	CA+1gyqJJJDhq=cc_D0ad59WH_OD2G_mN54xTru0KYoNaLkF48Q@mail.gmail.com Whole thread
Responses	Re: Proposal: tighten validation for legacy EUC encodings or document that accepted byte sequences may be unconvertible to UTF8
List	pgsql-hackers

Tree view

Currently PostgreSQL accepts structurally well-formed EUC_CN byte sequences such as 0xA2A3 into text columns. The value round-trips when client_encoding is EUC_CN, but fails when client_encoding is UTF8 because euc_cn_to_utf8 has no mapping.

If this behavior is intentional for compatibility, the documentation should explicitly say that validation for some legacy encodings is byte-structure validation, not mapping-table validation.
If it is not intentional, stricter validation could reject unassigned byte positions at input time.

Zhongpu Chen

pgsql-hackers by date:

From: Chao Li
Date: 02 May, 04:55:30
Subject: Re: Refactor code around GUC default_toast_compression

From: Zhongpu Chen
Date: 02 May, 05:39:26
Subject: Re: Proposal: tighten validation for legacy EUC encodings or document that accepted byte sequences may be unconvertible to UTF8

Proposal: tighten validation for legacy EUC encodings or document that accepted byte sequences may be unconvertible to UTF8 - Mailing list pgsql-hackers

Previous

Next