Home > mailing lists

Almost bug in COPY FROM processing of GB18030 encoded input - Mailing list pgsql-hackers

From	Heikki Linnakangas
Subject	Almost bug in COPY FROM processing of GB18030 encoded input
Date	January 23, 2019 11:23:23
Msg-id	7704d099-9643-2a55-fb0e-becd64400dcb@iki.fi Whole thread Raw
Responses	Re: Almost bug in COPY FROM processing of GB18030 encoded input
List	pgsql-hackers

Tree view

Hi,

I happened to notice that when CopyReadLineText() calls mblen(), it 
passes only the first byte of the multi-byte characters. However, 
pg_gb18030_mblen() looks at the first and the second byte. 
CopyReadLineText() always passes \0 as the second byte, so 
pg_gb18030_mblen() will incorrectly report the length of 4-byte encoded 
characters as 2.

It works out fine, though, because the second half of the 4-byte encoded 
character always looks like another 2-byte encoded character, in 
GB18030. CopyReadLineText() is looking for delimiter and escape 
characters and newlines, and only single-byte characters are supported 
for those, so treating a 4-byte character as two 2-byte characters is 
harmless.

Attached is a patch to explain that in the comments. Grepping for 
mblen(), I didn't find any other callers that used mblen() like that.

- Heikki

Attachment

0001-Fix-comments-to-that-claimed-that-mblen-only-looks-a.patch

pgsql-hackers by date:

From: Chris Travers
Date: 23 January 2019, 10:55:09
Subject: Re: Proposal for Signal Detection Refactoring

From: Etsuro Fujita
Date: 23 January 2019, 11:35:15
Subject: Re: postgres_fdw: oddity in costing aggregate pushdown paths

Almost bug in COPY FROM processing of GB18030 encoded input - Mailing list pgsql-hackers

Attachment

Previous

Next