Re: [GENERAL] psql weird behaviour with charset encodings - Mailing list pgsql-hackers
From | Michael Paquier |
---|---|
Subject | Re: [GENERAL] psql weird behaviour with charset encodings |
Date | |
Msg-id | CAB7nPqS6GdCefz_Yy3EuWiu2-v0vUtVxE6Yix-2OV1fvBwzssA@mail.gmail.com Whole thread Raw |
In response to | Re: [GENERAL] psql weird behaviour with charset encodings (Noah Misch <noah@leadboat.com>) |
Responses |
Re: [GENERAL] psql weird behaviour with charset encodings
|
List | pgsql-hackers |
<div dir="ltr"><br /><br />On Sun, May 24, 2015 at 2:43 AM, Noah Misch <<a href="mailto:noah@leadboat.com">noah@leadboat.com</a>>wrote:<br />> On Sat, May 08, 2010 at 09:24:45PM -0400, Tom Lanewrote:<br />>> <a href="mailto:hgonzalez@gmail.com">hgonzalez@gmail.com</a> writes:<br />>> > <a href="http://sources.redhat.com/bugzilla/show_bug.cgi?id=649">http://sources.redhat.com/bugzilla/show_bug.cgi?id=649</a><br />>><br/>>> > The last explains why they do not consider it a bug:<br />>><br />>> > ISO C99requires for %.*s to only write complete characters that fit below<br />>> > the<br />>> > precisionnumber of bytes. If you are using say UTF-8 locale, but ISO-8859-1<br />>> > characters as shown in theinput file you provided, some of the strings are<br />>> > not valid UTF-8 strings, therefore sprintf fails with-1 because of the<br />>> > encoding error. That's not a bug in glibc.<br />>><br />>> Yeah, thatwas about the position I thought they'd take.<br />><br />> GNU libc eventually revisited that conclusion and fixedthe bug through commit<br />> 715a900c9085907fa749589bf738b192b1a2bda5. RHEL 7.1 is fixed, but RHEL 6.6 and<br />>RHEL 5.11 are still affected; the bug will be relevant for another 8+ years.<br />><br />>> So the bottomline here is that we're best off to avoid %.*s because<br />>> it may fail if the string contains data that isn'tvalidly encoded<br />>> according to libc's idea of the prevailing encoding.<br />><br />> Yep. Immediateprecisions like %.10s trigger the bug as effectively as %.*s,<br />> so tarCreateHeader() [_tarWriteHeader()in 9.2 and earlier] is also affected.<br />> Switching to strlcpy(), as attached, fixes the bug whilesimplifying the code.<br />> The bug symptom is error 'pg_basebackup: unrecognized link indicator "0"' when<br />>the name of a file in the data directory is not a valid multibyte string.<br />><br />><br />> Commit 6dd9584introduced a new use of .*s, to pg_upgrade. It works reliably<br />> for now, because it always runs in the Clocale. pg_upgrade never calls<br />> set_pglocale_pgservice() or otherwise sets its permanent locale. It would be<br/>> natural for us to fix that someday, at which point non-ASCII database names<br />> would perturb this statusoutput.<br /><br />I caught up the following places that need attention on top of the 4 ones in tar.c:<br />src/backend/nodes/read.c: elog(ERROR, "unrecognized integer: \"%.*s\"",<br/>src/backend/nodes/read.c: elog(ERROR, "unrecognizedOID: \"%.*s\"",<br />src/backend/nodes/read.c: elog(ERROR, "unrecognized token:\"%.*s\"", tok_len, token);<br />src/backend/nodes/readfuncs.c: elog(ERROR, "unrecognized token: \"%.*s\"",length, token);<br />src/backend/nodes/readfuncs.c: elog(ERROR, "unrecognized token: \"%.*s\"", length,token);<br />src/backend/nodes/readfuncs.c: elog(ERROR, "unrecognized integer: \"%.*s\"", length,token);<br />src/backend/nodes/readfuncs.c: elog(ERROR, "unrecognized boolop \"%.*s\"", length, token);<br/>src/backend/nodes/readfuncs.c: elog(ERROR, "badly formatted node string \"%.32s\"...", token);<br />src/backend/tsearch/wparser_def.c: * Use of %.*s here is a bit risky since it can misbehave if the data is<br />src/backend/tsearch/wparser_def.c: fprintf(stderr, "parsing \"%.*s\"\n", len, str);<br />src/backend/tsearch/wparser_def.c: /* See note above about %.*s */<br />src/backend/tsearch/wparser_def.c: fprintf(stderr,"parsing copy of \"%.*s\"\n", prs->lenstr, prs->str);<br />src/backend/utils/adt/datetime.c: * Note: the uses of %.*s in this function would be risky if the<br />src/backend/utils/adt/datetime.c: sprintf(str + strlen(str), " %.*s", MAXTZLEN, tzn);<br/>src/backend/utils/adt/datetime.c: sprintf(str + strlen(str), " %.*s", MAXTZLEN,tzn);<br />src/backend/utils/adt/datetime.c: sprintf(str + strlen(str), "%.*s", MAXTZLEN, tzn);<br />src/backend/utils/adt/datetime.c: /* %.*s is safe since all our tokensare ASCII */<br />src/backend/utils/adt/datetime.c: elog(LOG, "token too long in %s table: \"%.*s\"",<br/>src/interfaces/ecpg/ecpglib/error.c: /* %.*s is safe here as long as sqlstate is all-ASCII */<br />src/interfaces/ecpg/ecpglib/error.c: ecpg_log("raising sqlstate %.*s (sqlcode %ld): %s\n",<br />src/interfaces/ecpg/pgtypeslib/dt_common.c: * Note: the uses of %.*s in this function would be riskyif the<br />src/interfaces/ecpg/pgtypeslib/dt_common.c: sprintf(str + strlen(str)," %.*s", MAXTZLEN, tzn);<br />src/interfaces/ecpg/pgtypeslib/dt_common.c: sprintf(str + strlen(str), " %.*s", MAXTZLEN, tzn);<br />src/interfaces/ecpg/pgtypeslib/dt_common.c: sprintf(str + strlen(str), " %.*s", MAXTZLEN, tzn);<br />src/bin/pg_basebackup/pg_basebackup.c: ngettext("%*s/%s kB (%d%%), %d/%d tablespace (%s%-*.*s)",<br />src/bin/pg_basebackup/pg_basebackup.c: "%*s/%s kB (%d%%), %d/%d tablespaces(%s%-*.*s)",<br />src/bin/pg_upgrade/util.c: printf(" %s%-*.*s\r",<br /><br />>It would be good to purge the code of precisions on "s" conversion specifiers,<br />> then Assert(!pointflag) infmtstr() to catch new introductions. I won't plan<br />> to do it myself, but it would be a nice little defensive change.<br/><br />This sounds like a good protection idea, but as it impacts existing backend code relying in sprintf's portversion we should only do the assertion in HEAD in my opinion, and mention it in the release notes of the next majorversion at the time a patch in this area is applied. I guess that we had better backpatch the places using .*s thoughon back-branches.<br />-- <br />Michael</div>
pgsql-hackers by date: