Re: [GENERAL] psql weird behaviour with charset encodings - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: [GENERAL] psql weird behaviour with charset encodings
Date
Msg-id CAB7nPqS6GdCefz_Yy3EuWiu2-v0vUtVxE6Yix-2OV1fvBwzssA@mail.gmail.com
Whole thread Raw
In response to Re: [GENERAL] psql weird behaviour with charset encodings  (Noah Misch <noah@leadboat.com>)
Responses Re: [GENERAL] psql weird behaviour with charset encodings
List pgsql-hackers
<div dir="ltr"><br /><br />On Sun, May 24, 2015 at 2:43 AM, Noah Misch <<a
href="mailto:noah@leadboat.com">noah@leadboat.com</a>>wrote:<br />> On Sat, May 08, 2010 at 09:24:45PM -0400, Tom
Lanewrote:<br />>> <a href="mailto:hgonzalez@gmail.com">hgonzalez@gmail.com</a> writes:<br />>> > <a
href="http://sources.redhat.com/bugzilla/show_bug.cgi?id=649">http://sources.redhat.com/bugzilla/show_bug.cgi?id=649</a><br
/>>><br/>>> > The last explains why they do not consider it a bug:<br />>><br />>> > ISO
C99requires for %.*s to only write complete characters that fit below<br />>> > the<br />>> >
precisionnumber of bytes. If you are using say UTF-8 locale, but ISO-8859-1<br />>> > characters as shown in
theinput file you provided, some of the strings are<br />>> > not valid UTF-8 strings, therefore sprintf fails
with-1 because of the<br />>> > encoding error. That's not a bug in glibc.<br />>><br />>> Yeah,
thatwas about the position I thought they'd take.<br />><br />> GNU libc eventually revisited that conclusion and
fixedthe bug through commit<br />> 715a900c9085907fa749589bf738b192b1a2bda5.  RHEL 7.1 is fixed, but RHEL 6.6 and<br
/>>RHEL 5.11 are still affected; the bug will be relevant for another 8+ years.<br />><br />>> So the
bottomline here is that we're best off to avoid %.*s because<br />>> it may fail if the string contains data that
isn'tvalidly encoded<br />>> according to libc's idea of the prevailing encoding.<br />><br />> Yep. 
Immediateprecisions like %.10s trigger the bug as effectively as %.*s,<br />> so tarCreateHeader()
[_tarWriteHeader()in 9.2 and earlier] is also affected.<br />> Switching to strlcpy(), as attached, fixes the bug
whilesimplifying the code.<br />> The bug symptom is error 'pg_basebackup: unrecognized link indicator "0"' when<br
/>>the name of a file in the data directory is not a valid multibyte string.<br />><br />><br />> Commit
6dd9584introduced a new use of .*s, to pg_upgrade.  It works reliably<br />> for now, because it always runs in the
Clocale.  pg_upgrade never calls<br />> set_pglocale_pgservice() or otherwise sets its permanent locale.  It would
be<br/>> natural for us to fix that someday, at which point non-ASCII database names<br />> would perturb this
statusoutput.<br /><br />I caught up the following places that need attention on top of the 4 ones in tar.c:<br
/>src/backend/nodes/read.c:                                                      elog(ERROR, "unrecognized integer:
\"%.*s\"",<br/>src/backend/nodes/read.c:                                                       elog(ERROR,
"unrecognizedOID: \"%.*s\"",<br />src/backend/nodes/read.c:                               elog(ERROR, "unrecognized
token:\"%.*s\"", tok_len, token);<br />src/backend/nodes/readfuncs.c:          elog(ERROR, "unrecognized token:
\"%.*s\"",length, token);<br />src/backend/nodes/readfuncs.c:          elog(ERROR, "unrecognized token: \"%.*s\"",
length,token);<br />src/backend/nodes/readfuncs.c:                  elog(ERROR, "unrecognized integer: \"%.*s\"",
length,token);<br />src/backend/nodes/readfuncs.c:          elog(ERROR, "unrecognized boolop \"%.*s\"", length,
token);<br/>src/backend/nodes/readfuncs.c:          elog(ERROR, "badly formatted node string \"%.32s\"...", token);<br
/>src/backend/tsearch/wparser_def.c:      * Use of %.*s here is a bit risky since it can misbehave if the data is<br
/>src/backend/tsearch/wparser_def.c:     fprintf(stderr, "parsing \"%.*s\"\n", len, str);<br
/>src/backend/tsearch/wparser_def.c:     /* See note above about %.*s */<br />src/backend/tsearch/wparser_def.c:    
 fprintf(stderr,"parsing copy of \"%.*s\"\n", prs->lenstr, prs->str);<br />src/backend/utils/adt/datetime.c:    
                  * Note: the uses of %.*s in this function would be risky if the<br
/>src/backend/utils/adt/datetime.c:                                      sprintf(str + strlen(str), " %.*s", MAXTZLEN,
tzn);<br/>src/backend/utils/adt/datetime.c:                                       sprintf(str + strlen(str), " %.*s",
MAXTZLEN,tzn);<br />src/backend/utils/adt/datetime.c:                                       sprintf(str + strlen(str),
"%.*s", MAXTZLEN, tzn);<br />src/backend/utils/adt/datetime.c:                       /* %.*s is safe since all our
tokensare ASCII */<br />src/backend/utils/adt/datetime.c:                       elog(LOG, "token too long in %s table:
\"%.*s\"",<br/>src/interfaces/ecpg/ecpglib/error.c:    /* %.*s is safe here as long as sqlstate is all-ASCII */<br
/>src/interfaces/ecpg/ecpglib/error.c:   ecpg_log("raising sqlstate %.*s (sqlcode %ld): %s\n",<br
/>src/interfaces/ecpg/pgtypeslib/dt_common.c:                     * Note: the uses of %.*s in this function would be
riskyif the<br />src/interfaces/ecpg/pgtypeslib/dt_common.c:                                     sprintf(str +
strlen(str)," %.*s", MAXTZLEN, tzn);<br />src/interfaces/ecpg/pgtypeslib/dt_common.c:                                  
 sprintf(str + strlen(str), " %.*s", MAXTZLEN, tzn);<br />src/interfaces/ecpg/pgtypeslib/dt_common.c:                  
                 sprintf(str + strlen(str), " %.*s", MAXTZLEN, tzn);<br />src/bin/pg_basebackup/pg_basebackup.c:      
                          ngettext("%*s/%s kB (%d%%), %d/%d tablespace (%s%-*.*s)",<br
/>src/bin/pg_basebackup/pg_basebackup.c:                                                  "%*s/%s kB (%d%%), %d/%d
tablespaces(%s%-*.*s)",<br />src/bin/pg_upgrade/util.c:                              printf("  %s%-*.*s\r",<br /><br
/>>It would be good to purge the code of precisions on "s" conversion specifiers,<br />> then Assert(!pointflag)
infmtstr() to catch new introductions.  I won't plan<br />> to do it myself, but it would be a nice little defensive
change.<br/><br />This sounds like a good protection idea, but as it impacts existing backend code relying in sprintf's
portversion we should only do the assertion in HEAD in my opinion, and mention it in the release notes of the next
majorversion at the time a patch in this area is applied. I guess that we had better backpatch the places using .*s
thoughon back-branches.<br />-- <br />Michael</div> 

pgsql-hackers by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: auto_explain sample rate
Next
From: Amit Langote
Date:
Subject: Re: checkpointer continuous flushing