Re: [GENERAL] psql weird behaviour with charset encodings - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: [GENERAL] psql weird behaviour with charset encodings
Date
Msg-id CAB7nPqTSwtVDXEsjB0QJgQRazVtRVSzPYBi0VfdyvyhnF8D+jQ@mail.gmail.com
Whole thread Raw
In response to Re: [GENERAL] psql weird behaviour with charset encodings  (Noah Misch <noah@leadboat.com>)
List pgsql-hackers
On Thu, Jun 18, 2015 at 9:47 AM, Noah Misch <noah@leadboat.com> wrote:
> On Wed, Jun 03, 2015 at 05:25:45PM +0900, Michael Paquier wrote:
>> On Tue, Jun 2, 2015 at 4:19 PM, Michael Paquier  <michael.paquier@gmail.com> wrote:
>> > On Sun, May 24, 2015 at 2:43 AM, Noah Misch <noah@leadboat.com> wrote:
>> > > It would be good to purge the code of precisions on "s" conversion specifiers,
>> > > then Assert(!pointflag) in fmtstr() to catch new introductions.  I won't plan
>> > > to do it myself, but it would be a nice little defensive change.
>> >
>> > This sounds like a good protection idea, but as it impacts existing
>> > backend code relying in sprintf's port version we should only do the
>> > assertion in HEAD in my opinion, and mention it in the release notes of the
>> > next major version at the time a patch in this area is applied. I guess
>
> Adding the assertion would be master-only.  We don't necessarily release-note
> C API changes.

Cool. So we are on the same page.

>> > that we had better backpatch the places using .*s though on back-branches.
>
> I would tend to back-patch only the ones that cause interesting bugs.  For
> example, we should not reach the read.c elog() calls anyway, so it's not a big
> deal if the GNU libc bug makes them a bit less helpful in back branches.
> (Thanks for the list of code sites; it was more complete than anything I had.)
> So far, only tar.c looks harmed enough to back-patch.
>
>> Attached is a patch purging a bunch of places from using %.*s, this will
>> make the code more solid when facing non-ASCII strings. I let pg_upgrade
>> and pg_basebackup code paths alone as it reduces the code lisibility by
>> moving out of this separator. We may want to fix them though if file path
>> names have non-ASCII characters, but it seems less critical.
>
> To add the assertion, we must of course fix all uses.

Sure.

> Having seen the patch I requested, I don't like the result as much as I
> expected to like it.  The patched code is definitely harder to read and write:
>
>> @@ -1534,7 +1541,10 @@ parseNodeString(void)
>>               return_value = _readDeclareCursorStmt();
>>       else
>>       {
>> -             elog(ERROR, "badly formatted node string \"%.32s\"...", token);
>> +             char buf[33];
>> +             memcpy(buf, token, 32);
>> +             buf[33] = '\0';
>> +             elog(ERROR, "badly formatted node string \"%s\"...", buf);
>>               return_value = NULL;    /* keep compiler quiet */
>>       }

We could spread what the first patch did in readfuncs.c by having some
more macros doing the duplicated work. Not that it would improve the
code readability of those macros..

> (Apropos, that terminator belongs in buf[32], not buf[33].)

Indeed.

> Perhaps we're better off setting aside the whole idea,

At least on OSX (10.8), I am seeing that no more than the number of
bytes defined by the precision is written. So it looks that we are
safe there. So yes thinking long-term this looks the better
alternative. And I am wondering about the potential breakage that this
could actually have with Postgres third-part tools using src/port's
snprintf.

> or forcing use of snprintf.c on configurations having the bug?

I am less sure about that. It doesn't seem worth it knowing that the
tendance is to evaluate the precision in terms in bytes and not
characters.
-- 
Michael



pgsql-hackers by date:

Previous
From: Noah Misch
Date:
Subject: Re: [GENERAL] psql weird behaviour with charset encodings
Next
From: Peter Eisentraut
Date:
Subject: Re: 9.5 make world failing due to sgml tools missing