Thread: gai_strerror() is not thread-safe on Windows

gai_strerror() is not thread-safe on Windows

From
Thomas Munro
Date:
Hi,

Commit 5579388d removed a bunch of dead code, formerly needed for old
systems that lacked getaddrinfo() in the early days of IPv6.  We
already used the system getaddrinfo() via either configure-time tests
(Unix) or runtime tests (Windows using attempt-to-find-with-dlsym that
always succeeded on modern systems), so no modern system needed the
fallback code, except for one small detail:

getaddrinfo() has a companion function to spit out human readable
error messages, and although Windows has that too, it's not thread
safe[1].  libpq shouldn't call it, or else an unlucky multi-threaded
program might see an error message messed up by another thread.

Here's a patch to put that bit back.  It's simpler than before: the
original replacement had a bunch of #ifdefs for various historical
reasons, but now we can just handle the 8 documented EAI errors on
Windows.

Noticed while wondering why the list of symbols reported in bug #18219
didn't include gai_strerrorA.  That turned out to be because it is
static inline in ws2tcpip.h, and its definition set alarm bells
ringing.  Avoid.

[1] https://learn.microsoft.com/en-us/windows/win32/api/ws2tcpip/nf-ws2tcpip-getaddrinfo

Attachment

Re: gai_strerror() is not thread-safe on Windows

From
Thomas Munro
Date:
On second thoughts, I guess it would make more sense to use the exact
messages Windows' own implementation would return instead of whatever
we had in the past (probably cribbed from some other OS or just made
up?).  I asked CI to spit those out[1].  Updated patch attached.  Will
add to CF.

[1] https://cirrus-ci.com/task/5816802207334400?logs=main#L15

Attachment

Re: gai_strerror() is not thread-safe on Windows

From
Kyotaro Horiguchi
Date:
At Tue, 5 Dec 2023 08:26:54 +1300, Thomas Munro <thomas.munro@gmail.com> wrote in 
> On second thoughts, I guess it would make more sense to use the exact
> messages Windows' own implementation would return instead of whatever
> we had in the past (probably cribbed from some other OS or just made
> up?).  I asked CI to spit those out[1].  Updated patch attached.  Will
> add to CF.
> 
> [1] https://cirrus-ci.com/task/5816802207334400?logs=main#L15

Windows' gai_strerror outputs messages that correspond to the language
environment. Similarly, I think that the messages that the messages
returned by our version should be translatable.

These messages may add extra line-end periods to the parent (or
cotaining) messages when appended. This looks as follows.

(auth.c:517 : errdetail_log() : sub (detail) message)
> Could not translate client host name "hoge" to IP address: An address incompatible with the requested protocol was
used..

(hba.c:1562 : errmsg() : main message)
> invalid IP address "192.0.2.1": This is usually a temporary error during hostname resolution and means that the local
serverdid not receive a response from an authoritative server.
 

When I first saw the first version, I thought it would be better to
use Windows' own messages, just like you did. However, considering the
content of the message above, wouldn't it be better to adhere to
Linux-style messages overall?

A slightly subtler point is that the second example seems to have a
misalignment between the descriptions before and after the colon, but
do you think it's not something to be concerned about to this extent?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: gai_strerror() is not thread-safe on Windows

From
Thomas Munro
Date:
On Tue, Dec 5, 2023 at 3:43 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
> At Tue, 5 Dec 2023 08:26:54 +1300, Thomas Munro <thomas.munro@gmail.com> wrote in
> > On second thoughts, I guess it would make more sense to use the exact
> > messages Windows' own implementation would return instead of whatever
> > we had in the past (probably cribbed from some other OS or just made
> > up?).  I asked CI to spit those out[1].  Updated patch attached.  Will
> > add to CF.
> >
> > [1] https://cirrus-ci.com/task/5816802207334400?logs=main#L15
>
> Windows' gai_strerror outputs messages that correspond to the language
> environment. Similarly, I think that the messages that the messages
> returned by our version should be translatable.

Hmm, that is a good point.  Wow, POSIX has given us a terrible
interface here, in terms of resource management.  Let's see what glibc
does:

https://github.com/lattera/glibc/blob/master/sysdeps/posix/gai_strerror.c
https://github.com/lattera/glibc/blob/master/sysdeps/posix/gai_strerror-strs.h

It doesn't look like it knows about locales at all.  And a test
program seems to confirm:

#include <locale.h>
#include <netdb.h>
#include <stdio.h>
int main()
{
    setlocale(LC_MESSAGES, "ja_JP.UTF-8");
    printf("%s\n", gai_strerror(EAI_MEMORY));
}

That prints:

Memory allocation failure

FreeBSD tries harder, and prints:

メモリ割り当て失敗

We can see that it has a thread-local variable that holds a copy of
that localised string until the next call to gai_strerror() in the
same thread:

https://github.com/freebsd/freebsd-src/blob/main/lib/libc/net/gai_strerror.c
https://github.com/freebsd/freebsd-src/blob/main/lib/libc/nls/ja_JP.UTF-8.msg

FreeBSD's message catalogues would provide a read-made source of
translations, bu... hmm, if glibc doesn't bother and the POSIX
interface is unhelpful and Windows' own implementation is so willfully
unusable, I don't really feel inclined to build a whole thread-local
cache thing on our side just to support this mess.

So I think we should just hard-code the error messages in English and
move on.  However, English is my language so perhaps I should abstain
and leave it to others to decide how important that is.

> These messages may add extra line-end periods to the parent (or
> cotaining) messages when appended. This looks as follows.
>
> (auth.c:517 : errdetail_log() : sub (detail) message)
> > Could not translate client host name "hoge" to IP address: An address incompatible with the requested protocol was
used..
>
> (hba.c:1562 : errmsg() : main message)
> > invalid IP address "192.0.2.1": This is usually a temporary error during hostname resolution and means that the
localserver did not receive a response from an authoritative server. 
>
> When I first saw the first version, I thought it would be better to
> use Windows' own messages, just like you did. However, considering the
> content of the message above, wouldn't it be better to adhere to
> Linux-style messages overall?

Yeah, I agree that either the glibc or the FreeBSD messages would be
better than those now that I've seen them.  They are short and sweet.

> A slightly subtler point is that the second example seems to have a
> misalignment between the descriptions before and after the colon, but
> do you think it's not something to be concerned about to this extent?

I didn't understand what you meant here.



Re: gai_strerror() is not thread-safe on Windows

From
Kyotaro Horiguchi
Date:
At Thu, 7 Dec 2023 09:43:37 +1300, Thomas Munro <thomas.munro@gmail.com> wrote in 
> On Tue, Dec 5, 2023 at 3:43 PM Kyotaro Horiguchi
> <horikyota.ntt@gmail.com> wrote:
> > Windows' gai_strerror outputs messages that correspond to the language
> > environment. Similarly, I think that the messages that the messages
> > returned by our version should be translatable.
> 
> Hmm, that is a good point.  Wow, POSIX has given us a terrible
> interface here, in terms of resource management.  Let's see what glibc
> does:
>
> https://github.com/lattera/glibc/blob/master/sysdeps/posix/gai_strerror.c
> https://github.com/lattera/glibc/blob/master/sysdeps/posix/gai_strerror-strs.h

It is quite a sight for sore eyes...

> It doesn't look like it knows about locales at all.  And a test
> program seems to confirm:
..
>     setlocale(LC_MESSAGES, "ja_JP.UTF-8");
>     printf("%s\n", gai_strerror(EAI_MEMORY));
> 
> That prints:
> 
> Memory allocation failure
> 
> FreeBSD tries harder, and prints:
> 
> メモリ割り当て失敗
> 
> We can see that it has a thread-local variable that holds a copy of
> that localised string until the next call to gai_strerror() in the
> same thread:
> 
> https://github.com/freebsd/freebsd-src/blob/main/lib/libc/net/gai_strerror.c
> https://github.com/freebsd/freebsd-src/blob/main/lib/libc/nls/ja_JP.UTF-8.msg
> 
> FreeBSD's message catalogues would provide a read-made source of
> translations, bu... hmm, if glibc doesn't bother and the POSIX
> interface is unhelpful and Windows' own implementation is so willfully
> unusable, I don't really feel inclined to build a whole thread-local
> cache thing on our side just to support this mess.

I agree, I wouldn't want to do it either.

> So I think we should just hard-code the error messages in English and
> move on.  However, English is my language so perhaps I should abstain
> and leave it to others to decide how important that is.

I also think that would be a good way.

> > These messages may add extra line-end periods to the parent (or
> > cotaining) messages when appended. This looks as follows.
> >
> > (auth.c:517 : errdetail_log() : sub (detail) message)
> > > Could not translate client host name "hoge" to IP address: An address incompatible with the requested protocol
wasused..
 
> >
> > (hba.c:1562 : errmsg() : main message)
> > > invalid IP address "192.0.2.1": This is usually a temporary error during hostname resolution and means that the
localserver did not receive a response from an authoritative server.
 
> >
> > When I first saw the first version, I thought it would be better to
> > use Windows' own messages, just like you did. However, considering the
> > content of the message above, wouldn't it be better to adhere to
> > Linux-style messages overall?
> 
> Yeah, I agree that either the glibc or the FreeBSD messages would be
> better than those now that I've seen them.  They are short and sweet.
> 
> > A slightly subtler point is that the second example seems to have a
> > misalignment between the descriptions before and after the colon, but
> > do you think it's not something to be concerned about to this extent?
> 
> I didn't understand what you meant here.

If it was just a temporary error that couldn't be resolved, it doesn't
mean that the IP address is invalid. If such a cause is possible, then
probabyly an error message saying "failed to resolve" would be more
appropriate. However, I wrote it meaning that there is no need to go
to great length to ensure consistency with this message.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: gai_strerror() is not thread-safe on Windows

From
Robert Haas
Date:
On Wed, Dec 6, 2023 at 8:45 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
> > So I think we should just hard-code the error messages in English and
> > move on.  However, English is my language so perhaps I should abstain
> > and leave it to others to decide how important that is.
>
> I also think that would be a good way.

Considering this remark from Kyotaro Horiguchi, I think the
previously-posted patch could be committed.

Thomas, do you plan to do that, or are there outstanding issues here?

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: gai_strerror() is not thread-safe on Windows

From
vignesh C
Date:
On Tue, 5 Dec 2023 at 00:57, Thomas Munro <thomas.munro@gmail.com> wrote:
>
> On second thoughts, I guess it would make more sense to use the exact
> messages Windows' own implementation would return instead of whatever
> we had in the past (probably cribbed from some other OS or just made
> up?).  I asked CI to spit those out[1].  Updated patch attached.  Will
> add to CF.

CFBot shows that the patch does not apply anymore as in [1]:

=== Applying patches on top of PostgreSQL commit ID
376c216138c75e161d39767650ea30536f23b482 ===
=== applying patch ./v2-0001-Fix-gai_strerror-thread-safety-on-Windows.patch
patching file configure
Hunk #1 succeeded at 16388 (offset 34 lines).
patching file configure.ac
Hunk #1 succeeded at 1885 (offset 7 lines).
patching file src/include/port/win32/sys/socket.h
patching file src/port/meson.build
patching file src/port/win32gai_strerror.c
can't find file to patch at input line 134
Perhaps you used the wrong -p or --strip option?
The text leading up to this was:
--------------------------
|diff --git a/src/tools/msvc/Mkvcbuild.pm b/src/tools/msvc/Mkvcbuild.pm
|index 46df01cc8d..c51296bdb6 100644
|--- a/src/tools/msvc/Mkvcbuild.pm
|+++ b/src/tools/msvc/Mkvcbuild.pm
--------------------------
No file to patch.  Skipping patch.
1 out of 1 hunk ignored

Please have a look and post an updated version.

[1] - http://cfbot.cputube.org/patch_46_4682.log

Regards,
Vignesh



Re: gai_strerror() is not thread-safe on Windows

From
Thomas Munro
Date:
On Tue, Jan 16, 2024 at 8:52 AM Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Dec 6, 2023 at 8:45 PM Kyotaro Horiguchi
> <horikyota.ntt@gmail.com> wrote:
> > > So I think we should just hard-code the error messages in English and
> > > move on.  However, English is my language so perhaps I should abstain
> > > and leave it to others to decide how important that is.
> >
> > I also think that would be a good way.
>
> Considering this remark from Kyotaro Horiguchi, I think the
> previously-posted patch could be committed.
>
> Thomas, do you plan to do that, or are there outstanding issues here?

Pushed.  I went with FreeBSD's error messages (I assume it'd be OK to
take glibc's too under fair use but I didn't want to think about
that).