report on not thread-safe functions - Mailing list pgsql-hackers

From Peter Eisentraut
Subject report on not thread-safe functions
Date
Msg-id 856e5ec3-879f-42ee-8258-8bcc6ec9bdea@eisentraut.org
Whole thread Raw
Responses Re: report on not thread-safe functions
List pgsql-hackers
In the context of the multithreaded-server project, I looked into
potentially not thread-safe functions.

(See proposed next steps at the end of this message.)

Here is a list of functions in POSIX that are possibly not thread-safe:

https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_09_01

I checked those against the PostgreSQL server source code (backend +
common + timezone), and found that the following from those are in
use:

- dlerror()
- getenv()
- getgrnam()
- getopt()
- getpwuid()
- localeconv()
- localtime()
- nl_langinfo()
- readdir()
- setenv()
- setlocale()
- strerror()
- strsignal()
- strtok()
- system()
- unsetenv()

Additionally, there are non-standard functions that are not
thread-safe, such as getopt_long().

Also, there are replacement functions such as pg_gmtime() and
pg_localtime() that mirror standard thread-unsafe functions and that
appear to be equally unsafe.  (Note to those looking into annotating
global variables: You also need to check static local variables.)

Conversely, some of the above might actually be thread-safe in
some/many/all implementations.  For example, strerror() and system()
are thread-safe in glibc.  So you might actually get a multithreaded
server running in that environment with fewer source code changes but
have it fail in others.  Just something to keep in mind.

I also tried the concurrency-mt-unsafe check from clang-tidy
(https://clang.llvm.org/extra/clang-tidy/checks/concurrency/mt-unsafe.html). 
  Run it for example like this:

clang-tidy -p build --quiet --checks='-*,concurrency-mt-unsafe' 
src/backend/main/*.c

(needs a compilation database in the build directory)

(You can't just run it like src/backend/**/*.c because some .c files
don't compile standalone, and then the whole thing aborts with too
many errors.  Maybe with a judicious exclusion list, this can be
achieved.  However, it's also not good dealing with compilation
options like FRONTEND.  So it can't easily serve as an automated
checker, but it's okay as a manual exploration tool.)

In addition to the POSIX list above, this also flagged:

- exit()
- sigprocmask()

Allegedly, you can toggle it between posix and glibc modes, but I
haven't succeeded with that.  So for example, it does not actually
flag strerror() out of the box, presumably because that is not in its
glibc list.


Now some more detailed comments on these functions:

- dlerror()

dlerror() gets the error from the last dlopen() call, which is
obviously not thread-safe.  This might require some deeper
investigation of the whole dfmgr.c mechanism.  (Which might be
appropriate in any case, since in multithreaded environments, you
don't need to load a library into each session separately.)

- exit()

Most of the exit() calls happen where there are not multiple threads
active.  But some emergency exit calls like in elog.c might more
correctly use _exit()?

- getenv()
- setenv()
- unsetenv()

getenv() is unsafe if there are concurrent setenv() or unsetenv()
calls.  We should try to move all those to early in the program
startup.  This seems doable.  Some calls are related to locale stuff,
which is a separate subproject to clean up.  There are some calls to
setenv("KRB5*"), which would need to be fixed.  The source code
comments nearby already contain ideas how to.

- getgrnam()
- getpwuid()
- localtime()

These have _r replacements.

- getopt()

This needs a custom replacement.  (There is no getopt_r() because
programs usually don't call getopt() after startup.)

(Note: This is also called during session startup, not only during
initial postmaster start.  So we definitely need something here, if we
want to, like, start more than one session concurrently.)

- localeconv()
- nl_langinfo()
- setlocale()

The locale business needs to be reworked to use locale_t and _l
functions.  This is already being discussed for other reasons.

- readdir()

This is listed as possibly thread-unsafe, but I think it is
thread-safe in practice.  You just can't work on the same DIR handle
from multiple threads.  There is a readdir_r(), but that's already
deprecated.  I think we can ignore this one.

- sigprocmask()

It looks like this is safe in practice.  Also, there is
pthread_sigmask() if needed.

- strerror()

Use strerror_r().  There are very few calls of this, actually, since
most potential users use printf %m.

- strsignal()

Use strsignal_r().  These calls are already wrapped in pg_strsignal()
for Windows portability, so it's easy to change.

But this also led me to think that it is potentially dangerous to have
different standards of thread-safety across the tree.  pg_strsignal()
is used by wait_result_to_str() which is used by pclose_check()
... and at that point you have completely lost track of what you are
dealing with underneath.  So if someone were to put, say,
pclose_check() into pgbench, it could be broken.

- strtok()

Use strtok_r() or maybe even strsep() (but there are small semantic
differences with the latter).

- system()

As mentioned above, this actually safe on some systems.  If there are
systems where it's not safe, then this could require some nontrivial
work.


Suggested next steps:

- The locale API business is already being worked on under separate
   cover.

- Getting rid of the setenv("KRB5*") calls is a small independently
   doable project.

- Check if we can get rid of the getopt() calls at session startup.
   Else figure out a thread-safe replacement.

- Replace remaining strtok() with strsep().  I think the semantics of
   strsep() are actually more correct for the uses I found.  (strtok()
   skips over multiple adjacent separators, but strsep() would return
   empty fields.)

After those, the remaining issues seem less complicated.



pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: [multithreading] extension compatibility
Next
From: Robert Haas
Date:
Subject: Re: ResourceOwner refactoring