report on not thread-safe functions - Mailing list pgsql-hackers
From | Peter Eisentraut |
---|---|
Subject | report on not thread-safe functions |
Date | |
Msg-id | 856e5ec3-879f-42ee-8258-8bcc6ec9bdea@eisentraut.org Whole thread Raw |
Responses |
Re: report on not thread-safe functions
|
List | pgsql-hackers |
In the context of the multithreaded-server project, I looked into potentially not thread-safe functions. (See proposed next steps at the end of this message.) Here is a list of functions in POSIX that are possibly not thread-safe: https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_09_01 I checked those against the PostgreSQL server source code (backend + common + timezone), and found that the following from those are in use: - dlerror() - getenv() - getgrnam() - getopt() - getpwuid() - localeconv() - localtime() - nl_langinfo() - readdir() - setenv() - setlocale() - strerror() - strsignal() - strtok() - system() - unsetenv() Additionally, there are non-standard functions that are not thread-safe, such as getopt_long(). Also, there are replacement functions such as pg_gmtime() and pg_localtime() that mirror standard thread-unsafe functions and that appear to be equally unsafe. (Note to those looking into annotating global variables: You also need to check static local variables.) Conversely, some of the above might actually be thread-safe in some/many/all implementations. For example, strerror() and system() are thread-safe in glibc. So you might actually get a multithreaded server running in that environment with fewer source code changes but have it fail in others. Just something to keep in mind. I also tried the concurrency-mt-unsafe check from clang-tidy (https://clang.llvm.org/extra/clang-tidy/checks/concurrency/mt-unsafe.html). Run it for example like this: clang-tidy -p build --quiet --checks='-*,concurrency-mt-unsafe' src/backend/main/*.c (needs a compilation database in the build directory) (You can't just run it like src/backend/**/*.c because some .c files don't compile standalone, and then the whole thing aborts with too many errors. Maybe with a judicious exclusion list, this can be achieved. However, it's also not good dealing with compilation options like FRONTEND. So it can't easily serve as an automated checker, but it's okay as a manual exploration tool.) In addition to the POSIX list above, this also flagged: - exit() - sigprocmask() Allegedly, you can toggle it between posix and glibc modes, but I haven't succeeded with that. So for example, it does not actually flag strerror() out of the box, presumably because that is not in its glibc list. Now some more detailed comments on these functions: - dlerror() dlerror() gets the error from the last dlopen() call, which is obviously not thread-safe. This might require some deeper investigation of the whole dfmgr.c mechanism. (Which might be appropriate in any case, since in multithreaded environments, you don't need to load a library into each session separately.) - exit() Most of the exit() calls happen where there are not multiple threads active. But some emergency exit calls like in elog.c might more correctly use _exit()? - getenv() - setenv() - unsetenv() getenv() is unsafe if there are concurrent setenv() or unsetenv() calls. We should try to move all those to early in the program startup. This seems doable. Some calls are related to locale stuff, which is a separate subproject to clean up. There are some calls to setenv("KRB5*"), which would need to be fixed. The source code comments nearby already contain ideas how to. - getgrnam() - getpwuid() - localtime() These have _r replacements. - getopt() This needs a custom replacement. (There is no getopt_r() because programs usually don't call getopt() after startup.) (Note: This is also called during session startup, not only during initial postmaster start. So we definitely need something here, if we want to, like, start more than one session concurrently.) - localeconv() - nl_langinfo() - setlocale() The locale business needs to be reworked to use locale_t and _l functions. This is already being discussed for other reasons. - readdir() This is listed as possibly thread-unsafe, but I think it is thread-safe in practice. You just can't work on the same DIR handle from multiple threads. There is a readdir_r(), but that's already deprecated. I think we can ignore this one. - sigprocmask() It looks like this is safe in practice. Also, there is pthread_sigmask() if needed. - strerror() Use strerror_r(). There are very few calls of this, actually, since most potential users use printf %m. - strsignal() Use strsignal_r(). These calls are already wrapped in pg_strsignal() for Windows portability, so it's easy to change. But this also led me to think that it is potentially dangerous to have different standards of thread-safety across the tree. pg_strsignal() is used by wait_result_to_str() which is used by pclose_check() ... and at that point you have completely lost track of what you are dealing with underneath. So if someone were to put, say, pclose_check() into pgbench, it could be broken. - strtok() Use strtok_r() or maybe even strsep() (but there are small semantic differences with the latter). - system() As mentioned above, this actually safe on some systems. If there are systems where it's not safe, then this could require some nontrivial work. Suggested next steps: - The locale API business is already being worked on under separate cover. - Getting rid of the setenv("KRB5*") calls is a small independently doable project. - Check if we can get rid of the getopt() calls at session startup. Else figure out a thread-safe replacement. - Replace remaining strtok() with strsep(). I think the semantics of strsep() are actually more correct for the uses I found. (strtok() skips over multiple adjacent separators, but strsep() would return empty fields.) After those, the remaining issues seem less complicated.
pgsql-hackers by date: