Thread: BUG #18312: libpq: PQsetdbLogin() not thread-safe
The following bug has been logged on the website: Bug reference: 18312 Logged by: Christian Maurer Email address: c.maurer@gmx.at PostgreSQL version: 16.1 Operating system: Windows 10 Pro Description: Hi When calling PQsetdbLogin() concurrently the program crashes with RC=3. Server: PostgreSQL 16.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 11.4.1 20230605 (Red Hat 11.4.1-2), 64-bit libpq: postgresql-16.1-1-windows-x64-binaries.zip (coming from https://www.enterprisedb.com/download-postgresql-binaries) Output: PQisthreadsafe: 1 Connect 0 Connect 2 Connect 1 Connect 3 Connected 0 Connected 2 Finished 0 Finished 2 => 'End' was not reached => %ERRORLEVEL% is 3 => Thread 1 and 3 did not connect/finish When calling PQsetdbLogin() and PQfinish() once before the multi-threading starts, there is no crash and we get the expected output: PQisthreadsafe: 1 Connect 0 Connect 2 Connect 3 Connect 1 Connected 3 Finished 3 Connected 1 Finished 1 Connected 2 Connected 0 Finished 2 Finished 0 End Regards Christian -- main.cpp start -- #include <iostream> #include <thread> #include <vector> #include <pgsql/libpq-fe.h> void test(int i) { try { std::cout << "Connect " << i << std::endl; PGconn *pgConn = PQsetdbLogin("myHost", (const char*)NULL, (const char*)NULL, (const char*)NULL, "myDatabase", "myUser", "myPassword"); std::cout << "Connected " << i << std::endl; PQfinish(pgConn); std::cout << "Finished " << i << std::endl; } catch (...) { std::cout << "Exception occurred in " << i << std::endl; } } int main() { try { std::cout << "PQisthreadsafe: " << PQisthreadsafe() << std::endl; //PGconn* pgConn = PQsetdbLogin("myHost", (const char*)NULL, (const char*)NULL, (const char*)NULL, "myDatabase", "myUser", "myPassword"); //PQfinish(pgConn); const std::size_t maxThreads = 4U; std::vector<std::thread> myThreads; for (std::size_t i = 0; i < maxThreads; ++i) myThreads.push_back(std::thread(test, i)); for (std::thread& myThread : myThreads) myThread.join(); std::cout << "End" << std::endl; } catch (...) { std::cout << "Exception occurred in main" << std::endl; } return 0; } -- main.cpp end --
Hi (Again. Sorry for the html-formatting in my previous mail.) I managed to capture a stack trace for the crash via WinDbg: Frame Index Call Site Child-SP Return Address [0x0] ntdll!NtTerminateProcess+0x14 0x18aa9fdf88 0x7fffce3cda98 [0x1] ntdll!RtlExitUserProcess+0xb8 0x18aa9fdf90 0x7fffcd99e82b [0x2] KERNEL32!ExitProcessImplementation+0xb 0x18aa9fdfc0 0x7fffcd42a155 [0x3] msvcrt!_crtExitProcess+0x15 0x18aa9fdff0 0x7fffcd42a7c5 [0x4] msvcrt!doexit+0x171 0x18aa9fe020 0x7fffcd41f26d [0x5] msvcrt!abort+0x8d 0x18aa9fe090 0x68281886 [0x6] libintl_9!libintl_bindtextdomain+0x56 0x18aa9fe640 0x7fff7dcd2e93 [0x7] LIBPQ!PQenv2encoding+0x1193 0x18aa9fe670 0x7fff7dcd2c9e [0x8] LIBPQ!PQenv2encoding+0xf9e 0x18aa9fe6a0 0x7fff7dcc9263 [0x9] LIBPQ!PQpingParams+0x2ea3 0x18aa9fe6d0 0x7fff7dcc3dcd [0xa] LIBPQ!PQconnectPoll+0x90d 0x18aa9feb50 0x7fff7dcc6a1c [0xb] LIBPQ!PQpingParams+0x65c 0x18aa9ff5c0 0x7fff7dcc50eb [0xc] LIBPQ!PQsetdbLogin+0x19b 0x18aa9ff5f0 0x7ff774798078 [0xd] _HelloWorld!test+0x98 0x18aa9ff620 0x7ff774796f17 It seems to me, that it is related to BUG #17299 Regards, Christian Maurer
Christian Maurer <c.maurer@gmx.at> writes: > I managed to capture a stack trace for the crash via WinDbg: > > Frame Index Call Site Child-SP Return Address > [0x0] ntdll!NtTerminateProcess+0x14 0x18aa9fdf88 0x7fffce3cda98 > [0x1] ntdll!RtlExitUserProcess+0xb8 0x18aa9fdf90 0x7fffcd99e82b > [0x2] KERNEL32!ExitProcessImplementation+0xb 0x18aa9fdfc0 0x7fffcd42a155 > [0x3] msvcrt!_crtExitProcess+0x15 0x18aa9fdff0 0x7fffcd42a7c5 > [0x4] msvcrt!doexit+0x171 0x18aa9fe020 0x7fffcd41f26d > [0x5] msvcrt!abort+0x8d 0x18aa9fe090 0x68281886 > [0x6] libintl_9!libintl_bindtextdomain+0x56 0x18aa9fe640 0x7fff7dcd2e93 > [0x7] LIBPQ!PQenv2encoding+0x1193 0x18aa9fe670 0x7fff7dcd2c9e Hm, so it's bindtextdomain() that is aborting. Back at [1] I wondered whether we needed to introduce a critical section to prevent multiple threads from calling that concurrently, even though its docs allege you shouldn't have to. Perhaps it's thread-safe except on Windows? regards, tom lane [1] https://www.postgresql.org/message-id/flat/CAE7q7Eit4Eq2%3Dbxce%3DFm8HAStECjaXUE%3DWBQc-sDDcgJQ7s7eg%40mail.gmail.com
> I wondered whether we needed to introduce a critical section > to prevent multiple threads from calling that concurrently, > even though its docs allege you shouldn't have to. Perhaps > it's thread-safe except on Windows? This would explain, why I could not reproduce the error on Red Hat Enterprise Linux. Regards, Christian Maurer
Christian Maurer <c.maurer@gmx.at> writes: >> I wondered whether we needed to introduce a critical section >> to prevent multiple threads from calling that concurrently, >> even though its docs allege you shouldn't have to. Perhaps >> it's thread-safe except on Windows? > This would explain, why I could not reproduce the error on Red Hat Enterprise Linux. Are you in a position to try a source-code patch? The attached is very quick-and-dirty, but if it resolves your problem then we could turn it into something respectable. regards, tom lane diff --git a/src/interfaces/libpq/fe-misc.c b/src/interfaces/libpq/fe-misc.c index 47a28b0a3a..8906dae892 100644 --- a/src/interfaces/libpq/fe-misc.c +++ b/src/interfaces/libpq/fe-misc.c @@ -1243,12 +1243,17 @@ libpq_binddomain(void) #endif const char *ldir; + pglock_thread(); + /* No relocatable lookup here because the binary could be anywhere */ ldir = getenv("PGLOCALEDIR"); if (!ldir) ldir = LOCALEDIR; bindtextdomain(PG_TEXTDOMAIN("libpq"), ldir); already_bound = true; + + pgunlock_thread(); + #ifdef WIN32 SetLastError(save_errno); #else
Hi Tom Lane, I could apply your patch and compile libpq on Windows 10. The parallel connect issues are gone now * in the minimal example and * also in our regression test which showed the issue in the first place. Would it be possible, that the patch you are aiming to make can be included in version 16.2? Regards, Christian
Christian Maurer <c.maurer@gmx.at> writes: > I could apply your patch and compile libpq on Windows 10. > > The parallel connect issues are gone now > * in the minimal example and > * also in our regression test which showed the issue in the first place. Great, thanks for the confirmation. > Would it be possible, that the patch you are aiming to make can be included in version 16.2? I'm afraid we're a few days too late for that --- even if I had a finished patch, the release freeze started two days ago. It'll be in the May releases, instead. regards, tom lane
I wrote: > Christian Maurer <c.maurer@gmx.at> writes: >> Would it be possible, that the patch you are aiming to make can be included in version 16.2? > I'm afraid we're a few days too late for that --- even if I had > a finished patch, the release freeze started two days ago. > It'll be in the May releases, instead. For the archives' sake --- I've posted a patch for this at [1]. regards, tom lane [1] https://www.postgresql.org/message-id/264860.1707163416%40sss.pgh.pa.us