Thread-unsafe coding in ecpg - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Thread-unsafe coding in ecpg |
Date | |
Msg-id | 31420.1547783697@sss.pgh.pa.us Whole thread Raw |
Responses |
Re: Thread-unsafe coding in ecpg
|
List | pgsql-hackers |
I've found that a couple of different OpenBSD 6.4 machines fall over badly in the ecpg regression tests, with output like test sql/parser ... ok test thread/thread ... stdout stderr FAILED (test process was terminated by signal 6: Abort trap) test thread/thread_implicit ... stdout FAILED (test process was terminated by signal 10: Bus error) test thread/prep ... ok (test process was terminated by signal 10: Bus error) test thread/alloc ... stderr FAILED (test process was terminated by signal 6: Abort trap) test thread/descriptor ... ok It's somewhat variable as to which tests fail, but it's always thread tests. Examining the core dumps shows traces like #0 thrkill () at -:3 #1 0x00000c04f427dd6e in _libc_abort () at /usr/src/lib/libc/stdlib/abort.c:51 #2 0x00000c04f425f7e9 in wrterror (d=Variable "d" is not available. ) at /usr/src/lib/libc/stdlib/malloc.c:291 #3 0x00000c04f42628fb in find_chunknum (d=Variable "d" is not available. ) at /usr/src/lib/libc/stdlib/malloc.c:1043 #4 0x00000c04f425fe23 in ofree (argpool=Variable "argpool" is not available. ) at /usr/src/lib/libc/stdlib/malloc.c:1359 #5 0x00000c04f425f8ec in free (ptr=0xc04df0e26e0) at /usr/src/lib/libc/stdlib/malloc.c:1419 #6 0x00000c04f427ec83 in freegl (oldgl=0xc05a022d080) at /usr/src/lib/libc/locale/setlocale.c:32 #7 0x00000c04f427eb49 in _libc_setlocale (category=4, locname=0xc059b605180 "C") at /usr/src/lib/libc/locale/setlocale.c:177 #8 0x00000c0531a6f955 in ecpg_do_epilogue (stmt=0xc0587bb0c00) at execute.c:1986 #9 0x00000c0531a6fa65 in ecpg_do (lineno=Variable "lineno" is not available. ) at execute.c:2018 #10 0x00000c0531a6fb31 in ECPGdo (lineno=Variable "lineno" is not available. ) at execute.c:2037 #11 0x00000c02a9f00b19 in test_thread (arg=Variable "arg" is not available. ) at thread.pgc:131 #12 0x00000c04b180b26e in _rthread_start (v=Variable "v" is not available. ) at /usr/src/lib/librthread/rthread.c:96 #13 0x00000c04f42ba77b in __tfork_thread () at /usr/src/lib/libc/arch/amd64/sys/tfork_thread.S:75 #14 0x0000000000000000 in ?? () or this one: #0 _libc_strlcpy (dst=0x2c61e133ee0 "C", src=0xdfdfdfdfdfdfdfdf <Address 0xdfdfdfdfdfdfdfdf out of bounds>, dsize=256) at /usr/src/lib/libc/string/strlcpy.c:36 #1 0x000002c61de99a71 in _libc_setlocale (category=4, locname=0x0) from /usr/lib/libc.so.92.5 #2 0x000002c5ae2693a8 in ecpg_do_prologue (lineno=59, compat=0, force_indicator=1, connection_name=0x0, questionmarks=false, statement_type=ECPGst_execute, query=0x2c333701418 "i", args=0x2c61a96f6d0, stmt_out=0x2c61a96f5e0) at execute.c:1776 #3 0x000002c5ae269a20 in ecpg_do (lineno=Variable "lineno" is not available. ) at execute.c:2001 #4 0x000002c5ae269b31 in ECPGdo (lineno=Variable "lineno" is not available. ) at execute.c:2037 #5 0x000002c333600b47 in fn (arg=Variable "arg" is not available. ) at prep.pgc:59 #6 0x000002c56a00b26e in _rthread_start (v=Variable "v" is not available. ) at /usr/src/lib/librthread/rthread.c:96 #7 0x000002c61ded577b in __tfork_thread () at /usr/src/lib/libc/arch/amd64/sys/tfork_thread.S:75 #8 0x0000000000000000 in ?? () The common denominator is always a call to setlocale(), and that generally is calling malloc or free or some other libc function that is unhappy. When output appears on stderr, it's usually free complaining about double-frees or some such. So my conclusion is that this version of setlocale() has some thread-safety issues. I was all set to go file a bug report when I noticed this in the POSIX spec for setlocale: The setlocale() function need not be thread-safe. as well as this: The global locale established using setlocale() shall only be used in threads for which no current locale has been set using uselocale() or whose current locale has been set to the global locale using uselocale(LC_GLOBAL_LOCALE). IOW, not only is setlocale() not necessarily thread-safe itself, but using it to change locales in a multithread program is seriously unsafe because of concurrent effects on other threads. Therefore, it's plain crazy for ecpg to be calling setlocale() inside threaded code. It looks to me like what ecpg is doing is trying to defend itself against non-C LC_NUMERIC settings, which is laudable, but this implementation of that is totally unsafe. Don't know what's the best way out of this. The simplest thing would be to just remove that code and document that you'd better run ecpg in LC_NUMERIC locale, but it'd be nice if we could do better. regards, tom lane
pgsql-hackers by date: