Thread: Mixing threaded and non-threaded
(I hope this is -hackers appropriate - feel free to point me elsewhere) I'm using 7.4.1 as the backend to several applications. Until recently, I've been developing solely single-threaded applications. I just rebuilt postgresql with --enable-thread-safety, to work with some multi-threaded code. When I rebuilt libpq to use threads, I started seeing a bunch of weird failures in many of the older applications. The change in libpq meant that libpthread was being dynamically linked into the non-thread-aware applications, leading to some mutex deadlocks in their signal handlers, hanging those applications. There doesn't seem to be any tidy way to build and use both threaded and non-threaded libpq on the same system (LD_LIBRARY_PATH hacks aren't really viable for distributed code). Is there something I'm missing? (If it's relevant, the OS in question is RedHat Linux, but I'mmaintaining the same suite of apps on several other architectures.) Cheers, Steve
Steve Atkins wrote: > (I hope this is -hackers appropriate - feel free to point me elsewhere) > > I'm using 7.4.1 as the backend to several applications. Until recently, > I've been developing solely single-threaded applications. > > I just rebuilt postgresql with --enable-thread-safety, to work with > some multi-threaded code. > > When I rebuilt libpq to use threads, I started seeing a bunch of weird > failures in many of the older applications. The change in libpq meant > that libpthread was being dynamically linked into the non-thread-aware > applications, leading to some mutex deadlocks in their signal > handlers, hanging those applications. > > There doesn't seem to be any tidy way to build and use both threaded > and non-threaded libpq on the same system (LD_LIBRARY_PATH hacks > aren't really viable for distributed code). Is there something I'm > missing? No, there is not. We could compile two versions, and have you specify the threaded version only when you want it, but only some operating systems have that distinction, so then we would have to identical libraries on some platforms, and different ones on others, and that seemed pretty confusing. Of course, we can always revisit this. > (If it's relevant, the OS in question is RedHat Linux, but I'm > maintaining the same suite of apps on several other architectures.) This is interesting. I had not considered that libpq's calls to libpthread would cause problems. In fact, libpq shouldn't be doing anything special with pthread except for a few calls used in port/thread.c. However, the issue we always were worried about was that linking against libpthread would cause some unexpected thread calls in the application, and it looks like that is exactly what you are seeing. In fact, it sounds like it is the calls to allow synchronous signals to be delivered to the thread that generated them that might be the new change you are seeing. My guess is that creating applications against the non-thread libpq and then replacing it with a threaded libpq is your problem. I guess the question is whether you would like to have two libpq's and have to decide at link time if you wanted threading, or just have one libpq and make sure you recompile if you change the threading behavior of the library. We considered the later to be clearer. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
On Fri, Jan 23, 2004 at 10:03:30PM -0500, Bruce Momjian wrote: > Steve Atkins wrote: > > When I rebuilt libpq to use threads, I started seeing a bunch of weird > > failures in many of the older applications. The change in libpq meant > > that libpthread was being dynamically linked into the non-thread-aware > > applications, leading to some mutex deadlocks in their signal > > handlers, hanging those applications. > > > > There doesn't seem to be any tidy way to build and use both threaded > > and non-threaded libpq on the same system (LD_LIBRARY_PATH hacks > > aren't really viable for distributed code). Is there something I'm > > missing? > > No, there is not. We could compile two versions, and have you specify > the threaded version only when you want it, but only some operating > systems have that distinction, so then we would have to identical > libraries on some platforms, and different ones on others, and that > seemed pretty confusing. Of course, we can always revisit this. > > > (If it's relevant, the OS in question is RedHat Linux, but I'm > > maintaining the same suite of apps on several other architectures.) > > This is interesting. I had not considered that libpq's calls to > libpthread would cause problems. In fact, libpq shouldn't be doing > anything special with pthread except for a few calls used in > port/thread.c. Yes, libpqs use of actual use of pthread seems pretty harmless. > However, the issue we always were worried about was that > linking against libpthread would cause some unexpected thread calls in > the application, and it looks like that is exactly what you are seeing. > In fact, it sounds like it is the calls to allow synchronous signals to > be delivered to the thread that generated them that might be the new > change you are seeing. Exactly that, yes. > My guess is that creating applications against the non-thread libpq and > then replacing it with a threaded libpq is your problem. Yes. It seems to make no difference whether the application is rebuilt or not. It's pulling libpthread into a non-thread-aware application that's the problem. The only fix that would allow the non-threaded application to work with a thread-safe libpq would be to rewrite it to be a threaded application with a single active thread. > I guess the > question is whether you would like to have two libpq's and have to > decide at link time if you wanted threading, or just have one libpq and > make sure you recompile if you change the threading behavior of the > library. We considered the later to be clearer. Recompiling doesn't neccesarily help unless the application is also rewritten. Also, if there are dozens of non-threaded applications using libpq on a system (possibly installed via rpms or equivalent) then replacing the system libpq could break something else. For now I'm just building and distributing two different libpqs and choosing between them with rpath hacks (yes, renaming one of them might be easier, but I'm specifying rpath explicitly anyway for other reasons). That seems to be working just fine for me. If there are multiple applications on the system using PostgreSQL we really don't want to break some of them if libpq is rebuilt to support a new one. Probably worth a mention in the documentation at least. Cheers, Steve
Steve Atkins wrote: > > My guess is that creating applications against the non-thread libpq and > > then replacing it with a threaded libpq is your problem. > > Yes. It seems to make no difference whether the application is rebuilt > or not. It's pulling libpthread into a non-thread-aware application > that's the problem. > > The only fix that would allow the non-threaded application to work > with a thread-safe libpq would be to rewrite it to be a threaded > application with a single active thread. Woh, as far as I know, any application should run fine with -lpthread, threaded or not. What OS are you on? This is the first I have heard of this problem. > > I guess the > > question is whether you would like to have two libpq's and have to > > decide at link time if you wanted threading, or just have one libpq and > > make sure you recompile if you change the threading behavior of the > > library. We considered the later to be clearer. > > Recompiling doesn't neccesarily help unless the application is also > rewritten. Also, if there are dozens of non-threaded applications > using libpq on a system (possibly installed via rpms or equivalent) > then replacing the system libpq could break something else. Why? How would you rewrite it? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
On Tue, Jan 27, 2004 at 02:07:44PM -0500, Bruce Momjian wrote: > Steve Atkins wrote: > > > My guess is that creating applications against the non-thread libpq and > > > then replacing it with a threaded libpq is your problem. > > > > Yes. It seems to make no difference whether the application is rebuilt > > or not. It's pulling libpthread into a non-thread-aware application > > that's the problem. > > > > The only fix that would allow the non-threaded application to work > > with a thread-safe libpq would be to rewrite it to be a threaded > > application with a single active thread. > > > Woh, as far as I know, any application should run fine with -lpthread, > threaded or not. What OS are you on? This is the first I have heard of > this problem. Linux/i386, RedHat 7.something, gcc 2.96. Not my favorite configuration, but nothing particularly odd. > > > I guess the > > > question is whether you would like to have two libpq's and have to > > > decide at link time if you wanted threading, or just have one libpq and > > > make sure you recompile if you change the threading behavior of the > > > library. We considered the later to be clearer. > > > > Recompiling doesn't neccesarily help unless the application is also > > rewritten. Also, if there are dozens of non-threaded applications > > using libpq on a system (possibly installed via rpms or equivalent) > > then replacing the system libpq could break something else. > > Why? How would you rewrite it? No idea. I've not looked at exactly what's going on, yet. It's perfectly possible that the problem I'm seeing is actually a bug in the underlying code - but it's been used in heavy production use for two years without pthread, and deadlocked immediately when built with pthread, so it's the sort of bug that could be elsewhere. It's a very complex application, so I'd really need to reduce it to a test case to narrow it down. A hint, though, might be that it's a multiprocess application with a single master process that controls dozens of child processes. When the master shuts down it asks all the children to shut down, and then it deadlocks in the SIGCHILD handler. I'll burrow a bit deeper when I get some time. Cheers, Steve
On Jan 27, 2004, at 1:16 PM, Steve Atkins wrote: > A hint, though, might be that it's a multiprocess application with a > single master process that controls dozens of child processes. When the > master shuts down it asks all the children to shut down, and then it > deadlocks in the SIGCHILD handler. It's not safe to do anything interesting in a SIGCHLD handler, unless you have pretty severe restrictions on when the signal can arrive. Take a look at <http://www.opengroup.org/onlinepubs/007904975/functions/ xsh_chap02_04.html>. It contains a list of all the async signal-safe functions in SUSv3. It's a pretty short list. Notably absent are pthread_mutex_*() and malloc() (and anything that uses them). Scott Lamb
Bruce Momjian wrote: >Woh, as far as I know, any application should run fine with -lpthread, >threaded or not. What OS are you on? This is the first I have heard of >this problem. > > Perhaps we should try to figure out how other packages handle multithreaded/singlethreaded libraries? I'm looking at openssl right now, and openssl never links against libpthread: The caller is responsible for registering the locking primitives. -- Manfred
Manfred Spraul wrote: > Bruce Momjian wrote: > > >Woh, as far as I know, any application should run fine with -lpthread, > >threaded or not. What OS are you on? This is the first I have heard of > >this problem. > > > > > Perhaps we should try to figure out how other packages handle > multithreaded/singlethreaded libraries? I'm looking at openssl right > now, and openssl never links against libpthread: The caller is > responsible for registering the locking primitives. We perhaps don't need -lpthread for creating libpq, but only for ecpg. However, now that we have used thread locking for SIGPIPE, we are now calling pthread from libpq, but only 7.5. However, I still don't understand why the user is seeing a problem and what rewrite he thinks is necessary for his application because pthread is linked in. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
On Jan 30, 2004, at 3:18 AM, Bruce Momjian wrote: > Manfred Spraul wrote: >> Bruce Momjian wrote: >> >>> Woh, as far as I know, any application should run fine with >>> -lpthread, >>> threaded or not. What OS are you on? This is the first I have >>> heard of >>> this problem. >>> >>> >> Perhaps we should try to figure out how other packages handle >> multithreaded/singlethreaded libraries? I'm looking at openssl right >> now, and openssl never links against libpthread: The caller is >> responsible for registering the locking primitives. Some other libraries, such as boost, always link against -lpthread when it is present. I don't think OpenSSL's example is a good one to follow. It's way too easy to forget to do that, and then your application is broken. You'll have weird crashes that will be hard to figure out. I think OpenSSL was made such because pthreads was not so common back in the day; they probably wanted to support other threading APIs. That's unnecessary now. Another reason might be to avoid the expense of locks when they are unnecessary. But also, I think that is not as necessary as it once was, particularly with modern systems like Linux+NPTL having locks cost virtually nothing when there is no contention. > We perhaps don't need -lpthread for creating libpq, but only for ecpg. > However, now that we have used thread locking for SIGPIPE, we are now > calling pthread from libpq, but only 7.5. > > However, I still don't understand why the user is seeing a problem and > what rewrite he thinks is necessary for his application because pthread > is linked in. I'm 99% certain that any application will work with -lpthread on RedHat Linux. And 95% certain that's true on _any_ platform. There's no pthread_init() or anything; the distinction he was describing between a non-threaded application and a threaded application with only one thread doesn't exist as far as I know. And he mentioned that the deadlocks are occurring in a SIGCHLD handler. Since so few functions are async signal-safe (I doubt anything in libpq is), the code in question was broken before; the extra locking is just making it more obvious. Speaking of async signal-safe functions, pthread_getspecific() isn't specified to be (and thus PQinSend() and thus sigpipe_handler_ignore_send()). It's probably okay, but libpq is technically using undefined behavior according to SUSv3. Scott Lamb
Scott Lamb wrote: > Speaking of async signal-safe functions, pthread_getspecific() isn't > specified to be (and thus PQinSend() and thus > sigpipe_handler_ignore_send()). It's probably okay, but libpq is > technically using undefined behavior according to SUSv3. Yikes. I never suspected pthread_getspecific() would not be signal safe because it is already thread safe, but I see the point that it is called in the current thread. Any ideas how to fix this? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
On Fri, Jan 30, 2004 at 11:10:49AM -0600, Scott Lamb wrote: > On Jan 30, 2004, at 3:18 AM, Bruce Momjian wrote: > >Manfred Spraul wrote: > >>Bruce Momjian wrote: > >> > >>>Woh, as far as I know, any application should run fine with > >>>-lpthread, > >>>threaded or not. What OS are you on? This is the first I have > >>>heard of > >>>this problem. > >>> > >>> > >>Perhaps we should try to figure out how other packages handle > >>multithreaded/singlethreaded libraries? I'm looking at openssl right > >>now, and openssl never links against libpthread: The caller is > >>responsible for registering the locking primitives. I don't think changing the linking approach is a good thing. But a mention in the documentation might be. > >We perhaps don't need -lpthread for creating libpq, but only for ecpg. > >However, now that we have used thread locking for SIGPIPE, we are now > >calling pthread from libpq, but only 7.5. > > > >However, I still don't understand why the user is seeing a problem and > >what rewrite he thinks is necessary for his application because pthread > >is linked in. I suspect the rewrite needed is to avoid doing Bad Things in the signal handler. > I'm 99% certain that any application will work with -lpthread on RedHat > Linux. And 95% certain that's true on _any_ platform. There's no > pthread_init() or anything; the distinction he was describing between a > non-threaded application and a threaded application with only one > thread doesn't exist as far as I know. That may be true for any correctly written application, but it's certainly not true for any application. The distinction is, at the very least, that some system calls are wrapped with mutexes. > And he mentioned that the deadlocks are occurring in a SIGCHLD handler. > Since so few functions are async signal-safe (I doubt anything in libpq > is), the code in question was broken before; the extra locking is just > making it more obvious. I tend to agree. However, while it may have been broken before, it worked flawlessly in multiple production environments on several different operating systems for several years when not linked with pthread. Cheers, Steve
Bruce Momjian wrote: > Scott Lamb wrote: > >>Speaking of async signal-safe functions, pthread_getspecific() isn't >>specified to be (and thus PQinSend() and thus >>sigpipe_handler_ignore_send()). It's probably okay, but libpq is >>technically using undefined behavior according to SUSv3. > > > Yikes. I never suspected pthread_getspecific() would not be signal safe > because it is already thread safe, but I see the point that it is called > in the current thread. Any ideas how to fix this? > A few idea. When I ran a similar situation in my own code, my approach was to just add a comment to make the assumption explicit. It's quite possible the standard is just overly conservative. Some specific platforms - <http://www.qnx.com/developer/docs/qnx_6.1_docs/neutrino/lib_ref/p/pthread_getspecific.html> - mark it as being async signal-safe. Searching for "pthread_getspecific signal" on google groups turns up a bunch of other people who have run into this same problem. One person notes that it's definitely not safe on LinuxThreads if you use sigaltstack(). If your platform has SA_SIGINFO, you could - in theory - use the ucontext argument to see if that thread is in a PostgreSQL operation. But I doubt that's portable. You could just do a pthread_sigmask() before and after the pthread_setspecific() to guarantee that no SIGPIPE will arrive on that thread in that time. I think it's pretty safe to assume that as long as you're not doing a pthread_[gs]etspecific() on that same pthread_key_t, it's safe. There's one thread function that is guaranteed to be async signal-safe - sem_post(). (Though apparently older LinuxThreads on x86 fails to meet this assumption.) I'm not quite sure what you could do with that, but apparently there's something or they wouldn't have gone to the effort of making it so. Scott
Scott Lamb wrote: > You could just do a pthread_sigmask() before and after the > pthread_setspecific() to guarantee that no SIGPIPE will arrive on that > thread in that time. I think it's pretty safe to assume that as long as > you're not doing a pthread_[gs]etspecific() on that same pthread_key_t, > it's safe. Actually, thinking about this a bit more, that might not even be necessary. Is SIGPIPE-via-(read|write) synchronous or asynchronous? (I.e., is the SIGPIPE guaranteed to arrive during the offending system call?) I was thinking not, but maybe yes. I can't seem to find a straight answer. A lot of documents seem to confuse thread-directed and synchronous, when they're not quite the same thing. SIGALRM-via-alarm() is thread-directed but obviously asynchronous.
Scott Lamb wrote: > You could just do a pthread_sigmask() before and after the > pthread_setspecific() to guarantee that no SIGPIPE will arrive on that > thread in that time. I think it's pretty safe to assume that as long as > you're not doing a pthread_[gs]etspecific() on that same pthread_key_t, > it's safe. I call pthread_setspecific() in the SIGPIPE handler. How sdoes pthread_sigmask() help me at that point? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Scott Lamb wrote: > Scott Lamb wrote: > > You could just do a pthread_sigmask() before and after the > > pthread_setspecific() to guarantee that no SIGPIPE will arrive on that > > thread in that time. I think it's pretty safe to assume that as long as > > you're not doing a pthread_[gs]etspecific() on that same pthread_key_t, > > it's safe. > > Actually, thinking about this a bit more, that might not even be > necessary. Is SIGPIPE-via-(read|write) synchronous or asynchronous? > (I.e., is the SIGPIPE guaranteed to arrive during the offending system > call?) I was thinking not, but maybe yes. I can't seem to find a > straight answer. A lot of documents seem to confuse thread-directed and > synchronous, when they're not quite the same thing. SIGALRM-via-alarm() > is thread-directed but obviously asynchronous. SIGPIPE is a sychronous signal that is called during the read() in libpq. I am not sure what thread-directed is. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
OK, thanks. --------------------------------------------------------------------------- Scott Lamb wrote: > On Jan 30, 2004, at 4:53 PM, Bruce Momjian wrote: > >> Actually, thinking about this a bit more, that might not even be > >> necessary. Is SIGPIPE-via-(read|write) synchronous or asynchronous? > >> (I.e., is the SIGPIPE guaranteed to arrive during the offending system > >> call?) I was thinking not, but maybe yes. I can't seem to find a > >> straight answer. A lot of documents seem to confuse thread-directed > >> and > >> synchronous, when they're not quite the same thing. > >> SIGALRM-via-alarm() > >> is thread-directed but obviously asynchronous. > > > > SIGPIPE is a sychronous signal that is called during the read() in > > libpq. I am not sure what thread-directed is. > > Ahh, then the usage in libpq is safe; sorry for the false alarm. The > concerns about signal safety are really only for async signals, as the > behavior is undefined only when one async signal-unsafe function is > called from a signal interrupting another: > > "In the presence of signals, all functions defined by this volume of > IEEE?Std?1003.1-2001 shall behave as defined when called from or > interrupted by a signal-catching function, with a single exception: > when a signal interrupts an unsafe function and the signal-catching > function calls an unsafe function, the behavior is undefined." > > thread-directed, by the way, simply means that the signal is directed > at a specific thread, not just some thread in the process that doesn't > have it masked. It's the difference between kill() and pthread_kill(). > AFAIK, all synchronous signals are thread-directed, but not all > thread-directed signals are synchronous. > > Here the signal is synchronous, so the signal is guaranteed to happen > at a safe point (during the read()), so there's no problem. > > Thanks, > Scott Lamb > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
On Jan 30, 2004, at 4:53 PM, Bruce Momjian wrote: >> Actually, thinking about this a bit more, that might not even be >> necessary. Is SIGPIPE-via-(read|write) synchronous or asynchronous? >> (I.e., is the SIGPIPE guaranteed to arrive during the offending system >> call?) I was thinking not, but maybe yes. I can't seem to find a >> straight answer. A lot of documents seem to confuse thread-directed >> and >> synchronous, when they're not quite the same thing. >> SIGALRM-via-alarm() >> is thread-directed but obviously asynchronous. > > SIGPIPE is a sychronous signal that is called during the read() in > libpq. I am not sure what thread-directed is. Ahh, then the usage in libpq is safe; sorry for the false alarm. The concerns about signal safety are really only for async signals, as the behavior is undefined only when one async signal-unsafe function is called from a signal interrupting another: "In the presence of signals, all functions defined by this volume of IEEE Std 1003.1-2001 shall behave as defined when called from or interrupted by a signal-catching function, with a single exception: when a signal interrupts an unsafe function and the signal-catching function calls an unsafe function, the behavior is undefined." thread-directed, by the way, simply means that the signal is directed at a specific thread, not just some thread in the process that doesn't have it masked. It's the difference between kill() and pthread_kill(). AFAIK, all synchronous signals are thread-directed, but not all thread-directed signals are synchronous. Here the signal is synchronous, so the signal is guaranteed to happen at a safe point (during the read()), so there's no problem. Thanks, Scott Lamb