Thread: Threading in BGWorkers (!)
Hi hackers,
On IRC RhodiumToad offered the following advice (after a standard there be dragons here disclaimer, as well as noting this may not be a complete list):
Threading (may) be safe if:
- all signals will be delivered on the main thread and nowhere else
- no postgres function will ever be called from anything that's not the main thread
- it's safe for postgres to call any system library function, even ones explicitly marked as not thread safe
- it's safe for postgres to call sigprocmask()
I can live with 1. and 2 - they are fairly easy as long as you know the rules.
3. needs to be converted to a list of possible calls - which can be done and checked, I suppose against the POSIX standards?
4. is not fine (I suppose this is a specific example of 3.), as I think Postgres would need to be using pthread_sigmask() instead - given looks like a one line change could pthread_sigmask be used when available?
Are there any other rules which need to be adhered to?
Any thoughts, comments, dire warnings, hand waving? On IRC the general thought was that any changes could be seen as encouraging threading which is a bad thing - I would argue if you're writing BGWorkers which have SPI access you've already got a pretty large area to screw things up in anyway (if you aren't following the standards / code comments).
James
The contents of this email are confidential and may be subject to legal or professional privilege and copyright. No representation is made that this email is free of viruses or other defects. If you have received this communication in error, you may not copy or distribute any part of it or otherwise disclose its contents to anyone. Please advise the sender of your incorrect receipt of this correspondence.
James Sewell <james.sewell@jirotech.com> writes: > I was talking about PostgreSQL and threading on IRC the other day - which I > know is a frowned upon topic - and just wanted to frame the same questions > here and hopefully get a discussion going. I think the short answer about threading in bgworkers (or any other backend process) is "we don't support it; if you try it and it breaks, which it likely will, you get to keep both pieces". I'm not sure that there's any merit in making small dents in that policy. I suspect that at some point, somebody will try to move those goalposts a long way, but it will be a large and controversial patch. Why do you want threads in a bgworker anyway? You could spawn multiple bgworkers, or you could dispatch the threaded work to a non-Postgres-ish process as PL/Java does. The only advantage I can see of doing work in a process that's not at arm's-length is to have access to PG computational or IPC facilities, and none of that is likely to work safely in a threaded context. regards, tom lane
> dispatch the threaded work to a non-Postgres-ish process
I’m no expert here but all your solid points about threading with Postgres notwithstanding....
I think there’s some issues around interrupt handling and general syscalls that doesn’t otherwise play nice with “non-Postgres-ish” *threads* when Postgres is still the main thread.
This is all purely hypothetical, but it seems that Postgres’ use of sigprocmask can cause problems with threads that are otherwise 100% “disconnected” from Postgres.
How can we start a dialog about this kind of situation? Nobody here is trying to make Postgres thread-safe, maybe only thread-friendly.
I think Mr. Sewell, has a better handle around these topics. But he ain’t the only one interested.
eric
On Mon, Jun 22, 2020 at 9:38 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
James Sewell <james.sewell@jirotech.com> writes:
> I was talking about PostgreSQL and threading on IRC the other day - which I
> know is a frowned upon topic - and just wanted to frame the same questions
> here and hopefully get a discussion going.
I think the short answer about threading in bgworkers (or any other
backend process) is "we don't support it; if you try it and it breaks,
which it likely will, you get to keep both pieces". I'm not sure that
there's any merit in making small dents in that policy. I suspect that
at some point, somebody will try to move those goalposts a long way,
but it will be a large and controversial patch.
Why do you want threads in a bgworker anyway? You could spawn multiple
bgworkers, or you could dispatch the threaded work to a non-Postgres-ish
process as PL/Java does. The only advantage I can see of doing work in a
process that's not at arm's-length is to have access to PG computational
or IPC facilities, and none of that is likely to work safely in a threaded
context.
regards, tom lane
On Tue, 23 Jun 2020 at 13:38, Tom Lane <tgl@sss.pgh.pa.us> wrote:
James Sewell <james.sewell@jirotech.com> writes:
> I was talking about PostgreSQL and threading on IRC the other day - which I
> know is a frowned upon topic - and just wanted to frame the same questions
> here and hopefully get a discussion going.
I think the short answer about threading in bgworkers (or any other
backend process) is "we don't support it; if you try it and it breaks,
which it likely will, you get to keep both pieces".
I'm hoping that from this a set of rules rather than a blanket ban can be agreed upon.
I'm not sure that
there's any merit in making small dents in that policy. I suspect that
at some point, somebody will try to move those goalposts a long way,
but it will be a large and controversial patch.
Understood, and I do agree with keeping the policy simple - but it looks like (potentially) the only blocker is a one line change to swap out sigprocmask. From my perspective this is a very large win - I'll do some testing.
Why do you want threads in a bgworker anyway? You could spawn multiple
bgworkers, or you could dispatch the threaded work to a non-Postgres-ish
process as PL/Java does. The only advantage I can see of doing work in a
process that's not at arm's-length is to have access to PG computational
or IPC facilities, and none of that is likely to work safely in a threaded
context.
I'm writing the workers in Rust - it would be nice to be able to safely access Rust crates which make use of threads.
cheers,
James
The contents of this email are confidential and may be subject to legal or professional privilege and copyright. No representation is made that this email is free of viruses or other defects. If you have received this communication in error, you may not copy or distribute any part of it or otherwise disclose its contents to anyone. Please advise the sender of your incorrect receipt of this correspondence.
On 23.06.2020 06:38, Tom Lane wrote: > James Sewell <james.sewell@jirotech.com> writes: >> I was talking about PostgreSQL and threading on IRC the other day - which I >> know is a frowned upon topic - and just wanted to frame the same questions >> here and hopefully get a discussion going. > I think the short answer about threading in bgworkers (or any other > backend process) is "we don't support it; if you try it and it breaks, > which it likely will, you get to keep both pieces". I'm not sure that > there's any merit in making small dents in that policy. I suspect that > at some point, somebody will try to move those goalposts a long way, > but it will be a large and controversial patch. > > Why do you want threads in a bgworker anyway? You could spawn multiple > bgworkers, or you could dispatch the threaded work to a non-Postgres-ish > process as PL/Java does. The only advantage I can see of doing work in a > process that's not at arm's-length is to have access to PG computational > or IPC facilities, and none of that is likely to work safely in a threaded > context. Just an example of using threads in bgworker: right now I am working on FDW for RocksDB. RocksDB LSM implementation shows quite promising performance advantages comparing with classical Postgres B-Tree (almost no degrade of insert speed with increasing number of records). RocksDB is multithreded database. May be it is possible to port it to multiprocess model but it will be very non trivial task. And even if such fork of RocksDB will be created, somebody have to permanently back patch changes from main trunk. Alternative solution is to launch multithreded RocksDB worker and let backends redirect requests to this worker. Client-server architecture inside server:) Using multithreading in bgworker is possible if you do not use any Postgres runtime inside thread procedures or do it in exclusive critical section. It is not so convenient but possible. The most difficult thing from my point of view is error reporting.
Using multithreading in bgworker is possible if you do not use any
Postgres runtime inside thread procedures or do it in exclusive critical
section.
It is not so convenient but possible. The most difficult thing from my
point of view is error reporting.
Happy to be proved wrong, but I don't think this is correct.
PostgreSQL can call sigprocmask() in your BGWorker whenever it wants, and "The use of sigprocmask() is unspecified in a multithreaded process" [1]
The contents of this email are confidential and may be subject to legal or professional privilege and copyright. No representation is made that this email is free of viruses or other defects. If you have received this communication in error, you may not copy or distribute any part of it or otherwise disclose its contents to anyone. Please advise the sender of your incorrect receipt of this correspondence.
On Tue, 23 Jun 2020 at 17:15, James Sewell <james.sewell@jirotech.com> wrote:
Using multithreading in bgworker is possible if you do not use any
Postgres runtime inside thread procedures or do it in exclusive critical
section.
It is not so convenient but possible. The most difficult thing from my
point of view is error reporting.Happy to be proved wrong, but I don't think this is correct.PostgreSQL can call sigprocmask() in your BGWorker whenever it wants, and "The use of sigprocmask() is unspecified in a multithreaded process" [1]
Sorry link should be [1] https://pubs.opengroup.org/onlinepubs/9699919799/functions/sigprocmask.html
The contents of this email are confidential and may be subject to legal or professional privilege and copyright. No representation is made that this email is free of viruses or other defects. If you have received this communication in error, you may not copy or distribute any part of it or otherwise disclose its contents to anyone. Please advise the sender of your incorrect receipt of this correspondence.
On 23.06.2020 10:15, James Sewell wrote:
Using multithreading in bgworker is possible if you do not use any
Postgres runtime inside thread procedures or do it in exclusive critical
section.
It is not so convenient but possible. The most difficult thing from my
point of view is error reporting.Happy to be proved wrong, but I don't think this is correct.
PostgreSQL can call sigprocmask() in your BGWorker whenever it wants, and "The use of sigprocmask() is unspecified in a multithreaded process" [1]
Sorry, may be I missed something.
But in my bgworker I am not using Postgres runtime at all (except initial bgworker startup code).
So I am not using latches (which are based on signals), snapshots,...
In my case bgworker has no connection to Postgres at all.
Yes, it can still receives signals from Postmaster (SIGTERM, SIGHUP). But their handler are trivial and do not need to mask any signals.
So may be in general case combination of signals and threads may cause some problems,
but it doesn't mean that you can't create multithreaded bgworker.
On Tue, 23 Jun 2020 at 17:26, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:
On 23.06.2020 10:15, James Sewell wrote:Using multithreading in bgworker is possible if you do not use any
Postgres runtime inside thread procedures or do it in exclusive critical
section.
It is not so convenient but possible. The most difficult thing from my
point of view is error reporting.Happy to be proved wrong, but I don't think this is correct.PostgreSQL can call sigprocmask() in your BGWorker whenever it wants, and "The use of sigprocmask() is unspecified in a multithreaded process" [1]
Sorry, may be I missed something.
But in my bgworker I am not using Postgres runtime at all (except initial bgworker startup code).
So I am not using latches (which are based on signals), snapshots,...
In my case bgworker has no connection to Postgres at all.
Yes, it can still receives signals from Postmaster (SIGTERM, SIGHUP). But their handler are trivial and do not need to mask any signals.
So may be in general case combination of signals and threads may cause some problems,
but it doesn't mean that you can't create multithreaded bgworker.
Ah yes - sorry *I* missed something.
A multi threaded BGWorker which accesses shared memory and database via SPI.
The contents of this email are confidential and may be subject to legal or professional privilege and copyright. No representation is made that this email is free of viruses or other defects. If you have received this communication in error, you may not copy or distribute any part of it or otherwise disclose its contents to anyone. Please advise the sender of your incorrect receipt of this correspondence.
On 06/22/20 23:38, Tom Lane wrote: > bgworkers, or you could dispatch the threaded work to a non-Postgres-ish > process as PL/Java does. The only advantage I can see of doing work in a > process that's not at arm's-length is to have access to PG computational > or IPC facilities, and none of that is likely to work safely in a threaded > context. You might be thinking of Dave Cramer's PL/JVM, which runs a JVM in another process and does IPC to it. PL/Java, by contrast, runs the JVM in-process and keeps both pieces. It only lets one thread downcall into PostgreSQL. On 06/23/20 00:46, Eric Ridge wrote: > How can we start a dialog about this kind of situation? Nobody here is > trying to make Postgres thread-safe, maybe only thread-friendly. There are just a couple of things I've been wanting to suggest, based on PL/Java experience. 1) It would be nice to be able to ereport from an arbitrary thread. There is already support in core to forward messages from parallel workers: the worker signals the lead process after adding a message to a shm_mq referenced from its ParallelWorkerInfo struct. The signal handler asynchronously sets ParallelMessagePending, which ProcessInterrupts will check at some convenient point and ereport the message. It seems like it would be no sweat for another thread in the same process to add something to an mq (could be the same structure as shm_mq but would not need to really be in shared memory) and do a volatile write of ParallelMessagePending. The rest is already there. Only missing ingredient would be a way for an extension to allocate something like a ParallelWorkerInfo struct good for the life of the backend (the current parallel worker infrastructure makes them all go away at the completion of a parallel query). 2) It would be nice to be able to request service. If J Random thread in PL/Java generates a bit of work requiring some PostgreSQL API, at present that bit of work has to queue up until the next time a call into PL/Java is occasioned by a query, which might be never. It would be nice to be able to also asynchronously set some flag like ExtensionServiceRequested, which could be checked as part of CHECK_FOR_INTERRUPTS or even at more limited times, such as idle. An extension could populate an ExtensionServiceInfo struct with a service entry point an flag indicating that extension has work pending. Those are the two thread-friendlier ideas I have been thinking of. Regards, -Chap
Hi, On 2020-06-23 09:19:36 -0400, Chapman Flack wrote: > 1) It would be nice to be able to ereport from an arbitrary thread. There > is already support in core to forward messages from parallel workers: > the worker signals the lead process after adding a message to a shm_mq > referenced from its ParallelWorkerInfo struct. The signal handler > asynchronously sets ParallelMessagePending, which ProcessInterrupts > will check at some convenient point and ereport the message. > > It seems like it would be no sweat for another thread in the same > process to add something to an mq (could be the same structure as > shm_mq but would not need to really be in shared memory) and do a > volatile write of ParallelMessagePending. The rest is already there. > Only missing ingredient would be a way for an extension to allocate > something like a ParallelWorkerInfo struct good for the life of the > backend (the current parallel worker infrastructure makes them all > go away at the completion of a parallel query). I think that's way harder than what you make it sound here. The locking for shm_mq doesn't really work inside a process. In contrast to the single threaded case something like a volatile write to ParallelMessagePending doesn't guarantee much, because there's no guaranteed memory ordering between threads. And more. I'm very doubtful this is a good direction to go in. Kinda maybe somewhat partially converting tiny parts of the backend code into threadsafe code will leave us with some baroque code. Greetings, Andres Freund
On 06/23/20 21:44, Andres Freund wrote: > I think that's way harder than what you make it sound here. The locking > for shm_mq doesn't really work inside a process. In contrast to the > single threaded case something like a volatile write to > ParallelMessagePending doesn't guarantee much, because there's no > guaranteed memory ordering between threads. And more. It occurred to me after I sent the message this morning that my suggestion (2) could subsume (1). And requires nothing more than a single volatile write of a boolean, and getting called back at a convenient time on the single main thread. So perhaps I shouldn't have suggested (1) at all - just muddies the waters. Regards, -Chap
Hi, On 2020-06-23 21:50:26 -0400, Chapman Flack wrote: > On 06/23/20 21:44, Andres Freund wrote: > > > I think that's way harder than what you make it sound here. The locking > > for shm_mq doesn't really work inside a process. In contrast to the > > single threaded case something like a volatile write to > > ParallelMessagePending doesn't guarantee much, because there's no > > guaranteed memory ordering between threads. And more. > > It occurred to me after I sent the message this morning that my suggestion > (2) could subsume (1). And requires nothing more than a single volatile > write of a boolean, and getting called back at a convenient time on the > single main thread. A single volatile write wouldn't guarantee you much in the presence of multiple threads. You could very well end up with a concurrent CHECK_FOR_INTERRUPTS() in the main thread unsetting InterruptPending, but not yet seeing / processing ParallelMessagePending. Nor would it wake up the main process if it's currently waiting on a latch. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2020-06-23 09:19:36 -0400, Chapman Flack wrote: >> 1) It would be nice to be able to ereport from an arbitrary thread. > I think that's way harder than what you make it sound here. Indeed. Just for starters: 1. elog.c is in itself not-thread-safe, because it uses a static data structure to track the message being built. 2. It uses palloc, another large pile of not-thread-safe infrastructure. 3. What exactly would the semantics of elog(ERROR) be? You can't make it something other than "abort the transaction" without mind-boggling levels of breakage. But how are you going to enforce a transaction abort across multiple threads? What if one of the other threads reports an independent error concurrently, or worse tries to COMMIT concurrently? So that's already two fundamental, and non-trivial, subsystems that have to be made fully thread-safe before you can get anything off the ground; plus basic architectural issues to be settled. I imagine that somebody will take a run at this at some point, but the idea that it's an easy problem to bite off seems nonsensical. I'm not sure whether the other idea >> It would be nice to be able to also asynchronously set some flag >> like ExtensionServiceRequested, which could be checked as part of >> CHECK_FOR_INTERRUPTS or even at more limited times, such as idle. is much easier. In the barest terms, we already have things like that (such as NOTIFY interrupts), so it doesn't sound hard at first. The problem is to figure out whether action X that you wish to do is safe to do at CHECK_FOR_INTERRUPTS call site Y. The answer is certainly not always "yes", but how would we build an infrastructure for deciding? (NOTIFY largely punts on this, by decreeing that it won't do anything till we reach an idle state. That's surely not adequate for a lot of likely actions X.) regards, tom lane
On 06/23/20 22:13, Tom Lane wrote: > 1. elog.c is in itself not-thread-safe, because it uses a static data > structure to track the message being built. > > 2. It uses palloc, another large pile of not-thread-safe infrastructure. ... I'm sure now that I shouldn't have mentioned (1) - muddied waters. The idea in my head had been to make what the PG code sees as close to the parallel- message case as possible: "here is an error structure that your elog code did not build, in a region of memory your palloc code did not manage." But leave that aside, because a way to request a service callback would clearly allow the regular elog and the regular palloc to do their regular thing on the regular thread, and be the more general and desirable idea anyway. > I'm not sure whether the other idea > >>> It would be nice to be able to also asynchronously set some flag >>> like ExtensionServiceRequested, which could be checked as part of >>> CHECK_FOR_INTERRUPTS or even at more limited times, such as idle. > > is much easier. In the barest terms, we already have things like that > (such as NOTIFY interrupts), so it doesn't sound hard at first. The > problem is to figure out whether action X that you wish to do is safe > to do at CHECK_FOR_INTERRUPTS call site Y. The answer is certainly not > always "yes", but how would we build an infrastructure for deciding? > (NOTIFY largely punts on this, by decreeing that it won't do anything > till we reach an idle state. That's surely not adequate for a lot > of likely actions X.) I think it could be adequate for a lot of them. (I even said "more limited times, such as idle" up there.) In PL/Java's case, there clearly aren't people running code now that functionally depends on this ability, because it wouldn't work. Even if the JVM uses multiple threads to accomplish something, if it is something the Java function result depends on, it obviously has to happen before the function returns, while PL/Java has the main thread and can just serialize the work onto it. The likeliest cases where something might want to happen after the function has returned are resource releases, which can sometimes be discovered by the garbage collector a little after the fact, and if the Java resource that's being collected is the dual of some palloc'd or reference-counted PostgreSQL object, it would be nice to not have to enqueue that cleanup for the next time some query calls into PL/Java. Even an "only in an idle state" rule would be an improvement over "who knows when and maybe never". Regards, -Chap
Chapman Flack <chap@anastigmatix.net> writes: > On 06/23/20 22:13, Tom Lane wrote: >> I'm not sure whether the other idea >>> It would be nice to be able to also asynchronously set some flag >>> like ExtensionServiceRequested, which could be checked as part of >>> CHECK_FOR_INTERRUPTS or even at more limited times, such as idle. >> is much easier. In the barest terms, we already have things like that >> (such as NOTIFY interrupts), so it doesn't sound hard at first. The >> problem is to figure out whether action X that you wish to do is safe >> to do at CHECK_FOR_INTERRUPTS call site Y. The answer is certainly not >> always "yes", but how would we build an infrastructure for deciding? > I think it could be adequate for a lot of them. I dunno. It's not even adequate for the use-case of reporting an error, because waiting till after the current transaction commits is surely not what should happen in that case. It happens to be okay for NOTIFY, because that's reporting an outside event that did occur regardless of the local transaction's success ... but really, how many use-cases does that argument apply to? I'm not trying to be completely negative here, but I think these issues are a lot harder than they might seem at first glance. regards, tom lane
On 06/23/20 23:08, Tom Lane wrote: > I dunno. It's not even adequate for the use-case of reporting an > error, because waiting till after the current transaction commits > is surely not what should happen in that case. In the case of the kind of exuberantly-threaded language runtime of which Java's an example, most of the threads running at any given time are doing somewhat obscure things that the language runtime knows about but might not be directly relevant to whether your current transaction commits. (The garbage collector thread was my earlier example because it routinely discovers reclaimable things, which can have implications for resources in PG but usually not for whether a commit should succeed.) If you're going to write a function and explicitly use threads in your computation, those are threads you're going to manage, catch exceptions in, and forward those back to the main thread to be ereported at the appropriate time. In other cases where some thread you're but dimly aware of has encountered some problem, generally what happens now is a default message and stacktrace get directly written to the backend's stderr and you don't otherwise find out anything happened. If something doesn't work later because that thread got wedged, then you piece together the story. If the logging_collector is running then at least the stuff written to stderr ends up in the log anyway, though without any log prefix info added. If the collector isn't running, then the messages went somewhere else, maybe the systemd journal, maybe the floor. Regards, -Chap
Hackers,
The patch replaces sigprocmask with pthread_sigmask. They have identical APIs ("[pthread_sigmask] shall be equivalent to sigprocmask(), without the restriction that the call be made in a single-threaded process"[1])
The rationale here is that as far as I can tell this is the *only* blocker to using multithreaded code in a BGWorker which can't be avoided by adhering to strict code rules (eg: no PG calls from non-main threads, no interaction with signals from non-main threads).
Before this went in the rules would need to be agreed upon and documented - but hopefully it's at least a way forward / a way to progress this discussion.
Cheers,
James
The contents of this email are confidential and may be subject to legal or professional privilege and copyright. No representation is made that this email is free of viruses or other defects. If you have received this communication in error, you may not copy or distribute any part of it or otherwise disclose its contents to anyone. Please advise the sender of your incorrect receipt of this correspondence.
Attachment
On Thu, Jul 2, 2020 at 6:39 PM James Sewell <james.sewell@jirotech.com> wrote: > The patch replaces sigprocmask with pthread_sigmask. They have identical APIs ("[pthread_sigmask] shall be equivalent tosigprocmask(), without the restriction that the call be made in a single-threaded process"[1]) -#define PG_SETMASK(mask) sigprocmask(SIG_SETMASK, mask, NULL) +#define PG_SETMASK(mask) pthread_sigmask(SIG_SETMASK, mask, NULL) So you're assuming that <signal.h> declares pthread_sigmask(). I was trying to understand what POSIX's "The functionality described is optional" means; could there be <signal.h> headers without the declaration? I mean, I know the practical answer: we support all the remaining Unixes, you can count them on two hands, and they all have pthreads, so it doesn't matter, and like Dr Stonebraker said, the plan is "converting POSTGRES to use lightweight processes available in the operating systems we are using. These include PRESTO for the Sequent Symmetry and threads in Version 4 of Sun/OS." so we'll actually *require* that stuff eventually anyway. One practical problem with this change is that some systems have a stub definition of pthread_sigmask() that does nothing, when you don't link against the threading library. Realistically, most *useful* builds of PostgreSQL bring it in indirectly (via SSL, LDAP, blah blah), but it so happens that a bare bones build and make check on this here FreeBSD box hangs in CHECK DATABASE waiting for the checkpointer to signal it. I can fix that by putting -lthr into LDFLAGS, so we'd probably have to figure out how to do that on our supported systems. > The rationale here is that as far as I can tell this is the *only* blocker to using multithreaded code in a BGWorker whichcan't be avoided by adhering to strict code rules (eg: no PG calls from non-main threads, no interaction with signalsfrom non-main threads). I guess you'd have to mask at least all the signals we care about before every call to pthread_create() and trust that the threads never unmask them. I guess you could interpose a checker function to abort if something tries to break the programming rule.
Thomas Munro <thomas.munro@gmail.com> writes: > On Thu, Jul 2, 2020 at 6:39 PM James Sewell <james.sewell@jirotech.com> wrote: >> The patch replaces sigprocmask with pthread_sigmask. They have identical APIs ("[pthread_sigmask] shall be equivalentto sigprocmask(), without the restriction that the call be made in a single-threaded process"[1]) > -#define PG_SETMASK(mask) sigprocmask(SIG_SETMASK, mask, NULL) > +#define PG_SETMASK(mask) pthread_sigmask(SIG_SETMASK, mask, NULL) > So you're assuming that <signal.h> declares pthread_sigmask(). If we were going to accept this patch, I'd say it should be conditional on a configure test for pthread_sigmask being present. We could allow that to require an additional library, or not. >> The rationale here is that as far as I can tell this is the *only* blocker to using multithreaded code in a BGWorker whichcan't be avoided by adhering to strict code rules (eg: no PG calls from non-main threads, no interaction with signalsfrom non-main threads). TBH, though, I do not buy this argument for a millisecond. I don't think that anything is going to come out of multithreading a bgworker but blood and tears. Perhaps someday we'll make a major push to make the backend code (somewhat(?)) thread safe ... but I'm not on board with making one-line-at-a-time changes in hopes of getting partway there. We need some kind of concrete plan for what is a usable amount of functionality and what has to be done to get it. regards, tom lane
We need some kind of concrete plan for what is a
usable amount of functionality and what has to be done to get it.
This is exactly the type of discussion I'm after.
I'll start.
A usable amount of functionality would be the ability to start threads from:
- a background worker
These cases should be bound by *at least* the following rules:
- no signal handling from threads
- no calls into PostgreSQL functions from threads
The patch I supplied is one of the requirements to get there - I would love help to discover the others.
The contents of this email are confidential and may be subject to legal or professional privilege and copyright. No representation is made that this email is free of viruses or other defects. If you have received this communication in error, you may not copy or distribute any part of it or otherwise disclose its contents to anyone. Please advise the sender of your incorrect receipt of this correspondence.
Hi, On 2020-07-29 13:41:02 +1200, Thomas Munro wrote: > One practical problem with this change is that some systems have a > stub definition of pthread_sigmask() that does nothing, when you don't > link against the threading library. Realistically, most *useful* > builds of PostgreSQL bring it in indirectly (via SSL, LDAP, blah > blah), but it so happens that a bare bones build and make check on > this here FreeBSD box hangs in CHECK DATABASE waiting for the > checkpointer to signal it. I can fix that by putting -lthr into > LDFLAGS, so we'd probably have to figure out how to do that on our > supported systems. Couldn't this be driven by --disable-thread-safety? Greetings, Andres Freund
Hi, On 2020-07-28 21:52:20 -0400, Tom Lane wrote: > >> The rationale here is that as far as I can tell this is the *only* blocker to using multithreaded code in a BGWorkerwhich can't be avoided by adhering to strict code rules (eg: no PG calls from non-main threads, no interaction withsignals from non-main threads). > > TBH, though, I do not buy this argument for a millisecond. I don't > think that anything is going to come out of multithreading a bgworker > but blood and tears. Perhaps someday we'll make a major push to > make the backend code (somewhat(?)) thread safe ... but I'm not on > board with making one-line-at-a-time changes in hopes of getting > partway there. We need some kind of concrete plan for what is a > usable amount of functionality and what has to be done to get it. Why not? Our answer to threading inside functions has been, for quite a while, that it's kinda ok if the threads never call into postgres and can never escape the lifetime of a function. But that's not actually really safe due to the signal handler issue. Whether it's a normal backend or a bgworker doesn't really play a role here, no? Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2020-07-28 21:52:20 -0400, Tom Lane wrote: >> TBH, though, I do not buy this argument for a millisecond. I don't >> think that anything is going to come out of multithreading a bgworker >> but blood and tears. Perhaps someday we'll make a major push to >> make the backend code (somewhat(?)) thread safe ... but I'm not on >> board with making one-line-at-a-time changes in hopes of getting >> partway there. We need some kind of concrete plan for what is a >> usable amount of functionality and what has to be done to get it. > Why not? Our answer to threading inside functions has been, for quite a > while, that it's kinda ok if the threads never call into postgres and > can never escape the lifetime of a function. But that's not actually > really safe due to the signal handler issue. In other words, it's *not* safe and never has been. I see no good reason to believe that the signal handler issue is the only one. Even if it is, not being able to call any postgres infrastructure is a pretty huge handicap. So I stand by the position that we need an actual plan here, not just chipping away at one-liner things that might or might not improve matters noticeably. regards, tom lane
On Thu, Jul 30, 2020 at 2:55 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Why not? Our answer to threading inside functions has been, for quite a > > while, that it's kinda ok if the threads never call into postgres and > > can never escape the lifetime of a function. But that's not actually > > really safe due to the signal handler issue. > > In other words, it's *not* safe and never has been. I see no good reason > to believe that the signal handler issue is the only one. Even if it is, > not being able to call any postgres infrastructure is a pretty huge > handicap. I find this line of argument really unfair. It's true that there might be issues other than the signal handler one, but so what? That is not a principled argument against fixing the signal handler part of the problem, which is the only *known* problem with the use case Andres describes. It is also true that it would be more useful to enable additional use cases rather than just this one, but that is not a principled argument against enabling this one. My only present concern about the proposal actually in front of us -- that is to say, use pthread_sigmask() rather than sigprocmask() -- is Thomas's observation that on his system doing so breaks the world. That seems to be quite a serious problem. If we are deciding whether to use one function or another some purpose and they are equally good for the core code but one is better for people who want to play around with extensions, we may as well use the one that's better for that purpose. We need not give such experimentation our unqualified endorsement; we need only decide against obstructing it unnecessarily. But when such a substitution risks breaking things that work today, the calculus gets a lot more complicated. Unless we can find a way to avoid that risk, I don't think this is a good trade-off. But more broadly I think it is well past time that we look into making the backend more thread-friendly. The fact that "it's *not* safe and never has been" has not prevented people from doing it. We don't end up with people going "oh, well sigprocmask uh oh so I better give up now." What we end up with is people just going right ahead and doing it, probably without even thinking about sigprocmask, and ending up with low-probability failure conditions, which seems likely to hurt PostgreSQL's reputation for reliability with no compensating benefit. Or alternatively they hack core, which sucks, or they switch to some non-PG database, which also sucks. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
I see no good reason to believe that the signal handler issue is the only one. Even if it is,
not being able to call any postgres infrastructure is a pretty huge
handicap.
(changed emails to get rid of the nasty employer notice...)
It's at least a workable handicap that I'm happy to deal with.
Let's help them to avoid unsafe code now, not wait until they show up on this list with a critical failure and tap at the big sign that says "NO THREADING".
- james