Re: One process per session lack of sharing - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: One process per session lack of sharing |
Date | |
Msg-id | CA+TgmoZVqXATOGEKFdnG_sMugx_iT8_B4L0OKJZeowHburMkiQ@mail.gmail.com Whole thread Raw |
In response to | Re: One process per session lack of sharing (Craig Ringer <craig@2ndquadrant.com>) |
Responses |
Re: One process per session lack of sharing
Re: One process per session lack of sharing |
List | pgsql-hackers |
On Fri, Jul 15, 2016 at 4:28 AM, Craig Ringer <craig@2ndquadrant.com> wrote: > I don't think anyone's considering moving from multi-processing to > multi-threading in PostgreSQL. I really, really like the protection that the > shared-nothing-by-default process model gives us, among other things. We get some very important protection by having the postmaster in a separate address space from the user processes, but separating the other backends from each other has no value. If one of the backends dies, we take provisions to make sure they all die, which is little or no different from what would happen if we had the postmaster as one process and all of the other backends as threads within a second process. As far as I can see, running each and every backend in a separate process has downsides but no upsides. It slows down the system and makes it difficult to share data between processes without much in the way of benefits. > I'm personally not absolutely opposed to threading, but you'll find it hard > to convince anyone it's worth the huge work required to ensure that > everything in PostgreSQL is done thread-safely, adapt all our logic to > handle thread IDs where we use process IDs, etc. It'd be a massive amount of > work for no practical gain for most users, and a huge reliability loss in > the short to medium term as we ironed out all the bugs. It would actually be pretty simple to allow PostgreSQL to be compiled to use either processes or threads, provided that you don't mind using something like GCC's __thread keyword. When compiling with threads, slap __thread on every global variable we have (using some kind of macro trick, no doubt), spawn threads instead of processes wherever you like, and I think you're more or less done. There could be some problems with third-party libraries we use, but I bet there probably wouldn't be all that many problems. Of course, there's not necessarily a whole lot of benefit to such a minimal transformation, but you could certainly do useful things on top of it. For example, the parallel query code could arrange to pass pointers to existing data structures in some cases instead of copying those data structures as we do currently. Spinning up a new thread and giving it pointers to some of the old thread's data structures is probably a lot faster than spinning up a new process and serializing and deserializing those data structures, so you wouldn't necessarily have to do all that much work before the "thread model" compile started to have noticeable advantages over the "process model" compile. You could pass tuples around directly rather than by copying them, too. A lot of things that we might want to do in this area would expose us to the risk of server-lifespan memory leaks, and we'd need to spend time and energy figuring out how to minimize those risks, but enough other people have written complex, long-running multithreaded programs that I think it is probably possible to do so without unduly compromising reliability. > Where I agreed with you, and where I think Robert sounded like he was > agreeing, was that our current design where we have one executor per user > sessions and can't suspend/resume sessions is problematic. The problems are very closely related. The problem with suspending and resuming sessions is that you need to keep all of the session's global variable contents (except for any caches that are safe to rebuild) until the session is resumed; and we have no way of discovering all of the global variables a process is using and no general mechanism that can be used to serialize and deserialize them. The problem with using threads is that code which uses global variables will not be thread-safe unless all of those variables are thread-local. Getting our hands around the uncontrolled use of global variables - doubtless at the risk of breaking third-party code - seems crucial. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: