Re: One process per session lack of sharing - Mailing list pgsql-hackers

From Robert Haas
Subject Re: One process per session lack of sharing
Date
Msg-id CA+TgmoZVqXATOGEKFdnG_sMugx_iT8_B4L0OKJZeowHburMkiQ@mail.gmail.com
Whole thread Raw
In response to Re: One process per session lack of sharing  (Craig Ringer <craig@2ndquadrant.com>)
Responses Re: One process per session lack of sharing
Re: One process per session lack of sharing
List pgsql-hackers
On Fri, Jul 15, 2016 at 4:28 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
> I don't think anyone's considering moving from multi-processing to
> multi-threading in PostgreSQL. I really, really like the protection that the
> shared-nothing-by-default process model gives us, among other things.

We get some very important protection by having the postmaster in a
separate address space from the user processes, but separating the
other backends from each other has no value.  If one of the backends
dies, we take provisions to make sure they all die, which is little or
no different from what would happen if we had the postmaster as one
process and all of the other backends as threads within a second
process.  As far as I can see, running each and every backend in a
separate process has downsides but no upsides.  It slows down the
system and makes it difficult to share data between processes without
much in the way of benefits.

> I'm personally not absolutely opposed to threading, but you'll find it hard
> to convince anyone it's worth the huge work required to ensure that
> everything in PostgreSQL is done thread-safely, adapt all our logic to
> handle thread IDs where we use process IDs, etc. It'd be a massive amount of
> work for no practical gain for most users, and a huge reliability loss in
> the short to medium term as we ironed out all the bugs.

It would actually be pretty simple to allow PostgreSQL to be compiled
to use either processes or threads, provided that you don't mind using
something like GCC's __thread keyword.  When compiling with threads,
slap __thread on every global variable we have (using some kind of
macro trick, no doubt), spawn threads instead of processes wherever
you like, and I think you're more or less done.  There could be some
problems with third-party libraries we use, but I bet there probably
wouldn't be all that many problems.  Of course, there's not
necessarily a whole lot of benefit to such a minimal transformation,
but you could certainly do useful things on top of it.  For example,
the parallel query code could arrange to pass pointers to existing
data structures in some cases instead of copying those data structures
as we do currently.  Spinning up a new thread and giving it pointers
to some of the old thread's data structures is probably a lot faster
than spinning up a new process and serializing and deserializing those
data structures, so you wouldn't necessarily have to do all that much
work before the "thread model" compile started to have noticeable
advantages over the "process model" compile.  You could pass tuples
around directly rather than by copying them, too.  A lot of things
that we might want to do in this area would expose us to the risk of
server-lifespan memory leaks, and we'd need to spend time and energy
figuring out how to minimize those risks, but enough other people have
written complex, long-running multithreaded programs that I think it
is probably possible to do so without unduly compromising reliability.

> Where I agreed with you, and where I think Robert sounded like he was
> agreeing, was that our current design where we have one executor per user
> sessions and can't suspend/resume sessions is problematic.

The problems are very closely related.  The problem with suspending
and resuming sessions is that you need to keep all of the session's
global variable contents (except for any caches that are safe to
rebuild) until the session is resumed; and we have no way of
discovering all of the global variables a process is using and no
general mechanism that can be used to serialize and deserialize them.
The problem with using threads is that code which uses global
variables will not be thread-safe unless all of those variables are
thread-local.  Getting our hands around the uncontrolled use of global
variables - doubtless at the risk of breaking third-party code - seems
crucial.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Jim Nasby
Date:
Subject: Re: A Modest Upgrade Proposal
Next
From: Jim Nasby
Date:
Subject: Re: A Modest Upgrade Proposal