Re: Let's make PostgreSQL multi-threaded - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Let's make PostgreSQL multi-threaded
Date
Msg-id 20230607220919.pilzcbqnp2rwfbl4@awork3.anarazel.de
Whole thread Raw
In response to Re: Let's make PostgreSQL multi-threaded  (Greg Stark <stark@mit.edu>)
Responses Re: Let's make PostgreSQL multi-threaded
Re: Let's make PostgreSQL multi-threaded
List pgsql-hackers
Hi,

On 2023-06-06 16:14:41 -0400, Greg Stark wrote:
> I think of processes and threads as fundamentally the same things,
> just a slightly different API -- namely that in one memory is by
> default unshared and needs to be explicitly shared and in the other
> it's default shared and needs to be explicitly unshared.

In theory that's true, in practice it's entirely wrong.

For one, the amount of complexity you need to deal with to share state across
processes, post fork, is *substantial*.  You can share file descriptors across
processes, but it's extremely platform dependant, requires cooperation between
both processes etc.  You can share memory allocations made after the processes
forked, but you're typically not going to be able to guarantee they're at the
same pointer values. Etc.

But more importantly, there's crucial performance differences between threads
and processes. Having the same memory mapping between threads makes allows the
hardware to share the TLB (on x86 via process context identifiers), which
isn't realistically possible with different processes.


> However all else is not equal. The discussion in the hallway turned to
> whether we could just use pthread primitives like mutexes and
> condition variables instead of our own locks -- and the point was
> raised that those libraries assume these objects will be in threads of
> one process not shared across completely different processes.

Independent of threads vs processes, I am -many on using pthread mutexes and
condition variables. From experiments, that *looses* performance, and we loose
a lot of control and increase cross-platform behavioural differences.  I also
don't see any benefit in going in that direction.


> And that's probably not the only library we're stuck reimplementing
> because of this. So the question is are these things worth taking the
> risk of having data structures shared implicitly and having unclear
> ownership rules?
> 
> I was going to say supporting both modes relieves that fear since it
> would force that extra discipline and allow testing under the more
> restrictive rule. However I don't think that will actually work. As
> long as we support both modes we lose all the advantages of threads.

I don't think that has to be true. We could e.g. eventually decide that we
don't support parallel query without threading support - which would allow us
to get rid of a very significant amount of code and runtime overhead.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Noah Misch
Date:
Subject: Re: v16 fails to build w/ Visual Studio 2015
Next
From: Peter Eisentraut
Date:
Subject: Re: Order changes in PG16 since ICU introduction