Re: Let's make PostgreSQL multi-threaded - Mailing list pgsql-hackers

From Konstantin Knizhnik
Subject Re: Let's make PostgreSQL multi-threaded
Date
Msg-id 36f61a71-3bbb-b7b0-0d99-db5e69715af7@garret.ru
Whole thread Raw
In response to Re: Let's make PostgreSQL multi-threaded  (James Addison <jay@jp-hosting.net>)
Responses Re: Let's make PostgreSQL multi-threaded
Re: Let's make PostgreSQL multi-threaded
List pgsql-hackers


On 15.06.2023 1:23 AM, James Addison wrote:
On Tue, 13 Jun 2023 at 07:55, Konstantin Knizhnik <knizhnik@garret.ru> wrote:


On 12.06.2023 3:23 PM, Pavel Borisov wrote:
Is the following true or not?

1. If we switch processes to threads but leave the amount of session
local variables unchanged, there would be hardly any performance gain.
2. If we move some backend's local variables into shared memory then
the performance gain would be very near to what we get with threads
having equal amount of session-local variables.

In other words, the overall goal in principle is to gain from less
memory copying wherever it doesn't add the burden of locks for
concurrent variables access?

Regards,
Pavel Borisov,
Supabase


IMHO both statements are not true.
Switching to threads will cause less context switch overhead (because
all threads are sharing the same memory space and so preserve TLB.
How big will be this advantage? In my prototype I got ~10%. But may be
it is possible to fin workloads when it is larger.
Hi Konstantin - do you have code/links that you can share for the
prototype and benchmarks used to gather those results?


Sorry, I have already shared the link:
https://github.com/postgrespro/postgresql.pthreads/

As you can see last commit was 6 years ago when I stopped work on this project.
Why?  I already tried to explain it:
- benefits from switching to threads were not so large. May be I just failed to fid proper workload, but is was more or less expected result,
because most of the code was not changed - it uses the same sync primitives, the same local catalog/relation caches,..
To take all advantage of multithreadig model it is necessary to rewrite many components, especially related with interprocess communication.
But maintaining such fork of Postgres and synchronize it with mainstream requires too much efforts and I was not able to do it myself.

There are three different but related directions of improving current Postgres:
1. Replacing processes with threads
2. Builtin connection pooler
3. Lightweight backends (shared catalog/relation/prepared statements caches)

The motivation for such changes are also similar:
1. Increase Postgres scalability
2. Reduce memory consumption
3. Make Postgres better fir cloud and serverless requirements

I am not sure now which one should be addressed first or them can be done together.

Replacing static variables with thread-local is the first and may be the easiest step.
It requires more or less mechanical changes. More challenging thing is replacing private per-backend data structures
with shared ones (caches, file descriptors,...)

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Fix regression tests to work with REGRESS_OPTS=--no-locale
Next
From: Thomas Munro
Date:
Subject: Re: pg_collation.collversion for C.UTF-8