Re: Let's make PostgreSQL multi-threaded - Mailing list pgsql-hackers

From James Addison
Subject Re: Let's make PostgreSQL multi-threaded
Date
Msg-id CALDQ5NwotYMZtXA2z6EkBQ72jVBrfyrn-+a9xU8=w54VBQPOhg@mail.gmail.com
Whole thread Raw
In response to Re: Let's make PostgreSQL multi-threaded  (Konstantin Knizhnik <knizhnik@garret.ru>)
Responses Re: Let's make PostgreSQL multi-threaded
Re: Let's make PostgreSQL multi-threaded
List pgsql-hackers
On Thu, 15 Jun 2023 at 08:12, Konstantin Knizhnik <knizhnik@garret.ru> wrote:
>
>
>
> On 15.06.2023 1:23 AM, James Addison wrote:
>
> On Tue, 13 Jun 2023 at 07:55, Konstantin Knizhnik <knizhnik@garret.ru> wrote:
>
>
> On 12.06.2023 3:23 PM, Pavel Borisov wrote:
>
> Is the following true or not?
>
> 1. If we switch processes to threads but leave the amount of session
> local variables unchanged, there would be hardly any performance gain.
> 2. If we move some backend's local variables into shared memory then
> the performance gain would be very near to what we get with threads
> having equal amount of session-local variables.
>
> In other words, the overall goal in principle is to gain from less
> memory copying wherever it doesn't add the burden of locks for
> concurrent variables access?
>
> Regards,
> Pavel Borisov,
> Supabase
>
>
> IMHO both statements are not true.
> Switching to threads will cause less context switch overhead (because
> all threads are sharing the same memory space and so preserve TLB.
> How big will be this advantage? In my prototype I got ~10%. But may be
> it is possible to fin workloads when it is larger.
>
> Hi Konstantin - do you have code/links that you can share for the
> prototype and benchmarks used to gather those results?
>
>
>
> Sorry, I have already shared the link:
> https://github.com/postgrespro/postgresql.pthreads/

Nope, my mistake for not locating the existing link - thank you.

Is there a reason that parser-related files (flex/bison) are added as
part of the changeset?  (I'm trying to narrow it down to only the
changes necessary for the functionality.  so far it looks mostly
fairly minimal, which is good.  the adjustments to progname are
another thing that look a bit unusual/maybe unnecessary for the
feature)

> As you can see last commit was 6 years ago when I stopped work on this project.
> Why?  I already tried to explain it:
> - benefits from switching to threads were not so large. May be I just failed to fid proper workload, but is was more
orless expected result,
 
> because most of the code was not changed - it uses the same sync primitives, the same local catalog/relation
caches,..
> To take all advantage of multithreadig model it is necessary to rewrite many components, especially related with
interprocesscommunication.
 
> But maintaining such fork of Postgres and synchronize it with mainstream requires too much efforts and I was not able
todo it myself.
 

I get the feeling that there are probably certain query types or
patterns where a significant, order-of-magnitude speedup is possible
with threads - but yep, I haven't seen those described in detail yet
on the mailing list (but as hinted by my not noticing the github link
previously, maybe I'm not following the list closely enough).

What workloads did you try with your version of the project?

> There are three different but related directions of improving current Postgres:
> 1. Replacing processes with threads
> 2. Builtin connection pooler
> 3. Lightweight backends (shared catalog/relation/prepared statements caches)
>
> The motivation for such changes are also similar:
> 1. Increase Postgres scalability
> 2. Reduce memory consumption
> 3. Make Postgres better fir cloud and serverless requirements
>
> I am not sure now which one should be addressed first or them can be done together.
>
> Replacing static variables with thread-local is the first and may be the easiest step.
> It requires more or less mechanical changes. More challenging thing is replacing private per-backend data structures
> with shared ones (caches, file descriptors,...)

Thank you.  Personally I think that motivation two (reducing memory
consumption) -- as long as it can be done without detrimentally
affecting functionality or correctness, and without making the code
harder to develop/understand -- could provide benefits for all three
of the motivating cases (and, in fact, for non-cloud/serverful use
cases too).

This is making me wonder about other performance/scalability areas
that might not have been considered due to focus on the details of the
existing codebase, but I'll save that for another thread and will try
to learn more first.



pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: subscription/033_run_as_table_owner is not listed in the meson.build
Next
From: Hannu Krosing
Date:
Subject: Re: Let's make PostgreSQL multi-threaded