Re: Let's make PostgreSQL multi-threaded - Mailing list pgsql-hackers
From | James Addison |
---|---|
Subject | Re: Let's make PostgreSQL multi-threaded |
Date | |
Msg-id | CALDQ5NwotYMZtXA2z6EkBQ72jVBrfyrn-+a9xU8=w54VBQPOhg@mail.gmail.com Whole thread Raw |
In response to | Re: Let's make PostgreSQL multi-threaded (Konstantin Knizhnik <knizhnik@garret.ru>) |
Responses |
Re: Let's make PostgreSQL multi-threaded
Re: Let's make PostgreSQL multi-threaded |
List | pgsql-hackers |
On Thu, 15 Jun 2023 at 08:12, Konstantin Knizhnik <knizhnik@garret.ru> wrote: > > > > On 15.06.2023 1:23 AM, James Addison wrote: > > On Tue, 13 Jun 2023 at 07:55, Konstantin Knizhnik <knizhnik@garret.ru> wrote: > > > On 12.06.2023 3:23 PM, Pavel Borisov wrote: > > Is the following true or not? > > 1. If we switch processes to threads but leave the amount of session > local variables unchanged, there would be hardly any performance gain. > 2. If we move some backend's local variables into shared memory then > the performance gain would be very near to what we get with threads > having equal amount of session-local variables. > > In other words, the overall goal in principle is to gain from less > memory copying wherever it doesn't add the burden of locks for > concurrent variables access? > > Regards, > Pavel Borisov, > Supabase > > > IMHO both statements are not true. > Switching to threads will cause less context switch overhead (because > all threads are sharing the same memory space and so preserve TLB. > How big will be this advantage? In my prototype I got ~10%. But may be > it is possible to fin workloads when it is larger. > > Hi Konstantin - do you have code/links that you can share for the > prototype and benchmarks used to gather those results? > > > > Sorry, I have already shared the link: > https://github.com/postgrespro/postgresql.pthreads/ Nope, my mistake for not locating the existing link - thank you. Is there a reason that parser-related files (flex/bison) are added as part of the changeset? (I'm trying to narrow it down to only the changes necessary for the functionality. so far it looks mostly fairly minimal, which is good. the adjustments to progname are another thing that look a bit unusual/maybe unnecessary for the feature) > As you can see last commit was 6 years ago when I stopped work on this project. > Why? I already tried to explain it: > - benefits from switching to threads were not so large. May be I just failed to fid proper workload, but is was more orless expected result, > because most of the code was not changed - it uses the same sync primitives, the same local catalog/relation caches,.. > To take all advantage of multithreadig model it is necessary to rewrite many components, especially related with interprocesscommunication. > But maintaining such fork of Postgres and synchronize it with mainstream requires too much efforts and I was not able todo it myself. I get the feeling that there are probably certain query types or patterns where a significant, order-of-magnitude speedup is possible with threads - but yep, I haven't seen those described in detail yet on the mailing list (but as hinted by my not noticing the github link previously, maybe I'm not following the list closely enough). What workloads did you try with your version of the project? > There are three different but related directions of improving current Postgres: > 1. Replacing processes with threads > 2. Builtin connection pooler > 3. Lightweight backends (shared catalog/relation/prepared statements caches) > > The motivation for such changes are also similar: > 1. Increase Postgres scalability > 2. Reduce memory consumption > 3. Make Postgres better fir cloud and serverless requirements > > I am not sure now which one should be addressed first or them can be done together. > > Replacing static variables with thread-local is the first and may be the easiest step. > It requires more or less mechanical changes. More challenging thing is replacing private per-backend data structures > with shared ones (caches, file descriptors,...) Thank you. Personally I think that motivation two (reducing memory consumption) -- as long as it can be done without detrimentally affecting functionality or correctness, and without making the code harder to develop/understand -- could provide benefits for all three of the motivating cases (and, in fact, for non-cloud/serverful use cases too). This is making me wonder about other performance/scalability areas that might not have been considered due to focus on the details of the existing codebase, but I'll save that for another thread and will try to learn more first.
pgsql-hackers by date: