Re: Parallel heap vacuum - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: Parallel heap vacuum |
Date | |
Msg-id | CAD21AoAHoU_dj0AJcTVRY=zL1jOcnkcihh9Y_Xuw7+EhBT9a5Q@mail.gmail.com Whole thread Raw |
In response to | Re: Parallel heap vacuum (Masahiko Sawada <sawada.mshk@gmail.com>) |
List | pgsql-hackers |
On Mon, Mar 3, 2025 at 3:24 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Mon, Mar 3, 2025 at 1:28 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > On Tue, Feb 25, 2025 at 4:49 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > On Tue, Feb 25, 2025 at 2:44 PM Melanie Plageman > > > <melanieplageman@gmail.com> wrote: > > > > > > > > On Tue, Feb 25, 2025 at 5:14 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > > > > > Given that we have only about one month until the feature freeze, I > > > > > find that it's realistic to introduce either one parallelism for PG18 > > > > > and at least we might want to implement the one first that is more > > > > > beneficial and helpful for users. Since we found that parallel phase > > > > > III is not very efficient in many cases, I'm thinking that in terms of > > > > > PG18 development, we might want to switch focus to parallel phase I, > > > > > and then go for phase III if we have time. > > > > > > > > Okay, well let me know how I can be helpful. Should I be reviewing a > > > > version that is already posted? > > > > > > Thank you so much. I'm going to submit the latest patches in a few > > > days for parallelizing the phase I. I would appreciate it if you could > > > review that version. > > > > > > > I've attached the updated patches that make the phase I (heap > > scanning) parallel. I'll share the benchmark results soon. > > > > I've attached the benchmark test results. > > Overall, with the parallel heap scan (phase I), the vacuum got speedup > much. On the other hand, looking at each phase I can see performance > regressions in some cases: > > First, we can see the regression on a table with one index due to > overhead of the shared TidStore. Currently, we disable parallel index > vacuuming if the table has only one index as the leader process always > takes one index. With this patch, we enable parallel heap scan even if > the parallel index vacuuming is disabled, ending up using the shared > TidStore. In the benchmark test, while the regression due to that > overhead is about ~25% the speedup by parallel heap scan is 50%~, so > the performance number is good overall. I think we can improve the > shared TidStore in the future. > > Another performance regression I can see in the results is that heap > vacuum phase (phase III) got slower with the patch. It's weired to me > since I don't touch the code of heap vacuum phase. I'm still > investigating the cause. > Discussing with Amit offlist, I've run another benchmark test where no data is loaded on the shared buffer. In the previous test, I loaded all table blocks before running vacuum, so it was the best case. The attached test results showed the worst case. Overall, while the numbers seem not stable, the phase I got sped up a bit, but not as scalable as expected, which is not surprising. Please note that the test results shows that the phase III also got sped up but this is because in parallel vacuum we use more ring buffers than the single process vacuum. So we need to compare the only phase I time in terms of the benefit of the parallelism. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Attachment
pgsql-hackers by date: