On Tue, Mar 11, 2025 at 6:00 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Mar 10, 2025 at 11:57 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Sun, Mar 9, 2025 at 11:12 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > >
> > > > However, in the heap vacuum phase, the leader process needed
> > > > to process all blocks, resulting in soft page faults while creating
> > > > Page Table Entries (PTEs). Without the patch, the backend process had
> > > > already created PTEs during the heap scan, thus preventing these
> > > > faults from occurring during the heap vacuum phase.
> > > >
> > >
> > > This part is again not clear to me because I am assuming all the data
> > > exists in shared buffers before the vacuum, so why the page faults
> > > will occur in the first place.
> >
> > IIUC PTEs are process-local data. So even if physical pages are loaded
> > to PostgreSQL's shared buffer (and paga caches), soft page faults (or
> > minor page faults)[1] can occur if these pages are not yet mapped in
> > its page table.
> >
>
> Okay, I got your point. BTW, I noticed that even for the case where
> all the data is in shared_buffers, the performance improvement for
> workers greater than two does decrease marginally. Am I reading the
> data correctly? If so, what is the theory, and do we have
> recommendations for a parallel degree?
The decrease you referred to is that the total vacuum execution time?
When it comes to the execution time of phase 1, it seems we have good
scalability. For example, with 2 workers (i.e.3 workers working
including the leader in total) it got about 3x speed up, and with 4
workers it got about 5x speed up. Regarding other phases, the phase 3
got slower probably because of PTEs stuff, but I don't investigate why
the phase 2 also slightly got slower with more than 2 workers.
In the current patch, the parallel degree for phase 1 is chosen based
on the table size, which is almost the same as the calculation of the
degree for parallel seq scan. But thinking further, we might want to
account for the number of all-visible pages and all-frozen pages here
so that we can avoid launching many workers for
mostly-frozen-big-tables.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com