Re: Add index scan progress to pg_stat_progress_vacuum - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Add index scan progress to pg_stat_progress_vacuum
Date
Msg-id CAD21AoAwsaohDMissr_fR_YHp3fzq+eJ4n2=G2Bxji8aMqPAxA@mail.gmail.com
Whole thread Raw
In response to Re: Add index scan progress to pg_stat_progress_vacuum  ("Imseih (AWS), Sami" <simseih@amazon.com>)
Responses Re: Add index scan progress to pg_stat_progress_vacuum  ("Imseih (AWS), Sami" <simseih@amazon.com>)
List pgsql-hackers
Sorry for the late reply.

On Tue, Mar 15, 2022 at 1:20 AM Imseih (AWS), Sami <simseih@amazon.com> wrote:
>
> >    I'm still unsure the current design of 0001 patch is better than other
> >    approaches we’ve discussed. Even users who don't use parallel vacuum
> >    are forced to allocate shared memory for index vacuum progress, with
> >    GetMaxBackends() entries from the beginning. Also, it’s likely to
> >    extend the progress tracking feature for other parallel operations in
> >    the future but I think the current design is not extensible. If we
> >    want to do that, we will end up creating similar things for each of
> >    them or re-creating index vacuum progress tracking feature while
> >    creating a common infra. It might not be a problem as of now but I'm
> >    concerned that introducing a feature that is not extensible and forces
> >    users to allocate additional shmem might be a blocker in the future.
> >    Looking at the precedent example, When we introduce the progress
> >    tracking feature, we implemented it in an extensible way. On the other
> >    hand, others in this thread seem to agree with this approach, so I'd
> >    like to leave it to committers.
>
> Thanks for the review!
>
> I think you make strong arguments as to why we need to take a different approach now than later.
>
> Flaws with current patch set:
>
> 1. GetMaxBackends() is a really heavy-handed overallocation of a shared memory serving a very specific purpose.
> 2. Going with the approach of a vacuum specific hash breaks the design of progress which is meant to be extensible.
> 3. Even if we go with this current approach as an interim solution, it will be a real pain in the future.
>
> With that said, v7 introduces the new infrastructure. 0001 includes the new infrastructure and 0002 takes advantage
ofthis. 
>
> This approach is the following:
>
> 1. Introduces a new API called pgstat_progress_update_param_parallel along with some others support functions. This
newinfrastructure is in backend_progress.c 
>
> 2. There is still a shared memory involved, but the size is capped to " max_worker_processes" which is the max to how
manyparallel workers can be doing work at any given time. The shared memory hash includes a st_progress_param array
justlike the Backend Status array. 

I think that there is a corner case where a parallel operation could
not perform due to the lack of a free shared hash entry, because there
is a window between a parallel worker exiting and the leader
deallocating the hash table entry.

BTW have we discussed another idea I mentioned before that we have the
leader process periodically check the number of completed indexes and
advertise it in its progress information? I'm not sure which one is
better but this idea would require only changes of vacuum code and
probably simpler than the current idea.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Broken make dependency somewhere near win32ver.o?
Next
From: Thomas Munro
Date:
Subject: Re: Why is src/test/modules/committs/t/002_standby.pl flaky?