Re: Add index scan progress to pg_stat_progress_vacuum - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Add index scan progress to pg_stat_progress_vacuum
Date
Msg-id CAD21AoBW6SMJ96CNoMeu+f_BR4jmatPcfVA016FdD2hkLDsaTA@mail.gmail.com
Whole thread Raw
In response to Re: Add index scan progress to pg_stat_progress_vacuum  ("Imseih (AWS), Sami" <simseih@amazon.com>)
Responses Re: Add index scan progress to pg_stat_progress_vacuum
List pgsql-hackers
On Wed, Mar 9, 2022 at 12:41 AM Imseih (AWS), Sami <simseih@amazon.com> wrote:
>
> +    +/*
> +    + * vacuum_worker_init --- initialize this module's shared memory hash
> +    + * to track the progress of a vacuum worker
> +    + */
> +    +void
> +    +vacuum_worker_init(void)
> +    +{
> +    +       HASHCTL     info;
> +    +       long        max_table_size = GetMaxBackends();
> +    +
> +    +       VacuumWorkerProgressHash = NULL;
> +    +
> +    +       info.keysize = sizeof(pid_t);
> +    +       info.entrysize = sizeof(VacProgressEntry);
> +    +
> +    +       VacuumWorkerProgressHash = ShmemInitHash("Vacuum Progress Hash",
> +    +
> +                      max_table_size,
> +    +
> +                      max_table_size,
> +    +
> +                      &info,
> +    +
> +                      HASH_ELEM | HASH_BLOBS);
> +    +}
>
> +    It seems to me that creating a shmem hash with max_table_size entries
> +    for parallel vacuum process tracking is too much. IIRC an old patch
> +    had parallel vacuum workers advertise its progress and changed the
> +    pg_stat_progress_vacuum view so that it aggregates the results
> +    including workers' stats. I think it’s better than the current one.
> +    Why did you change that?
>
> +    Regards,
>
> I was trying to avoid a shared memory to track completed indexes, but aggregating stats does not work with parallel
vacuums.This is because a parallel worker will exit before the vacuum completes causing the aggregated total to be
wrong.
>
> For example
>
> Leader_pid advertises it completed 2 indexes
> Parallel worker advertises it completed 2 indexes
>
> When aggregating we see 4 indexes completed.
>
> After the parallel worker exits, the aggregation will show only 2 indexes completed.

Indeed.

It might have already been discussed but other than using a new shmem
hash for parallel vacuum, I wonder if we can allow workers to change
the leader’s progress information. It would break the assumption that
the backend status entry is modified by its own backend, though. But
it might help for progress updates of other parallel operations too.
This essentially does the same thing as what the current patch does
but it doesn't require a new shmem hash.

Another idea I come up with is that the parallel vacuum leader checks
PVIndStats.status and updates how many indexes are processed to its
progress information. The leader can check it and update the progress
information before and after index vacuuming. And possibly we can add
a callback to the main loop of index AM's bulkdelete and vacuumcleanup
so that the leader can periodically make it up-to-date.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Naming of the different stats systems / "stats collector"
Next
From: Andres Freund
Date:
Subject: Re: Allow async standbys wait for sync replication