On Fri, Nov 5, 2021 at 6:25 AM Peter Geoghegan <pg@bowt.ie> wrote:
>
> On Thu, Nov 4, 2021 at 12:42 PM Peter Geoghegan <pg@bowt.ie> wrote:
> > Since "The leader process alone processes all parallel-safe indexes in
> > the case where no workers are launched" (no change there), I wonder:
> > how does the leader *know* that it's the leader (and so can always
> > process any indexes) inside its call to
> > lazy_parallel_process_indexes()? Or, does the leader actually process
> > all indexes inside lazy_serial_process_indexes() in this edge case?
> > (The edge case where a parallel VACUUM has no workers at all, because
> > they couldn't be launched by the core parallel infrastructure.)
>
> I think that I might see a related problem. But I'm not sure, so I'll just ask:
>
> > + /* Set data required for parallel index vacuum or cleanup */
> > + prepare_parallel_index_processing(vacrel, vacuum);
> > +
> > + /* Reset the parallel index processing counter */
> > + pg_atomic_write_u32(&(lps->lvshared->idx), 0);
> > +
> > /* Setup the shared cost-based vacuum delay and launch workers */
> > if (nworkers > 0)
> > {
> > + /* Reinitialize the parallel context to relaunch parallel workers */
> > if (vacrel->num_index_scans > 0)
> > - {
> > - /* Reset the parallel index processing counter */
> > - pg_atomic_write_u32(&(lps->lvshared->idx), 0);
> > -
> > - /* Reinitialize the parallel context to relaunch parallel workers */
> > ReinitializeParallelDSM(lps->pcxt);
> > - }
>
> Is it okay that we don't call ReinitializeParallelDSM() just because
> "nworkers == 0" this time around? I notice that there is a wait for
> "nworkers_launched" workers to finish parallel processing, at the top
> of ReinitializeParallelDSM(). I can see why the
> "vacrel->num_index_scans > 0" test is okay, but I can't see why the
> "nworkers == 0" test is okay.
>
> I just want to be sure that we're not somehow relying on seeing state
> in shared memory (in the LVSharedIndStats array) in all cases, but
> finding that it is not actually there in certain rare edge cases.
> Maybe this didn't matter before, because the leader didn't expect to
> find this information in shared memory in any case. But that is
> changed by your patch, of course, so it's something to be concerned
> about.
If we launch workers (i.g., nworkers > 0), we wait for these workers
to finish after processing all indexes (see we call
WaitForParallelWorkersToFinish() after lazy_parallel_process_indexes).
So it's guaranteed that all workers finished at the end
ofparallel_lazy_vacuum_or_cleanup_all_indexes(). So even in the
second call to this function, we don't need to wait for
"nworkers_launched" workers who previously were running to finish.
Does it make sense?
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/