Re: [HACKERS] Block level parallel vacuum - Mailing list pgsql-hackers

From Dilip Kumar
Subject Re: [HACKERS] Block level parallel vacuum
Date
Msg-id CAFiTN-v2eG89rp7zmUD+=gaRWqV9Jn48KNR1tkAUzJE=7RWx+Q@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Block level parallel vacuum  (Dilip Kumar <dilipbalaut@gmail.com>)
List pgsql-hackers
On Fri, Jan 17, 2020 at 9:36 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Thu, Jan 16, 2020 at 5:34 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Thu, Jan 16, 2020 at 4:46 PM Masahiko Sawada
> > <masahiko.sawada@2ndquadrant.com> wrote:
> > >
> > > Right. Most indexes (all?) of tables that are used in the regression
> > > tests are smaller than min_parallel_index_scan_size. And we set
> > > min_parallel_index_scan_size to 0 in vacuum.sql but VACUUM would not
> > > be speeded-up much because of the relation size. Since we instead
> > > populate new table for parallel vacuum testing the regression test for
> > > vacuum would take a longer time.
> > >
> >
> > Fair enough and I think it is good in a way that it won't change the
> > coverage of existing vacuum code.  I have fixed all the issues
> > reported by Mahendra and have fixed a few other cosmetic things in the
> > attached patch.
> >
> I have few small comments.
>
> 1.
> logical streaming for large in-progress transactions+
> + /* Can't perform vacuum in parallel */
> + if (parallel_workers <= 0)
> + {
> + pfree(can_parallel_vacuum);
> + return lps;
> + }
>
> why are we checking parallel_workers <= 0, Function
> compute_parallel_vacuum_workers only returns 0 or greater than 0
> so isn't it better to just check if (parallel_workers == 0) ?
>
> 2.
> +/*
> + * Macro to check if we are in a parallel vacuum.  If true, we are in the
> + * parallel mode and the DSM segment is initialized.
> + */
> +#define ParallelVacuumIsActive(lps) (((LVParallelState *) (lps)) != NULL)
>
> (LVParallelState *) (lps) -> this typecast is not required, just (lps)
> != NULL should be enough.
>
> 3.
>
> + shared->offset = MAXALIGN(add_size(SizeOfLVShared, BITMAPLEN(nindexes)));
> + prepare_index_statistics(shared, can_parallel_vacuum, nindexes);
> + pg_atomic_init_u32(&(shared->idx), 0);
> + pg_atomic_init_u32(&(shared->cost_balance), 0);
> + pg_atomic_init_u32(&(shared->active_nworkers), 0);
>
> I think it will look cleaner if we can initialize in the order they
> are declared in structure.
>
> 4.
> + VacuumSharedCostBalance = &(lps->lvshared->cost_balance);
> + VacuumActiveNWorkers = &(lps->lvshared->active_nworkers);
> +
> + /*
> + * Set up shared cost balance and the number of active workers for
> + * vacuum delay.
> + */
> + pg_atomic_write_u32(VacuumSharedCostBalance, VacuumCostBalance);
> + pg_atomic_write_u32(VacuumActiveNWorkers, 0);
> +
> + /*
> + * The number of workers can vary between bulkdelete and cleanup
> + * phase.
> + */
> + ReinitializeParallelWorkers(lps->pcxt, nworkers);
> +
> + LaunchParallelWorkers(lps->pcxt);
> +
> + if (lps->pcxt->nworkers_launched > 0)
> + {
> + /*
> + * Reset the local cost values for leader backend as we have
> + * already accumulated the remaining balance of heap.
> + */
> + VacuumCostBalance = 0;
> + VacuumCostBalanceLocal = 0;
> + }
> + else
> + {
> + /*
> + * Disable shared cost balance if we are not able to launch
> + * workers.
> + */
> + VacuumSharedCostBalance = NULL;
> + VacuumActiveNWorkers = NULL;
> + }
> +
>
> I don't like the idea of first initializing the
> VacuumSharedCostBalance with lps->lvshared->cost_balance and then
> uninitialize if nworkers_launched is 0.
> I am not sure why do we need to initialize VacuumSharedCostBalance
> here? just to perform pg_atomic_write_u32(VacuumSharedCostBalance,
> VacuumCostBalance);?
> I think we can initialize it only if nworkers_launched > 0 then we can
> get rid of the else branch completely.

I missed one of my comment

+ /* Carry the shared balance value to heap scan */
+ if (VacuumSharedCostBalance)
+ VacuumCostBalance = pg_atomic_read_u32(VacuumSharedCostBalance);
+
+ if (nworkers > 0)
+ {
+ /* Disable shared cost balance */
+ VacuumSharedCostBalance = NULL;
+ VacuumActiveNWorkers = NULL;
+ }

Doesn't make sense to keep them as two conditions, we can combine them as below

/* If shared costing is enable, carry the shared balance value to heap
scan and disable the shared costing */
 if (VacuumSharedCostBalance)
{
     VacuumCostBalance = pg_atomic_read_u32(VacuumSharedCostBalance);
     VacuumSharedCostBalance = NULL;
     VacuumActiveNWorkers = NULL;
}

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Dilip Kumar
Date:
Subject: Re: [HACKERS] Block level parallel vacuum
Next
From: Michael Paquier
Date:
Subject: Re: Add pg_file_sync() to adminpack