Thread: pgsql: Allow vacuum command to process indexes in parallel.

pgsql: Allow vacuum command to process indexes in parallel.

From
Amit Kapila
Date:
Allow vacuum command to process indexes in parallel.

This feature allows the vacuum to leverage multiple CPUs in order to
process indexes.  This enables us to perform index vacuuming and index
cleanup with background workers.  This adds a PARALLEL option to VACUUM
command where the user can specify the number of workers that can be used
to perform the command which is limited by the number of indexes on a
table.  Specifying zero as a number of workers will disable parallelism.
This option can't be used with the FULL option.

Each index is processed by at most one vacuum process.  Therefore parallel
vacuum can be used when the table has at least two indexes.

The parallel degree is either specified by the user or determined based on
the number of indexes that the table has, and further limited by
max_parallel_maintenance_workers.  The index can participate in parallel
vacuum iff it's size is greater than min_parallel_index_scan_size.

Author: Masahiko Sawada and Amit Kapila
Reviewed-by: Dilip Kumar, Amit Kapila, Robert Haas, Tomas Vondra,
Mahendra Singh and Sergei Kornilov
Tested-by: Mahendra Singh and Prabhat Sahu
Discussion:
https://postgr.es/m/CAD21AoDTPMgzSkV4E3SFo1CH_x50bf5PqZFQf4jmqjk-C03BWg@mail.gmail.com
https://postgr.es/m/CAA4eK1J-VoR9gzS5E75pcD-OH0mEyCdp8RihcwKrcuw7J-Q0+w@mail.gmail.com

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/40d964ec997f64227bc0ff5e058dc4a5770a70a9

Modified Files
--------------
doc/src/sgml/config.sgml              |   18 +-
doc/src/sgml/ref/vacuum.sgml          |   61 +-
src/backend/access/heap/vacuumlazy.c  | 1256 ++++++++++++++++++++++++++++++---
src/backend/access/transam/parallel.c |   26 +-
src/backend/commands/vacuum.c         |  135 +++-
src/backend/postmaster/autovacuum.c   |    2 +
src/bin/psql/tab-complete.c           |    2 +-
src/include/access/heapam.h           |    3 +
src/include/access/parallel.h         |    4 +-
src/include/commands/vacuum.h         |   12 +
src/test/regress/expected/vacuum.out  |   34 +
src/test/regress/sql/vacuum.sql       |   31 +
src/tools/pgindent/typedefs.list      |    4 +
13 files changed, 1452 insertions(+), 136 deletions(-)


Re: pgsql: Allow vacuum command to process indexes in parallel.

From
Andres Freund
Date:
Hi,

On 2020-01-20 02:33:34 +0000, Amit Kapila wrote:
> Allow vacuum command to process indexes in parallel.
> 
> This feature allows the vacuum to leverage multiple CPUs in order to
> process indexes.  This enables us to perform index vacuuming and index
> cleanup with background workers.  This adds a PARALLEL option to VACUUM
> command where the user can specify the number of workers that can be used
> to perform the command which is limited by the number of indexes on a
> table.  Specifying zero as a number of workers will disable parallelism.
> This option can't be used with the FULL option.
> 
> Each index is processed by at most one vacuum process.  Therefore parallel
> vacuum can be used when the table has at least two indexes.
> 
> The parallel degree is either specified by the user or determined based on
> the number of indexes that the table has, and further limited by
> max_parallel_maintenance_workers.  The index can participate in parallel
> vacuum iff it's size is greater than min_parallel_index_scan_size.
> 
> Author: Masahiko Sawada and Amit Kapila
> Reviewed-by: Dilip Kumar, Amit Kapila, Robert Haas, Tomas Vondra,
> Mahendra Singh and Sergei Kornilov
> Tested-by: Mahendra Singh and Prabhat Sahu
> Discussion:
> https://postgr.es/m/CAD21AoDTPMgzSkV4E3SFo1CH_x50bf5PqZFQf4jmqjk-C03BWg@mail.gmail.com
> https://postgr.es/m/CAA4eK1J-VoR9gzS5E75pcD-OH0mEyCdp8RihcwKrcuw7J-Q0+w@mail.gmail.com

Coverity is complaining that:
> ** CID ...:  Incorrect expression  (UNINTENDED_INTEGER_DIVISION)
> /srv/coverity/git/pgsql-git/postgresql/src/backend/commands/vacuum.c: 2078 in compute_parallel_delay()
> 
> 
> ________________________________________________________________________________________________________
> *** CID ...:  Incorrect expression  (UNINTENDED_INTEGER_DIVISION)
> /srv/coverity/git/pgsql-git/postgresql/src/backend/commands/vacuum.c: 2078 in compute_parallel_delay()
> 2072         shared_balance = pg_atomic_add_fetch_u32(VacuumSharedCostBalance, VacuumCostBalance);
> 2073     
> 2074         /* Compute the total local balance for the current worker */
> 2075         VacuumCostBalanceLocal += VacuumCostBalance;
> 2076     
> 2077         if ((shared_balance >= VacuumCostLimit) &&
> >>>     CID ...:  Incorrect expression  (UNINTENDED_INTEGER_DIVISION)
> >>>     Dividing integer expressions "VacuumCostLimit" and "nworkers", and then converting the integer quotient to
type"double". Any remainder, or fractional part of the quotient, is ignored.
 
> 2078             (VacuumCostBalanceLocal > 0.5 * (VacuumCostLimit / nworkers)))
> 2079         {
> 2080             /* Compute sleep time based on the local cost balance */
> 2081             msec = VacuumCostDelay * VacuumCostBalanceLocal / VacuumCostLimit;
> 2082             pg_atomic_sub_fetch_u32(VacuumSharedCostBalance, VacuumCostBalanceLocal);
> 2083             VacuumCostBalanceLocal = 0;

Which seems like a fair enough complaint?

Greetings,

Andres Freund



Re: pgsql: Allow vacuum command to process indexes in parallel.

From
Amit Kapila
Date:
On Mon, Mar 30, 2020 at 4:18 AM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2020-01-20 02:33:34 +0000, Amit Kapila wrote:
> > Allow vacuum command to process indexes in parallel.
> >
> > This feature allows the vacuum to leverage multiple CPUs in order to
> > process indexes.  This enables us to perform index vacuuming and index
> > cleanup with background workers.  This adds a PARALLEL option to VACUUM
> > command where the user can specify the number of workers that can be used
> > to perform the command which is limited by the number of indexes on a
> > table.  Specifying zero as a number of workers will disable parallelism.
> > This option can't be used with the FULL option.
> >
> > Each index is processed by at most one vacuum process.  Therefore parallel
> > vacuum can be used when the table has at least two indexes.
> >
> > The parallel degree is either specified by the user or determined based on
> > the number of indexes that the table has, and further limited by
> > max_parallel_maintenance_workers.  The index can participate in parallel
> > vacuum iff it's size is greater than min_parallel_index_scan_size.
> >
> > Author: Masahiko Sawada and Amit Kapila
> > Reviewed-by: Dilip Kumar, Amit Kapila, Robert Haas, Tomas Vondra,
> > Mahendra Singh and Sergei Kornilov
> > Tested-by: Mahendra Singh and Prabhat Sahu
> > Discussion:
> > https://postgr.es/m/CAD21AoDTPMgzSkV4E3SFo1CH_x50bf5PqZFQf4jmqjk-C03BWg@mail.gmail.com
> > https://postgr.es/m/CAA4eK1J-VoR9gzS5E75pcD-OH0mEyCdp8RihcwKrcuw7J-Q0+w@mail.gmail.com
>
> Coverity is complaining that:
> > ** CID ...:  Incorrect expression  (UNINTENDED_INTEGER_DIVISION)
> > /srv/coverity/git/pgsql-git/postgresql/src/backend/commands/vacuum.c: 2078 in compute_parallel_delay()
> >
> >
> > ________________________________________________________________________________________________________
> > *** CID ...:  Incorrect expression  (UNINTENDED_INTEGER_DIVISION)
> > /srv/coverity/git/pgsql-git/postgresql/src/backend/commands/vacuum.c: 2078 in compute_parallel_delay()
> > 2072          shared_balance = pg_atomic_add_fetch_u32(VacuumSharedCostBalance, VacuumCostBalance);
> > 2073
> > 2074          /* Compute the total local balance for the current worker */
> > 2075          VacuumCostBalanceLocal += VacuumCostBalance;
> > 2076
> > 2077          if ((shared_balance >= VacuumCostLimit) &&
> > >>>     CID ...:  Incorrect expression  (UNINTENDED_INTEGER_DIVISION)
> > >>>     Dividing integer expressions "VacuumCostLimit" and "nworkers", and then converting the integer quotient to
type"double". Any remainder, or fractional part of the quotient, is ignored.
 
> > 2078                  (VacuumCostBalanceLocal > 0.5 * (VacuumCostLimit / nworkers)))
> > 2079          {
> > 2080                  /* Compute sleep time based on the local cost balance */
> > 2081                  msec = VacuumCostDelay * VacuumCostBalanceLocal / VacuumCostLimit;
> > 2082                  pg_atomic_sub_fetch_u32(VacuumSharedCostBalance, VacuumCostBalanceLocal);
> > 2083                  VacuumCostBalanceLocal = 0;
>
> Which seems like a fair enough complaint?
>

I'll look into it.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: pgsql: Allow vacuum command to process indexes in parallel.

From
Mahendra Singh Thalor
Date:
On Mon, 30 Mar 2020 at 07:39, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Mar 30, 2020 at 4:18 AM Andres Freund <andres@anarazel.de> wrote:
> >
> > Hi,
> >
> > On 2020-01-20 02:33:34 +0000, Amit Kapila wrote:
> > > Allow vacuum command to process indexes in parallel.
> > >
> > > This feature allows the vacuum to leverage multiple CPUs in order to
> > > process indexes.  This enables us to perform index vacuuming and index
> > > cleanup with background workers.  This adds a PARALLEL option to VACUUM
> > > command where the user can specify the number of workers that can be used
> > > to perform the command which is limited by the number of indexes on a
> > > table.  Specifying zero as a number of workers will disable parallelism.
> > > This option can't be used with the FULL option.
> > >
> > > Each index is processed by at most one vacuum process.  Therefore parallel
> > > vacuum can be used when the table has at least two indexes.
> > >
> > > The parallel degree is either specified by the user or determined based on
> > > the number of indexes that the table has, and further limited by
> > > max_parallel_maintenance_workers.  The index can participate in parallel
> > > vacuum iff it's size is greater than min_parallel_index_scan_size.
> > >
> > > Author: Masahiko Sawada and Amit Kapila
> > > Reviewed-by: Dilip Kumar, Amit Kapila, Robert Haas, Tomas Vondra,
> > > Mahendra Singh and Sergei Kornilov
> > > Tested-by: Mahendra Singh and Prabhat Sahu
> > > Discussion:
> > > https://postgr.es/m/CAD21AoDTPMgzSkV4E3SFo1CH_x50bf5PqZFQf4jmqjk-C03BWg@mail.gmail.com
> > > https://postgr.es/m/CAA4eK1J-VoR9gzS5E75pcD-OH0mEyCdp8RihcwKrcuw7J-Q0+w@mail.gmail.com
> >
> > Coverity is complaining that:
> > > ** CID ...:  Incorrect expression  (UNINTENDED_INTEGER_DIVISION)
> > > /srv/coverity/git/pgsql-git/postgresql/src/backend/commands/vacuum.c: 2078 in compute_parallel_delay()
> > >
> > >
> > > ________________________________________________________________________________________________________
> > > *** CID ...:  Incorrect expression  (UNINTENDED_INTEGER_DIVISION)
> > > /srv/coverity/git/pgsql-git/postgresql/src/backend/commands/vacuum.c: 2078 in compute_parallel_delay()
> > > 2072          shared_balance = pg_atomic_add_fetch_u32(VacuumSharedCostBalance, VacuumCostBalance);
> > > 2073
> > > 2074          /* Compute the total local balance for the current worker */
> > > 2075          VacuumCostBalanceLocal += VacuumCostBalance;
> > > 2076
> > > 2077          if ((shared_balance >= VacuumCostLimit) &&
> > > >>>     CID ...:  Incorrect expression  (UNINTENDED_INTEGER_DIVISION)
> > > >>>     Dividing integer expressions "VacuumCostLimit" and "nworkers", and then converting the integer quotient
totype "double". Any remainder, or fractional part of the quotient, is ignored.
 
> > > 2078                  (VacuumCostBalanceLocal > 0.5 * (VacuumCostLimit / nworkers)))
> > > 2079          {
> > > 2080                  /* Compute sleep time based on the local cost balance */
> > > 2081                  msec = VacuumCostDelay * VacuumCostBalanceLocal / VacuumCostLimit;
> > > 2082                  pg_atomic_sub_fetch_u32(VacuumSharedCostBalance, VacuumCostBalanceLocal);
> > > 2083                  VacuumCostBalanceLocal = 0;
> >
> > Which seems like a fair enough complaint?
> >
>
> I'll look into it.
>

Hi,
Attaching patch to fix this but I don't have coverity setup so I
haven't verified fix.

-- 
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: pgsql: Allow vacuum command to process indexes in parallel.

From
Amit Kapila
Date:
On Mon, Mar 30, 2020 at 4:18 AM Andres Freund <andres@anarazel.de> wrote:
>
> > 2076
> > 2077          if ((shared_balance >= VacuumCostLimit) &&
> > >>>     CID ...:  Incorrect expression  (UNINTENDED_INTEGER_DIVISION)
> > >>>     Dividing integer expressions "VacuumCostLimit" and "nworkers", and then converting the integer quotient to
type"double". Any remainder, or fractional part of the quotient, is ignored.
 
> > 2078                  (VacuumCostBalanceLocal > 0.5 * (VacuumCostLimit / nworkers)))
> > 2079          {
> > 2080                  /* Compute sleep time based on the local cost balance */
> > 2081                  msec = VacuumCostDelay * VacuumCostBalanceLocal / VacuumCostLimit;
> > 2082                  pg_atomic_sub_fetch_u32(VacuumSharedCostBalance, VacuumCostBalanceLocal);
> > 2083                  VacuumCostBalanceLocal = 0;
>
> Which seems like a fair enough complaint?
>

Yeah, how can we set up and test a fix for this?  Where can I see these results?


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: pgsql: Allow vacuum command to process indexes in parallel.

From
Mahendra Singh Thalor
Date:
On Mon, 30 Mar 2020 at 09:44, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Mar 30, 2020 at 4:18 AM Andres Freund <andres@anarazel.de> wrote:
> >
> > > 2076
> > > 2077          if ((shared_balance >= VacuumCostLimit) &&
> > > >>>     CID ...:  Incorrect expression  (UNINTENDED_INTEGER_DIVISION)
> > > >>>     Dividing integer expressions "VacuumCostLimit" and "nworkers", and then converting the integer quotient
totype "double". Any remainder, or fractional part of the quotient, is ignored.
 
> > > 2078                  (VacuumCostBalanceLocal > 0.5 * (VacuumCostLimit / nworkers)))
> > > 2079          {
> > > 2080                  /* Compute sleep time based on the local cost balance */
> > > 2081                  msec = VacuumCostDelay * VacuumCostBalanceLocal / VacuumCostLimit;
> > > 2082                  pg_atomic_sub_fetch_u32(VacuumSharedCostBalance, VacuumCostBalanceLocal);
> > > 2083                  VacuumCostBalanceLocal = 0;
> >
> > Which seems like a fair enough complaint?
> >
>
> Yeah, how can we set up and test a fix for this?  Where can I see these results?

I am able to make coverity setup.  I am verifying fix and will post my
results in coming days.

-- 
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com



Re: pgsql: Allow vacuum command to process indexes in parallel.

From
Andres Freund
Date:
Hi,

On March 31, 2020 12:43:47 AM PDT, Mahendra Singh Thalor <mahi6run@gmail.com> wrote:
>On Mon, 30 Mar 2020 at 09:44, Amit Kapila <amit.kapila16@gmail.com>
>wrote:
>>
>> On Mon, Mar 30, 2020 at 4:18 AM Andres Freund <andres@anarazel.de>
>wrote:
>> >
>> > > 2076
>> > > 2077          if ((shared_balance >= VacuumCostLimit) &&
>> > > >>>     CID ...:  Incorrect expression
>(UNINTENDED_INTEGER_DIVISION)
>> > > >>>     Dividing integer expressions "VacuumCostLimit" and
>"nworkers", and then converting the integer quotient to type "double".
>Any remainder, or fractional part of the quotient, is ignored.
>> > > 2078                  (VacuumCostBalanceLocal > 0.5 *
>(VacuumCostLimit / nworkers)))
>> > > 2079          {
>> > > 2080                  /* Compute sleep time based on the local
>cost balance */
>> > > 2081                  msec = VacuumCostDelay *
>VacuumCostBalanceLocal / VacuumCostLimit;
>> > > 2082
>pg_atomic_sub_fetch_u32(VacuumSharedCostBalance,
>VacuumCostBalanceLocal);
>> > > 2083                  VacuumCostBalanceLocal = 0;
>> >
>> > Which seems like a fair enough complaint?
>> >
>>
>> Yeah, how can we set up and test a fix for this?  Where can I see
>these results?
>
>I am able to make coverity setup.  I am verifying fix and will post my
>results in coming days.

That doesn't seem necessary - we should commit a fix. We'll know in a few days for sure. But it's not hard to just
theoreticallylook at the issue in this case? 

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.



Re: pgsql: Allow vacuum command to process indexes in parallel.

From
Amit Kapila
Date:
On Mon, Mar 30, 2020 at 8:30 AM Mahendra Singh Thalor
<mahi6run@gmail.com> wrote:
>
> > > > 2077          if ((shared_balance >= VacuumCostLimit) &&
> > > > >>>     CID ...:  Incorrect expression  (UNINTENDED_INTEGER_DIVISION)
> > > > >>>     Dividing integer expressions "VacuumCostLimit" and "nworkers", and then converting the integer quotient
totype "double". Any remainder, or fractional part of the quotient, is ignored.
 
> > > > 2078                  (VacuumCostBalanceLocal > 0.5 * (VacuumCostLimit / nworkers)))
> > > > 2079          {
> > > > 2080                  /* Compute sleep time based on the local cost balance */
> > > > 2081                  msec = VacuumCostDelay * VacuumCostBalanceLocal / VacuumCostLimit;
> > > > 2082                  pg_atomic_sub_fetch_u32(VacuumSharedCostBalance, VacuumCostBalanceLocal);
> > > > 2083                  VacuumCostBalanceLocal = 0;
> > >
> > > Which seems like a fair enough complaint?
> > >
> >
> > I'll look into it.
> >
>
> Hi,
> Attaching patch to fix this but I don't have coverity setup so I
> haven't verified fix.
>

- (VacuumCostBalanceLocal > 0.5 * (VacuumCostLimit / nworkers)))
+ (VacuumCostBalanceLocal > (int) (0.5 * ((double) VacuumCostLimit /
nworkers))))
  {

I think typecasting to double should be enough to fix this coverity error.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: pgsql: Allow vacuum command to process indexes in parallel.

From
Mahendra Singh Thalor
Date:
On Tue, 31 Mar 2020 at 17:28, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Mar 30, 2020 at 8:30 AM Mahendra Singh Thalor
> <mahi6run@gmail.com> wrote:
> >
> > > > > 2077          if ((shared_balance >= VacuumCostLimit) &&
> > > > > >>>     CID ...:  Incorrect expression  (UNINTENDED_INTEGER_DIVISION)
> > > > > >>>     Dividing integer expressions "VacuumCostLimit" and "nworkers", and then converting the integer
quotientto type "double". Any remainder, or fractional part of the quotient, is ignored.
 
> > > > > 2078                  (VacuumCostBalanceLocal > 0.5 * (VacuumCostLimit / nworkers)))
> > > > > 2079          {
> > > > > 2080                  /* Compute sleep time based on the local cost balance */
> > > > > 2081                  msec = VacuumCostDelay * VacuumCostBalanceLocal / VacuumCostLimit;
> > > > > 2082                  pg_atomic_sub_fetch_u32(VacuumSharedCostBalance, VacuumCostBalanceLocal);
> > > > > 2083                  VacuumCostBalanceLocal = 0;
> > > >
> > > > Which seems like a fair enough complaint?
> > > >
> > >
> > > I'll look into it.
> > >
> >
> > Hi,
> > Attaching patch to fix this but I don't have coverity setup so I
> > haven't verified fix.
> >
>
> - (VacuumCostBalanceLocal > 0.5 * (VacuumCostLimit / nworkers)))
> + (VacuumCostBalanceLocal > (int) (0.5 * ((double) VacuumCostLimit /
> nworkers))))
>   {
>
> I think typecasting to double should be enough to fix this coverity error.

I also think same so attaching updated patch.

-- 
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: pgsql: Allow vacuum command to process indexes in parallel.

From
Amit Kapila
Date:
On Wed, Apr 1, 2020 at 8:39 AM Mahendra Singh Thalor <mahi6run@gmail.com> wrote:
>
> On Tue, 31 Mar 2020 at 17:28, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Mon, Mar 30, 2020 at 8:30 AM Mahendra Singh Thalor
> > <mahi6run@gmail.com> wrote:
> > >
> > > > > > 2077          if ((shared_balance >= VacuumCostLimit) &&
> > > > > > >>>     CID ...:  Incorrect expression  (UNINTENDED_INTEGER_DIVISION)
> > > > > > >>>     Dividing integer expressions "VacuumCostLimit" and "nworkers", and then converting the integer
quotientto type "double". Any remainder, or fractional part of the quotient, is ignored.
 
> > > > > > 2078                  (VacuumCostBalanceLocal > 0.5 * (VacuumCostLimit / nworkers)))
> > > > > > 2079          {
> > > > > > 2080                  /* Compute sleep time based on the local cost balance */
> > > > > > 2081                  msec = VacuumCostDelay * VacuumCostBalanceLocal / VacuumCostLimit;
> > > > > > 2082                  pg_atomic_sub_fetch_u32(VacuumSharedCostBalance, VacuumCostBalanceLocal);
> > > > > > 2083                  VacuumCostBalanceLocal = 0;
> > > > >
> > > > > Which seems like a fair enough complaint?
> > > > >
> > > >
> > > > I'll look into it.
> > > >
> > >
> > > Hi,
> > > Attaching patch to fix this but I don't have coverity setup so I
> > > haven't verified fix.
> > >
> >
> > - (VacuumCostBalanceLocal > 0.5 * (VacuumCostLimit / nworkers)))
> > + (VacuumCostBalanceLocal > (int) (0.5 * ((double) VacuumCostLimit /
> > nworkers))))
> >   {
> >
> > I think typecasting to double should be enough to fix this coverity error.
>
> I also think same so attaching updated patch.
>

Thanks, I have pushed this patch yesterday.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com