Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ] - Mailing list pgsql-hackers

From Dilip kumar
Subject Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
Date
Msg-id 4205E661176A124FAF891E0A6BA9135266380268@szxeml509-mbs.china.huawei.com
Whole thread Raw
In response to Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ]  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ]  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers

On 24 November 2014 11:29, Amit Kapila Wrote,

 

>I think still some of the comments given upthread are not handled:

>a.  About cancel handling

 

Your Actual comment was à

 

>One other related point is that I think still cancel handling mechanism

>is not completely right, code is doing that when there are not enough

>number of freeslots, but otherwise it won't handle the cancel request,

>basically I am referring to below part of code:

 

run_parallel_vacuum()
{
..
for (cell = tables->head; cell; cell = cell->next)
{
..

free_slot = GetIdleSlot(connSlot, max_slot, dbname, progname,
completedb);

PQsendQuery(slotconn, sql.data);

resetPQExpBuffer(&sql);
}

 

 

1.      I think only if connection is going for Slot wait, it will be in blocking, or while GetQueryResult, so we have Handle SetCancleRequest both places.

2.      Now a case (as you mentioned), when there are enough slots, and and above for loop is running if user do Ctrl+C then this will not break, This I have handled by checking inAbort

Mode inside the for loop before sending the new command, I think this we cannot do by setting the SetCancel because only when query receive some data it will realize that it canceled and it will fail, but until connection is not going to receive data it will not see the failure. So I have handled inAbort directly.



 

>b.  Setting of inAbort flag for case where PQCancel is successful

 

Your Actual comment was à

 

>Yeah, user has tried to terminate, however utility will emit the
>message: "Could not send cancel request" in such a case and still

>silently tries to cancel and disconnect all connections.

 

You are right, I have fixed the code, now in case of failure we need not to set inAbort Flag..

 

>c.  Performance data of --analyze-in-stages switch

 

 

 

Performance Data

------------------------------

CPU 8 cores
RAM = 16GB
checkpoint_segments=256

Before each test, run the test.sql (attached)

 

Un-patched -

dilip@linux-ltr9:/home/dilip/9.4/install/bin> time ./vacuumdb  -p 9005  --analyze-in-stages  -d postgres

Generating minimal optimizer statistics (1 target)

Generating medium optimizer statistics (10 targets)

Generating default (full) optimizer statistics

 

real    0m0.843s

user    0m0.000s

sys     0m0.000s

 

Patched

dilip@linux-ltr9:/home/dilip/9.4/install/bin> time ./vacuumdb  -p 9005  --analyze-in-stages  -j 2 -d postgres

Generating minimal optimizer statistics (1 target)

Generating medium optimizer statistics (10 targets)

Generating default (full) optimizer statistics

 

real    0m0.593s

user    0m0.004s

sys     0m0.004s

 

dilip@linux-ltr9:/home/dilip/9.4/install/bin> time ./vacuumdb  -p 9005  --analyze-in-stages  -j 4 -d postgres

Generating minimal optimizer statistics (1 target)

Generating medium optimizer statistics (10 targets)

Generating default (full) optimizer statistics

 

real    0m0.421s

user    0m0.004s

sys     0m0.004s

 

I think in 2 connections we can get 30% improvement.

 

>d.  Have one pass over the comments in patch.  I could still some

>wrong multiline comments.  Refer below:

>+  /* Otherwise, we got a stage from vacuum_all_databases(), so run

>+   * only that one. */

 

Checked all, and fixed..

 

While testing, I found one more different behavior compare to base code,

 

Base Code:

dilip@linux-ltr9:/home/dilip/9.4/install/bin> time ./vacuumdb  -p 9005 -t "t1" -t "t2" -t "t3" -t "t4" --analyze-in-stages  -d Postgres

 

Generating minimal optimizer statistics (1 target)

Generating medium optimizer statistics (10 targets)

Generating default (full) optimizer statistics

Generating minimal optimizer statistics (1 target)

Generating medium optimizer statistics (10 targets)

Generating default (full) optimizer statistics

Generating minimal optimizer statistics (1 target)

Generating medium optimizer statistics (10 targets)

Generating default (full) optimizer statistics

Generating minimal optimizer statistics (1 target)

Generating medium optimizer statistics (10 targets)

Generating default (full) optimizer statistics

 

real    0m0.605s

user    0m0.004s

sys     0m0.000s

 

I think it should be like,

SET default_statistics_target=1; do for all three tables

SET default_statistics_target=10; do for all three tables so on..

 

With Patched

dilip@linux-ltr9:/home/dilip/9.4/install/bin> time ./vacuumdb  -p 9005 -t "t1" -t "t2" -t "t3" -t "t4" --analyze-in-stages -j 2 -d postgres

Generating minimal optimizer statistics (1 target)

Generating medium optimizer statistics (10 targets)

Generating default (full) optimizer statistics

 

real    0m0.395s

user    0m0.000s

sys     0m0.004s

 

here we are setting each target once and doing for all the tables..

 

Please provide you opinion.

 

 

Regards,

Dilip Kumar

 

 

 

Attachment

pgsql-hackers by date:

Previous
From: Jim Nasby
Date:
Subject: Re: Proposal: Log inability to lock pages during vacuum
Next
From: Craig Ringer
Date:
Subject: [PROTOCOL TODO] Permit streaming of unknown-length lob/clob (bytea,text,etc)