Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ] - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
Date
Msg-id CAA4eK1+Nag7=8=y5zpxZ05h9kU4-e9R9v6Krj5V4caLsy3Fzsg@mail.gmail.com
Whole thread Raw
In response to Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ]  (Dilip kumar <dilip.kumar@huawei.com>)
Responses Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ]  (Michael Paquier <michael.paquier@gmail.com>)
Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ]  (Dilip kumar <dilip.kumar@huawei.com>)
Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ]  (Dilip kumar <dilip.kumar@huawei.com>)
List pgsql-hackers

On Mon, Dec 1, 2014 at 12:18 PM, Dilip kumar <dilip.kumar@huawei.com> wrote:
>
> On 24 November 2014 11:29, Amit Kapila Wrote,
>

I have verified that all previous comments are addressed and
the new version is much better than previous version.

>
> here we are setting each target once and doing for all the tables..
>

Hmm, theoretically I think new behaviour could lead to more I/O in
certain cases as compare to existing behaviour.  The reason for more I/O
is that in the new behaviour, while doing Analyze for a particular table at
different targets, in-between it has Analyze of different table as well,
so the pages in shared buffers or OS cache for a particular table needs to
be reloded again for a new target whereas currently it will do all stages
of Analyze for a particular table in one-go which means that each stage
of Analyze could get benefit from the pages of a table loaded by previous
stage.  If you agree, then we should try to avoid this change in new
behaviour.
  
>
> Please provide you opinion.

I have few questions regarding function GetIdleSlot()

+ static int
+ GetIdleSlot(ParallelSlot *pSlot, int max_slot, const char *dbname,
+ const 
char *progname, bool completedb)
{
..
+ /*
+ * Some of the slot are free, Process the results for slots whichever
+ * are free
+ */
+ do
+ {
+ SetCancelConn(pSlot[0].connection);
+ i = select_loop(maxFd, 
&slotset);
+ ResetCancelConn();
+ if (i < 0)
+ {
+ /*
+
* This can only happen if user has sent the cancel request using
+
Ctrl+C, Cancel is handled by 0th slot, so fetch the error result.
+ */
+
GetQueryResult(pSlot[0].connection, dbname, progname,
+   
completedb);
+ return NO_SLOT;
+ }
+ Assert(i != 0);
+
for (i = 0; i < max_slot; i++)
+ {
+ if (!FD_ISSET(pSlot[i].sock, 
&slotset))
+ continue;
+ PQconsumeInput(pSlot[i].connection);
if (PQisBusy(pSlot[i].connection))
+ continue;
+
pSlot[i].isFree = true;
+ if (!GetQueryResult(pSlot[i].connection, dbname, 
progname,
+ completedb))
+
return NO_SLOT;
+ if (firstFree < 0)
+ firstFree = i;
+
}
+ }while(firstFree < 0);
}

I wanted to understand what exactly the above loop is doing.

a.
first of all the comment on top of it says "Some of the slot
are free, ...", if some slot is free, then why do you want
to process the results? (Do you mean to say that *None* of
the slot is free....?)

b.
IIUC, you have called function select_loop(maxFd, &slotset)
to check if socket descriptor is readable, if yes then why
in do..while loop the same maxFd is checked always, don't
you want to check different socket descriptors?  I am not sure
if I am missing something here

c.
After checking the socket descriptor for maxFd why you want
to run run the below for loop for all slots?
for (i = 0; i < max_slot; i++)


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: On partitioning
Next
From: Stephen Frost
Date:
Subject: Re: Parallel Seq Scan