Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ] - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
Date
Msg-id CAB7nPqQMzLNXWDY7R312ma0Ar-0pruofi453=0bFH5E5DOLRRw@mail.gmail.com
Whole thread Raw
In response to Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ]  (Andres Freund <andres@2ndquadrant.com>)
Responses Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
List pgsql-hackers
On Sun, Jan 4, 2015 at 10:57 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2014-12-31 18:35:38 +0530, Amit Kapila wrote:
>> +      <term><option>-j <replaceable class="parameter">jobs</replaceable></option></term>
>> +      <term><option>--jobs=<replaceable class="parameter">njobs</replaceable></option></term>
>> +      <listitem>
>> +       <para>
>> +        Number of concurrent connections to perform the operation.
>> +        This option will enable the vacuum operation to run on asynchronous
>> +        connections, at a time one table will be operated on one connection.
>> +        So at one time as many tables will be vacuumed parallely as number of
>> +        jobs.  If number of jobs given are more than number of tables then
>> +        number of jobs will be set to number of tables.
>
> "asynchronous connections" isn't a very well defined term. Also, the
> second part of that sentence doesn't seem to be gramattically correct.
>
>> +       </para>
>> +       <para>
>> +        <application>vacuumdb</application> will open
>> +        <replaceable class="parameter"> njobs</replaceable> connections to the
>> +        database, so make sure your <xref linkend="guc-max-connections">
>> +        setting is high enough to accommodate all connections.
>> +       </para>
>
> Isn't it njobs+1?
>
>> @@ -141,6 +199,7 @@ main(int argc, char *argv[])
>>               }
>>       }
>>
>> +     optind++;
>
> Hm, where's that coming from?
>
>> +     PQsetnonblocking(connSlot[0].connection, 1);
>> +
>> +     for (i = 1; i < concurrentCons; i++)
>> +     {
>> +             connSlot[i].connection = connectDatabase(dbname, host, port, username,
>> +                                                               prompt_password, progname, false);
>> +
>> +             PQsetnonblocking(connSlot[i].connection, 1);
>> +             connSlot[i].isFree = true;
>> +             connSlot[i].sock = PQsocket(connSlot[i].connection);
>> +     }
>
> Are you sure about this global PQsetnonblocking()? This means that you
> might not be able to send queries... And you don't seem to be waiting
> for sockets waiting for writes in the select loop - which means you
> might end up being stuck waiting for reads when you haven't submitted
> the query.
>
> I think you might need a more complex select() loop. On nonfree
> connections also wait for writes if PQflush() returns != 0.
>
>
>> +/*
>> + * GetIdleSlot
>> + * Process the slot list, if any free slot is available then return
>> + * the slotid else perform the select on all the socket's and wait
>> + * until atleast one slot becomes available.
>> + */
>> +static int
>> +GetIdleSlot(ParallelSlot *pSlot, int max_slot, const char *dbname,
>> +                     const char *progname, bool completedb)
>> +{
>> +     int             i;
>> +     fd_set  slotset;
>
>
> Hm, you probably need to limit -j to FD_SETSIZE - 1 or so.
>
>> +     int     firstFree = -1;
>> +     pgsocket maxFd;
>> +
>> +     for (i = 0; i < max_slot; i++)
>> +             if (pSlot[i].isFree)
>> +                     return i;
>
>> +     FD_ZERO(&slotset);
>> +
>> +     maxFd = pSlot[0].sock;
>> +
>> +     for (i = 0; i < max_slot; i++)
>> +     {
>> +             FD_SET(pSlot[i].sock, &slotset);
>> +             if (pSlot[i].sock > maxFd)
>> +                     maxFd = pSlot[i].sock;
>> +     }
>
> So we're waiting for idle connections?
>
> I think you'll have to have to use two fdsets here, and set the write
> set based on PQflush() != 0.
>
>> +/*
>> + * A select loop that repeats calling select until a descriptor in the read
>> + * set becomes readable. On Windows we have to check for the termination event
>> + * from time to time, on Unix we can just block forever.
>> + */
>
> Should a) mention why we have to check regularly on windows b) that on
> linux we don't have to because we send a cancel event from the signal
> handler.
>
>> +static int
>> +select_loop(int maxFd, fd_set *workerset)
>> +{
>> +     int                     i;
>> +     fd_set          saveSet = *workerset;
>>
>> +#ifdef WIN32
>> +     /* should always be the master */
>
> Hm?
>
>
> I have to say, this is a fairly large patch for such a minor feature...
Andres, this patch needs more effort from the author, right? So
marking it as returned with feedback.
-- 
Michael



pgsql-hackers by date:

Previous
From: Kyotaro HORIGUCHI
Date:
Subject: Re: Async execution of postgres_fdw.
Next
From: Michael Paquier
Date:
Subject: Re: HINTing on UPDATE foo SET foo.bar = ..;