Re: Parallel Append implementation - Mailing list pgsql-hackers

From Amit Khandekar
Subject Re: Parallel Append implementation
Date
Msg-id CAJ3gD9cERtAjRGwwdurSndr+071FsbDo3SN91V8vaoxTxs5wAw@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Parallel Append implementation  (Amit Khandekar <amitdkhan.pg@gmail.com>)
Responses Re: Parallel Append implementation  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On 24 March 2017 at 00:38, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
> On 23 March 2017 at 16:26, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
>> On 23 March 2017 at 05:55, Robert Haas <robertmhaas@gmail.com> wrote:
>>>
>>> So in your example we do this:
>>>
>>> C[0] += 20;
>>> C[1] += 16;
>>> C[2] += 10;
>>> /* C[2] is smaller than C[0] or C[1] at this point, so we add the next
>>> path to C[2] */
>>> C[2] += 8;
>>> /* after the previous line, C[1] is now the smallest, so add to that
>>> entry next */
>>> C[1] += 3;
>>> /* now we've got C[0] = 20, C[1] = 19, C[2] = 18, so add to C[2] */
>>> C[2] += 1;
>>> /* final result: C[0] = 20, C[1] = 19, C[2] = 19 */
>>>
>>> Now we just take the highest entry that appears in any array, which in
>>> this case is C[0], as the total cost.
>>
>> Wow. The way your final result exactly tallies with my algorithm
>> result is very interesting. This looks like some maths or computer
>> science theory that I am not aware.
>>
>> I am currently coding the algorithm using your method.
>

> While I was coding this, I was considering if Path->rows also should
> be calculated similar to total cost for non-partial subpath and total
> cost for partial subpaths. I think for rows, we can just take
> total_rows divided by workers for non-partial paths, and this
> approximation should suffice. It looks odd that it be treated with the
> same algorithm we chose for total cost for non-partial paths.

Attached is the patch v12, where Path->rows calculation of non-partial
paths is kept separate from the way total cost is done for non-partial
costs. rows for non-partial paths is calculated as total_rows divided
by workers as approximation. And then rows for partial paths are just
added one by one.

>
> Meanwhile, attached is a WIP patch v10. The only change in this patch
> w.r.t. the last patch (v9) is that this one has a new function defined
> append_nonpartial_cost(). Just sending this to show how the algorithm
> looks like; haven't yet called it.

Now append_nonpartial_cost() is used, and it is tested.

-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Attachment

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: exposing wait events for non-backends (was: Trackingwait event for latches)
Next
From: Aleksander Alekseev
Date:
Subject: Re: Re: Declarative partitioning optimization for largeamount of partitions