Thread: allow partial union-all and improve parallel subquery costing

allow partial union-all and improve parallel subquery costing

From
Luc Vlaming
Date:
Hi,

While developing some improvements for TPC-DS queries I found out that with 
UNION ALL partial paths are not emitted. Whilst fixing that I also came across
the subquery costing which does not seem to consider parallelism when doing 
the costing.

I added a simplified testcase in pg-regress to show this goes wrong, and 
attached also a before and after explain output of tpc-ds SF100 query 5
based on version 12.4.

I hope I followed all etiquette and these kind of improvements are welcome.

Kind regards,
Luc
Swarm64
Attachment

Re: allow partial union-all and improve parallel subquery costing

From
Luc Vlaming
Date:
Hi,

It seems I ran the wrong make checks to verify everything is correct (make check instead
of make installcheck-world) and this uncovered another regress test change. I also noticed
the statistics are sometimes giving different row count results so I increased the row
statistics target to make sure the regress output is stable. Updated patch attached which
now successfully runs installcheck-world for v13 and master.

Kind regards,
Luc

________________________________________
From: Luc Vlaming <luc@swarm64.com>
Sent: Tuesday, October 13, 2020 10:57 AM
To: pgsql-hackers
Subject: allow partial union-all and improve parallel subquery costing

Hi,

While developing some improvements for TPC-DS queries I found out that with
UNION ALL partial paths are not emitted. Whilst fixing that I also came across
the subquery costing which does not seem to consider parallelism when doing
the costing.

I added a simplified testcase in pg-regress to show this goes wrong, and
attached also a before and after explain output of tpc-ds SF100 query 5
based on version 12.4.

I hope I followed all etiquette and these kind of improvements are welcome.

Kind regards,
Luc
Swarm64

Attachment

Re: allow partial union-all and improve parallel subquery costing

From
Luc Vlaming
Date:
On 14.10.20 09:38, Luc Vlaming wrote:
> Hi,
> 
> It seems I ran the wrong make checks to verify everything is correct (make check instead
> of make installcheck-world) and this uncovered another regress test change. I also noticed
> the statistics are sometimes giving different row count results so I increased the row
> statistics target to make sure the regress output is stable. Updated patch attached which
> now successfully runs installcheck-world for v13 and master.
> 
> Kind regards,
> Luc
> 
> ________________________________________
> From: Luc Vlaming <luc@swarm64.com>
> Sent: Tuesday, October 13, 2020 10:57 AM
> To: pgsql-hackers
> Subject: allow partial union-all and improve parallel subquery costing
> 
> Hi,
> 
> While developing some improvements for TPC-DS queries I found out that with
> UNION ALL partial paths are not emitted. Whilst fixing that I also came across
> the subquery costing which does not seem to consider parallelism when doing
> the costing.
> 
> I added a simplified testcase in pg-regress to show this goes wrong, and
> attached also a before and after explain output of tpc-ds SF100 query 5
> based on version 12.4.
> 
> I hope I followed all etiquette and these kind of improvements are welcome.
> 
> Kind regards,
> Luc
> Swarm64
> 

Hi,

Created a commitfest entry assuming this is the right thing to do so 
that someone can potentially pick it up during the commitfest.

Kind regards,
Luc
Swarm64



Re: allow partial union-all and improve parallel subquery costing

From
Luc Vlaming
Date:
On 23-10-2020 07:51, Luc Vlaming wrote:
> On 14.10.20 09:38, Luc Vlaming wrote:
>> Hi,
>>
>> It seems I ran the wrong make checks to verify everything is correct 
>> (make check instead
>> of make installcheck-world) and this uncovered another regress test 
>> change. I also noticed
>> the statistics are sometimes giving different row count results so I 
>> increased the row
>> statistics target to make sure the regress output is stable. Updated 
>> patch attached which
>> now successfully runs installcheck-world for v13 and master.
>>
>> Kind regards,
>> Luc
>>
>> ________________________________________
>> From: Luc Vlaming <luc@swarm64.com>
>> Sent: Tuesday, October 13, 2020 10:57 AM
>> To: pgsql-hackers
>> Subject: allow partial union-all and improve parallel subquery costing
>>
>> Hi,
>>
>> While developing some improvements for TPC-DS queries I found out that 
>> with
>> UNION ALL partial paths are not emitted. Whilst fixing that I also 
>> came across
>> the subquery costing which does not seem to consider parallelism when 
>> doing
>> the costing.
>>
>> I added a simplified testcase in pg-regress to show this goes wrong, and
>> attached also a before and after explain output of tpc-ds SF100 query 5
>> based on version 12.4.
>>
>> I hope I followed all etiquette and these kind of improvements are 
>> welcome.
>>
>> Kind regards,
>> Luc
>> Swarm64
>>
> 
> Hi,
> 
> Created a commitfest entry assuming this is the right thing to do so 
> that someone can potentially pick it up during the commitfest.
> 
> Kind regards,
> Luc
> Swarm64

Hi,

Providing an updated patch based on latest master.

Cheers,
Luc

Attachment

Re: allow partial union-all and improve parallel subquery costing

From
David Steele
Date:
On 3/15/21 9:09 AM, David Steele wrote:
> 
> On 12/30/20 8:54 AM, Luc Vlaming wrote:
>>>
>>> Created a commitfest entry assuming this is the right thing to do so 
>>> that someone can potentially pick it up during the commitfest.
>>
>> Providing an updated patch based on latest master.
> 
> Looks like you need another rebase: 
> http://cfbot.cputube.org/patch_32_2787.log. Marked as Waiting for Author.
> 
> You may also want to give a more detailed description of what you have 
> done here and why it improves execution plans. This may help draw some 
> reviewers.

Since no new patch has been provided, marking this Returned with Feedback.

Please resubmit to the next CF when you have a new patch.

Regards,
-- 
-David
david@pgmasters.net



Re: allow partial union-all and improve parallel subquery costing

From
Luc Vlaming
Date:
Hi David,

On 15-03-2021 14:09, David Steele wrote:
> Hi Luc,
> 
> On 12/30/20 8:54 AM, Luc Vlaming wrote:
>>>
>>> Created a commitfest entry assuming this is the right thing to do so 
>>> that someone can potentially pick it up during the commitfest.
>>
>> Providing an updated patch based on latest master.
> 
> Looks like you need another rebase: 
> http://cfbot.cputube.org/patch_32_2787.log. Marked as Waiting for Author.
> 
> You may also want to give a more detailed description of what you have 
> done here and why it improves execution plans. This may help draw some 
> reviewers.
> 
> Regards,

Here's an improved and rebased patch. Hope the description helps some 
people. I will resubmit it to the next commitfest.

Regards,
Luc

Attachment

Re: allow partial union-all and improve parallel subquery costing

From
Ronan Dunklau
Date:
Le lundi 12 avril 2021, 14:01:36 CEST Luc Vlaming a écrit :
> Here's an improved and rebased patch. Hope the description helps some
> people. I will resubmit it to the next commitfest.
>

Hello Luc,

I've taken a look at this patch, and while I don't fully understand its
implications here are a couple remarks.

I think you should add a test demonstrating the use of the new partial append
path you add, for example using your base query:

explain (costs off)
select sum(two) from
(
select *, 1::int  from tenk1 a
union all
select *, 1::bigint  from tenk1 b
) t
;

I'm not sure I understand why the subquery scan rows estimate has not been
accounted like you propose before, because the way it's done as of now
basically doubles the estimate for the subqueryscan, since we account for it
already being divided by it's number of workers, as mentioned in cost_append:

/*
 * Apply parallel divisor to subpaths.  Scale the number of rows
 * for each partial subpath based on the ratio of the parallel
 * divisor originally used for the subpath to the one we adopted.
 * Also add the cost of partial paths to the total cost, but
 * ignore non-partial paths for now.
 */

Do we have other nodes for which we make this assumption ?

Also, adding a partial path comprised only of underlying partial paths might
not be enough: maybe we should add one partial path even in the case of mixed
partial / nonpartial paths like it's done in add_paths_to_append_rel ?

Regards,

--
Ronan Dunklau





Re: allow partial union-all and improve parallel subquery costing

From
Daniel Gustafsson
Date:
With the thread stalled and requests for a test (documentation really?) not
responded to I'm marking this patch Returned with Feedback.

--
Daniel Gustafsson        https://vmware.com/