Thread: ERROR: ORDER/GROUP BY expression not found in targetlist
Hi, What is going on here? postgres=# create table logs as select generate_series(1, 1000000)::text as data; SELECT 1000000 postgres=# insert into logs select * from logs; INSERT 0 1000000 postgres=# insert into logs select * from logs; INSERT 0 2000000 postgres=# insert into logs select * from logs; INSERT 0 4000000 postgres=# insert into logs select * from logs; INSERT 0 8000000 postgres=# insert into logs select * from logs; INSERT 0 16000000 postgres=# analyze logs; ANALYZE postgres=# set max_parallel_workers_per_gather = 0; SET postgres=# explain select length(data) from logs group by length(data); ┌────────────────────────────────────────────────────────────────────────────┐ │ QUERY PLAN │ ├────────────────────────────────────────────────────────────────────────────┤ │ Group (cost=5843157.07..6005642.13 rows=993989 width=4) │ │ Group Key: (length(data)) │ │ -> Sort (cost=5843157.07..5923157.11 rows=32000018 width=4) │ │ Sort Key: (length(data)) │ │ -> Seq Scan on logs (cost=0.00..541593.22 rows=32000018 width=4) │ └────────────────────────────────────────────────────────────────────────────┘ (5 rows) postgres=# set max_parallel_workers_per_gather = 2; SET postgres=# explain select length(data) from logs group by length(data); ERROR: ORDER/GROUP BY expression not found in targetlist -- Thomas Munro http://www.enterprisedb.com
Hi, I tried to run tpc-h queries, but some queries failed by the error on last week. >Subject: Re: [HACKERS] ORDER/GROUP BY expression not found in targetlist>Date: Thu, 09 Jun 2016 12:08:01 +0900 Today, I try it again by changing max_parallel_workers_per_gather parameter. The result of Q1 is bellow. Is this bug in the Open items on wiki? ------------- postgres=# set max_parallel_workers_per_gather = 0; SET postgres=# \i queries/1.explain.sql QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=43474.03..43474.03 rows=1 width=236) (actual time=1039.583..1039.583 rows=1 loops=1) -> Sort (cost=43474.03..43474.04rows=6 width=236) (actual time=1039.583..1039.583 rows=1 loops=1) Sort Key: l_returnflag,l_linestatus Sort Method: top-N heapsort Memory: 25kB -> HashAggregate (cost=43473.83..43474.00rows=6 width=236) (actual time=1039.529..1039.534 rows=4 loops=1) Group Key: l_returnflag,l_linestatus -> Seq Scan on lineitem (cost=0.00..19668.15 rows=595142 width=25) (actual time=0.048..125.332rows=595224 loops=1) Filter: (l_shipdate <= '1998-09-22 00:00:00'::timestamp withouttime zone) Rows Removed by Filter: 5348 Planning time: 0.180 ms Execution time: 1039.758 ms (11 rows) postgres=# set max_parallel_workers_per_gather = default; SET postgres=# \i queries/1.explain.sql ERROR: ORDER/GROUP BY expression not found in targetlist ------------- Regards, Tatsuro Yamada NTT OSS Center On 2016/06/13 12:39, Thomas Munro wrote: > Hi, > > What is going on here? > > postgres=# create table logs as select generate_series(1, > 1000000)::text as data; > SELECT 1000000 > postgres=# insert into logs select * from logs; > INSERT 0 1000000 > postgres=# insert into logs select * from logs; > INSERT 0 2000000 > postgres=# insert into logs select * from logs; > INSERT 0 4000000 > postgres=# insert into logs select * from logs; > INSERT 0 8000000 > postgres=# insert into logs select * from logs; > INSERT 0 16000000 > postgres=# analyze logs; > ANALYZE > postgres=# set max_parallel_workers_per_gather = 0; > SET > postgres=# explain select length(data) from logs group by length(data); > ┌────────────────────────────────────────────────────────────────────────────┐ > │ QUERY PLAN │ > ├────────────────────────────────────────────────────────────────────────────┤ > │ Group (cost=5843157.07..6005642.13 rows=993989 width=4) │ > │ Group Key: (length(data)) │ > │ -> Sort (cost=5843157.07..5923157.11 rows=32000018 width=4) │ > │ Sort Key: (length(data)) │ > │ -> Seq Scan on logs (cost=0.00..541593.22 rows=32000018 width=4) │ > └────────────────────────────────────────────────────────────────────────────┘ > (5 rows) > > postgres=# set max_parallel_workers_per_gather = 2; > SET > postgres=# explain select length(data) from logs group by length(data); > ERROR: ORDER/GROUP BY expression not found in targetlist >
On Mon, Jun 13, 2016 at 4:16 PM, Tatsuro Yamada <yamada.tatsuro@lab.ntt.co.jp> wrote: > I tried to run tpc-h queries, but some queries failed by the error on last > week. > >>Subject: Re: [HACKERS] ORDER/GROUP BY expression not found in targetlist >>Date: Thu, 09 Jun 2016 12:08:01 +0900 Right, I saw that thread which involved the same error message: https://www.postgresql.org/message-id/flat/20160526021235.w4nq7k3gnheg7vit%40alap3.anarazel.de#20160526021235.w4nq7k3gnheg7vit@alap3.anarazel.de ... but that seems to be a different problem than the one you and I have seen, it involved repeated references to columns in the tlist. It was fixed with this commit: commit aeb9ae6457865c8949641d71a9523374d843a418 Author: Tom Lane <tgl@sss.pgh.pa.us> Date: Thu May 26 14:52:24 2016 -0400 Disable physical tlist if any Var would need multiple sortgroupref labels. > Today, I try it again by changing max_parallel_workers_per_gather parameter. > The result of Q1 is bellow. Is this bug in the Open items on wiki? I don't see it on the Open Issues list. -- Thomas Munro http://www.enterprisedb.com
On 13 June 2016 at 15:39, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > What is going on here? ... > > postgres=# set max_parallel_workers_per_gather = 2; > SET > postgres=# explain select length(data) from logs group by length(data); > ERROR: ORDER/GROUP BY expression not found in targetlist Seems like this was caused by 04ae11f62e643e07c411c4935ea6af46cb112aa9 I missed the discussion on this commit, so I'll go look for that now. -- David Rowley http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Hi, >>> Subject: Re: [HACKERS] ORDER/GROUP BY expression not found in targetlist >>> Date: Thu, 09 Jun 2016 12:08:01 +0900 > > Right, I saw that thread which involved the same error message: > > https://www.postgresql.org/message-id/flat/20160526021235.w4nq7k3gnheg7vit%40alap3.anarazel.de#20160526021235.w4nq7k3gnheg7vit@alap3.anarazel.de > > ... but that seems to be a different problem than the one you and I > have seen, it involved repeated references to columns in the tlist. > It was fixed with this commit: > > commit aeb9ae6457865c8949641d71a9523374d843a418 > Author: Tom Lane <tgl@sss.pgh.pa.us> > Date: Thu May 26 14:52:24 2016 -0400 > > Disable physical tlist if any Var would need multiple sortgroupref labels. I use this version:f721e94 to run tpc-h on last week. This patch is commited at Jun 8. If it fixed, I didn't get the error. >PG96beta1> commit: f721e94b5f360391fc3ffe183bf697a0441e9184 ----- commit f721e94b5f360391fc3ffe183bf697a0441e9184 Author: Robert Haas <rhaas@postgresql.org> Date: Wed Jun 8 08:37:06 2016 -0400 Fix typo. Amit Langote ----- I got mistake to write an e-mail to -hackers on last week. :-< I should have written this. The bug has not fixed by Tom Lane's patch: commit aeb9ae6. Because I got the same error using tpc-h. >> Today, I try it again by changing max_parallel_workers_per_gather parameter. >> The result of Q1 is bellow. Is this bug in the Open items on wiki? > > I don't see it on the Open Issues list. I checked the list, but the bug is not listed. https://wiki.postgresql.org/wiki/PostgreSQL_9.6_Open_Items Regards, Tatsuro Yamada NTT OSS Center
On Mon, Jun 13, 2016 at 2:42 PM, Tatsuro Yamada <yamada.tatsuro@lab.ntt.co.jp> wrote: > I got mistake to write an e-mail to -hackers on last week. :-< > I should have written this. > > The bug has not fixed by Tom Lane's patch: commit aeb9ae6. > Because I got the same error using tpc-h. This looks like a different regression.. >>> Today, I try it again by changing max_parallel_workers_per_gather >>> parameter. >>> The result of Q1 is bellow. Is this bug in the Open items on wiki? >> >> I don't see it on the Open Issues list. > > I checked the list, but the bug is not listed. > https://wiki.postgresql.org/wiki/PostgreSQL_9.6_Open_Items And the winner is: 04ae11f62e643e07c411c4935ea6af46cb112aa9 is the first bad commit commit 04ae11f62e643e07c411c4935ea6af46cb112aa9 Author: Robert Haas <rhaas@postgresql.org> Date: Fri Jun 3 14:27:33 2016 -0400 I am adding that to the list of open items. -- Michael
On Mon, Jun 13, 2016 at 11:05 AM, David Rowley <david.rowley@2ndquadrant.com> wrote:
>
> On 13 June 2016 at 15:39, Thomas Munro <thomas.munro@enterprisedb.com> wrote:
> > What is going on here?
>
> ...
>
> >
> > postgres=# set max_parallel_workers_per_gather = 2;
> > SET
> > postgres=# explain select length(data) from logs group by length(data);
> > ERROR: ORDER/GROUP BY expression not found in targetlist
>
> Seems like this was caused by 04ae11f62e643e07c411c4935ea6af46cb112aa9
>
>
> On 13 June 2016 at 15:39, Thomas Munro <thomas.munro@enterprisedb.com> wrote:
> > What is going on here?
>
> ...
>
> >
> > postgres=# set max_parallel_workers_per_gather = 2;
> > SET
> > postgres=# explain select length(data) from logs group by length(data);
> > ERROR: ORDER/GROUP BY expression not found in targetlist
>
> Seems like this was caused by 04ae11f62e643e07c411c4935ea6af46cb112aa9
>
In create_grouping_paths(), we are building partial_grouping_path and same is used for gather path and other grouping paths (for partial paths). However, we don't use it for partial path list and sort path due to which path target for Sort path is different from what we have expected. Is there a problem in applying partial_grouping_path for partial pathlist? Attached patch just does that and I don't see error with patch.
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
Attachment
On 2016/06/13 15:52, Michael Paquier wrote: > On Mon, Jun 13, 2016 at 2:42 PM, Tatsuro Yamada > <yamada.tatsuro@lab.ntt.co.jp> wrote: >> I got mistake to write an e-mail to -hackers on last week. :-< >> I should have written this. >> >> The bug has not fixed by Tom Lane's patch: commit aeb9ae6. >> Because I got the same error using tpc-h. > > This looks like a different regression.. I understand it now, thanks. :-) >> I checked the list, but the bug is not listed. >> https://wiki.postgresql.org/wiki/PostgreSQL_9.6_Open_Items > > And the winner is: > > 04ae11f62e643e07c411c4935ea6af46cb112aa9 is the first bad commit > commit 04ae11f62e643e07c411c4935ea6af46cb112aa9 > Author: Robert Haas <rhaas@postgresql.org> > Date: Fri Jun 3 14:27:33 2016 -0400 > > I am adding that to the list of open items. Oh... I'll try to run tpc-h if I got a new patch which fixes the bug. :) Thanks, Tatsuro Yamada NTT OSS Center
Hi, I applied your patch and run tpc-h. Then I got new errors on Q4,12,17 and 19. ERROR: Aggref found in non-Agg plan node. See bellow, ---------------------------------- postgres=# \i queries/4.explain.sql ERROR: Aggref found in non-Agg plan node STATEMENT: explain analyze select o_orderpriority, count(*) as order_count from orders where o_orderdate >= date '1993-10-01' and o_orderdate < date '1993-10-01'+ interval '3' month and exists ( select * from lineitem where l_orderkey = o_orderkey and l_commitdate < l_receiptdate ) group by o_orderpriority order by o_orderpriority LIMIT 1; ---------------------------------- Regards, Tatsuro Yamada NTT OSS Center On 2016/06/13 16:18, Amit Kapila wrote: > On Mon, Jun 13, 2016 at 11:05 AM, David Rowley <david.rowley@2ndquadrant.com <mailto:david.rowley@2ndquadrant.com>> wrote: > > > > On 13 June 2016 at 15:39, Thomas Munro <thomas.munro@enterprisedb.com <mailto:thomas.munro@enterprisedb.com>> wrote: > > > What is going on here? > > > > ... > > > > > > > > postgres=# set max_parallel_workers_per_gather = 2; > > > SET > > > postgres=# explain select length(data) from logs group by length(data); > > > ERROR: ORDER/GROUP BY expression not found in targetlist > > > > Seems like this was caused by 04ae11f62e643e07c411c4935ea6af46cb112aa9 > > > > In create_grouping_paths(), we are building partial_grouping_path and same is used for gather path and other grouping paths(for partial paths). However, we don't use it for partial path list and sort path due to which path target for Sortpath is different from what we have expected. Is there a problem in applying partial_grouping_path for partial pathlist? Attached patch just does that and I don't see error with patch. > > With Regards, > Amit Kapila. > EnterpriseDB: http://www.enterprisedb.com <http://www.enterprisedb.com/> > > >
On Mon, Jun 13, 2016 at 3:18 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: > In create_grouping_paths(), we are building partial_grouping_path and same > is used for gather path and other grouping paths (for partial paths). > However, we don't use it for partial path list and sort path due to which > path target for Sort path is different from what we have expected. Is there > a problem in applying partial_grouping_path for partial pathlist? > Attached patch just does that and I don't see error with patch. It doesn't seem like a good idea to destructive modify input_rel->partial_pathlist. Applying the projection to each path before using it would probably be better. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Mon, Jun 13, 2016 at 3:18 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: >> In create_grouping_paths(), we are building partial_grouping_path and same >> is used for gather path and other grouping paths (for partial paths). >> However, we don't use it for partial path list and sort path due to which >> path target for Sort path is different from what we have expected. Is there >> a problem in applying partial_grouping_path for partial pathlist? >> Attached patch just does that and I don't see error with patch. > It doesn't seem like a good idea to destructive modify > input_rel->partial_pathlist. Applying the projection to each path > before using it would probably be better. I think the real question here is why the code removed by 04ae11f62 was wrong. It was unsafe to use apply_projection_to_path, certainly, but using create_projection_path directly would have avoided the stated problem. And it's very unclear that this new patch doesn't bring back that bug in a different place. I am not very happy that neither 04ae11f62 nor this patch include any regression test case proving that a problem existed and has been fixed. regards, tom lane
Amit Kapila <amit.kapila@enterprisedb.com> writes: > On Mon, Jun 13, 2016 at 7:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I think the real question here is why the code removed by 04ae11f62 >> was wrong. It was unsafe to use apply_projection_to_path, certainly, >> but using create_projection_path directly would have avoided the >> stated problem. And it's very unclear that this new patch doesn't >> bring back that bug in a different place. > This new patch still doesn't seem to be right, but it won't bring back the > original problem because apply_projection_to_path will be only done if > grouped_rel is parallel_safe which means it doesn't have any > parallel-unsafe or parallel-restricted clause in quals or target list. The problem cited in 04ae11f62's commit message is that apply_projection_to_path would overwrite the paths' pathtargets in-place, causing problems if they'd been used for other purposes elsewhere. I do not share your confidence that using apply_projection_to_path within create_grouping_paths is free of such a hazard. regards, tom lane
Amit Kapila <amit.kapila@enterprisedb.com> writes: > It is slightly tricky to write a reproducible parallel-query test, but > point taken and I think we should try to have a test unless such a test is > really time consuming. BTW, decent regression tests could be written without the need to create enormous tables if the minimum rel size in create_plain_partial_paths() could be configured to something less than 1000 blocks. I think it's fairly crazy that that arbitrary constant is hard-wired anyway. Should we make it a GUC? regards, tom lane
I wrote: > Amit Kapila <amit.kapila@enterprisedb.com> writes: >> It is slightly tricky to write a reproducible parallel-query test, but >> point taken and I think we should try to have a test unless such a test is >> really time consuming. > BTW, decent regression tests could be written without the need to create > enormous tables if the minimum rel size in create_plain_partial_paths() > could be configured to something less than 1000 blocks. I think it's > fairly crazy that that arbitrary constant is hard-wired anyway. Should > we make it a GUC? Just as an experiment to see what would happen, I did - int parallel_threshold = 1000; + int parallel_threshold = 1; and ran the regression tests. I got a core dump in the window.sql test: Program terminated with signal 11, Segmentation fault. #0 0x0000000000664dbc in make_partialgroup_input_target (root=0x1795018, input_rel=0x17957a8, target=0x17bf228, rollup_lists=0x0, rollup_groupclauses=0x0) at planner.c:4307 4307 Index sgref = final_target->sortgrouprefs[i]; (gdb) bt #0 0x0000000000664dbc in make_partialgroup_input_target (root=0x1795018, input_rel=0x17957a8, target=0x17bf228, rollup_lists=0x0, rollup_groupclauses=0x0) at planner.c:4307 #1 create_grouping_paths (root=0x1795018, input_rel=0x17957a8, target=0x17bf228, rollup_lists=0x0, rollup_groupclauses=0x0) at planner.c:3420 #2 0x0000000000667405 in grouping_planner (root=0x1795018, inheritance_update=0 '\000', tuple_fraction=0) at planner.c:1794 #3 0x0000000000668c80 in subquery_planner (glob=<value optimized out>, parse=0x1703580, parent_root=<value optimizedout>, hasRecursion=<value optimized out>, tuple_fraction=0) at planner.c:769 #4 0x0000000000668ea5 in standard_planner (parse=0x1703580, cursorOptions=256, boundParams=0x0) at planner.c:308 #5 0x00000000006691b6 in planner (parse=<value optimized out>, cursorOptions=<value optimized out>, boundParams=<valueoptimized out>) at planner.c:178 #6 0x00000000006fb069 in pg_plan_query (querytree=0x1703580, cursorOptions=256, boundParams=0x0) at postgres.c:798 (gdb) p debug_query_string $1 = 0x1702078 "SELECT SUM(COUNT(f1)) OVER () FROM int4_tbl WHERE f1=42;" which I think may be another manifestation of the failure-to-apply-proper- pathtarget issue we're looking at in this thread. Or maybe it's just an unjustified assumption in make_partialgroup_input_target that the input path must always have some sortgrouprefs assigned. Before getting to that point, there was also an unexplainable plan change: *** /home/postgres/pgsql/src/test/regress/expected/aggregates.out Thu Apr 7 21:13:14 2016 --- /home/postgres/pgsql/src/test/regress/results/aggregates.out Mon Jun 13 11:54:01 2016 *************** *** 577,590 **** explain (costs off) select max(unique1) from tenk1 where unique1 > 42000; ! QUERY PLAN ! --------------------------------------------------------------------------- ! Result ! InitPlan 1 (returns $0) ! -> Limit ! -> Index Only Scan Backward using tenk1_unique1 on tenk1 ! Index Cond: ((unique1 IS NOT NULL) AND (unique1 > 42000)) ! (5 rows) select max(unique1) from tenk1 where unique1 > 42000; max --- 577,588 ---- explain (costs off) select max(unique1) from tenk1 where unique1 > 42000; ! QUERY PLAN ! ---------------------------------------------------- ! Aggregate ! -> Index Only Scan using tenk1_unique1 on tenk1 ! Index Cond: (unique1 > 42000) ! (3 rows) select max(unique1) from tenk1 where unique1 > 42000; max I would not be surprised at a change to a parallel-query plan, but there's no parallelism here, so what happened? This looks like a bug to me. (Also, doing this query without COSTS OFF shows that the newly selected plan actually has a greater estimated cost than the expected plan, which makes it definitely a bug.) At this point I'm pretty firmly convinced that we should have a way to run the regression tests with parallel scans considered for even very small tables. If someone doesn't want that way to be a GUC, you'd better propose another solution. regards, tom lane
On Mon, Jun 13, 2016 at 11:02 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Amit Kapila <amit.kapila@enterprisedb.com> writes: >> It is slightly tricky to write a reproducible parallel-query test, but >> point taken and I think we should try to have a test unless such a test is >> really time consuming. > > BTW, decent regression tests could be written without the need to create > enormous tables if the minimum rel size in create_plain_partial_paths() > could be configured to something less than 1000 blocks. I think it's > fairly crazy that that arbitrary constant is hard-wired anyway. Should > we make it a GUC? That was proposed before, and I didn't do it mostly because I couldn't think of a name for it that didn't sound unbelievably corny. Also, the whole way that algorithm works is kind of a hack and probably needs to be overhauled entirely in some future release. I'm worried about having the words "backward compatibility" thrown in my face when it's time to improve this logic. But aside from those two issues I'm OK with exposing a knob. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Mon, Jun 13, 2016 at 11:02 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> BTW, decent regression tests could be written without the need to create >> enormous tables if the minimum rel size in create_plain_partial_paths() >> could be configured to something less than 1000 blocks. I think it's >> fairly crazy that that arbitrary constant is hard-wired anyway. Should >> we make it a GUC? > That was proposed before, and I didn't do it mostly because I couldn't > think of a name for it that didn't sound unbelievably corny. min_parallel_relation_size, or min_parallelizable_relation_size, or something like that? > Also, > the whole way that algorithm works is kind of a hack and probably > needs to be overhauled entirely in some future release. I'm worried > about having the words "backward compatibility" thrown in my face when > it's time to improve this logic. But aside from those two issues I'm > OK with exposing a knob. I agree it's a hack, and I don't want to expose anything about the number-of-workers scaling behavior, for precisely that reason. But a threshold on the size of a table to consider parallel scans for at all doesn't seem unreasonable. regards, tom lane
On Mon, Jun 13, 2016 at 7:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Robert Haas <robertmhaas@gmail.com> writes:
> > On Mon, Jun 13, 2016 at 3:18 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >> In create_grouping_paths(), we are building partial_grouping_path and same
> >> is used for gather path and other grouping paths (for partial paths).
> >> However, we don't use it for partial path list and sort path due to which
> >> path target for Sort path is different from what we have expected. Is there
> >> a problem in applying partial_grouping_path for partial pathlist?
> >> Attached patch just does that and I don't see error with patch.
>
> > It doesn't seem like a good idea to destructive modify
> > input_rel->partial_pathlist. Applying the projection to each path
> > before using it would probably be better.
>
> I think the real question here is why the code removed by 04ae11f62
> was wrong. It was unsafe to use apply_projection_to_path, certainly,
> but using create_projection_path directly would have avoided the
> stated problem. And it's very unclear that this new patch doesn't
> bring back that bug in a different place.
>
>
> Robert Haas <robertmhaas@gmail.com> writes:
> > On Mon, Jun 13, 2016 at 3:18 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >> In create_grouping_paths(), we are building partial_grouping_path and same
> >> is used for gather path and other grouping paths (for partial paths).
> >> However, we don't use it for partial path list and sort path due to which
> >> path target for Sort path is different from what we have expected. Is there
> >> a problem in applying partial_grouping_path for partial pathlist?
> >> Attached patch just does that and I don't see error with patch.
>
> > It doesn't seem like a good idea to destructive modify
> > input_rel->partial_pathlist. Applying the projection to each path
> > before using it would probably be better.
>
> I think the real question here is why the code removed by 04ae11f62
> was wrong. It was unsafe to use apply_projection_to_path, certainly,
> but using create_projection_path directly would have avoided the
> stated problem. And it's very unclear that this new patch doesn't
> bring back that bug in a different place.
>
This new patch still doesn't seem to be right, but it won't bring back the original problem because apply_projection_to_path will be only done if grouped_rel is parallel_safe which means it doesn't have any parallel-unsafe or parallel-restricted clause in quals or target list.
> I am not very happy that neither 04ae11f62 nor this patch include
> any regression test case proving that a problem existed and has
> been fixed.
>
It is slightly tricky to write a reproducible parallel-query test, but point taken and I think we should try to have a test unless such a test is really time consuming.
On Mon, Jun 13, 2016 at 7:17 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Mon, Jun 13, 2016 at 3:18 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > In create_grouping_paths(), we are building partial_grouping_path and same
> > is used for gather path and other grouping paths (for partial paths).
> > However, we don't use it for partial path list and sort path due to which
> > path target for Sort path is different from what we have expected. Is there
> > a problem in applying partial_grouping_path for partial pathlist?
> > Attached patch just does that and I don't see error with patch.
>
> It doesn't seem like a good idea to destructive modify
> input_rel->partial_pathlist. Applying the projection to each path
> before using it would probably be better.
>
>
> On Mon, Jun 13, 2016 at 3:18 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > In create_grouping_paths(), we are building partial_grouping_path and same
> > is used for gather path and other grouping paths (for partial paths).
> > However, we don't use it for partial path list and sort path due to which
> > path target for Sort path is different from what we have expected. Is there
> > a problem in applying partial_grouping_path for partial pathlist?
> > Attached patch just does that and I don't see error with patch.
>
> It doesn't seem like a good idea to destructive modify
> input_rel->partial_pathlist. Applying the projection to each path
> before using it would probably be better.
>
Do you mean to have it when we generate a complete GroupAgg Path atop of the cheapest partial path?
On Mon, Jun 13, 2016 at 1:06 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Mon, Jun 13, 2016 at 11:02 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> BTW, decent regression tests could be written without the need to create >>> enormous tables if the minimum rel size in create_plain_partial_paths() >>> could be configured to something less than 1000 blocks. I think it's >>> fairly crazy that that arbitrary constant is hard-wired anyway. Should >>> we make it a GUC? > >> That was proposed before, and I didn't do it mostly because I couldn't >> think of a name for it that didn't sound unbelievably corny. > > min_parallel_relation_size, or min_parallelizable_relation_size, or > something like that? Sure. >> Also, >> the whole way that algorithm works is kind of a hack and probably >> needs to be overhauled entirely in some future release. I'm worried >> about having the words "backward compatibility" thrown in my face when >> it's time to improve this logic. But aside from those two issues I'm >> OK with exposing a knob. > > I agree it's a hack, and I don't want to expose anything about the > number-of-workers scaling behavior, for precisely that reason. But a > threshold on the size of a table to consider parallel scans for at all > doesn't seem unreasonable. OK. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
I wrote: > ... there was also an unexplainable plan change: > *** /home/postgres/pgsql/src/test/regress/expected/aggregates.out Thu Apr 7 21:13:14 2016 > --- /home/postgres/pgsql/src/test/regress/results/aggregates.out Mon Jun 13 11:54:01 2016 > *************** > *** 577,590 **** > explain (costs off) > select max(unique1) from tenk1 where unique1 > 42000; > ! QUERY PLAN > ! --------------------------------------------------------------------------- > ! Result > ! InitPlan 1 (returns $0) > ! -> Limit > ! -> Index Only Scan Backward using tenk1_unique1 on tenk1 > ! Index Cond: ((unique1 IS NOT NULL) AND (unique1 > 42000)) > ! (5 rows) > select max(unique1) from tenk1 where unique1 > 42000; > max > --- 577,588 ---- > explain (costs off) > select max(unique1) from tenk1 where unique1 > 42000; > ! QUERY PLAN > ! ---------------------------------------------------- > ! Aggregate > ! -> Index Only Scan using tenk1_unique1 on tenk1 > ! Index Cond: (unique1 > 42000) > ! (3 rows) > select max(unique1) from tenk1 where unique1 > 42000; > max > I would not be surprised at a change to a parallel-query plan, but there's > no parallelism here, so what happened? This looks like a bug to me. > (Also, doing this query without COSTS OFF shows that the newly selected > plan actually has a greater estimated cost than the expected plan, which > makes it definitely a bug.) I looked into this and found that the costs are considered fuzzily the same, and then add_path prefers the slightly-worse path on the grounds that it is marked parallel_safe while the MinMaxAgg path is not. It seems to me that there is some fuzzy thinking going on there. On exactly what grounds is a path to be preferred merely because it is parallel safe, and not actually parallelized? Or perhaps the question to ask is whether a MinMaxAgg path can be marked parallel-safe. regards, tom lane
On Mon, Jun 13, 2016 at 3:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I would not be surprised at a change to a parallel-query plan, but there's >> no parallelism here, so what happened? This looks like a bug to me. >> (Also, doing this query without COSTS OFF shows that the newly selected >> plan actually has a greater estimated cost than the expected plan, which >> makes it definitely a bug.) > > I looked into this and found that the costs are considered fuzzily the > same, and then add_path prefers the slightly-worse path on the grounds > that it is marked parallel_safe while the MinMaxAgg path is not. It seems > to me that there is some fuzzy thinking going on there. On exactly what > grounds is a path to be preferred merely because it is parallel safe, and > not actually parallelized? A parallel-safe plan can be joined to a partial path at a higher level of the plan tree to form a new partial path. A non-parallel-safe path cannot. Therefore, if two paths are equally good, the one which is parallel-safe is more useful (just as a sorted path which is slightly more expensive than an unsorted path is worth keeping around because it is ordered). In practice, we don't yet have the ability for parallel-safe paths from subqueries to affect planning at higher query levels, but that's because the pathification stuff landed too late in the cycle for me to fully react to it. > Or perhaps the question to ask is whether a > MinMaxAgg path can be marked parallel-safe. That is a good question. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > In practice, we don't yet have the ability for > parallel-safe paths from subqueries to affect planning at higher query > levels, but that's because the pathification stuff landed too late in > the cycle for me to fully react to it. I wonder if that's not just from confusion between subplans and subqueries. I don't believe any of the claims made in the comment adjusted in the patch below (other than "Subplans currently aren't passed to workers", which is true but irrelevant). Nested Gather nodes is something that you would need, and already have, a defense for anyway. regards, tom lane diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c index cc8ba61..8ab049c 100644 *** a/src/backend/optimizer/path/allpaths.c --- b/src/backend/optimizer/path/allpaths.c *************** set_rel_consider_parallel(PlannerInfo *r *** 560,574 **** case RTE_SUBQUERY: /* ! * Subplans currently aren't passed to workers. Even if they ! * were, the subplan might be using parallelism internally, and we ! * can't support nested Gather nodes at present. Finally, we ! * don't have a good way of knowing whether the subplan involves ! * any parallel-restricted operations. It would be nice to relax ! * this restriction some day, but it's going to take a fair amount ! * of work. */ ! return; case RTE_JOIN: /* Shouldn't happen; we're only considering baserels here. */ --- 560,574 ---- case RTE_SUBQUERY: /* ! * If the subquery doesn't have anything parallel-restricted, we ! * can consider parallel scans. Note that this does not mean that ! * all (or even any) of the paths produced for the subquery will ! * actually be parallel-safe; but that's true for paths produced ! * for regular tables, too. */ ! if (has_parallel_hazard((Node *) rte->subquery, false)) ! return; ! break; case RTE_JOIN: /* Shouldn't happen; we're only considering baserels here. */
On Mon, Jun 13, 2016 at 4:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> In practice, we don't yet have the ability for >> parallel-safe paths from subqueries to affect planning at higher query >> levels, but that's because the pathification stuff landed too late in >> the cycle for me to fully react to it. > > I wonder if that's not just from confusion between subplans and > subqueries. I don't believe any of the claims made in the comment > adjusted in the patch below (other than "Subplans currently aren't passed > to workers", which is true but irrelevant). Nested Gather nodes is > something that you would need, and already have, a defense for anyway. I think you may be correct. One problem that I've realized that is related to this is that the way that the consider_parallel flag is being set for upper rels is almost totally bogus, which I believe accounts for your complaint at PGCon that force_parallel_mode was not doing as much as it ought to do. When I originally wrote a lot of this logic, there were no upper rels, which led to this code: if (force_parallel_mode == FORCE_PARALLEL_OFF || !glob->parallelModeOK) { glob->parallelModeNeeded = false; glob->wholePlanParallelSafe = false; /* either false or don't care */ } else { glob->parallelModeNeeded = true; glob->wholePlanParallelSafe = !has_parallel_hazard((Node *) parse, false); } The main way that wholePlanParallelSafe is supposed to be cleared is in create_plan: /* Update parallel safety information if needed. */ if (!best_path->parallel_safe) root->glob->wholePlanParallelSafe = false; However, at the time that code was written, it was impossible to rely entirely on the latter mechanism. Since there were no upper rels and no paths for upper plan nodes, you could have the case where every path was parallel-safe but the whole plan was node. Therefore the code shown above was needed to scan the whole darned plan for parallel-unsafe things. Post-pathification, this whole thing is pretty bogus: upper rels mostly don't get consider_parallel set at all, which means plans involving upper rels get marked parallel-unsafe even if they are not, which means the wholePlanParallelSafe flag gets cleared in a bunch of cases where it shouldn't. And on the flip side I think that the first chunk of code quoted above would be totally unnecessary if we were actually setting consider_parallel correctly on the upper rels. I started working on a patch to fix all this, which I'm attaching here so you can see what I'm thinking. I am not sure it's correct, but it does cause force_parallel_mode to do something interesting in many more cases. Anyway, the reason this is related to the issue at hand is that you might have something like A LEFT JOIN (SELECT x, sum(q) FROM B GROUP BY 1) ON A.x = B.x. Right now, I think that the paths for the aggregated subquery will always be marked as not parallel-safe, but that's only because the consider_parallel flag on the grouped rel isn't being set properly. If it were, you could theoretically do a parallel seq scan on A and then have each worker evaluate the join for the rows that pop out of the subquery. Maybe that's a stupid example, but the point is that I bet there are cases where failing to mark the upper rels with consider_parallel where appropriate causes good parallel plans not to get generated. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
On Mon, Jun 13, 2016 at 5:27 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Mon, Jun 13, 2016 at 4:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Robert Haas <robertmhaas@gmail.com> writes: >>> In practice, we don't yet have the ability for >>> parallel-safe paths from subqueries to affect planning at higher query >>> levels, but that's because the pathification stuff landed too late in >>> the cycle for me to fully react to it. >> >> I wonder if that's not just from confusion between subplans and >> subqueries. I don't believe any of the claims made in the comment >> adjusted in the patch below (other than "Subplans currently aren't passed >> to workers", which is true but irrelevant). Nested Gather nodes is >> something that you would need, and already have, a defense for anyway. > > I think you may be correct. Oh, one other thing about this: I'm not going to try to defend whatever confusion between subplans and subqueries appears in that comment, but prior to upper planner pathification I think we were dead in the water here anyway, because the subquery was going to output a finished plan, not a list of paths. Since finished plans aren't labeled as to parallel-safety, we'd have to conservatively assume that the finished plan we got back might not be parallel-safe. Now that we're using the path representation throughout, we can do better. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > One problem that I've realized that is related to this is that the way > that the consider_parallel flag is being set for upper rels is almost > totally bogus, which I believe accounts for your complaint at PGCon > that force_parallel_mode was not doing as much as it ought to do. Yeah, I just ran into a different reason to think that. I tried marking MinMaxAggPaths in the same way as other relation-scan paths, ie pathnode->path.parallel_safe = rel->consider_parallel; but that did nothing because the just-created UPPERREL_GROUP_AGG rel didn't have its consider_parallel flag set yet. Moreover, if I'd tried to fix that by invoking set_grouped_rel_consider_parallel() from planagg.c, it would still not work right because set_grouped_rel_consider_parallel() is hard-wired to consider only partial aggregation, which is not what's going on in a MinMaxAggPath. I'm not sure about the best rewrite here, but I would urge you to make sure that consider_parallel on an upper rel reflects only properties inherent to the rel (eg, safeness of quals that must be applied there) and not properties of specific implementation methods. Also, please make sure that whatever logic is involved remains factored out where it can be called by an extension that might want to create an upperrel sooner than the core code would do. BTW, do we need wholePlanParallelSafe at all, and if so why? Can't it be replaced by examining the parallel_safe flag on the selected topmost Path? regards, tom lane
Robert Haas <robertmhaas@gmail.com> writes: > Oh, one other thing about this: I'm not going to try to defend > whatever confusion between subplans and subqueries appears in that > comment, but prior to upper planner pathification I think we were dead > in the water here anyway, because the subquery was going to output a > finished plan, not a list of paths. Since finished plans aren't > labeled as to parallel-safety, we'd have to conservatively assume that > the finished plan we got back might not be parallel-safe. Now that > we're using the path representation throughout, we can do better. Well, if that were still the state of affairs, we could certainly consider adding a parallel_safe flag to Plans as well as Paths. But as you say, it's moot now. regards, tom lane
On Mon, Jun 13, 2016 at 5:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> One problem that I've realized that is related to this is that the way >> that the consider_parallel flag is being set for upper rels is almost >> totally bogus, which I believe accounts for your complaint at PGCon >> that force_parallel_mode was not doing as much as it ought to do. > > Yeah, I just ran into a different reason to think that. I tried marking > MinMaxAggPaths in the same way as other relation-scan paths, ie > > pathnode->path.parallel_safe = rel->consider_parallel; > > but that did nothing because the just-created UPPERREL_GROUP_AGG rel > didn't have its consider_parallel flag set yet. Moreover, if I'd tried to > fix that by invoking set_grouped_rel_consider_parallel() from planagg.c, > it would still not work right because set_grouped_rel_consider_parallel() > is hard-wired to consider only partial aggregation, which is not what's > going on in a MinMaxAggPath. I'm not sure about the best rewrite here, > but I would urge you to make sure that consider_parallel on an upper rel > reflects only properties inherent to the rel (eg, safeness of quals that > must be applied there) and not properties of specific implementation > methods. Yeah, I think that David and I goofed this up when adding parallel aggregation. I just wasn't thinking clearly about the way upper rels needed to work with consider_parallel. If you would be willing to do any sort of review of the WIP patch I posted just upthread, that would help. > Also, please make sure that whatever logic is involved remains > factored out where it can be called by an extension that might want to > create an upperrel sooner than the core code would do. Again, please have a look at the patch and see if it seems unsuitable to you for some reason. I'm not confident of my ability to get all of this path stuff right without a bit of help from the guy who designed it. > BTW, do we need wholePlanParallelSafe at all, and if so why? Can't > it be replaced by examining the parallel_safe flag on the selected > topmost Path? Oh, man. I think you're right. This is yet another piece of fallout from upper planner pathification. That was absolutely necessary before, but now if we do the other bits right we don't need it any more. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > Again, please have a look at the patch and see if it seems unsuitable > to you for some reason. I'm not confident of my ability to get all of > this path stuff right without a bit of help from the guy who designed > it. I'm kind of in a bind right now because Tomas has produced an FK-selectivity patch rewrite on schedule, and I already committed to review that before beta2, so I better go do that first. If you can wait awhile I will try to help out more with parallel query. Having said that, the main thing I noticed in your draft patch is that you'd removed set_grouped_rel_consider_parallel() in favor of hard-wiring this in create_grouping_paths(): + if (input_rel->consider_parallel && + !has_parallel_hazard((Node *) target->exprs, false) && + !has_parallel_hazard((Node *) parse->havingQual, false)) + grouped_rel->consider_parallel = true; I think this is bad because it forces any external creators of UPPERREL_GROUP_AGG to duplicate that code --- and heaven help everybody if anyone is out of sync on whether to set the flag. So I'd rather keep set_grouped_rel_consider_parallel(), even if all it does is the above. And make it global not static. Ditto for the other upperrels. Also, I wonder whether we could reduce that test to just the has_parallel_hazard tests, so as to avoid the ordering dependency of needing to create the top scan/join rel (and set its consider_parallel flag) before we can create the UPPERREL_GROUP_AGG rel. This would mean putting more dependency on per-path parallel_safe flags to detect whether you can't parallelize a given step for lack of parallel-safe input, but I'm not sure that that's a bad thing. regards, tom lane
On Mon, Jun 13, 2016 at 6:21 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> Again, please have a look at the patch and see if it seems unsuitable >> to you for some reason. I'm not confident of my ability to get all of >> this path stuff right without a bit of help from the guy who designed >> it. > > I'm kind of in a bind right now because Tomas has produced an > FK-selectivity patch rewrite on schedule, and I already committed to > review that before beta2, so I better go do that first. If you can wait > awhile I will try to help out more with parallel query. I'm happy to have you look at his patch first. > Having said that, the main thing I noticed in your draft patch is that > you'd removed set_grouped_rel_consider_parallel() in favor of hard-wiring > this in create_grouping_paths(): > > + if (input_rel->consider_parallel && > + !has_parallel_hazard((Node *) target->exprs, false) && > + !has_parallel_hazard((Node *) parse->havingQual, false)) > + grouped_rel->consider_parallel = true; > > I think this is bad because it forces any external creators of > UPPERREL_GROUP_AGG to duplicate that code --- and heaven help everybody > if anyone is out of sync on whether to set the flag. So I'd rather keep > set_grouped_rel_consider_parallel(), even if all it does is the above. > And make it global not static. Ditto for the other upperrels. I'm slightly mystified by this because we really shouldn't be setting that flag more than once. We don't want to do that work repeatedly, just once, and prior to adding any paths to the rel. Are you imagining that somebody would try to created grouped paths before we finish scan/join planning? > Also, I wonder whether we could reduce that test to just the > has_parallel_hazard tests, so as to avoid the ordering dependency of > needing to create the top scan/join rel (and set its consider_parallel > flag) before we can create the UPPERREL_GROUP_AGG rel. This would mean > putting more dependency on per-path parallel_safe flags to detect whether > you can't parallelize a given step for lack of parallel-safe input, but > I'm not sure that that's a bad thing. It doesn't sound like an especially good thing to me. Skipping all parallel path generation is quite a bit less work than trying each path in turn and realizing that none of them will work, and there are various places where we optimize on that basis. I don't understand, in any event, why it makes any sense to create the UPPERREL_GROUP_AGG rel before we finish scan/join planning. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Jun 13, 2016 at 9:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> I wrote:
> > Amit Kapila <amit.kapila@enterprisedb.com> writes:
> >> It is slightly tricky to write a reproducible parallel-query test, but
> >> point taken and I think we should try to have a test unless such a test is
> >> really time consuming.
>
> > BTW, decent regression tests could be written without the need to create
> > enormous tables if the minimum rel size in create_plain_partial_paths()
> > could be configured to something less than 1000 blocks. I think it's
> > fairly crazy that that arbitrary constant is hard-wired anyway. Should
> > we make it a GUC?
>
> Just as an experiment to see what would happen, I did
>
> - int parallel_threshold = 1000;
> + int parallel_threshold = 1;
>
> and ran the regression tests. I got a core dump in the window.sql test:
>
> Program terminated with signal 11, Segmentation fault.
> #0 0x0000000000664dbc in make_partialgroup_input_target (root=0x1795018,
> input_rel=0x17957a8, target=0x17bf228, rollup_lists=0x0,
> rollup_groupclauses=0x0) at planner.c:4307
> 4307 Index sgref = final_target->sortgrouprefs[i];
> (gdb) bt
> #0 0x0000000000664dbc in make_partialgroup_input_target (root=0x1795018,
> input_rel=0x17957a8, target=0x17bf228, rollup_lists=0x0,
> rollup_groupclauses=0x0) at planner.c:4307
> #1 create_grouping_paths (root=0x1795018, input_rel=0x17957a8,
> target=0x17bf228, rollup_lists=0x0, rollup_groupclauses=0x0)
> at planner.c:3420
> #2 0x0000000000667405 in grouping_planner (root=0x1795018,
> inheritance_update=0 '\000', tuple_fraction=0) at planner.c:1794
> #3 0x0000000000668c80 in subquery_planner (glob=<value optimized out>,
> parse=0x1703580, parent_root=<value optimized out>,
> hasRecursion=<value optimized out>, tuple_fraction=0) at planner.c:769
> #4 0x0000000000668ea5 in standard_planner (parse=0x1703580,
> cursorOptions=256, boundParams=0x0) at planner.c:308
> #5 0x00000000006691b6 in planner (parse=<value optimized out>,
> cursorOptions=<value optimized out>, boundParams=<value optimized out>)
> at planner.c:178
> #6 0x00000000006fb069 in pg_plan_query (querytree=0x1703580,
> cursorOptions=256, boundParams=0x0) at postgres.c:798
> (gdb) p debug_query_string
> $1 = 0x1702078 "SELECT SUM(COUNT(f1)) OVER () FROM int4_tbl WHERE f1=42;"
>
> which I think may be another manifestation of the failure-to-apply-proper-
> pathtarget issue we're looking at in this thread. Or maybe it's just
> an unjustified assumption in make_partialgroup_input_target that the
> input path must always have some sortgrouprefs assigned.
>
I don't see this problem after your recent commit - 89d53515e53ea080029894939118365b647489b3. Are you still getting it?
>
> Before getting to that point, there was also an unexplainable plan change:
>
Yeah, I am also seeing such a plan change, but I this is a separate thing to investigate.
>
> I wrote:
> > Amit Kapila <amit.kapila@enterprisedb.com> writes:
> >> It is slightly tricky to write a reproducible parallel-query test, but
> >> point taken and I think we should try to have a test unless such a test is
> >> really time consuming.
>
> > BTW, decent regression tests could be written without the need to create
> > enormous tables if the minimum rel size in create_plain_partial_paths()
> > could be configured to something less than 1000 blocks. I think it's
> > fairly crazy that that arbitrary constant is hard-wired anyway. Should
> > we make it a GUC?
>
> Just as an experiment to see what would happen, I did
>
> - int parallel_threshold = 1000;
> + int parallel_threshold = 1;
>
> and ran the regression tests. I got a core dump in the window.sql test:
>
> Program terminated with signal 11, Segmentation fault.
> #0 0x0000000000664dbc in make_partialgroup_input_target (root=0x1795018,
> input_rel=0x17957a8, target=0x17bf228, rollup_lists=0x0,
> rollup_groupclauses=0x0) at planner.c:4307
> 4307 Index sgref = final_target->sortgrouprefs[i];
> (gdb) bt
> #0 0x0000000000664dbc in make_partialgroup_input_target (root=0x1795018,
> input_rel=0x17957a8, target=0x17bf228, rollup_lists=0x0,
> rollup_groupclauses=0x0) at planner.c:4307
> #1 create_grouping_paths (root=0x1795018, input_rel=0x17957a8,
> target=0x17bf228, rollup_lists=0x0, rollup_groupclauses=0x0)
> at planner.c:3420
> #2 0x0000000000667405 in grouping_planner (root=0x1795018,
> inheritance_update=0 '\000', tuple_fraction=0) at planner.c:1794
> #3 0x0000000000668c80 in subquery_planner (glob=<value optimized out>,
> parse=0x1703580, parent_root=<value optimized out>,
> hasRecursion=<value optimized out>, tuple_fraction=0) at planner.c:769
> #4 0x0000000000668ea5 in standard_planner (parse=0x1703580,
> cursorOptions=256, boundParams=0x0) at planner.c:308
> #5 0x00000000006691b6 in planner (parse=<value optimized out>,
> cursorOptions=<value optimized out>, boundParams=<value optimized out>)
> at planner.c:178
> #6 0x00000000006fb069 in pg_plan_query (querytree=0x1703580,
> cursorOptions=256, boundParams=0x0) at postgres.c:798
> (gdb) p debug_query_string
> $1 = 0x1702078 "SELECT SUM(COUNT(f1)) OVER () FROM int4_tbl WHERE f1=42;"
>
> which I think may be another manifestation of the failure-to-apply-proper-
> pathtarget issue we're looking at in this thread. Or maybe it's just
> an unjustified assumption in make_partialgroup_input_target that the
> input path must always have some sortgrouprefs assigned.
>
I don't see this problem after your recent commit - 89d53515e53ea080029894939118365b647489b3. Are you still getting it?
>
> Before getting to that point, there was also an unexplainable plan change:
>
Yeah, I am also seeing such a plan change, but I this is a separate thing to investigate.
On 14 June 2016 at 04:07, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Just as an experiment to see what would happen, I did > > - int parallel_threshold = 1000; > + int parallel_threshold = 1; > > and ran the regression tests. I got a core dump in the window.sql test: > > Program terminated with signal 11, Segmentation fault. > #0 0x0000000000664dbc in make_partialgroup_input_target (root=0x1795018, > input_rel=0x17957a8, target=0x17bf228, rollup_lists=0x0, > rollup_groupclauses=0x0) at planner.c:4307 > 4307 Index sgref = final_target->sortgrouprefs[i]; > (gdb) bt > #0 0x0000000000664dbc in make_partialgroup_input_target (root=0x1795018, > input_rel=0x17957a8, target=0x17bf228, rollup_lists=0x0, > rollup_groupclauses=0x0) at planner.c:4307 > #1 create_grouping_paths (root=0x1795018, input_rel=0x17957a8, > target=0x17bf228, rollup_lists=0x0, rollup_groupclauses=0x0) > at planner.c:3420 > #2 0x0000000000667405 in grouping_planner (root=0x1795018, > inheritance_update=0 '\000', tuple_fraction=0) at planner.c:1794 > #3 0x0000000000668c80 in subquery_planner (glob=<value optimized out>, > parse=0x1703580, parent_root=<value optimized out>, > hasRecursion=<value optimized out>, tuple_fraction=0) at planner.c:769 > #4 0x0000000000668ea5 in standard_planner (parse=0x1703580, > cursorOptions=256, boundParams=0x0) at planner.c:308 > #5 0x00000000006691b6 in planner (parse=<value optimized out>, > cursorOptions=<value optimized out>, boundParams=<value optimized out>) > at planner.c:178 > #6 0x00000000006fb069 in pg_plan_query (querytree=0x1703580, > cursorOptions=256, boundParams=0x0) at postgres.c:798 > (gdb) p debug_query_string > $1 = 0x1702078 "SELECT SUM(COUNT(f1)) OVER () FROM int4_tbl WHERE f1=42;" I see you've committed a fix for this. Thank you. I thought it would be good to be consistent with the ressortgroupref handling, and I quite like your fix in that regard. Do you think it's worth also applying the attached so as to have ressortgroupref set to NULL more consistently, instead of sometimes NULL and other times allocated to memory wastefully filled with zeros? The patch also saved on allocating the array's memory when it's not needed. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On Mon, Jun 13, 2016 at 8:11 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Amit Kapila <amit.kapila@enterprisedb.com> writes:
> > On Mon, Jun 13, 2016 at 7:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> I think the real question here is why the code removed by 04ae11f62
> >> was wrong. It was unsafe to use apply_projection_to_path, certainly,
> >> but using create_projection_path directly would have avoided the
> >> stated problem. And it's very unclear that this new patch doesn't
> >> bring back that bug in a different place.
>
> > This new patch still doesn't seem to be right, but it won't bring back the
> > original problem because apply_projection_to_path will be only done if
> > grouped_rel is parallel_safe which means it doesn't have any
> > parallel-unsafe or parallel-restricted clause in quals or target list.
>
> The problem cited in 04ae11f62's commit message is that
> apply_projection_to_path would overwrite the paths' pathtargets in-place,
> causing problems if they'd been used for other purposes elsewhere.
>
> Amit Kapila <amit.kapila@enterprisedb.com> writes:
> > On Mon, Jun 13, 2016 at 7:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> I think the real question here is why the code removed by 04ae11f62
> >> was wrong. It was unsafe to use apply_projection_to_path, certainly,
> >> but using create_projection_path directly would have avoided the
> >> stated problem. And it's very unclear that this new patch doesn't
> >> bring back that bug in a different place.
>
> > This new patch still doesn't seem to be right, but it won't bring back the
> > original problem because apply_projection_to_path will be only done if
> > grouped_rel is parallel_safe which means it doesn't have any
> > parallel-unsafe or parallel-restricted clause in quals or target list.
>
> The problem cited in 04ae11f62's commit message is that
> apply_projection_to_path would overwrite the paths' pathtargets in-place,
> causing problems if they'd been used for other purposes elsewhere.
>
The main reason for removal of that code is that because there was no check there to prevent assigning of parallel-restricted clauses to pathtarget of partial paths. I think the same is indicated in commit message as well, if we focus on below part of commit message:
"especially because this code has no check that the PathTarget is in fact parallel-safe."
Due to above reason, I don't see how the suggestion given by you to avoid the problem by using create_projection_path can work.
> I do
> not share your confidence that using apply_projection_to_path within
> create_grouping_paths is free of such a hazard.
>
> not share your confidence that using apply_projection_to_path within
> create_grouping_paths is free of such a hazard.
>
Fair enough, let me try to explain the problem and someways to solve it in some more detail. The main thing that got missed by me in the patch related to commit-04ae11f62 is that the partial path list of rel also needs to have a scanjoin_target. I was under assumption that create_grouping_paths will take care of assigning appropriate Path targets for the parallel paths generated by it. If we see, create_grouping_paths() do take care of adding the appropriate path targets for the paths generated by that function. However, it doesn't do anything for partial paths. The patch sent by me yesterday [1] which was trying to assign partial_grouping_target to partial paths won't be the right fix, because (a) partial_grouping_target includes Aggregates (refer make_partialgroup_input_target) which we don't want for partial paths; (b) it is formed using grouping_target passed in function create_grouping_paths() whereas we need the path target formed from final_target for partial paths (as partial paths are scanjoin relations) as is done for main path list (in grouping_planner(), /* Forcibly apply that target to all the Paths for the scan/join rel.*/). Now, I think we have following ways to solve it:
(a) check whether the scanjoin_target contains any parallel-restricted clause before applying the same to partial path list in grouping_planner. However it could lead to duplicate checks in some cases for parallel-restricted clause, once in apply_projection_to_path() for main pathlist and then again before applying to partial paths. I think we can avoid that by having an additional parameter in apply_projection_to_path() which can indicate if the check for parallel-safety is done inside that function and what is the result of same.
(b) generate the appropriate scanjoin_target for partial path list in create_grouping_paths using final_target. However I think this might lead to some duplicate code in create_grouping_paths() as we might have to some thing similar to what we have done in grouping_planner for generating such a target. I think if we want we can pass scanjoin_target to create_grouping_paths() as well. Again, we have to check for parallel-safety for scanjoin_target before applying it to partial paths, but we need to do that only when grouped_rel is considered parallel-safe.
Thoughts?
On Tue, Jun 14, 2016 at 2:16 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Robert Haas <robertmhaas@gmail.com> writes:
> > In practice, we don't yet have the ability for
> > parallel-safe paths from subqueries to affect planning at higher query
> > levels, but that's because the pathification stuff landed too late in
> > the cycle for me to fully react to it.
>
> I wonder if that's not just from confusion between subplans and
> subqueries.
>
> Robert Haas <robertmhaas@gmail.com> writes:
> > In practice, we don't yet have the ability for
> > parallel-safe paths from subqueries to affect planning at higher query
> > levels, but that's because the pathification stuff landed too late in
> > the cycle for me to fully react to it.
>
> I wonder if that's not just from confusion between subplans and
> subqueries.
>
Won't the patch as written will allow parallel-restricted things to be pushed to workers for UNION ALL kind of queries?
Explain verbose Select * from (SELECT c1+1 FROM t1 UNION ALL SELECT c1+1 FROM t2 where c1 < 10) ss(a);
In above kind of queries, set_rel_consider_parallel() might set consider_parallel as true for rel, but later in set_append_rel_size(), it might pull some unsafe clauses in target of childrel. Basically, I am wondering about the same problem as we discussed here [1].
David Rowley <david.rowley@2ndquadrant.com> writes: > Do you think it's worth also applying the attached so as to have > ressortgroupref set to NULL more consistently, instead of sometimes > NULL and other times allocated to memory wastefully filled with zeros? Meh --- that seems to add complication without buying a whole lot. regards, tom lane
On Mon, Jun 13, 2016 at 10:36 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Robert Haas <robertmhaas@gmail.com> writes:
> > On Mon, Jun 13, 2016 at 11:02 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> BTW, decent regression tests could be written without the need to create
> >> enormous tables if the minimum rel size in create_plain_partial_paths()
> >> could be configured to something less than 1000 blocks. I think it's
> >> fairly crazy that that arbitrary constant is hard-wired anyway. Should
> >> we make it a GUC?
>
> > That was proposed before, and I didn't do it mostly because I couldn't
> > think of a name for it that didn't sound unbelievably corny.
>
> min_parallel_relation_size, or min_parallelizable_relation_size, or
> something like that?
>
With Regards,
>
> Robert Haas <robertmhaas@gmail.com> writes:
> > On Mon, Jun 13, 2016 at 11:02 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> BTW, decent regression tests could be written without the need to create
> >> enormous tables if the minimum rel size in create_plain_partial_paths()
> >> could be configured to something less than 1000 blocks. I think it's
> >> fairly crazy that that arbitrary constant is hard-wired anyway. Should
> >> we make it a GUC?
>
> > That was proposed before, and I didn't do it mostly because I couldn't
> > think of a name for it that didn't sound unbelievably corny.
>
> min_parallel_relation_size, or min_parallelizable_relation_size, or
> something like that?
>
You are right that such a variable will make it simpler to write tests for parallel query. I have implemented such a guc and choose to keep the name as min_parallel_relation_size. One thing to note is that in function create_plain_partial_paths(), curently it is using PG_INT32_MAX/3 for parallel_threshold to check for overflow, I have changed it to INT_MAX/3 so as to be consistent with guc.c. I am not sure if it is advisable to use PG_INT32_MAX in guc.c as other similar parameters use INT_MAX.
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
Attachment
Robert Haas <robertmhaas@gmail.com> writes: > On Mon, Jun 13, 2016 at 6:21 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I think this is bad because it forces any external creators of >> UPPERREL_GROUP_AGG to duplicate that code --- and heaven help everybody >> if anyone is out of sync on whether to set the flag. So I'd rather keep >> set_grouped_rel_consider_parallel(), even if all it does is the above. >> And make it global not static. Ditto for the other upperrels. > I'm slightly mystified by this because we really shouldn't be setting > that flag more than once. We don't want to do that work repeatedly, > just once, and prior to adding any paths to the rel. Are you > imagining that somebody would try to created grouped paths before we > finish scan/join planning? Exactly. The original pathification design contemplated that FDWs and other extensions could inject Paths for the upperrels whenever they felt like it, specifically during relation path building. Even if you think that's silly, it *must* be possible to do it during GetForeignUpperPaths, else that hook is completely useless. Therefore, adding any data other than new Paths to the upperrel during create_grouping_paths is a broken design unless there is some easy, reliable, and not too expensive way for an FDW to make the same thing happen a bit earlier. See the commentary in optimizer/README starting at line 922. I generally don't like rel->consider_parallel at all for basically this reason, but it may be too late to undo that aspect of the parallel join planning design. I will settle for having an API call that allows FDWs to get the flag set correctly. Another possibility is to set the flag before calling GetForeignUpperPaths, but that might be messy too. regards, tom lane
Amit Kapila <amit.kapila16@gmail.com> writes: > On Mon, Jun 13, 2016 at 9:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> ... I got a core dump in the window.sql test: >> which I think may be another manifestation of the failure-to-apply-proper- >> pathtarget issue we're looking at in this thread. Or maybe it's just >> an unjustified assumption in make_partialgroup_input_target that the >> input path must always have some sortgrouprefs assigned. > I don't see this problem after your recent commit > - 89d53515e53ea080029894939118365b647489b3. Are you still getting it? No. I am suspicious that there's still a bug there, ie we're looking at the wrong pathtarget; but the regression test doesn't actually crash. That might only be because we don't choose the parallelized path. regards, tom lane
On Tue, Jun 14, 2016 at 4:48 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Jun 13, 2016 at 8:11 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:> I do
> not share your confidence that using apply_projection_to_path within
> create_grouping_paths is free of such a hazard.
>Fair enough, let me try to explain the problem and someways to solve it in some more detail. The main thing that got missed by me in the patch related to commit-04ae11f62 is that the partial path list of rel also needs to have a scanjoin_target. I was under assumption that create_grouping_paths will take care of assigning appropriate Path targets for the parallel paths generated by it. If we see, create_grouping_paths() do take care of adding the appropriate path targets for the paths generated by that function. However, it doesn't do anything for partial paths. The patch sent by me yesterday [1] which was trying to assign partial_grouping_target to partial paths won't be the right fix, because (a) partial_grouping_target includes Aggregates (refer make_partialgroup_input_target) which we don't want for partial paths; (b) it is formed using grouping_target passed in function create_grouping_paths() whereas we need the path target formed from final_target for partial paths (as partial paths are scanjoin relations) as is done for main path list (in grouping_planner(), /* Forcibly apply that target to all the Paths for the scan/join rel.*/). Now, I think we have following ways to solve it:(a) check whether the scanjoin_target contains any parallel-restricted clause before applying the same to partial path list in grouping_planner. However it could lead to duplicate checks in some cases for parallel-restricted clause, once in apply_projection_to_path() for main pathlist and then again before applying to partial paths. I think we can avoid that by having an additional parameter in apply_projection_to_path() which can indicate if the check for parallel-safety is done inside that function and what is the result of same.(b) generate the appropriate scanjoin_target for partial path list in create_grouping_paths using final_target. However I think this might lead to some duplicate code in create_grouping_paths() as we might have to some thing similar to what we have done in grouping_planner for generating such a target. I think if we want we can pass scanjoin_target to create_grouping_paths() as well. Again, we have to check for parallel-safety for scanjoin_target before applying it to partial paths, but we need to do that only when grouped_rel is considered parallel-safe.
One more idea could be to have a flag (say parallel_safe) in PathTarget and set it once we have ensured that the expressions in target doesn't contain any parallel-restricted entry. In create_pathtarget()/set_pathtarget_cost_width(), if the parallelmodeOK flag is set, then we can evaluate target expressions for parallel-restricted expressions and set the flag accordingly. Now, this has the benefit that we don't need to evaluate the expressions of targets for parallel-restricted clauses again and again. I think this way if the flag is set once for final_target in grouping_planner, we don't need to evaluate it again for other targets (grouping_target, scanjoin_target, etc.) as those are derived from final_target. Similarly, I think we need to set ReplOptInfo->reltarget and others as required.
On Tue, Jun 14, 2016 at 12:18 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Amit Kapila <amit.kapila16@gmail.com> writes: >> On Mon, Jun 13, 2016 at 9:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> ... I got a core dump in the window.sql test: >>> which I think may be another manifestation of the failure-to-apply-proper- >>> pathtarget issue we're looking at in this thread. Or maybe it's just >>> an unjustified assumption in make_partialgroup_input_target that the >>> input path must always have some sortgrouprefs assigned. > >> I don't see this problem after your recent commit >> - 89d53515e53ea080029894939118365b647489b3. Are you still getting it? > > No. I am suspicious that there's still a bug there, ie we're looking at > the wrong pathtarget; but the regression test doesn't actually crash. > That might only be because we don't choose the parallelized path. OK, so there are quite a number of separate things here: 1. The case originally reported by Thomas Munro still fails. To fix that, we probably need to apply scanjoin_target to each partial path. But we can only do that if it's parallel-safe. It seems like what we want is something like this: (1) During scan/join planning, somehow skip calling generate_gather_paths for the topmost scan/join rel as we do to all the others. (2) If scanjoin_target is not parallel-safe, build a path for the scan/join phase that applies a Gather node to the cheapest path and does projection at the Gather node. Then forget all the partial paths so we can't do any bogus upper planning. (3) If scanjoin_target is parallel-safe, replace the list of partial paths for the topmost scan/join rel with a new list where scanjoin_target has been applied to each one. I haven't tested this so I might be totally off-base about what's actually required here... 2. In https://www.postgresql.org/message-id/15695.1465827695@sss.pgh.pa.us Tom raised the issue that we need some test cases covering this area. 3. In https://www.postgresql.org/message-id/25521.1465837597@sss.pgh.pa.us Tom proposed adding a GUC to set the minimum relation size required for consideration of parallelism. 4. In https://www.postgresql.org/message-id/1597.1465846969@sss.pgh.pa.us Tom raised the question of whether there is a danger that MinMaxAggPath might not be parallel-safe. 5. In https://www.postgresql.org/message-id/20391.1465850779@sss.pgh.pa.us Tom proposed a patch to fix an apparent confusion on my part between subplans and subqueries. 6. In https://www.postgresql.org/message-id/CA+TgmoZwJB9EaiBj-MEeAQ913WkKOz4aQ7VQnCfrDLs9WYhZdQ@mail.gmail.com I discussed the need to fix the way consider_parallel is set for upper rels, and in https://www.postgresql.org/message-id/22832.1465854401@sss.pgh.pa.us Tom observed that, once that work is done, we can get rid of the wholePlanParallelSafe flag. I don't think it's remotely realistic to get all of this fixed in time for beta2; I think we'll be lucky if we can get #1 fixed. But we'd better track all of these issues so that we can crank through them later. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Jun 16, 2016 at 9:26 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Tue, Jun 14, 2016 at 12:18 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Amit Kapila <amit.kapila16@gmail.com> writes:
> >> On Mon, Jun 13, 2016 at 9:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >>> ... I got a core dump in the window.sql test:
> >>> which I think may be another manifestation of the failure-to-apply-proper-
> >>> pathtarget issue we're looking at in this thread. Or maybe it's just
> >>> an unjustified assumption in make_partialgroup_input_target that the
> >>> input path must always have some sortgrouprefs assigned.
> >
> >> I don't see this problem after your recent commit
> >> - 89d53515e53ea080029894939118365b647489b3. Are you still getting it?
> >
> > No. I am suspicious that there's still a bug there, ie we're looking at
> > the wrong pathtarget; but the regression test doesn't actually crash.
> > That might only be because we don't choose the parallelized path.
>
> OK, so there are quite a number of separate things here:
>
> 1. The case originally reported by Thomas Munro still fails. To fix
> that, we probably need to apply scanjoin_target to each partial path.
> But we can only do that if it's parallel-safe. It seems like what we
> want is something like this: (1) During scan/join planning, somehow
> skip calling generate_gather_paths for the topmost scan/join rel as we
> do to all the others. (2) If scanjoin_target is not parallel-safe,
> build a path for the scan/join phase that applies a Gather node to the
> cheapest path and does projection at the Gather node. Then forget all
> the partial paths so we can't do any bogus upper planning. (3) If
> scanjoin_target is parallel-safe, replace the list of partial paths
> for the topmost scan/join rel with a new list where scanjoin_target
> has been applied to each one. I haven't tested this so I might be
> totally off-base about what's actually required here...
>
>
> On Tue, Jun 14, 2016 at 12:18 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Amit Kapila <amit.kapila16@gmail.com> writes:
> >> On Mon, Jun 13, 2016 at 9:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >>> ... I got a core dump in the window.sql test:
> >>> which I think may be another manifestation of the failure-to-apply-proper-
> >>> pathtarget issue we're looking at in this thread. Or maybe it's just
> >>> an unjustified assumption in make_partialgroup_input_target that the
> >>> input path must always have some sortgrouprefs assigned.
> >
> >> I don't see this problem after your recent commit
> >> - 89d53515e53ea080029894939118365b647489b3. Are you still getting it?
> >
> > No. I am suspicious that there's still a bug there, ie we're looking at
> > the wrong pathtarget; but the regression test doesn't actually crash.
> > That might only be because we don't choose the parallelized path.
>
> OK, so there are quite a number of separate things here:
>
> 1. The case originally reported by Thomas Munro still fails. To fix
> that, we probably need to apply scanjoin_target to each partial path.
> But we can only do that if it's parallel-safe. It seems like what we
> want is something like this: (1) During scan/join planning, somehow
> skip calling generate_gather_paths for the topmost scan/join rel as we
> do to all the others. (2) If scanjoin_target is not parallel-safe,
> build a path for the scan/join phase that applies a Gather node to the
> cheapest path and does projection at the Gather node. Then forget all
> the partial paths so we can't do any bogus upper planning. (3) If
> scanjoin_target is parallel-safe, replace the list of partial paths
> for the topmost scan/join rel with a new list where scanjoin_target
> has been applied to each one. I haven't tested this so I might be
> totally off-base about what's actually required here...
>
I think we can achieve it just by doing something like what you have mentioned in (2) and (3). I am not sure if there is a need to skip generation of gather paths for top scan/join node. Please see the patch attached. I have just done some minimal testing to ensure that problem reported by Thomas Munro in this thread is fixed and verified that the fix is sane for problems [1][2] reported by sqlsmith. If you think this is on right lines, I can try to do more verification and try to add tests.
> 2. In https://www.postgresql.org/message-id/15695.1465827695@sss.pgh.pa.us
> Tom raised the issue that we need some test cases covering this area.
>
> 3. In https://www.postgresql.org/message-id/25521.1465837597@sss.pgh.pa.us
> Tom proposed adding a GUC to set the minimum relation size required
> for consideration of parallelism.
>
I have posted a patch for this upthread [3]. The main thing to verify is the default value of guc.
Attachment
On Thu, Jun 16, 2016 at 8:00 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: >> 1. The case originally reported by Thomas Munro still fails. To fix >> that, we probably need to apply scanjoin_target to each partial path. >> But we can only do that if it's parallel-safe. It seems like what we >> want is something like this: (1) During scan/join planning, somehow >> skip calling generate_gather_paths for the topmost scan/join rel as we >> do to all the others. (2) If scanjoin_target is not parallel-safe, >> build a path for the scan/join phase that applies a Gather node to the >> cheapest path and does projection at the Gather node. Then forget all >> the partial paths so we can't do any bogus upper planning. (3) If >> scanjoin_target is parallel-safe, replace the list of partial paths >> for the topmost scan/join rel with a new list where scanjoin_target >> has been applied to each one. I haven't tested this so I might be >> totally off-base about what's actually required here... > > I think we can achieve it just by doing something like what you have > mentioned in (2) and (3). I am not sure if there is a need to skip > generation of gather paths for top scan/join node. Please see the patch > attached. I have just done some minimal testing to ensure that problem > reported by Thomas Munro in this thread is fixed and verified that the fix > is sane for problems [1][2] reported by sqlsmith. If you think this is on > right lines, I can try to do more verification and try to add tests. You can't do it this way because of the issue Tom discussed here: https://www.postgresql.org/message-id/16421.1465828862@sss.pgh.pa.us -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Amit Kapila <amit.kapila16@gmail.com> writes: > On Thu, Jun 16, 2016 at 9:26 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> 3. In https://www.postgresql.org/message-id/25521.1465837597@sss.pgh.pa.us >> Tom proposed adding a GUC to set the minimum relation size required >> for consideration of parallelism. > I have posted a patch for this upthread [3]. The main thing to verify is > the default value of guc. I'll review and push this patch, unless Robert or somebody else has already started on it. I concur with your choice of setting the default value to 1024 blocks; that's close to the existing behavior but as a power of 2 will look cleaner in GUC size units. regards, tom lane
On Thu, Jun 16, 2016 at 12:50 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Amit Kapila <amit.kapila16@gmail.com> writes: >> On Thu, Jun 16, 2016 at 9:26 AM, Robert Haas <robertmhaas@gmail.com> wrote: >>> 3. In https://www.postgresql.org/message-id/25521.1465837597@sss.pgh.pa.us >>> Tom proposed adding a GUC to set the minimum relation size required >>> for consideration of parallelism. > >> I have posted a patch for this upthread [3]. The main thing to verify is >> the default value of guc. > > I'll review and push this patch, unless Robert or somebody else has > already started on it. Great, it's all yours. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Amit Kapila <amit.kapila@enterprisedb.com> writes: > On Mon, Jun 13, 2016 at 10:36 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> min_parallel_relation_size, or min_parallelizable_relation_size, or >> something like that? > You are right that such a variable will make it simpler to write tests for > parallel query. I have implemented such a guc and choose to keep the name > as min_parallel_relation_size. Pushed with minor adjustments. My first experiments with this say that we should have done this long ago: https://www.postgresql.org/message-id/22782.1466100870@sss.pgh.pa.us > One thing to note is that in function > create_plain_partial_paths(), curently it is using PG_INT32_MAX/3 for > parallel_threshold to check for overflow, I have changed it to INT_MAX/3 so > as to be consistent with guc.c. I am not sure if it is advisable to use > PG_INT32_MAX in guc.c as other similar parameters use INT_MAX. I agree that using INT_MAX is more consistent with the code elsewhere in guc.c, and more correct given that we declare the variable in question as int not int32. But you need to include <limits.h> to use INT_MAX ... regards, tom lane
On Thu, Jun 16, 2016 at 12:16 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Thu, Jun 16, 2016 at 8:00 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: >>> 1. The case originally reported by Thomas Munro still fails. To fix >>> that, we probably need to apply scanjoin_target to each partial path. >>> But we can only do that if it's parallel-safe. It seems like what we >>> want is something like this: (1) During scan/join planning, somehow >>> skip calling generate_gather_paths for the topmost scan/join rel as we >>> do to all the others. (2) If scanjoin_target is not parallel-safe, >>> build a path for the scan/join phase that applies a Gather node to the >>> cheapest path and does projection at the Gather node. Then forget all >>> the partial paths so we can't do any bogus upper planning. (3) If >>> scanjoin_target is parallel-safe, replace the list of partial paths >>> for the topmost scan/join rel with a new list where scanjoin_target >>> has been applied to each one. I haven't tested this so I might be >>> totally off-base about what's actually required here... >> >> I think we can achieve it just by doing something like what you have >> mentioned in (2) and (3). I am not sure if there is a need to skip >> generation of gather paths for top scan/join node. Please see the patch >> attached. I have just done some minimal testing to ensure that problem >> reported by Thomas Munro in this thread is fixed and verified that the fix >> is sane for problems [1][2] reported by sqlsmith. If you think this is on >> right lines, I can try to do more verification and try to add tests. > > You can't do it this way because of the issue Tom discussed here: > > https://www.postgresql.org/message-id/16421.1465828862@sss.pgh.pa.us Something like what you have there might work if you use create_projection_path instead of apply_projection_to_path. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Jun 16, 2016 at 11:49 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Amit Kapila <amit.kapila@enterprisedb.com> writes:
> > On Mon, Jun 13, 2016 at 10:36 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> min_parallel_relation_size, or min_parallelizable_relation_size, or
> >> something like that?
>
> > You are right that such a variable will make it simpler to write tests for
> > parallel query. I have implemented such a guc and choose to keep the name
> > as min_parallel_relation_size.
>
> Pushed with minor adjustments.
Thanks.
>
> Amit Kapila <amit.kapila@enterprisedb.com> writes:
> > On Mon, Jun 13, 2016 at 10:36 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> min_parallel_relation_size, or min_parallelizable_relation_size, or
> >> something like that?
>
> > You are right that such a variable will make it simpler to write tests for
> > parallel query. I have implemented such a guc and choose to keep the name
> > as min_parallel_relation_size.
>
> Pushed with minor adjustments.
Thanks.
> > One thing to note is that in function
> > create_plain_partial_paths(), curently it is using PG_INT32_MAX/3 for
> > parallel_threshold to check for overflow, I have changed it to INT_MAX/3 so
> > as to be consistent with guc.c. I am not sure if it is advisable to use
> > PG_INT32_MAX in guc.c as other similar parameters use INT_MAX.
>
> I agree that using INT_MAX is more consistent with the code elsewhere in
> guc.c, and more correct given that we declare the variable in question
> as int not int32. But you need to include <limits.h> to use INT_MAX ...
>
I wonder why it has not given me any compilation error/warning. I have tried it on both Windows and CentOS, before sending the patch.
Amit Kapila <amit.kapila16@gmail.com> writes: > On Thu, Jun 16, 2016 at 11:49 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> But you need to include <limits.h> to use INT_MAX ... > I wonder why it has not given me any compilation error/warning. I have > tried it on both Windows and CentOS, before sending the patch. Some platforms seem to make it available from other headers too. But AFAIK, POSIX only requires <limits.h> to provide it. regards, tom lane
On Fri, Jun 17, 2016 at 3:20 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Thu, Jun 16, 2016 at 12:16 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> > On Thu, Jun 16, 2016 at 8:00 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >>> 1. The case originally reported by Thomas Munro still fails. To fix
> >>> that, we probably need to apply scanjoin_target to each partial path.
> >>> But we can only do that if it's parallel-safe. It seems like what we
> >>> want is something like this: (1) During scan/join planning, somehow
> >>> skip calling generate_gather_paths for the topmost scan/join rel as we
> >>> do to all the others. (2) If scanjoin_target is not parallel-safe,
> >>> build a path for the scan/join phase that applies a Gather node to the
> >>> cheapest path and does projection at the Gather node. Then forget all
> >>> the partial paths so we can't do any bogus upper planning. (3) If
> >>> scanjoin_target is parallel-safe, replace the list of partial paths
> >>> for the topmost scan/join rel with a new list where scanjoin_target
> >>> has been applied to each one. I haven't tested this so I might be
> >>> totally off-base about what's actually required here...
> >>
> >> I think we can achieve it just by doing something like what you have
> >> mentioned in (2) and (3). I am not sure if there is a need to skip
> >> generation of gather paths for top scan/join node. Please see the patch
> >> attached. I have just done some minimal testing to ensure that problem
> >> reported by Thomas Munro in this thread is fixed and verified that the fix
> >> is sane for problems [1][2] reported by sqlsmith. If you think this is on
> >> right lines, I can try to do more verification and try to add tests.
> >
> > You can't do it this way because of the issue Tom discussed here:
> >
> > https://www.postgresql.org/message-id/16421.1465828862@sss.pgh.pa.us
>
>
> On Thu, Jun 16, 2016 at 12:16 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> > On Thu, Jun 16, 2016 at 8:00 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >>> 1. The case originally reported by Thomas Munro still fails. To fix
> >>> that, we probably need to apply scanjoin_target to each partial path.
> >>> But we can only do that if it's parallel-safe. It seems like what we
> >>> want is something like this: (1) During scan/join planning, somehow
> >>> skip calling generate_gather_paths for the topmost scan/join rel as we
> >>> do to all the others. (2) If scanjoin_target is not parallel-safe,
> >>> build a path for the scan/join phase that applies a Gather node to the
> >>> cheapest path and does projection at the Gather node. Then forget all
> >>> the partial paths so we can't do any bogus upper planning. (3) If
> >>> scanjoin_target is parallel-safe, replace the list of partial paths
> >>> for the topmost scan/join rel with a new list where scanjoin_target
> >>> has been applied to each one. I haven't tested this so I might be
> >>> totally off-base about what's actually required here...
> >>
> >> I think we can achieve it just by doing something like what you have
> >> mentioned in (2) and (3). I am not sure if there is a need to skip
> >> generation of gather paths for top scan/join node. Please see the patch
> >> attached. I have just done some minimal testing to ensure that problem
> >> reported by Thomas Munro in this thread is fixed and verified that the fix
> >> is sane for problems [1][2] reported by sqlsmith. If you think this is on
> >> right lines, I can try to do more verification and try to add tests.
> >
> > You can't do it this way because of the issue Tom discussed here:
> >
> > https://www.postgresql.org/message-id/16421.1465828862@sss.pgh.pa.us
>
I think although we are always adding a projection path in apply_projection_to_path() to subpath of Gather path if target is parallel safe, still they can be used directly if create_projection_plan() doesn't add a Result node. So changing them directly after being pushed below Gather is not a safe bet.
> Something like what you have there might work if you use
> create_projection_path instead of apply_projection_to_path.
>
Okay, I think you mean to say use create_projection_path() while applying scanjoin_target to partial path lists in below code:
foreach(lc, current_rel->partial_pathlist)
{
Path *subpath = (Path *) lfirst(lc);
Assert(subpath->param_info == NULL);
lfirst(lc) = apply_projection_to_path(root, current_rel,
subpath, scanjoin_target,
scanjoin_target_parallel_safe);
}
{
Path *subpath = (Path *) lfirst(lc);
Assert(subpath->param_info == NULL);
lfirst(lc) = apply_projection_to_path(root, current_rel,
subpath, scanjoin_target,
scanjoin_target_parallel_safe);
}
På torsdag 16. juni 2016 kl. 20:19:44, skrev Tom Lane <tgl@sss.pgh.pa.us>:
Amit Kapila <amit.kapila@enterprisedb.com> writes:
> On Mon, Jun 13, 2016 at 10:36 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> min_parallel_relation_size, or min_parallelizable_relation_size, or
>> something like that?
> You are right that such a variable will make it simpler to write tests for
> parallel query. I have implemented such a guc and choose to keep the name
> as min_parallel_relation_size.
Pushed with minor adjustments. My first experiments with this say that
we should have done this long ago:
https://www.postgresql.org/message-id/22782.1466100870@sss.pgh.pa.us
> One thing to note is that in function
> create_plain_partial_paths(), curently it is using PG_INT32_MAX/3 for
> parallel_threshold to check for overflow, I have changed it to INT_MAX/3 so
> as to be consistent with guc.c. I am not sure if it is advisable to use
> PG_INT32_MAX in guc.c as other similar parameters use INT_MAX.
I agree that using INT_MAX is more consistent with the code elsewhere in
guc.c, and more correct given that we declare the variable in question
as int not int32. But you need to include <limits.h> to use INT_MAX ...
regards, tom lane
As of 4c56f3269a84a81461cc53941e0eee02fc920ab6 I'm still getting it in one of my queries:
ORDER/GROUP BY expression not found in targetlist
Am I missing something?
I could dig into this further to reproduce if necessary.
--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
Attachment
On Fri, Jun 17, 2016 at 11:39 AM, Andreas Joseph Krogh <andreas@visena.com> wrote:
No, the fix is still not committed.
På torsdag 16. juni 2016 kl. 20:19:44, skrev Tom Lane <tgl@sss.pgh.pa.us>:Amit Kapila <amit.kapila@enterprisedb.com> writes:
> On Mon, Jun 13, 2016 at 10:36 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> min_parallel_relation_size, or min_parallelizable_relation_size, or
>> something like that?
> You are right that such a variable will make it simpler to write tests for
> parallel query. I have implemented such a guc and choose to keep the name
> as min_parallel_relation_size.
Pushed with minor adjustments. My first experiments with this say that
we should have done this long ago:
https://www.postgresql.org/message-id/22782.1466100870@sss.pgh.pa.us
> One thing to note is that in function
> create_plain_partial_paths(), curently it is using PG_INT32_MAX/3 for
> parallel_threshold to check for overflow, I have changed it to INT_MAX/3 so
> as to be consistent with guc.c. I am not sure if it is advisable to use
> PG_INT32_MAX in guc.c as other similar parameters use INT_MAX.
I agree that using INT_MAX is more consistent with the code elsewhere in
guc.c, and more correct given that we declare the variable in question
as int not int32. But you need to include <limits.h> to use INT_MAX ...
regards, tom laneAs of 4c56f3269a84a81461cc53941e0eee02fc920ab6 I'm still getting it in one of my queries:ORDER/GROUP BY expression not found in targetlist
I am working on preparing a patch to fix this issue.
Am I missing something?
No, the fix is still not committed.
På fredag 17. juni 2016 kl. 08:14:39, skrev Amit Kapila <amit.kapila16@gmail.com>:
On Fri, Jun 17, 2016 at 11:39 AM, Andreas Joseph Krogh <andreas@visena.com> wrote:På torsdag 16. juni 2016 kl. 20:19:44, skrev Tom Lane <tgl@sss.pgh.pa.us>:Amit Kapila <amit.kapila@enterprisedb.com> writes:
> On Mon, Jun 13, 2016 at 10:36 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> min_parallel_relation_size, or min_parallelizable_relation_size, or
>> something like that?
> You are right that such a variable will make it simpler to write tests for
> parallel query. I have implemented such a guc and choose to keep the name
> as min_parallel_relation_size.
Pushed with minor adjustments. My first experiments with this say that
we should have done this long ago:
https://www.postgresql.org/message-id/22782.1466100870@sss.pgh.pa.us
> One thing to note is that in function
> create_plain_partial_paths(), curently it is using PG_INT32_MAX/3 for
> parallel_threshold to check for overflow, I have changed it to INT_MAX/3 so
> as to be consistent with guc.c. I am not sure if it is advisable to use
> PG_INT32_MAX in guc.c as other similar parameters use INT_MAX.
I agree that using INT_MAX is more consistent with the code elsewhere in
guc.c, and more correct given that we declare the variable in question
as int not int32. But you need to include <limits.h> to use INT_MAX ...
regards, tom laneAs of 4c56f3269a84a81461cc53941e0eee02fc920ab6 I'm still getting it in one of my queries:ORDER/GROUP BY expression not found in targetlistI am working on preparing a patch to fix this issue.Am I missing something?
No, the fix is still not committed.
Ah, I thought Tom pushed a fix, but it apparently was another fix.
Thanks for fixing.
--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
Attachment
On Fri, Jun 17, 2016 at 8:18 AM, Amit Kapila <amit.kapila@enterprisedb.com> wrote:
>
> On Fri, Jun 17, 2016 at 3:20 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> >
>
> > Something like what you have there might work if you use
> > create_projection_path instead of apply_projection_to_path.
> >
>
> Okay, I think you mean to say use create_projection_path() while applying scanjoin_target to partial path lists in below code:
> foreach(lc, current_rel->partial_pathlist)
> {
> Path *subpath = (Path *) lfirst(lc);
> Assert(subpath->param_info == NULL);
> lfirst(lc) = apply_projection_to_path(root, current_rel,
> subpath, scanjoin_target,
> scanjoin_target_parallel_safe);
> }
>
>
> On Fri, Jun 17, 2016 at 3:20 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> >
>
> > Something like what you have there might work if you use
> > create_projection_path instead of apply_projection_to_path.
> >
>
> Okay, I think you mean to say use create_projection_path() while applying scanjoin_target to partial path lists in below code:
> foreach(lc, current_rel->partial_pathlist)
> {
> Path *subpath = (Path *) lfirst(lc);
> Assert(subpath->param_info == NULL);
> lfirst(lc) = apply_projection_to_path(root, current_rel,
> subpath, scanjoin_target,
> scanjoin_target_parallel_safe);
> }
>
Attached, please find updated patch based on above lines. Due to addition of projection path on top of partial paths, some small additional cost is added for parallel paths. In one of the tests in select_parallel.sql the plan was getting converted to serial plan from parallel plan due to this cost, so I have changed it to make it more restrictive. Before changing, I have debugged the test to confirm that it was due to this new projection path cost. I have added two new tests as well to cover the new code added by patch.
Note - Apart from the tests related to new code, Dilip Kumar has helped to verify that TPC-H queries also worked fine (he tested using Explain). He doesn't encounter problem reported above [1] whereas without patch TPCH queries were failing.
Attachment
On Fri, Jun 17, 2016 at 9:04 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: > Attached, please find updated patch based on above lines. Due to addition > of projection path on top of partial paths, some small additional cost is > added for parallel paths. In one of the tests in select_parallel.sql the > plan was getting converted to serial plan from parallel plan due to this > cost, so I have changed it to make it more restrictive. Before changing, I > have debugged the test to confirm that it was due to this new projection > path cost. I have added two new tests as well to cover the new code added > by patch. I'm going to go work on this patch now. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Jun 17, 2016 at 9:29 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Fri, Jun 17, 2016 at 9:04 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: >> Attached, please find updated patch based on above lines. Due to addition >> of projection path on top of partial paths, some small additional cost is >> added for parallel paths. In one of the tests in select_parallel.sql the >> plan was getting converted to serial plan from parallel plan due to this >> cost, so I have changed it to make it more restrictive. Before changing, I >> have debugged the test to confirm that it was due to this new projection >> path cost. I have added two new tests as well to cover the new code added >> by patch. > > I'm going to go work on this patch now. Here is a revised version, which I plan to commit in a few hours before the workday expires. I kept it basically as Amit had it, but I whacked the comments around some and, more substantively, avoided inserting and/or charging for a Result node when that's not necessary. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
Hi, I ran tpc-h's queries on this version and it's successful. The error is fixed. commit 705ad7f3b523acae0ddfdebd270b7048b2bb8029 Author: Tom Lane <tgl@sss.pgh.pa.us> Date: Sun Jun 19 13:11:40 2016 -0400 Regards, Tatsuro Yamada NTT OSS Center On 2016/06/18 1:26, Robert Haas wrote: > On Fri, Jun 17, 2016 at 9:29 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Fri, Jun 17, 2016 at 9:04 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: >>> Attached, please find updated patch based on above lines. Due to addition >>> of projection path on top of partial paths, some small additional cost is >>> added for parallel paths. In one of the tests in select_parallel.sql the >>> plan was getting converted to serial plan from parallel plan due to this >>> cost, so I have changed it to make it more restrictive. Before changing, I >>> have debugged the test to confirm that it was due to this new projection >>> path cost. I have added two new tests as well to cover the new code added >>> by patch. >> >> I'm going to go work on this patch now. > > Here is a revised version, which I plan to commit in a few hours > before the workday expires. I kept it basically as Amit had it, but I > whacked the comments around some and, more substantively, avoided > inserting and/or charging for a Result node when that's not necessary. > > > >
On Mon, Jun 13, 2016 at 03:42:49PM -0400, Tom Lane wrote: > I wrote: > > ... there was also an unexplainable plan change: > > > *** /home/postgres/pgsql/src/test/regress/expected/aggregates.out Thu Apr 7 21:13:14 2016 > > --- /home/postgres/pgsql/src/test/regress/results/aggregates.out Mon Jun 13 11:54:01 2016 > > *************** > > *** 577,590 **** > > > explain (costs off) > > select max(unique1) from tenk1 where unique1 > 42000; > > ! QUERY PLAN > > ! --------------------------------------------------------------------------- > > ! Result > > ! InitPlan 1 (returns $0) > > ! -> Limit > > ! -> Index Only Scan Backward using tenk1_unique1 on tenk1 > > ! Index Cond: ((unique1 IS NOT NULL) AND (unique1 > 42000)) > > ! (5 rows) > > > select max(unique1) from tenk1 where unique1 > 42000; > > max > > --- 577,588 ---- > > > explain (costs off) > > select max(unique1) from tenk1 where unique1 > 42000; > > ! QUERY PLAN > > ! ---------------------------------------------------- > > ! Aggregate > > ! -> Index Only Scan using tenk1_unique1 on tenk1 > > ! Index Cond: (unique1 > 42000) > > ! (3 rows) > > > select max(unique1) from tenk1 where unique1 > 42000; > > max > > > I would not be surprised at a change to a parallel-query plan, but there's > > no parallelism here, so what happened? This looks like a bug to me. > > (Also, doing this query without COSTS OFF shows that the newly selected > > plan actually has a greater estimated cost than the expected plan, which > > makes it definitely a bug.) > > I looked into this and found that the costs are considered fuzzily the > same, and then add_path prefers the slightly-worse path on the grounds > that it is marked parallel_safe while the MinMaxAgg path is not. It seems > to me that there is some fuzzy thinking going on there. On exactly what > grounds is a path to be preferred merely because it is parallel safe, and > not actually parallelized? Or perhaps the question to ask is whether a > MinMaxAgg path can be marked parallel-safe. [Action required within 72 hours. This is a generic notification.] The above-described topic is currently a PostgreSQL 9.6 open item ("consider whether MinMaxAggPath might fail to be parallel-safe"). Robert, since you committed the patch believed to have created it, you own this open item. If some other commit is more relevant or if this does not belong as a 9.6 open item, please let us know. Otherwise, please observe the policy on open item ownership[1] and send a status update within 72 hours of this message. Include a date for your subsequent status update. Testers may discover new open items at any time, and I want to plan to get them all fixed well in advance of shipping 9.6rc1. Consequently, I will appreciate your efforts toward speedy resolution. Thanks. [1] http://www.postgresql.org/message-id/20160527025039.GA447393@tornado.leadboat.com
On Mon, Jun 13, 2016 at 04:46:19PM -0400, Tom Lane wrote: > Robert Haas <robertmhaas@gmail.com> writes: > > In practice, we don't yet have the ability for > > parallel-safe paths from subqueries to affect planning at higher query > > levels, but that's because the pathification stuff landed too late in > > the cycle for me to fully react to it. > > I wonder if that's not just from confusion between subplans and > subqueries. I don't believe any of the claims made in the comment > adjusted in the patch below (other than "Subplans currently aren't passed > to workers", which is true but irrelevant). Nested Gather nodes is > something that you would need, and already have, a defense for anyway. [Action required within 72 hours. This is a generic notification.] The above-described topic is currently a PostgreSQL 9.6 open item ("fix possible confusion between subqueries and subplans"). Robert, since you committed the patch believed to have created it, you own this open item. If some other commit is more relevant or if this does not belong as a 9.6 open item, please let us know. Otherwise, please observe the policy on open item ownership[1] and send a status update within 72 hours of this message. Include a date for your subsequent status update. Testers may discover new open items at any time, and I want to plan to get them all fixed well in advance of shipping 9.6rc1. Consequently, I will appreciate your efforts toward speedy resolution. Thanks. [1] http://www.postgresql.org/message-id/20160527025039.GA447393@tornado.leadboat.com
On Mon, Jun 13, 2016 at 05:27:13PM -0400, Robert Haas wrote: > One problem that I've realized that is related to this is that the way > that the consider_parallel flag is being set for upper rels is almost > totally bogus, which I believe accounts for your complaint at PGCon > that force_parallel_mode was not doing as much as it ought to do. > When I originally wrote a lot of this logic, there were no upper rels, > which led to this code: > > if (force_parallel_mode == FORCE_PARALLEL_OFF || !glob->parallelModeOK) > { > glob->parallelModeNeeded = false; > glob->wholePlanParallelSafe = false; /* either > false or don't care */ > } > else > { > glob->parallelModeNeeded = true; > glob->wholePlanParallelSafe = > !has_parallel_hazard((Node *) parse, false); > } > > The main way that wholePlanParallelSafe is supposed to be cleared is > in create_plan: > > /* Update parallel safety information if needed. */ > if (!best_path->parallel_safe) > root->glob->wholePlanParallelSafe = false; > > However, at the time that code was written, it was impossible to rely > entirely on the latter mechanism. Since there were no upper rels and > no paths for upper plan nodes, you could have the case where every > path was parallel-safe but the whole plan was node. Therefore the > code shown above was needed to scan the whole darned plan for > parallel-unsafe things. Post-pathification, this whole thing is > pretty bogus: upper rels mostly don't get consider_parallel set at > all, which means plans involving upper rels get marked parallel-unsafe > even if they are not, which means the wholePlanParallelSafe flag gets > cleared in a bunch of cases where it shouldn't. And on the flip side > I think that the first chunk of code quoted above would be totally > unnecessary if we were actually setting consider_parallel correctly on > the upper rels. [Action required within 72 hours. This is a generic notification.] The above-described topic is currently a PostgreSQL 9.6 open item ("set consider_parallel correctly for upper rels"). Robert, since you committed the patch believed to have created it, you own this open item. If some other commit is more relevant or if this does not belong as a 9.6 open item, please let us know. Otherwise, please observe the policy on open item ownership[1] and send a status update within 72 hours of this message. Include a date for your subsequent status update. Testers may discover new open items at any time, and I want to plan to get them all fixed well in advance of shipping 9.6rc1. Consequently, I will appreciate your efforts toward speedy resolution. Thanks. [1] http://www.postgresql.org/message-id/20160527025039.GA447393@tornado.leadboat.com