Thread: Calculation of param_source_rels in add_paths_to_joinrel
Hi, There's code in add_paths_to_joinrel() which computes the set of target relations that should overlap parameterization of any proposed join path. 120 /* 121 * Decide whether it's sensible to generate parameterized paths for this 122 * joinrel, and if so, which relations such paths should require. There 123 * is usually no need to create a parameterized result path unless there 124 * is a join order restriction that prevents joining one of our input rels 125 * directly to the parameter source rel instead of joining to the other 126 * input rel. (But see allow_star_schema_join().) This restriction 127 * reduces the number of parameterized paths we have to deal with at 128 * higher join levels, without compromising the quality of the resulting 129 * plan. We express the restriction as a Relids set that must overlap the 130 * parameterization of any proposed join path. 131 */ The calculations that follow are based on joinrel->relids (baserels covered by the join) and SpecialJoinInfo list in PlannerInfo. It is not based on specific combination of relations being joined or the paths being generated. We should probably do this computation once and store the result in the joinrel and use it multiple times. That way we can avoid computing the same set again and again for every pair of joining relations and their order. Any reasons why we don't do this? Attached patch moves this code to build_join_rel() and uses it in add_paths_to_joinrel(). make check-world does not show any failures. If this change is acceptable, we might actually remove param_source_rels from JoinPathExtraData and directly access it from joinrel in try_nestloop_path(), try_mergejoin_path() and try_hashjoin_path(). Also, the way this code has been written, the declaration of variable sjinfo masks the earlier declaration with the same name. I am not sure if that's intentional, but may be we should use another variable name for the inner sjinfo. I have not included that change in the patch. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Attachment
Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> writes: > There's code in add_paths_to_joinrel() which computes the set of > target relations that should overlap parameterization of any proposed > join path. > ... > The calculations that follow are based on joinrel->relids (baserels > covered by the join) and SpecialJoinInfo list in PlannerInfo. It is > not based on specific combination of relations being joined or the > paths being generated. We should probably do this computation once and > store the result in the joinrel and use it multiple times. That way we > can avoid computing the same set again and again for every pair of > joining relations and their order. Any reasons why we don't do this? I'm not terribly excited about this. The issue is strictly local to add_paths_to_joinrel, but putting that set in a global data structure makes it nonlocal, and makes it that much harder to tweak the algorithm if we think of a better way. (In particular, I think it's not all that obvious that the set must be independent of which two subset relations we are currently joining.) If you can show a measurable performance improvement from this change, then maybe those downsides are acceptable. But I do not think we should commit it without a demonstrated performance benefit from the added complexity and loss of flexibility. > Also, the way this code has been written, the declaration of variable > sjinfo masks the earlier declaration with the same name. I am not sure > if that's intentional, but may be we should use another variable name > for the inner sjinfo. I have not included that change in the patch. Hmm, yeah, that's probably not terribly good coding practice. regards, tom lane
On Sat, Nov 5, 2016 at 2:16 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> writes: >> There's code in add_paths_to_joinrel() which computes the set of >> target relations that should overlap parameterization of any proposed >> join path. >> ... >> The calculations that follow are based on joinrel->relids (baserels >> covered by the join) and SpecialJoinInfo list in PlannerInfo. It is >> not based on specific combination of relations being joined or the >> paths being generated. We should probably do this computation once and >> store the result in the joinrel and use it multiple times. That way we >> can avoid computing the same set again and again for every pair of >> joining relations and their order. Any reasons why we don't do this? > > I'm not terribly excited about this. The issue is strictly local to > add_paths_to_joinrel, but putting that set in a global data structure > makes it nonlocal, and makes it that much harder to tweak the algorithm > if we think of a better way. (In particular, I think it's not all that > obvious that the set must be independent of which two subset relations > we are currently joining.) Right now it appears that for every subset of relations, we have different param_source_rels, which is clearly not. It takes a bit of time to understand that. Adding it to a global data structure will at least make the current implementation clear i.e param_source_rels does not change with subset of relations being joined. > > If you can show a measurable performance improvement from this change, > then maybe those downsides are acceptable. But I do not think we should > commit it without a demonstrated performance benefit from the added > complexity and loss of flexibility. I couldn't find a measurable time difference with or without my patch, so multiple computations of param_source_rels aren't taking noticeable time. I used following queries to measure the planning time through explain analyze. create view pc_view as select c1.oid c1o, c2.oid c2o, c3.oid c3o from pg_class c1, pg_class c2 left join pg_class c3 on (c2.oid = c3.oid) where c1.oid = c2.oid and c1.oid = c3.oid and c1.relname = c3.relname; select v1, v2, v3 from pc_view v1, pc_view v2 left join pc_view v3 on (v2.c3o = v3.c1o), pc_view v4 where v1.c3o = v2.c2o and v1.c2o = v4.c3o limit 0; > >> Also, the way this code has been written, the declaration of variable >> sjinfo masks the earlier declaration with the same name. I am not sure >> if that's intentional, but may be we should use another variable name >> for the inner sjinfo. I have not included that change in the patch. > > Hmm, yeah, that's probably not terribly good coding practice. Attached a patch to fix this. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Attachment
Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> writes: > On Sat, Nov 5, 2016 at 2:16 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> writes: >>> Also, the way this code has been written, the declaration of variable >>> sjinfo masks the earlier declaration with the same name. >> Hmm, yeah, that's probably not terribly good coding practice. > Attached a patch to fix this. Pushed, sorry about the delay. regards, tom lane
Thanks Tom. On Thu, Nov 24, 2016 at 2:58 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> writes: >> On Sat, Nov 5, 2016 at 2:16 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> writes: >>>> Also, the way this code has been written, the declaration of variable >>>> sjinfo masks the earlier declaration with the same name. > >>> Hmm, yeah, that's probably not terribly good coding practice. > >> Attached a patch to fix this. > > Pushed, sorry about the delay. > > regards, tom lane -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company