Re: star schema and the optimizer - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: star schema and the optimizer |
Date | |
Msg-id | 8269.1425062722@sss.pgh.pa.us Whole thread Raw |
In response to | Re: star schema and the optimizer (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: star schema and the optimizer
|
List | pgsql-hackers |
> I wrote: >> I had actually thought that we'd fixed this type of problem in recent >> versions, and that you should be able to get a plan that would look like >> Nestloop >> -> scan dim1 >> -> Nestloop >> -> scan dim2 >> -> indexscan fact table using dim1.a and dim2.b After closer study, I think this is an oversight in commit e2fa76d80ba571d4de8992de6386536867250474, which quoth +It can be useful for the parameter value to be passed down through +intermediate layers of joins, for example: + + NestLoop + -> Seq Scan on A + Hash Join + Join Condition: B.Y = C.W + -> Seq Scan on B + -> Index Scan using C_Z_IDX on C + Index Condition: C.Z = A.X + +If all joins are plain inner joins then this is unnecessary, because +it's always possible to reorder the joins so that a parameter is used +immediately below the nestloop node that provides it. But in the +presence of outer joins, join reordering may not be possible, and then +this option can be critical. Before version 9.2, Postgres used ad-hoc This reasoning overlooked the fact that if we need parameters from more than one relation, and there's no way to join those relations to each other directly, then we have to allow passing the dim1 parameter down through the join to dim2. The attached patch seems to fix it (modulo the need for some updates in the README, and maybe a regression test). Could you see if this produces satisfactory plans for you? regards, tom lane diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c index e6aa21c..ce812b0 100644 *** a/src/backend/optimizer/path/joinpath.c --- b/src/backend/optimizer/path/joinpath.c *************** try_nestloop_path(PlannerInfo *root, *** 291,299 **** if (required_outer && !bms_overlap(required_outer, param_source_rels)) { ! /* Waste no memory when we reject a path here */ ! bms_free(required_outer); ! return; } /* --- 291,320 ---- if (required_outer && !bms_overlap(required_outer, param_source_rels)) { ! /* ! * We override the param_source_rels heuristic to accept nestloop ! * paths in which the outer rel satisfies some but not all of the ! * inner path's parameterization. This is necessary to get good plans ! * for star-schema scenarios, in which a parameterized path for a ! * "fact" table may require parameters from multiple "dimension" ! * tables that will not get joined directly to each other. We can ! * handle that by stacking nestloops that have the dimension tables on ! * the outside; but this breaks the rule the param_source_rels ! * heuristic is based on, that parameters should not be passed down ! * across joins unless there's a join-order-constraint-based reason to ! * do so. So, we should consider partial satisfaction of ! * parameterization as another reason to allow such paths. ! */ ! Relids outerrelids = outer_path->parent->relids; ! Relids innerparams = PATH_REQ_OUTER(inner_path); ! ! if (!(bms_overlap(innerparams, outerrelids) && ! bms_nonempty_difference(innerparams, outerrelids))) ! { ! /* Waste no memory when we reject a path here */ ! bms_free(required_outer); ! return; ! } } /*
pgsql-hackers by date: