Re: star schema and the optimizer - Mailing list pgsql-hackers

From Tom Lane
Subject Re: star schema and the optimizer
Date
Msg-id 8269.1425062722@sss.pgh.pa.us
Whole thread Raw
In response to Re: star schema and the optimizer  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: star schema and the optimizer  (Marc Cousin <cousinmarc@gmail.com>)
List pgsql-hackers
> I wrote:
>> I had actually thought that we'd fixed this type of problem in recent
>> versions, and that you should be able to get a plan that would look like

>> Nestloop
>>   -> scan dim1
>>   -> Nestloop
>>        -> scan dim2
>>        -> indexscan fact table using dim1.a and dim2.b

After closer study, I think this is an oversight in commit
e2fa76d80ba571d4de8992de6386536867250474, which quoth

+It can be useful for the parameter value to be passed down through
+intermediate layers of joins, for example:
+
+    NestLoop
+        -> Seq Scan on A
+        Hash Join
+            Join Condition: B.Y = C.W
+            -> Seq Scan on B
+            -> Index Scan using C_Z_IDX on C
+                Index Condition: C.Z = A.X
+
+If all joins are plain inner joins then this is unnecessary, because
+it's always possible to reorder the joins so that a parameter is used
+immediately below the nestloop node that provides it.  But in the
+presence of outer joins, join reordering may not be possible, and then
+this option can be critical.  Before version 9.2, Postgres used ad-hoc

This reasoning overlooked the fact that if we need parameters from
more than one relation, and there's no way to join those relations
to each other directly, then we have to allow passing the dim1 parameter
down through the join to dim2.

The attached patch seems to fix it (modulo the need for some updates
in the README, and maybe a regression test).  Could you see if this
produces satisfactory plans for you?

            regards, tom lane

diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index e6aa21c..ce812b0 100644
*** a/src/backend/optimizer/path/joinpath.c
--- b/src/backend/optimizer/path/joinpath.c
*************** try_nestloop_path(PlannerInfo *root,
*** 291,299 ****
      if (required_outer &&
          !bms_overlap(required_outer, param_source_rels))
      {
!         /* Waste no memory when we reject a path here */
!         bms_free(required_outer);
!         return;
      }

      /*
--- 291,320 ----
      if (required_outer &&
          !bms_overlap(required_outer, param_source_rels))
      {
!         /*
!          * We override the param_source_rels heuristic to accept nestloop
!          * paths in which the outer rel satisfies some but not all of the
!          * inner path's parameterization.  This is necessary to get good plans
!          * for star-schema scenarios, in which a parameterized path for a
!          * "fact" table may require parameters from multiple "dimension"
!          * tables that will not get joined directly to each other.  We can
!          * handle that by stacking nestloops that have the dimension tables on
!          * the outside; but this breaks the rule the param_source_rels
!          * heuristic is based on, that parameters should not be passed down
!          * across joins unless there's a join-order-constraint-based reason to
!          * do so.  So, we should consider partial satisfaction of
!          * parameterization as another reason to allow such paths.
!          */
!         Relids        outerrelids = outer_path->parent->relids;
!         Relids        innerparams = PATH_REQ_OUTER(inner_path);
!
!         if (!(bms_overlap(innerparams, outerrelids) &&
!               bms_nonempty_difference(innerparams, outerrelids)))
!         {
!             /* Waste no memory when we reject a path here */
!             bms_free(required_outer);
!             return;
!         }
      }

      /*

pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: logical column ordering
Next
From: Robert Haas
Date:
Subject: Re: plpgsql versus domains