Thread: nested loop semijoin estimates

nested loop semijoin estimates

From

Tomas Vondra

Date:

29 May 2015, 23:20:57

Hi,

while looking at this post from pgsql-performance about plan changes

http://www.postgresql.org/message-id/flat/20150529095117.GB15813@hjp.at

I noticed that initial_cost_nestloop() does this in (9.1, mentioned in
the pgsql-performance post uses the same logic):


     if (jointype == JOIN_SEMI || jointype == JOIN_ANTI)
     {
         double      outer_matched_rows;
         Selectivity inner_scan_frac;

         run_cost += inner_run_cost;

         outer_matched_rows
             = rint(outer_path_rows * semifactors->outer_match_frac);
         inner_scan_frac = 2.0 / (semifactors->match_count + 1.0);

         if (outer_matched_rows > 1)
             run_cost += (outer_matched_rows - 1)
                         * inner_rescan_run_cost * inner_scan_frac;

         ...
     }


I wonder whether the

         run_cost += inner_run_cost;

is actually correct, because this pretty much means we assume scanning
the whole inner relation (once). Wouldn't something like this be more
appropriate?

         run_cost += inner_run_cost * inner_scan_frac;

i.e. only counting the proportional part of the inner run cost, just
like we do for the rescans (i.e. until the first match)?

Imagine a simple EXISTS() query that gets turned into a semijoin, with
an inner index scan. The current code pretty much means we'll scan the
whole result, even though we only really need a single matching query
(to confirm the EXISTS clause).

For cheap index scans (e.g. using a PK) this does not make much
difference, but the example posted to pgsql-performance results in index
scans matching ~50% of the table, which makes the whole index scan quite
expensive, and much higher than the rescan cost (multiplied by
inner_scan_frac).

While investigating the pgsql-performance report I've prepared the
attached testcase, producing similar data set (attached). So let's see
how the code change impacts plan choice.

With the current code, the planner produces a plan like this:

                              QUERY PLAN
-----------------------------------------------------------------------
  Nested Loop  (cost=169627.96..169636.03 rows=1 width=74)
    Join Filter: ((t.term)::text = (f.berechnungsart)::text)
    ->  Index Scan using term_facttablename_columnname_idx on term t
        (cost=0.55..8.57 rows=1 width=74)
          Index Cond: (((facttablename)::text =
              'facttable_stat_fta4'::text) AND ((columnname)::text =
              'berechnungsart'::text))
    ->  HashAggregate  (cost=169627.41..169627.43 rows=2 width=2)
          Group Key: (f.berechnungsart)::text
          ->  Seq Scan on facttable_stat_fta4 f
              (cost=0.00..145274.93 rows=9740993 width=2)

which seems slightly inefficient, as ~50% of the facttable_stat_fta4
matches the condition (so building the whole hash is a waste of time).
Also notice this is a regular join, not a semijoin - that seems to
confirm the planner does not realize the cost difference, believes both
plans (regular and semijoin) are equally expensive and simply uses the
first one.

With the run_cost change, I do get this plan:

                             QUERY PLAN
-----------------------------------------------------------------------
  Nested Loop Semi Join  (cost=111615.33..111623.42 rows=1 width=74)
    ->  Index Scan using term_facttablename_columnname_idx on term t
          (cost=0.55..8.57 rows=1 width=74)
          Index Cond: (((facttablename)::text =
               'facttable_stat_fta4'::text) AND ((columnname)::text =
                'berechnungsart'::text))
    ->  Bitmap Heap Scan on facttable_stat_fta4 f
          (cost=111614.78..220360.98 rows=4870496 width=2)
          Recheck Cond: ((berechnungsart)::text = (t.term)::text)
          ->  Bitmap Index Scan on facttable_stat_fta4_berechnungsart_idx
                (cost=0.00..110397.15 rows=4870496 width=0)
                Index Cond: ((berechnungsart)::text = (t.term)::text)

and this runs about ~3x faster than the original plan (700ms vs.
2000ms), at least when using the testcase dataset on my laptop.

This however is not the whole story, because after disabling the bitmap
index scan, I do get this plan, running in ~1ms (so ~700x faster than
the bitmap index scan):

                              QUERY PLAN
------------------------------------------------------------------------
  Nested Loop Semi Join  (cost=0.99..9.20 rows=1 width=74)
    ->  Index Scan using term_facttablename_columnname_idx on term t
        (cost=0.55..8.57 rows=1 width=74)
          Index Cond: (((facttablename)::text =
                     'facttable_stat_fta4'::text) AND
                     ((columnname)::text = 'berechnungsart'::text))
    ->  Index Only Scan using facttable_stat_fta4_berechnungsart_idx
           on facttable_stat_fta4 f
          (cost=0.43..310525.02 rows=4870496 width=2)
          Index Cond: (berechnungsart = (t.term)::text)

Notice the cost - it's way lover than the previous plan (9.2 vs ~111k),
yet this plan was not chosen. So either the change broke something (e.g.
by violating some optimizer assumption), or maybe there's a bug
somewhere else ...

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

testcase.sql

Re: nested loop semijoin estimates

From

Tomas Vondra

Date:

30 May 2015, 01:52:18

On 05/30/15 01:20, Tomas Vondra wrote:
>
> Notice the cost - it's way lover than the previous plan (9.2 vs
> ~111k), yet this plan was not chosen. So either the change broke
> something (e.g. by violating some optimizer assumption), or maybe
> there's a bug somewhere else ...

After a bit more investigation, what I think is happening here is 
add_path() does not realize this is a SEMI join and a single tuple is 
enough, and discards the simple indexscan path in favor of the bitmap 
index scan as that seems cheaper when scanning everything.

So not really a bug, but maybe 'consider_startup' would help?

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: nested loop semijoin estimates

From

Tomas Vondra

Date:

30 May 2015, 17:17:27

On 05/30/15 03:52, Tomas Vondra wrote:
>
>
> On 05/30/15 01:20, Tomas Vondra wrote:
>>
>> Notice the cost - it's way lover than the previous plan (9.2 vs
>> ~111k), yet this plan was not chosen. So either the change broke
>> something (e.g. by violating some optimizer assumption), or maybe
>> there's a bug somewhere else ...
>
> After a bit more investigation, what I think is happening here is
> add_path() does not realize this is a SEMI join and a single tuple is
> enough, and discards the simple indexscan path in favor of the bitmap
> index scan as that seems cheaper when scanning everything.
>
> So not really a bug, but maybe 'consider_startup' would help?

A bit more info - the reason why the IndexPath gets discarded (in favor
of BitmapHeapScan) seems to be twofold:

(a) Semijoin does no imply 'consider_startup' for the inner path, so it
     gets discarded by the BitmapHeapScan, as it has lower total cost.

     I'm not sure where exactly the consider_startup should be set, so
     I've forced that at the beginning on add_path() to see what happens.
     Sadly, this alone is not sufficient, because of the next point.

(b) To actually use startup costs in compare_path_costs_fuzzily, there
     is an additional condition about handling parameterized paths, as
     explained in this comment:

     * This function also includes special hacks to support a policy
     * enforced by its sole caller, add_path(): paths that have any
     * parameterization cannot win comparisons on the grounds of having
     * cheaper startup cost, since we deem only total cost to be of
     * interest for a parameterized path. (Unparameterized paths are
     * more common, so we check for this case last.)

     so the startup_cost comparisons look like this:

     if (consider_startup &&
         path2->startup_cost > path1->startup_cost * fuzz_factor &&
         path1->param_info == NULL)

     Of course, path1 (IndexPath) actually has parameters, which makes
     the whole condition false, and the IndexPath gets discarded :-(

     ISTM the reported test case seems like a counter example to the
     policy, so maybe this should really be relaxed somehow.

To see what happens, I tweaked both (a) and (b) by forcing

     consider_startup = true

at the beginning of add_path(), and removing the (param_info==NULL)
conditions (patch attached). It also includes the run_cost tweak
mentioned in the first message.

I'm not claiming this is the correct fix, the only goal was to see what
happens. And it really does produce the 'right' plan:

                              QUERY PLAN
-----------------------------------------------------------------------
  Nested Loop Semi Join  (cost=0.99..9.20 rows=1 width=74)
    ->  Index Scan using term_facttablename_columnname_idx on term t
          (cost=0.55..8.57 rows=1 width=74)
          Index Cond: (((facttablename)::text =
               'facttable_stat_fta4'::text) AND ((columnname)::text =
               'berechnungsart'::text))
    ->  Index Only Scan using facttable_stat_fta4_berechnungsart_idx on
        facttable_stat_fta4 f (cost=0.43..310525.02 rows=4870496 width=2)
          Index Cond: (berechnungsart = (t.term)::text)
(5 rows)


If I get the comments in pathnode.c about the startup_cost policy, it's
"only" an optimization to reduce the number of parameterized paths, so I
believe this does not break anything. Also, it's no fun if the planning
is a tad faster, but the query runtime is 1000x higher.

Ideas how to relax the policy without significant impact on optimizer?

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

nestloop-semijoin-fix.patch

Re: nested loop semijoin estimates

From

Tom Lane

Date:

30 May 2015, 19:50:30

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> I wonder whether the
>          run_cost += inner_run_cost;
> is actually correct, because this pretty much means we assume scanning 
> the whole inner relation (once). Wouldn't something like this be more 
> appropriate?
>          run_cost += inner_run_cost * inner_scan_frac;

Well, not entirely.  The reason why it's like that is explained (evidently
not well enough) in the comment just above:
        * A complicating factor is that rescans may be cheaper than first        * scans.  If we never scan all the way
tothe end of the inner rel,        * it might be (depending on the plan type) that we'd never pay the        * whole
innerfirst-scan run cost.  However it is difficult to        * estimate whether that will happen, so be conservative
andalways        * charge the whole first-scan cost once.

The case this is concerned about is something like a Materialize node
above a scan node.  The Materialize node makes rescans much cheaper than
the original scan, but only once the data has been fetched to begin with.
So, while the first successful probe should indeed have cost something
like inner_run_cost * inner_scan_frac, once we make this change it's
no longer legitimate for final_cost_nestloop to suppose that the cost
of an unsuccessful probe is just inner_rescan_run_cost.  Any unsuccessful
probe will mean we must incur all of inner_run_cost in order to complete
the underlying scan and fully load the Materialize node.

However, in the case where has_indexed_join_quals() is true, I think
your change is correct, because then all the scans will either stop
on the first tuple or quickly determine that no tuples satisfy the
indexquals.  (Also, in the cases where has_indexed_join_quals() can
succeed, rescan cost is the same as initial scan cost anyhow.)

So what this seems to mean is that for SEMI/ANTI join cases, we have to
postpone all of the inner scan cost determination to final_cost_nestloop,
so that we can do this differently depending on whether
has_indexed_join_quals() is true.  That's a little bit annoying because it
will mean we take the shortcut exit less often; but since SEMI/ANTI joins
aren't that common, it's probably not going to be a big planning time hit.

Not sure yet about your other point about the indexscan getting rejected
too soon.  That doesn't seem to be happening for me, at least not in HEAD.
        regards, tom lane

Re: nested loop semijoin estimates

From

Tomas Vondra

Date:

30 May 2015, 21:11:52

On 05/30/15 21:50, Tom Lane wrote:
> So what this seems to mean is that for SEMI/ANTI join cases, we have
> to postpone all of the inner scan cost determination to
> final_cost_nestloop, so that we can do this differently depending on
> whether has_indexed_join_quals() is true. That's a little bit
> annoying because it will mean we take the shortcut exit less often;
> but since SEMI/ANTI joins aren't that common, it's probably not
> going to be a big planning time hit.

Yay! So I wasn't entirely wrong on this. Do you plan to make this 
change, or will you leave that on me?

>
> Not sure yet about your other point about the indexscan getting
> rejected too soon. That doesn't seem to be happening for me, at least
> not in HEAD.

FWIW I can reproduce that reliably on current HEAD, with the test case I 
sent yesterday. I'm using default config (as produced by initdb).

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: nested loop semijoin estimates

From

Tom Lane

Date:

30 May 2015, 21:16:38

I wrote:
> So what this seems to mean is that for SEMI/ANTI join cases, we have to
> postpone all of the inner scan cost determination to final_cost_nestloop,
> so that we can do this differently depending on whether
> has_indexed_join_quals() is true.  That's a little bit annoying because it
> will mean we take the shortcut exit less often; but since SEMI/ANTI joins
> aren't that common, it's probably not going to be a big planning time hit.

Attached is a draft patch for that.  It fixes the problem for me:

 Nested Loop Semi Join  (cost=0.99..9.09 rows=1 width=74) (actual time=0.591..1.554 rows=2 loops=1)
   ->  Index Scan using term_facttablename_columnname_idx on term t  (cost=0.55..8.57 rows=1 width=74) (actual
time=0.022..0.025rows=2 loops=1) 
         Index Cond: (((facttablename)::text = 'facttable_stat_fta4'::text) AND ((columnname)::text =
'berechnungsart'::text))
   ->  Index Only Scan using facttable_stat_fta4_berechnungsart_idx on facttable_stat_fta4 f  (cost=0.43..143244.98
rows=5015134width=2) (actual time=0.759..0.759 rows=1 loops=2) 
         Index Cond: (berechnungsart = (t.term)::text)
         Heap Fetches: 0
 Planning time: 0.545 ms
 Execution time: 1.615 ms

> Not sure yet about your other point about the indexscan getting rejected
> too soon.  That doesn't seem to be happening for me, at least not in HEAD.

I do see something of the sort if I turn off enable_indexonlyscan.
Not sure about a good fix for that aspect.  It may not be terribly
critical, since AFAICS index-only scans generally ought to apply
in these cases.

            regards, tom lane

diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index ac865be..0d302f6 100644
*** a/src/backend/optimizer/path/costsize.c
--- b/src/backend/optimizer/path/costsize.c
*************** cost_group(Path *path, PlannerInfo *root
*** 1767,1773 ****
   * estimate and getting a tight lower bound.  We choose to not examine the
   * join quals here, since that's by far the most expensive part of the
   * calculations.  The end result is that CPU-cost considerations must be
!  * left for the second phase.
   *
   * 'workspace' is to be filled with startup_cost, total_cost, and perhaps
   *        other data to be used by final_cost_nestloop
--- 1767,1774 ----
   * estimate and getting a tight lower bound.  We choose to not examine the
   * join quals here, since that's by far the most expensive part of the
   * calculations.  The end result is that CPU-cost considerations must be
!  * left for the second phase; and for SEMI/ANTI joins, we must also postpone
!  * incorporation of the inner path's run cost.
   *
   * 'workspace' is to be filled with startup_cost, total_cost, and perhaps
   *        other data to be used by final_cost_nestloop
*************** initial_cost_nestloop(PlannerInfo *root,
*** 1815,1858 ****

      if (jointype == JOIN_SEMI || jointype == JOIN_ANTI)
      {
-         double        outer_matched_rows;
-         Selectivity inner_scan_frac;
-
          /*
           * SEMI or ANTI join: executor will stop after first match.
           *
!          * For an outer-rel row that has at least one match, we can expect the
!          * inner scan to stop after a fraction 1/(match_count+1) of the inner
!          * rows, if the matches are evenly distributed.  Since they probably
!          * aren't quite evenly distributed, we apply a fuzz factor of 2.0 to
!          * that fraction.  (If we used a larger fuzz factor, we'd have to
!          * clamp inner_scan_frac to at most 1.0; but since match_count is at
!          * least 1, no such clamp is needed now.)
!          *
!          * A complicating factor is that rescans may be cheaper than first
!          * scans.  If we never scan all the way to the end of the inner rel,
!          * it might be (depending on the plan type) that we'd never pay the
!          * whole inner first-scan run cost.  However it is difficult to
!          * estimate whether that will happen, so be conservative and always
!          * charge the whole first-scan cost once.
!          */
!         run_cost += inner_run_cost;
!
!         outer_matched_rows = rint(outer_path_rows * semifactors->outer_match_frac);
!         inner_scan_frac = 2.0 / (semifactors->match_count + 1.0);
!
!         /* Add inner run cost for additional outer tuples having matches */
!         if (outer_matched_rows > 1)
!             run_cost += (outer_matched_rows - 1) * inner_rescan_run_cost * inner_scan_frac;
!
!         /*
!          * The cost of processing unmatched rows varies depending on the
!          * details of the joinclauses, so we leave that part for later.
           */

          /* Save private data for final_cost_nestloop */
!         workspace->outer_matched_rows = outer_matched_rows;
!         workspace->inner_scan_frac = inner_scan_frac;
      }
      else
      {
--- 1816,1831 ----

      if (jointype == JOIN_SEMI || jointype == JOIN_ANTI)
      {
          /*
           * SEMI or ANTI join: executor will stop after first match.
           *
!          * Getting decent estimates requires inspection of the join quals,
!          * which we choose to postpone to final_cost_nestloop.
           */

          /* Save private data for final_cost_nestloop */
!         workspace->inner_run_cost = inner_run_cost;
!         workspace->inner_rescan_run_cost = inner_rescan_run_cost;
      }
      else
      {
*************** initial_cost_nestloop(PlannerInfo *root,
*** 1869,1875 ****
      workspace->total_cost = startup_cost + run_cost;
      /* Save private data for final_cost_nestloop */
      workspace->run_cost = run_cost;
-     workspace->inner_rescan_run_cost = inner_rescan_run_cost;
  }

  /*
--- 1842,1847 ----
*************** final_cost_nestloop(PlannerInfo *root, N
*** 1893,1899 ****
      double        inner_path_rows = inner_path->rows;
      Cost        startup_cost = workspace->startup_cost;
      Cost        run_cost = workspace->run_cost;
-     Cost        inner_rescan_run_cost = workspace->inner_rescan_run_cost;
      Cost        cpu_per_tuple;
      QualCost    restrict_qual_cost;
      double        ntuples;
--- 1865,1870 ----
*************** final_cost_nestloop(PlannerInfo *root, N
*** 1912,1953 ****
      if (!enable_nestloop)
          startup_cost += disable_cost;

!     /* cost of source data */

      if (path->jointype == JOIN_SEMI || path->jointype == JOIN_ANTI)
      {
-         double        outer_matched_rows = workspace->outer_matched_rows;
-         Selectivity inner_scan_frac = workspace->inner_scan_frac;
-
          /*
           * SEMI or ANTI join: executor will stop after first match.
           */

!         /* Compute number of tuples processed (not number emitted!) */
          ntuples = outer_matched_rows * inner_path_rows * inner_scan_frac;

          /*
!          * For unmatched outer-rel rows, there are two cases.  If the inner
!          * path is an indexscan using all the joinquals as indexquals, then an
!          * unmatched row results in an indexscan returning no rows, which is
!          * probably quite cheap.  We estimate this case as the same cost to
!          * return the first tuple of a nonempty scan.  Otherwise, the executor
!          * will have to scan the whole inner rel; not so cheap.
           */
          if (has_indexed_join_quals(path))
          {
              run_cost += (outer_path_rows - outer_matched_rows) *
                  inner_rescan_run_cost / inner_path_rows;

              /*
!              * We won't be evaluating any quals at all for these rows, so
               * don't add them to ntuples.
               */
          }
          else
          {
              run_cost += (outer_path_rows - outer_matched_rows) *
                  inner_rescan_run_cost;
              ntuples += (outer_path_rows - outer_matched_rows) *
                  inner_path_rows;
          }
--- 1883,1983 ----
      if (!enable_nestloop)
          startup_cost += disable_cost;

!     /* cost of inner-relation source data (we already dealt with outer rel) */

      if (path->jointype == JOIN_SEMI || path->jointype == JOIN_ANTI)
      {
          /*
           * SEMI or ANTI join: executor will stop after first match.
           */
+         Cost        inner_run_cost = workspace->inner_run_cost;
+         Cost        inner_rescan_run_cost = workspace->inner_rescan_run_cost;
+         double        outer_matched_rows;
+         Selectivity inner_scan_frac;

!         /*
!          * For an outer-rel row that has at least one match, we can expect the
!          * inner scan to stop after a fraction 1/(match_count+1) of the inner
!          * rows, if the matches are evenly distributed.  Since they probably
!          * aren't quite evenly distributed, we apply a fuzz factor of 2.0 to
!          * that fraction.  (If we used a larger fuzz factor, we'd have to
!          * clamp inner_scan_frac to at most 1.0; but since match_count is at
!          * least 1, no such clamp is needed now.)
!          */
!         outer_matched_rows = rint(outer_path_rows * semifactors->outer_match_frac);
!         inner_scan_frac = 2.0 / (semifactors->match_count + 1.0);
!
!         /*
!          * Compute number of tuples processed (not number emitted!).  First,
!          * account for successfully-matched outer rows.
!          */
          ntuples = outer_matched_rows * inner_path_rows * inner_scan_frac;

          /*
!          * Now we need to estimate the actual costs of scanning the inner
!          * relation, which may be quite a bit less than N times inner_run_cost
!          * due to early scan stops.  We consider two cases.  If the inner path
!          * is an indexscan using all the joinquals as indexquals, then an
!          * unmatched outer row results in an indexscan returning no rows,
!          * which is probably quite cheap.  Otherwise, the executor will have
!          * to scan the whole inner rel for an unmatched row; not so cheap.
           */
          if (has_indexed_join_quals(path))
          {
+             /*
+              * Successfully-matched outer rows will only require scanning
+              * inner_scan_frac of the inner relation.  In this case, we don't
+              * need to charge the full inner_run_cost even when that's more
+              * than inner_rescan_run_cost, because we can assume that none of
+              * the inner scans ever scan the whole inner relation.  So it's
+              * okay to assume that all the inner scan executions can be
+              * fractions of the full cost, even if materialization is reducing
+              * the rescan cost.  At this writing, it's impossible to get here
+              * for a materialized inner scan, so inner_run_cost and
+              * inner_rescan_run_cost will be the same anyway; but just in
+              * case, use inner_run_cost for the first matched tuple and
+              * inner_rescan_run_cost for additional ones.
+              */
+             run_cost += inner_run_cost * inner_scan_frac;
+             if (outer_matched_rows > 1)
+                 run_cost += (outer_matched_rows - 1) * inner_rescan_run_cost * inner_scan_frac;
+
+             /*
+              * Add the cost of inner-scan executions for unmatched outer rows.
+              * We estimate this as the same cost as returning the first tuple
+              * of a nonempty scan.  We consider that these are all rescans,
+              * since we used inner_run_cost once already.
+              */
              run_cost += (outer_path_rows - outer_matched_rows) *
                  inner_rescan_run_cost / inner_path_rows;

              /*
!              * We won't be evaluating any quals at all for unmatched rows, so
               * don't add them to ntuples.
               */
          }
          else
          {
+             /*
+              * Here, a complicating factor is that rescans may be cheaper than
+              * first scans.  If we never scan all the way to the end of the
+              * inner rel, it might be (depending on the plan type) that we'd
+              * never pay the whole inner first-scan run cost.  However it is
+              * difficult to estimate whether that will happen (and it could
+              * not happen if there are any unmatched outer rows!), so be
+              * conservative and always charge the whole first-scan cost once.
+              */
+             run_cost += inner_run_cost;
+
+             /* Add inner run cost for additional outer tuples having matches */
+             if (outer_matched_rows > 1)
+                 run_cost += (outer_matched_rows - 1) * inner_rescan_run_cost * inner_scan_frac;
+
+             /* Add inner run cost for unmatched outer tuples */
              run_cost += (outer_path_rows - outer_matched_rows) *
                  inner_rescan_run_cost;
+
+             /* And count the unmatched join tuples as being processed */
              ntuples += (outer_path_rows - outer_matched_rows) *
                  inner_path_rows;
          }
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 279051e..706b6d1 100644
*** a/src/include/nodes/relation.h
--- b/src/include/nodes/relation.h
*************** typedef struct JoinCostWorkspace
*** 1719,1730 ****
      Cost        run_cost;        /* non-startup cost components */

      /* private for cost_nestloop code */
      Cost        inner_rescan_run_cost;
-     double        outer_matched_rows;
-     Selectivity inner_scan_frac;

      /* private for cost_mergejoin code */
-     Cost        inner_run_cost;
      double        outer_rows;
      double        inner_rows;
      double        outer_skip_rows;
--- 1719,1728 ----
      Cost        run_cost;        /* non-startup cost components */

      /* private for cost_nestloop code */
+     Cost        inner_run_cost; /* also used by cost_mergejoin code */
      Cost        inner_rescan_run_cost;

      /* private for cost_mergejoin code */
      double        outer_rows;
      double        inner_rows;
      double        outer_skip_rows;

Re: nested loop semijoin estimates

From

Tomas Vondra

Date:

30 May 2015, 21:47:10

Hi,

On 05/30/15 23:16, Tom Lane wrote:
> I wrote:
>> So what this seems to mean is that for SEMI/ANTI join cases, we have to
>> postpone all of the inner scan cost determination to final_cost_nestloop,
>> so that we can do this differently depending on whether
>> has_indexed_join_quals() is true.  That's a little bit annoying because it
>> will mean we take the shortcut exit less often; but since SEMI/ANTI joins
>> aren't that common, it's probably not going to be a big planning time hit.
>
> Attached is a draft patch for that.  It fixes the problem for me:
>
>   Nested Loop Semi Join  (cost=0.99..9.09 rows=1 width=74) (actual time=0.591..1.554 rows=2 loops=1)
>     ->  Index Scan using term_facttablename_columnname_idx on term t  (cost=0.55..8.57 rows=1 width=74) (actual
time=0.022..0.025rows=2 loops=1)
 
>           Index Cond: (((facttablename)::text = 'facttable_stat_fta4'::text) AND ((columnname)::text =
'berechnungsart'::text))
>     ->  Index Only Scan using facttable_stat_fta4_berechnungsart_idx on facttable_stat_fta4 f  (cost=0.43..143244.98
rows=5015134width=2) (actual time=0.759..0.759 rows=1 loops=2)
 
>           Index Cond: (berechnungsart = (t.term)::text)
>           Heap Fetches: 0
>   Planning time: 0.545 ms
>   Execution time: 1.615 ms

Seems to be working OK, but I still do get a Bitmap Heap Scan there (but 
more about that later).

Do you plan to push that into 9.5, or 9.6? I assume it's a behavior 
change so that no back-patching, right?

>
>> Not sure yet about your other point about the indexscan getting
>> rejected too soon. That doesn't seem to be happening for me, at
>> least not in HEAD.
>
> I do see something of the sort if I turn off enable_indexonlyscan.
> Not sure about a good fix for that aspect.  It may not be terribly
> critical, since AFAICS index-only scans generally ought to apply
> in these cases.

Hmmm, a VACUUM FREEZE fixed that for me. The reason is that right after 
loading the testcase, I do get this:
   Index Only Scan using facttable_stat_fta4_berechnungsart_idx   on facttable_stat_fta4 f  (cost=0.43..280220.51
rows=5000016width=2)
 

and after VACUUM FREEZE I do get this:
   Index Only Scan using facttable_stat_fta4_berechnungsart_idx   on facttable_stat_fta4 f  (cost=0.43..142344.43
rows=5000000width=2)
 

and the Bitmap Heap Scan case looks like this:
   Bitmap Heap Scan on facttable_stat_fta4 f   (cost=93594.56..200342.76 rows=5000016 width=2)

so it's cheaper (total cost) than the index only scan before freezing, 
and more expensive than index only scan after freezing.

I still think this is wrong (or rather "suboptimal") - there are 
probably cases where even the "freezed" index only scan is more 
expensive than a bitmap heap scan, and in that case the it won't be 
used, although it'd be much faster.

Another example is a query with a plain index scan, e.g. consider this 
slight modification of the query:

SELECT facttablename, columnname, term FROM term t WHERE 
facttablename='facttable_stat_fta4' AND columnname='berechnungsart' AND 
EXISTS (SELECT 1 FROM facttable_stat_fta4 f WHERE 
f.berechnungsart=t.term AND einheit IS NOT NULL);

This will result in bitmap index scan no matter the visibility.


--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: nested loop semijoin estimates

From

Tom Lane

Date:

31 May 2015, 22:08:13

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> On 05/30/15 23:16, Tom Lane wrote:
>> Attached is a draft patch for that.  It fixes the problem for me:

> Seems to be working OK, but I still do get a Bitmap Heap Scan there (but
> more about that later).

Attached is an incremental patch (on top of the previous one) to allow
startup cost of parameterized paths to be considered when the relation
is the RHS of a semi or anti join.  It seems reasonably clean except
for one thing: logically, we perhaps should remove the checks on
path->param_info from the last half of compare_path_costs_fuzzily(),
so as to increase the symmetry between parameterized paths and
unparameterized ones.  However, when I did that, I got changes in some
of the regression test plans, and they didn't seem to be for the better.
So I left that alone.  As-is, this patch doesn't seem to affect the
results for any existing regression tests.

> Do you plan to push that into 9.5, or 9.6? I assume it's a behavior
> change so that no back-patching, right?

Mumble.  It's definitely a planner bug fix, and I believe that the effects
are narrowly constrained to cases where significantly better plans are
possible.  So personally I'd be willing to back-patch it.  But others
might see that differently, especially since it's been like this for a
long time and we only have one field complaint.

            regards, tom lane

diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 4775acf..87304ba 100644
*** a/src/backend/nodes/outfuncs.c
--- b/src/backend/nodes/outfuncs.c
*************** _outRelOptInfo(StringInfo str, const Rel
*** 1844,1849 ****
--- 1844,1850 ----
      WRITE_FLOAT_FIELD(rows, "%.0f");
      WRITE_INT_FIELD(width);
      WRITE_BOOL_FIELD(consider_startup);
+     WRITE_BOOL_FIELD(consider_param_startup);
      WRITE_NODE_FIELD(reltargetlist);
      WRITE_NODE_FIELD(pathlist);
      WRITE_NODE_FIELD(ppilist);
diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README
index d51fd57..d1349aa 100644
*** a/src/backend/optimizer/README
--- b/src/backend/optimizer/README
*************** a nestloop that provides parameters to t
*** 795,801 ****
  do not ignore merge joins entirely, joinpath.c does not fully explore the
  space of potential merge joins with parameterized inputs.  Also, add_path
  treats parameterized paths as having no pathkeys, so that they compete
! only on total cost and rowcount; they don't get preference for producing a
  special sort order.  This creates additional bias against merge joins,
  since we might discard a path that could have been useful for performing
  a merge without an explicit sort step.  Since a parameterized path must
--- 795,801 ----
  do not ignore merge joins entirely, joinpath.c does not fully explore the
  space of potential merge joins with parameterized inputs.  Also, add_path
  treats parameterized paths as having no pathkeys, so that they compete
! only on cost and rowcount; they don't get preference for producing a
  special sort order.  This creates additional bias against merge joins,
  since we might discard a path that could have been useful for performing
  a merge without an explicit sort step.  Since a parameterized path must
*************** uninteresting, these choices do not affe
*** 804,809 ****
--- 804,817 ----
  output order of a query --- they only make it harder to use a merge join
  at a lower level.  The savings in planning work justifies that.

+ Similarly, parameterized paths do not normally get preference in add_path
+ for having cheap startup cost; that's seldom of much value when on the
+ inside of a nestloop, so it seems not worth keeping extra paths solely for
+ that.  An exception occurs for parameterized paths for the RHS relation of
+ a SEMI or ANTI join: in those cases, we can stop the inner scan after the
+ first match, so that it's primarily startup not total cost that we care
+ about.
+

  LATERAL subqueries
  ------------------
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 4e6d90d..0b83189 100644
*** a/src/backend/optimizer/path/allpaths.c
--- b/src/backend/optimizer/path/allpaths.c
*************** set_rel_pathlist_hook_type set_rel_pathl
*** 61,66 ****
--- 61,67 ----
  join_search_hook_type join_search_hook = NULL;


+ static void set_base_rel_consider_startup(PlannerInfo *root);
  static void set_base_rel_sizes(PlannerInfo *root);
  static void set_base_rel_pathlists(PlannerInfo *root);
  static void set_rel_size(PlannerInfo *root, RelOptInfo *rel,
*************** make_one_rel(PlannerInfo *root, List *jo
*** 152,157 ****
--- 153,161 ----
          root->all_baserels = bms_add_member(root->all_baserels, brel->relid);
      }

+     /* Mark base rels as to whether we care about fast-start plans */
+     set_base_rel_consider_startup(root);
+
      /*
       * Generate access paths for the base rels.
       */
*************** make_one_rel(PlannerInfo *root, List *jo
*** 172,177 ****
--- 176,224 ----
  }

  /*
+  * set_base_rel_consider_startup
+  *      Set the consider_[param_]startup flags for each base-relation entry.
+  *
+  * For the moment, we only deal with consider_param_startup here; because the
+  * logic for consider_startup is pretty trivial and is the same for every base
+  * relation, we just let build_simple_rel() initialize that flag correctly to
+  * start with.  If that logic ever gets more complicated it would probably
+  * be better to move it here.
+  */
+ static void
+ set_base_rel_consider_startup(PlannerInfo *root)
+ {
+     /*
+      * Since parameterized paths can only be used on the inside of a nestloop
+      * join plan, there is usually little value in considering fast-start
+      * plans for them.  However, for relations that are on the RHS of a SEMI
+      * or ANTI join, a fast-start plan can be useful because we're only going
+      * to care about fetching one tuple anyway.
+      *
+      * To minimize growth of planning time, we currently restrict this to
+      * cases where the RHS is a single base relation, not a join; there is no
+      * provision for consider_param_startup to get set at all on joinrels.
+      * Also we don't worry about appendrels.  costsize.c's costing rules for
+      * nestloop semi/antijoins don't consider such cases either.
+      */
+     ListCell   *lc;
+
+     foreach(lc, root->join_info_list)
+     {
+         SpecialJoinInfo *sjinfo = (SpecialJoinInfo *) lfirst(lc);
+         int            varno;
+
+         if ((sjinfo->jointype == JOIN_SEMI || sjinfo->jointype == JOIN_ANTI) &&
+             bms_get_singleton_member(sjinfo->syn_righthand, &varno))
+         {
+             RelOptInfo *rel = find_base_rel(root, varno);
+
+             rel->consider_param_startup = true;
+         }
+     }
+ }
+
+ /*
   * set_base_rel_sizes
   *      Set the size estimates (rows and widths) for each base-relation entry.
   *
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 7f7aa24..2afd59a 100644
*** a/src/backend/optimizer/util/pathnode.c
--- b/src/backend/optimizer/util/pathnode.c
*************** compare_fractional_path_costs(Path *path
*** 138,156 ****
   * total cost, we just say that their costs are "different", since neither
   * dominates the other across the whole performance spectrum.
   *
!  * If consider_startup is false, then we don't care about keeping paths with
!  * good startup cost, so we'll never return COSTS_DIFFERENT.
!  *
!  * This function also includes special hacks to support a policy enforced
!  * by its sole caller, add_path(): paths that have any parameterization
!  * cannot win comparisons on the grounds of having cheaper startup cost,
!  * since we deem only total cost to be of interest for a parameterized path.
!  * (Unparameterized paths are more common, so we check for this case last.)
   */
  static PathCostComparison
! compare_path_costs_fuzzily(Path *path1, Path *path2, double fuzz_factor,
!                            bool consider_startup)
  {
      /*
       * Check total cost first since it's more likely to be different; many
       * paths have zero startup cost.
--- 138,154 ----
   * total cost, we just say that their costs are "different", since neither
   * dominates the other across the whole performance spectrum.
   *
!  * This function also enforces a policy rule that paths for which the relevant
!  * one of parent->consider_startup and parent->consider_param_startup is false
!  * cannot win comparisons on the grounds of good startup cost, so we never
!  * return COSTS_DIFFERENT when that is true for the total-cost loser.
   */
  static PathCostComparison
! compare_path_costs_fuzzily(Path *path1, Path *path2, double fuzz_factor)
  {
+ #define CONSIDER_PATH_STARTUP_COST(p)  \
+     ((p)->param_info == NULL ? (p)->parent->consider_startup : (p)->parent->consider_param_startup)
+
      /*
       * Check total cost first since it's more likely to be different; many
       * paths have zero startup cost.
*************** compare_path_costs_fuzzily(Path *path1,
*** 158,166 ****
      if (path1->total_cost > path2->total_cost * fuzz_factor)
      {
          /* path1 fuzzily worse on total cost */
!         if (consider_startup &&
!             path2->startup_cost > path1->startup_cost * fuzz_factor &&
!             path1->param_info == NULL)
          {
              /* ... but path2 fuzzily worse on startup, so DIFFERENT */
              return COSTS_DIFFERENT;
--- 156,163 ----
      if (path1->total_cost > path2->total_cost * fuzz_factor)
      {
          /* path1 fuzzily worse on total cost */
!         if (CONSIDER_PATH_STARTUP_COST(path1) &&
!             path2->startup_cost > path1->startup_cost * fuzz_factor)
          {
              /* ... but path2 fuzzily worse on startup, so DIFFERENT */
              return COSTS_DIFFERENT;
*************** compare_path_costs_fuzzily(Path *path1,
*** 171,179 ****
      if (path2->total_cost > path1->total_cost * fuzz_factor)
      {
          /* path2 fuzzily worse on total cost */
!         if (consider_startup &&
!             path1->startup_cost > path2->startup_cost * fuzz_factor &&
!             path2->param_info == NULL)
          {
              /* ... but path1 fuzzily worse on startup, so DIFFERENT */
              return COSTS_DIFFERENT;
--- 168,175 ----
      if (path2->total_cost > path1->total_cost * fuzz_factor)
      {
          /* path2 fuzzily worse on total cost */
!         if (CONSIDER_PATH_STARTUP_COST(path2) &&
!             path1->startup_cost > path2->startup_cost * fuzz_factor)
          {
              /* ... but path1 fuzzily worse on startup, so DIFFERENT */
              return COSTS_DIFFERENT;
*************** compare_path_costs_fuzzily(Path *path1,
*** 181,188 ****
          /* else path1 dominates */
          return COSTS_BETTER1;
      }
!     /* fuzzily the same on total cost */
!     /* (so we may as well compare startup cost, even if !consider_startup) */
      if (path1->startup_cost > path2->startup_cost * fuzz_factor &&
          path2->param_info == NULL)
      {
--- 177,189 ----
          /* else path1 dominates */
          return COSTS_BETTER1;
      }
!
!     /*
!      * Fuzzily the same on total cost (so we might as well compare startup
!      * cost, even when that would otherwise be uninteresting; but
!      * parameterized paths aren't allowed to win this way, we'd rather move on
!      * to other comparison heuristics)
!      */
      if (path1->startup_cost > path2->startup_cost * fuzz_factor &&
          path2->param_info == NULL)
      {
*************** compare_path_costs_fuzzily(Path *path1,
*** 197,202 ****
--- 198,205 ----
      }
      /* fuzzily the same on both costs */
      return COSTS_EQUAL;
+
+ #undef CONSIDER_PATH_STARTUP_COST
  }

  /*
*************** compare_path_costs_fuzzily(Path *path1,
*** 212,222 ****
   *
   * The cheapest_parameterized_paths list collects all parameterized paths
   * that have survived the add_path() tournament for this relation.  (Since
!  * add_path ignores pathkeys and startup cost for a parameterized path,
!  * these will be paths that have best total cost or best row count for their
!  * parameterization.)  cheapest_parameterized_paths always includes the
!  * cheapest-total unparameterized path, too, if there is one; the users of
!  * that list find it more convenient if that's included.
   *
   * This is normally called only after we've finished constructing the path
   * list for the rel node.
--- 215,225 ----
   *
   * The cheapest_parameterized_paths list collects all parameterized paths
   * that have survived the add_path() tournament for this relation.  (Since
!  * add_path ignores pathkeys for a parameterized path, these will be paths
!  * that have best cost or best row count for their parameterization.)
!  * cheapest_parameterized_paths always includes the cheapest-total
!  * unparameterized path, too, if there is one; the users of that list find
!  * it more convenient if that's included.
   *
   * This is normally called only after we've finished constructing the path
   * list for the rel node.
*************** set_cheapest(RelOptInfo *parent_rel)
*** 361,374 ****
   *      cases do arise, so we make the full set of checks anyway.
   *
   *      There are two policy decisions embedded in this function, along with
!  *      its sibling add_path_precheck: we treat all parameterized paths as
!  *      having NIL pathkeys, and we ignore their startup costs, so that they
!  *      compete only on parameterization, total cost and rowcount.  This is to
!  *      reduce the number of parameterized paths that are kept.  See discussion
!  *      in src/backend/optimizer/README.
   *
!  *      Another policy that is enforced here is that we only consider cheap
!  *      startup cost to be interesting if parent_rel->consider_startup is true.
   *
   *      The pathlist is kept sorted by total_cost, with cheaper paths
   *      at the front.  Within this routine, that's simply a speed hack:
--- 364,378 ----
   *      cases do arise, so we make the full set of checks anyway.
   *
   *      There are two policy decisions embedded in this function, along with
!  *      its sibling add_path_precheck.  First, we treat all parameterized paths
!  *      as having NIL pathkeys, so that they cannot win comparisons on the
!  *      basis of sort order.  This is to reduce the number of parameterized
!  *      paths that are kept; see discussion in src/backend/optimizer/README.
   *
!  *      Second, we only consider cheap startup cost to be interesting if
!  *      parent_rel->consider_startup is true for an unparameterized path, or
!  *      parent_rel->consider_param_startup is true for a parameterized one.
!  *      Again, this allows discarding useless paths sooner.
   *
   *      The pathlist is kept sorted by total_cost, with cheaper paths
   *      at the front.  Within this routine, that's simply a speed hack:
*************** add_path(RelOptInfo *parent_rel, Path *n
*** 433,440 ****
           * Do a fuzzy cost comparison with 1% fuzziness limit.  (XXX does this
           * percentage need to be user-configurable?)
           */
!         costcmp = compare_path_costs_fuzzily(new_path, old_path, 1.01,
!                                              parent_rel->consider_startup);

          /*
           * If the two paths compare differently for startup and total cost,
--- 437,443 ----
           * Do a fuzzy cost comparison with 1% fuzziness limit.  (XXX does this
           * percentage need to be user-configurable?)
           */
!         costcmp = compare_path_costs_fuzzily(new_path, old_path, 1.01);

          /*
           * If the two paths compare differently for startup and total cost,
*************** add_path(RelOptInfo *parent_rel, Path *n
*** 501,508 ****
                                      accept_new = false; /* old dominates new */
                                  else if (compare_path_costs_fuzzily(new_path,
                                                                      old_path,
!                                                                 1.0000000001,
!                                                                     parent_rel->consider_startup) == COSTS_BETTER1)
                                      remove_old = true;    /* new dominates old */
                                  else
                                      accept_new = false; /* old equals or
--- 504,510 ----
                                      accept_new = false; /* old dominates new */
                                  else if (compare_path_costs_fuzzily(new_path,
                                                                      old_path,
!                                               1.0000000001) == COSTS_BETTER1)
                                      remove_old = true;    /* new dominates old */
                                  else
                                      accept_new = false; /* old equals or
*************** add_path_precheck(RelOptInfo *parent_rel
*** 622,632 ****
--- 624,638 ----
                    List *pathkeys, Relids required_outer)
  {
      List       *new_path_pathkeys;
+     bool        consider_startup;
      ListCell   *p1;

      /* Pretend parameterized paths have no pathkeys, per add_path policy */
      new_path_pathkeys = required_outer ? NIL : pathkeys;

+     /* Decide whether new path's startup cost is interesting */
+     consider_startup = required_outer ? parent_rel->consider_param_startup : parent_rel->consider_startup;
+
      foreach(p1, parent_rel->pathlist)
      {
          Path       *old_path = (Path *) lfirst(p1);
*************** add_path_precheck(RelOptInfo *parent_rel
*** 644,651 ****
           */
          if (total_cost >= old_path->total_cost)
          {
!             /* can win on startup cost only if unparameterized */
!             if (startup_cost >= old_path->startup_cost || required_outer)
              {
                  /* new path does not win on cost, so check pathkeys... */
                  List       *old_path_pathkeys;
--- 650,657 ----
           */
          if (total_cost >= old_path->total_cost)
          {
!             /* new path can win on startup cost only if consider_startup */
!             if (startup_cost >= old_path->startup_cost || !consider_startup)
              {
                  /* new path does not win on cost, so check pathkeys... */
                  List       *old_path_pathkeys;
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 1d635cd..be2ef3b 100644
*** a/src/backend/optimizer/util/relnode.c
--- b/src/backend/optimizer/util/relnode.c
*************** build_simple_rel(PlannerInfo *root, int
*** 101,106 ****
--- 101,107 ----
      rel->width = 0;
      /* cheap startup cost is interesting iff not all tuples to be retrieved */
      rel->consider_startup = (root->tuple_fraction > 0);
+     rel->consider_param_startup = false;        /* might get changed later */
      rel->reltargetlist = NIL;
      rel->pathlist = NIL;
      rel->ppilist = NIL;
*************** build_join_rel(PlannerInfo *root,
*** 361,366 ****
--- 362,368 ----
      joinrel->width = 0;
      /* cheap startup cost is interesting iff not all tuples to be retrieved */
      joinrel->consider_startup = (root->tuple_fraction > 0);
+     joinrel->consider_param_startup = false;
      joinrel->reltargetlist = NIL;
      joinrel->pathlist = NIL;
      joinrel->ppilist = NIL;
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 706b6d1..33b0874 100644
*** a/src/include/nodes/relation.h
--- b/src/include/nodes/relation.h
*************** typedef struct PlannerInfo
*** 321,328 ****
   *               clauses have been applied (ie, output rows of a plan for it)
   *        width - avg. number of bytes per tuple in the relation after the
   *                appropriate projections have been done (ie, output width)
!  *        consider_startup - true if there is any value in keeping paths for
   *                           this rel on the basis of having cheap startup cost
   *        reltargetlist - List of Var and PlaceHolderVar nodes for the values
   *                        we need to output from this relation.
   *                        List is in no particular order, but all rels of an
--- 321,329 ----
   *               clauses have been applied (ie, output rows of a plan for it)
   *        width - avg. number of bytes per tuple in the relation after the
   *                appropriate projections have been done (ie, output width)
!  *        consider_startup - true if there is any value in keeping plain paths for
   *                           this rel on the basis of having cheap startup cost
+  *        consider_param_startup - the same for parameterized paths
   *        reltargetlist - List of Var and PlaceHolderVar nodes for the values
   *                        we need to output from this relation.
   *                        List is in no particular order, but all rels of an
*************** typedef struct RelOptInfo
*** 437,442 ****
--- 438,444 ----

      /* per-relation planner control flags */
      bool        consider_startup;        /* keep cheap-startup-cost paths? */
+     bool        consider_param_startup; /* ditto, for parameterized paths? */

      /* materialization information */
      List       *reltargetlist;    /* Vars to be output by scan of relation */

Re: nested loop semijoin estimates

From

Tomas Vondra

Date:

31 May 2015, 23:35:14

On 06/01/15 00:08, Tom Lane wrote:
> Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
>> On 05/30/15 23:16, Tom Lane wrote:
>>> Attached is a draft patch for that.  It fixes the problem for me:
>
>> Seems to be working OK, but I still do get a Bitmap Heap Scan there (but
>> more about that later).
>
> Attached is an incremental patch (on top of the previous one) to
> allow startup cost of parameterized paths to be considered when the
> relation is the RHS of a semi or anti join. It seems reasonably clean
> except for one thing: logically, we perhaps should remove the checks
> on path->param_info from the last half of
> compare_path_costs_fuzzily(), so as to increase the symmetry between
> parameterized paths and unparameterized ones. However, when I did
> that, I got changes in some of the regression test plans, and they
> didn't seem to be for the better. So I left that alone. As-is, this
> patch doesn't seem to affect the results for any existing regression
> tests.

Seems to be working fine. I've tried a bunch of queries modifying the 
test case in various ways, and all seem to be planned fine. I've been 
unable to come up with a query that'd get planned badly.

Regarding the remaining checks in compare_path_costs_fuzzily(), isn't 
that simply caused by very small data sets? For example the first 
"failing" plan in join.sql looks like this:

Nested Loop Left Join  (cost=0.29..22.80 rows=2 width=12)
 Nested Loop Left Join  (cost=0.29..22.80 rows=2 width=12)   Output: "*VALUES*".column1, i1.f1, (666)   Join Filter:
("*VALUES*".column1= i1.f1)   ->  Values Scan on "*VALUES*"  (cost=0.00..0.03 rows=2 width=4)         Output:
"*VALUES*".column1  ->  Materialize  (cost=0.29..22.64 rows=5 width=8)         Output: i1.f1, (666)         ->  Nested
LoopLeft Join  (cost=0.29..22.61 rows=5 width=8)               Output: i1.f1, 666               ->  Seq Scan on
public.int4_tbli1  (cost=0.00..1.05 ...                     Output: i1.f1               ->  Index Only Scan using
tenk1_unique2on public....                     Output: i2.unique2                     Index Cond: (i2.unique2 = i1.f1)
 

while with the changes it'd look like this:
 Hash Right Join  (cost=0.34..22.70 rows=2 width=12)   Output: "*VALUES*".column1, i1.f1, (666)   Hash Cond: (i1.f1 =
"*VALUES*".column1)  ->  Nested Loop Left Join  (cost=0.29..22.61 rows=5 width=8)         Output: i1.f1, 666         ->
Seq Scan on public.int4_tbl i1  (cost=0.00..1.05 ...               Output: i1.f1         ->  Index Only Scan using
tenk1_unique2on public.tenk1 ...               Output: i2.unique2               Index Cond: (i2.unique2 = i1.f1)   ->
Hash (cost=0.03..0.03 rows=2 width=4)         Output: "*VALUES*".column1         ->  Values Scan on "*VALUES*"
(cost=0.00..0.03rows=2 ...               Output: "*VALUES*".column1
 
(14 rows)

So the planner actually believes the plan to be cheaper, although only 
by a tiny margin. And I see pretty much no difference in planning/exec 
time (but I'm on a machine with power-management and VMs, so a lot of 
random noise).

But once the int4_tbl gets bigger (say, 160k rows instead of the 5), 
even the current the hash join clearly wins. Actually, it switches from 
the current plan way sooner (at 500 rows it's already using the hash 
join, and I see about the same timings).

I don't really see why you think those plan changes to be bad? And even 
if they are,  isn't that simply a matter of tuning the cost parameters?


>> Do you plan to push that into 9.5, or 9.6? I assume it's a
>> behavior change so that no back-patching, right?
>
> Mumble. It's definitely a planner bug fix, and I believe that the
> effects are narrowly constrained to cases where significantly better
> plans are possible. So personally I'd be willing to back-patch it.
> But others might see that differently, especially since it's been
> like this for a long time and we only have one field complaint.

+1 to back-patching from me. It's true we only have one field complaint, 
but I believe there are more users impacted by this. They just didn't 
notice - we only really got this complaint because of the plan 
discrepancy between queries that are almost exactly the same.

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: nested loop semijoin estimates

From

Tom Lane

Date:

01 June 2015, 21:18:45

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> On 06/01/15 00:08, Tom Lane wrote:
>> Attached is an incremental patch (on top of the previous one) to
>> allow startup cost of parameterized paths to be considered when the
>> relation is the RHS of a semi or anti join. It seems reasonably clean
>> except for one thing: logically, we perhaps should remove the checks
>> on path->param_info from the last half of
>> compare_path_costs_fuzzily(), so as to increase the symmetry between
>> parameterized paths and unparameterized ones. However, when I did
>> that, I got changes in some of the regression test plans, and they
>> didn't seem to be for the better. So I left that alone. As-is, this
>> patch doesn't seem to affect the results for any existing regression
>> tests.

> Regarding the remaining checks in compare_path_costs_fuzzily(), isn't 
> that simply caused by very small data sets?

No, I don't think the size of the data set is the issue.  The point is
that previously, if we had two parameterized paths of fuzzily the same
total cost, we'd choose which to keep based on factors other than startup
cost.  If we change that part of compare_path_costs_fuzzily(), we'll be
changing the behavior in cases that are unrelated to the problem we are
trying to solve, because same-total-cost isn't going to apply to the
bitmap indexscan vs. plain indexscan cases we're worried about here.
(The places where it tends to happen are things like hash left join
vs hash right join with the inputs swapped.)  I don't want to change such
cases in a patch that's meant to be back-patched; people don't like plan
instability in stable branches.

It's also not quite clear what the new rule ought to be.  Rather than
treating parameterized paths exactly like regular ones, maybe we should
consider their startup cost only if consider_param_startup is true.
It remains the case that in most scenarios for parameterized paths,
startup cost really isn't that interesting and shouldn't drive choices.

Another practical problem is that the regression tests that are being
affected are specifically meant to test particular scenarios in
construction of nested-loop join plans.  If the planner stops selecting
a nestloop plan there, the test is effectively broken.  We could probably
rejigger the test queries to restore the intended test scenario, but that
wasn't an exercise I was in a hurry to undertake.

Anyway, bottom line is that if we want to change that, it ought to be
a separate HEAD-only patch.

>>> Do you plan to push that into 9.5, or 9.6? I assume it's a
>>> behavior change so that no back-patching, right?

>> Mumble. It's definitely a planner bug fix, and I believe that the
>> effects are narrowly constrained to cases where significantly better
>> plans are possible. So personally I'd be willing to back-patch it.
>> But others might see that differently, especially since it's been
>> like this for a long time and we only have one field complaint.

> +1 to back-patching from me. It's true we only have one field complaint, 
> but I believe there are more users impacted by this. They just didn't 
> notice - we only really got this complaint because of the plan 
> discrepancy between queries that are almost exactly the same.

Perhaps.  It seems like in many cases the planner would accidentally
choose the "right" plan anyway, because even though it was overestimating
the cost, the other alternatives would usually look even worse.  So it's
hard to tell how many users will be benefited.  But I'm fairly sure the
patch as posted is narrow enough that very few people would be negatively
affected.

Anybody else want to speak for or against back-patching the patch as
posted?  I intentionally didn't push it in before today's releases,
but I will push it later this week if there are not objections.
        regards, tom lane

Re: nested loop semijoin estimates

From

Josh Berkus

Date:

01 June 2015, 21:48:04

On 06/01/2015 02:18 PM, Tom Lane wrote:
> Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
>> On 06/01/15 00:08, Tom Lane wrote:
>>> Attached is an incremental patch (on top of the previous one) to
>>> allow startup cost of parameterized paths to be considered when the
>>> relation is the RHS of a semi or anti join. It seems reasonably clean
>>> except for one thing: logically, we perhaps should remove the checks
>>> on path->param_info from the last half of
>>> compare_path_costs_fuzzily(), so as to increase the symmetry between
>>> parameterized paths and unparameterized ones. However, when I did
>>> that, I got changes in some of the regression test plans, and they
>>> didn't seem to be for the better. So I left that alone. As-is, this
>>> patch doesn't seem to affect the results for any existing regression
>>> tests.
> 
>> Regarding the remaining checks in compare_path_costs_fuzzily(), isn't 
>> that simply caused by very small data sets?
> 
> No, I don't think the size of the data set is the issue.  The point is
> that previously, if we had two parameterized paths of fuzzily the same
> total cost, we'd choose which to keep based on factors other than startup
> cost.  If we change that part of compare_path_costs_fuzzily(), we'll be
> changing the behavior in cases that are unrelated to the problem we are
> trying to solve, because same-total-cost isn't going to apply to the
> bitmap indexscan vs. plain indexscan cases we're worried about here.
> (The places where it tends to happen are things like hash left join
> vs hash right join with the inputs swapped.)  I don't want to change such
> cases in a patch that's meant to be back-patched; people don't like plan
> instability in stable branches.
> 
> It's also not quite clear what the new rule ought to be.  Rather than
> treating parameterized paths exactly like regular ones, maybe we should
> consider their startup cost only if consider_param_startup is true.
> It remains the case that in most scenarios for parameterized paths,
> startup cost really isn't that interesting and shouldn't drive choices.
> 
> Another practical problem is that the regression tests that are being
> affected are specifically meant to test particular scenarios in
> construction of nested-loop join plans.  If the planner stops selecting
> a nestloop plan there, the test is effectively broken.  We could probably
> rejigger the test queries to restore the intended test scenario, but that
> wasn't an exercise I was in a hurry to undertake.
> 
> Anyway, bottom line is that if we want to change that, it ought to be
> a separate HEAD-only patch.
> 
>>>> Do you plan to push that into 9.5, or 9.6? I assume it's a
>>>> behavior change so that no back-patching, right?
> 
>>> Mumble. It's definitely a planner bug fix, and I believe that the
>>> effects are narrowly constrained to cases where significantly better
>>> plans are possible. So personally I'd be willing to back-patch it.
>>> But others might see that differently, especially since it's been
>>> like this for a long time and we only have one field complaint.
> 
>> +1 to back-patching from me. It's true we only have one field complaint, 
>> but I believe there are more users impacted by this. They just didn't 
>> notice - we only really got this complaint because of the plan 
>> discrepancy between queries that are almost exactly the same.
> 
> Perhaps.  It seems like in many cases the planner would accidentally
> choose the "right" plan anyway, because even though it was overestimating
> the cost, the other alternatives would usually look even worse.  So it's
> hard to tell how many users will be benefited.  But I'm fairly sure the
> patch as posted is narrow enough that very few people would be negatively
> affected.
> 
> Anybody else want to speak for or against back-patching the patch as
> posted?  I intentionally didn't push it in before today's releases,
> but I will push it later this week if there are not objections.

I would like Mark Wong to test this on DBT3, if that's possible for him.I'm very worried about an unanticipated
regression.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: nested loop semijoin estimates

From

Tomas Vondra

Date:

01 June 2015, 22:23:10

On 06/01/15 23:47, Josh Berkus wrote:
> On 06/01/2015 02:18 PM, Tom Lane wrote:
>
>> Anybody else want to speak for or against back-patching the patch as
>> posted?  I intentionally didn't push it in before today's releases,
>> but I will push it later this week if there are not objections.
>
> I would like Mark Wong to test this on DBT3, if that's possible for him.
>   I'm very worried about an unanticipated regression.

AFAIK Mark is busy with other stuff at the moment, but I can do the 
TPC-H (which DBT3 is equal to, IIRC).

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: nested loop semijoin estimates

From

Josh Berkus

Date:

01 June 2015, 23:47:26

On 06/01/2015 03:22 PM, Tomas Vondra wrote:
> 
> On 06/01/15 23:47, Josh Berkus wrote:
>> On 06/01/2015 02:18 PM, Tom Lane wrote:
>>
>>> Anybody else want to speak for or against back-patching the patch as
>>> posted?  I intentionally didn't push it in before today's releases,
>>> but I will push it later this week if there are not objections.
>>
>> I would like Mark Wong to test this on DBT3, if that's possible for him.
>>   I'm very worried about an unanticipated regression.
> 
> AFAIK Mark is busy with other stuff at the moment, but I can do the
> TPC-H (which DBT3 is equal to, IIRC).

Yeah, I just want something which has a chance of catching unanticipated
regression in queries which should be unaffected by the patch.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: nested loop semijoin estimates

From

Tomas Vondra

Date:

02 June 2015, 14:12:05

On 06/02/15 01:47, Josh Berkus wrote:
> On 06/01/2015 03:22 PM, Tomas Vondra wrote:
>>
>> On 06/01/15 23:47, Josh Berkus wrote:
>>> On 06/01/2015 02:18 PM, Tom Lane wrote:
>>>
>>>> Anybody else want to speak for or against back-patching the patch as
>>>> posted?  I intentionally didn't push it in before today's releases,
>>>> but I will push it later this week if there are not objections.
>>>
>>> I would like Mark Wong to test this on DBT3, if that's possible
>>> for him. I'm very worried about an unanticipated regression.
>>
>> AFAIK Mark is busy with other stuff at the moment, but I can do
>> the TPC-H (which DBT3 is equal to, IIRC).
>
> Yeah, I just want something which has a chance of catching
> unanticipated regression in queries which should be unaffected by the
> patch.

OK, so I did the testing today - with TPC-H and TPC-DS benchmarks. The
results are good, IMHO.

With TPC-H, I've used 1GB and 4GB datasets, and I've seen no plan
changes at all. I don't plan to run the tests on larger data sets, I do
expect the behavior to remain the same, considering the uniformity of
TPC-H data sets.

With TPC-DS (using the 63 queries supported by PostgreSQL), I've seen
two cases of plan changes - see the plans attached. In both cases
however the plan change results in much better performance. While on
master the queries took 23 and 18 seconds, with the two patches it's
only 7 and 3. This is just the 1GB dataset. I'll repeat the test with
the 4GB dataset and post an update if there are any changes.

While this can't prove there are no regressions, in these two benchmarks
the patches actually improve some of the queries.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: nested loop semijoin estimates

From

Tom Lane

Date:

02 June 2015, 14:38:14

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> OK, so I did the testing today - with TPC-H and TPC-DS benchmarks. The 
> results are good, IMHO.

> With TPC-H, I've used 1GB and 4GB datasets, and I've seen no plan 
> changes at all. I don't plan to run the tests on larger data sets, I do 
> expect the behavior to remain the same, considering the uniformity of 
> TPC-H data sets.

> With TPC-DS (using the 63 queries supported by PostgreSQL), I've seen 
> two cases of plan changes - see the plans attached. In both cases 
> however the plan change results in much better performance. While on 
> master the queries took 23 and 18 seconds, with the two patches it's 
> only 7 and 3. This is just the 1GB dataset. I'll repeat the test with 
> the 4GB dataset and post an update if there are any changes.

I'm a bit disturbed by that, because AFAICS from the plans, these queries
did not involve any semi or anti joins, which should mean that the patch
would not have changed the planner's behavior.  You were using the second
patch as-posted, right, without further hacking on
compare_path_costs_fuzzily?

It's possible that the change was due to random variation in ANALYZE
statistics, in which case it was just luck.
        regards, tom lane

Re: nested loop semijoin estimates

From

Tomas Vondra

Date:

02 June 2015, 15:34:28

On 06/02/15 16:37, Tom Lane wrote:
> Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
>> OK, so I did the testing today - with TPC-H and TPC-DS benchmarks. The
>> results are good, IMHO.
>
> I'm a bit disturbed by that, because AFAICS from the plans, these
> queries did not involve any semi or anti joins, which should mean
> that the patch would not have changed the planner's behavior. You
> were using the second patch as-posted, right, without further hacking
> on compare_path_costs_fuzzily?

Yes, no additional changes.

> It's possible that the change was due to random variation in ANALYZE
> statistics, in which case it was just luck.

I don't think so. I simply loaded the data, ran ANALYZE, and then simply 
started either master or patched master. There should be no difference 
in statistics, I believe. Also, the plans contain pretty much the same 
row counts, but the costs differ.

For example look at the 'cs_ui' CTE, right at the beginning of the 
analyze logs. The row counts are exactly the same, but the costs are 
different. And it's not using semijoins or not nested loops ...

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: nested loop semijoin estimates

From

Tom Lane

Date:

02 June 2015, 15:50:31

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> On 06/02/15 16:37, Tom Lane wrote:
>> It's possible that the change was due to random variation in ANALYZE
>> statistics, in which case it was just luck.

> I don't think so. I simply loaded the data, ran ANALYZE, and then simply 
> started either master or patched master. There should be no difference 
> in statistics, I believe. Also, the plans contain pretty much the same 
> row counts, but the costs differ.

> For example look at the 'cs_ui' CTE, right at the beginning of the 
> analyze logs. The row counts are exactly the same, but the costs are 
> different. And it's not using semijoins or not nested loops ...

The cost estimates in that CTE all look exactly the same to me, and the
actual runtime's not all that different either.  The key runtime
difference in the first query seems to be in the first couple of joins
to the cs_ui CTE:
       ->  Nested Loop  (cost=4.63..724.34 rows=16 width=67) (actual time=8346.904..14024.947 rows=117 loops=1)
   Join Filter: (item.i_item_sk = store_returns.sr_item_sk)             ->  Nested Loop  (cost=0.29..662.07 rows=1
width=59)(actual time=8324.352..13618.106 rows=8 loops=1)                   ->  CTE Scan on cs_ui  (cost=0.00..2.16
rows=108width=4) (actual time=7264.670..7424.096 rows=17169 loops=1)                   ->  Index Scan using item_pkey
onitem  (cost=0.29..6.10 rows=1 width=55) (actual time=0.356..0.356 rows=0 loops=17169)                         Index
Cond:(i_item_sk = cs_ui.cs_item_sk)                         Filter: ((i_current_price >= '79'::numeric) AND
(i_current_price<= '89'::numeric) AND (i_current_price >= '80'::numeric) AND (i_current_price <= '94'::numeric) AND
(i_color= ANY ('{navajo,burlywood,cornflower,olive,turquoise,linen}'::bpchar[])))                         Rows Removed
byFilter: 1             ->  Bitmap Heap Scan on store_returns  (cost=4.34..62.05 rows=18 width=8) (actual
time=32.525..50.662rows=15 loops=8)                   Recheck Cond: (sr_item_sk = cs_ui.cs_item_sk)
HeapBlocks: exact=117                   ->  Bitmap Index Scan on idx_sr_item_sk  (cost=0.00..4.34 rows=18 width=0)
(actualtime=19.747..19.747 rows=15 loops=8)                         Index Cond: (sr_item_sk = cs_ui.cs_item_sk)
 

vs patched
       ->  Nested Loop  (cost=4.63..724.34 rows=16 width=67) (actual time=6867.921..7001.417 rows=117 loops=1)
  Join Filter: (item.i_item_sk = store_returns.sr_item_sk)             ->  Nested Loop  (cost=0.29..662.07 rows=1
width=59)(actual time=6867.874..7000.211 rows=8 loops=1)                   ->  CTE Scan on cs_ui  (cost=0.00..2.16
rows=108width=4) (actual time=6865.792..6924.816 rows=17169 loops=1)                   ->  Index Scan using item_pkey
onitem  (cost=0.29..6.10 rows=1 width=55) (actual time=0.003..0.003 rows=0 loops=17169)                         Index
Cond:(i_item_sk = cs_ui.cs_item_sk)                         Filter: ((i_current_price >= '79'::numeric) AND
(i_current_price<= '89'::numeric) AND (i_current_price >= '80'::numeric) AND (i_current_price <= '94'::numeric) AND
(i_color= ANY ('{navajo,burlywood,cornflower,olive,turquoise,linen}'::bpchar[])))                         Rows Removed
byFilter: 1             ->  Bitmap Heap Scan on store_returns  (cost=4.34..62.05 rows=18 width=8) (actual
time=0.025..0.116rows=15 loops=8)                   Recheck Cond: (sr_item_sk = cs_ui.cs_item_sk)
HeapBlocks: exact=117                   ->  Bitmap Index Scan on idx_sr_item_sk  (cost=0.00..4.34 rows=18 width=0)
(actualtime=0.017..0.017 rows=15 loops=8)                         Index Cond: (sr_item_sk = cs_ui.cs_item_sk)
 

Those are the exact same plans, and same cost estimates, but for some
reason the master run is 2X slower.  The only explanation I can think of
is disk caching effects, or maybe hint-bit setting during the first run.
It's certainly not the planner that deserves either credit or blame for
that.

The part of this plan that's actually different is a different join order
sequence for a lot of follow-on joins that are expected to get single rows
from their other table.  I think that's basically irrelevant really, but
again, this patch should not have changed anything there, since those were
plain joins not semijoins.
        regards, tom lane

Re: nested loop semijoin estimates

From

Tomas Vondra

Date:

02 June 2015, 19:38:30


On 06/02/15 17:50, Tom Lane wrote:
> Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
>> On 06/02/15 16:37, Tom Lane wrote:
>>> It's possible that the change was due to random variation in ANALYZE
>>> statistics, in which case it was just luck.
>
>> I don't think so. I simply loaded the data, ran ANALYZE, and then simply
>> started either master or patched master. There should be no difference
>> in statistics, I believe. Also, the plans contain pretty much the same
>> row counts, but the costs differ.
>
>> For example look at the 'cs_ui' CTE, right at the beginning of the
>> analyze logs. The row counts are exactly the same, but the costs are
>> different. And it's not using semijoins or not nested loops ...
>
> The cost estimates in that CTE all look exactly the same to me, and the
> actual runtime's not all that different either.  The key runtime
> difference in the first query seems to be in the first couple of joins
> to the cs_ui CTE:

D'oh! I was looking at the timings, not costs. EINOTENOUGHCAFFEINE ...


> Those are the exact same plans, and same cost estimates, but for
> some reason the master run is 2X slower. The only explanation I can
> think of is disk caching effects, or maybe hint-bit setting during
> the first run. It's certainly not the planner that deserves either
> credit or blame for that.

Yes, seems like that.


> The part of this plan that's actually different is a different join
> order sequence for a lot of follow-on joins that are expected to get
> single rows from their other table. I think that's basically
> irrelevant really, but again, this patch should not have changed
> anything there, since those were plain joins not semijoins.

Yes, it's interesting. I've repeated the tests on 1GB TPC-DS dataset,
and this time I got slightly different set of plan changes. I don't see
the change in join order, for example.

Attached is the whole explain output (all 63 queries), explain.master.1
and explain.patched.1.

The are two kinds of plan changes:

1) minor cost changes (these queries contain semijoins, so it's
    expected), but the plan is exactly the same

2) queries with nested loop replaced by hash join, i.e. plan on master
    looks like this:

      ->  Nested Loop
            ->  Nested Loop
                  ->  Nested Loop
                        ->  Seq Scan
                        ->  Index Scan
                  ->  Index Scan
            ->  Materialize
                  ->  Seq Scan

     while with the patches:

      ->  Hash Join
            ->  Hash Join
                  ->  Nested Loop
                        ->  Seq Scan
                        ->  Index Scan
                  ->  Hash
                        ->  Seq Scan
            ->  Hash
                  ->  Seq Scan

     This is slightly surprising, because those queries do not contain
     any semijoins. Detailed explain analyze for the three queries with
     different plans is attached.

At the end, I repeated the explain on master, and got exactly the same
results as at the beginning (same as explain.master.1) - so there's no
variability in statistics at all.

Another interesting thing is that these plan changes only happen after
the second patch, i.e. with nestloop-semijoin-costing-fix everything is
exactly as on master, but with consider-startup-costs-for-semijoins the
plans change.

I suspected the problem is somewhere in compare_path_costs_fuzzily() but
that doesn't seem to be the case - I've however noticed that the number
of calls to that function changes with the patch.

For example when running the query in tpcds-3.log, compare_fuzzily os
called ~1050x, but with the patch attached it's called only ~730x. I've
checked backtraces for a few of the paths that are not considered with
the patch, and all of them look like this (detailed backtrace attached):

#3  0x0000000000656cd5 in compare_path_costs_fuzzily at pathnode.c:154
#4  0x0000000000657146 in add_path at pathnode.c:438
#5  0x0000000000631bfb in try_nestloop_path at joinpath.c:347
#6  0x0000000000632931 in match_unsorted_outer at joinpath.c:900
#7  add_paths_to_joinrel at joinpath.c:225

That particular part of try_nestloop_path looks like this:

     if (add_path_precheck(joinrel,
                           workspace.startup_cost, workspace.total_cost,
                           pathkeys, required_outer))
     {
         add_path(joinrel, (Path *)
                  create_nestloop_path(...);
     }

I wouldn't be surprised if the problem was in add_path_precheck.
Actually, I'm quite sure that's where the problem is - particularly in
how required_outer was replaced with (!consider_startup):

    consider_startup
       = required_outer ? consider_param_startup : consider_startup;

If both _startup flags are false, the original required_outer value is
lost, and that changes the if() condition which was changed from

    if (startup_cost >= old_path->startup_cost || required_outer)

to

    if (startup_cost >= old_path->startup_cost || !consider_startup)

So, if required_outer=false and both p->_startup=false, we get
consider_startup=false irrespectedly of the required_outer value, so

     (!consider_startupe) != required_outer

so that the conditions are not equivalent. And indeed, by reverting the
if condition to the previous form, we get the same plans as on master.

I don't know whether this is correct behavior or not, but in all three
cases I've observed on TPC-DS, the new plans performed better

                    Q1              Q2             Q3
   --------------------------------------------------------------------
   master       518.330 ms      466.539 ms      560.585 ms
   patched      405.310 ms      340.506 ms      122.072 ms

This time I've been careful not to produce bogus numbers (also, the
explain analyze timing I mentioned in previous post were significantly
influenced by timing overhead). These results are quite stable.

A few more comments on the patch:

1) Shouldn't the CONSIDER_PATH_STARTUP_COST(p) be defined rather like
    this? I mean, if it's a parametric query, use any of the flags?
    And same in add_path_precheck()?

     #define CONSIDER_PATH_STARTUP_COST(p)  \
         ((p)->param_info == NULL ? (p)->parent->consider_startup :
          ((p)->parent->consider_param_startup ||
           (p)->parent->consider_startup))

2) Is it really possible to get different values for the startup
    flags on path1 and path2 in compare_path_costs_fuzzily()? If not
    (if I get it right, path1->parent == path2->parent anyway), then
    maybe passing that as a function parameter (just like before)
    would be better. The macro probably won't be as nice, but I find
    it easier to understand.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: nested loop semijoin estimates

From

Tom Lane

Date:

02 June 2015, 19:39:39

I wrote:
> I'm a bit disturbed by that, because AFAICS from the plans, these queries
> did not involve any semi or anti joins, which should mean that the patch
> would not have changed the planner's behavior.

After contemplating my navel for awhile, it occurred to me that maybe the
patch's changes in add_path_precheck could have caused behavioral changes
at the margins.  Could you try a run without that (without the last two
hunks in pathnode.c)?
        regards, tom lane

Re: nested loop semijoin estimates

From

Tomas Vondra

Date:

02 June 2015, 19:44:25


On 06/02/15 21:39, Tom Lane wrote:
> I wrote:
>> I'm a bit disturbed by that, because AFAICS from the plans, these queries
>> did not involve any semi or anti joins, which should mean that the patch
>> would not have changed the planner's behavior.
>
> After contemplating my navel for awhile, it occurred to me that maybe the
> patch's changes in add_path_precheck could have caused behavioral changes
> at the margins.  Could you try a run without that (without the last two
> hunks in pathnode.c)?

Yes, I think that's the cause. See the end of the message I sent just 
before you posted this.


--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: nested loop semijoin estimates

From

Tom Lane

Date:

02 June 2015, 19:55:23

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> So, if required_outer=false and both p->_startup=false, we get 
> consider_startup=false irrespectedly of the required_outer value, so
>      (!consider_startupe) != required_outer
> so that the conditions are not equivalent. And indeed, by reverting the 
> if condition to the previous form, we get the same plans as on master.

Ah, I see we arrived at the same conclusions independently.

> I don't know whether this is correct behavior or not, but in all three 
> cases I've observed on TPC-DS, the new plans performed better

Hm.  In principle, add_path_precheck shouldn't be doing anything except
rejecting paths that would get rejected anyway later on.  However it
strikes me that there's a logic error in it; specifically, this
simplifying assumption is wrong:
        * For speed, we make exact rather than fuzzy cost comparisons. If an        * old path dominates the new path
exactlyon both costs, it will        * surely do so fuzzily.

The old path's cost might be fuzzily the same, not fuzzily better.  This
is less likely to be an issue for total cost (since even if they're
fuzzily the same at this point, they probably won't be after we get done
adding more components to the new path's cost estimate).  But it could
well be an issue for startup cost since that's often exactly zero.  So
I guess this must be rejecting or letting through a slightly different set
of non-parameterized paths than it did before.  I'm not prepared to assume
the results are always superior, because we haven't fixed the actual logic
oversight.

> 1) Shouldn't the CONSIDER_PATH_STARTUP_COST(p) be defined rather like
>     this? I mean, if it's a parametric query, use any of the flags?

No.  The paths are parametric, not the query as a whole.

> 2) Is it really possible to get different values for the startup
>     flags on path1 and path2 in compare_path_costs_fuzzily()?

If we were comparing a parameterized and an unparameterized path,
it might be that one's startup cost is of interest while the other
one's isn't.  This is written to let a path win on startup cost
as long as its startup cost is actually of interest.

>     If not
>     (if I get it right, path1->parent == path2->parent anyway), then
>     maybe passing that as a function parameter (just like before)
>     would be better. The macro probably won't be as nice, but I find
>     it easier to understand.

Well, we could pass the parent RelOptInfo explicitly instead of passing
the two flags, but I don't find that to be an improvement particularly.
        regards, tom lane

Re: nested loop semijoin estimates

From

Tom Lane

Date:

02 June 2015, 21:27:46

I wrote:
> Hm.  In principle, add_path_precheck shouldn't be doing anything except
> rejecting paths that would get rejected anyway later on.  However it
> strikes me that there's a logic error in it; specifically, this
> simplifying assumption is wrong:
>          * For speed, we make exact rather than fuzzy cost comparisons. If an
>          * old path dominates the new path exactly on both costs, it will
>          * surely do so fuzzily.

After staring at this awhile longer, I still don't see exactly why it's
changing the results.  Do we have instructions around here anyplace on
how to set up/use TPC-DS?  I couldn't find anything about it on the
wiki ...
        regards, tom lane

Re: nested loop semijoin estimates

From

Tomas Vondra

Date:

02 June 2015, 22:20:35


On 06/02/15 23:27, Tom Lane wrote:
> I wrote:
>> Hm.  In principle, add_path_precheck shouldn't be doing anything except
>> rejecting paths that would get rejected anyway later on.  However it
>> strikes me that there's a logic error in it; specifically, this
>> simplifying assumption is wrong:
>>           * For speed, we make exact rather than fuzzy cost comparisons. If an
>>           * old path dominates the new path exactly on both costs, it will
>>           * surely do so fuzzily.
>
> After staring at this awhile longer, I still don't see exactly why
> it's changing the results.

So the required_outer vs. (!consider_startup) inconsistency does not
explain that?

> Do we have instructions around here anyplace on how to set up/use
 > TPC-DS? I couldn't find anything about it on the wiki ...

Not that I'm aware of, but it's not really all that difficult.

1) get the TPC-DS toolkit from

    http://www.tpc.org/information/current_specifications.asp

2) build the 'tools' directory (just "make")

3) generate data "dsdgen -scale 1" (1GB dataset)

4) fix the data file formatting (pipe at the end)

    for f in *.dat; do sed 's/|$//g' $f > ${f/dat/csv}; done;

5) create database structure (attached 'create.sql')

6) load data (attached 'load.sql', fix paths first)

7) indexes.sql

8) run queries.sql or explain.sql


--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: nested loop semijoin estimates

From

Tom Lane

Date:

03 June 2015, 02:10:35

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> On 06/02/15 23:27, Tom Lane wrote:
>> Do we have instructions around here anyplace on how to set up/use
>>> TPC-DS? I couldn't find anything about it on the wiki ...

> Not that I'm aware of, but it's not really all that difficult.
> [ instructions... ]

Thanks.  I added some debugging printouts to trace the logic, and found
that once in awhile we get cases like this:

patch would change precheck result to fail for path cost 133752.12..136070.23 vs old path 135378.72..135550.22
phantom path causes rejection of old path with cost 135378.72..135550.22
phantom path has final cost 133752.12..136161.64, accepted

That is, we have an unparameterized path whose (initial estimate of) total
cost is just slightly more than the total cost of some old path, but whose
startup cost is less.  The existing logic in add_path_precheck allows
construction of this path to proceed.  If its finished total cost is still
within 1% of the old path's cost, it is able to supplant the old path on
the grounds that the total costs are fuzzily the same while its startup
cost is less.  The patch's change to add_path_precheck causes the startup
cost to be disregarded so that the new path is rejected immediately, and
the path replacement doesn't happen.

Now, what this demonstrates is that add_path_precheck is not operating as
intended --- as I said earlier, it's supposed to just save path
construction cycles, not change the outcome of any path replacement tests.
What we ought to do is fix it to do the cost comparisons fuzzily, and then
it should be possible for it to disregard startup cost when appropriate
without changing the results beyond that.  However making the comparisons
fuzzy will certainly result in some plan changes at the margin, because it
will now let through some paths that it should have let through and
didn't.  Some of those changes will be for the better and others for the
worse.  (I don't have any faith in your conclusion that the patch
as-posted always results in better plans.  There's no reason to believe
that sub-one-percent differences in planner cost estimates will reliably
predict real-world wins.)

What it seems like we should do, if we want to back-patch this, is apply
it without the add_path_precheck changes.  Then as an independent
HEAD-only patch, change add_path_precheck so that it's behaving as
designed.  It looks to me like that will save some planning time in any
case --- changing add_path_precheck to disregard startup cost when
appropriate seems to let it reject a lot more paths than it used to.
        regards, tom lane

Re: nested loop semijoin estimates

From

Robert Haas

Date:

05 June 2015, 12:21:50

On Tue, Jun 2, 2015 at 10:10 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> What it seems like we should do, if we want to back-patch this, is apply
> it without the add_path_precheck changes.  Then as an independent
> HEAD-only patch, change add_path_precheck so that it's behaving as
> designed.  It looks to me like that will save some planning time in any
> case --- changing add_path_precheck to disregard startup cost when
> appropriate seems to let it reject a lot more paths than it used to.

I'd just like to mention that I really appreciate the time and thought
that went into keeping the back-patched portion of this fix narrow.

Thanks!

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: nested loop semijoin estimates

From

Tomas Vondra

Date:

06 June 2015, 20:56:23

FWIW, I've repeated the TPC-DS tests on a much larger data set (50GB) 
today, and I see that

(a) 3f59be836c555fa679bbe0ec76de50a8b5cb23e0 (ANTI/SEMI join costing)    changes nothing - there are some small cost
changes,but only in    plans involving semi/anti-joins (which is expected). Nevertheless,    all the plans remain the
same.

(b) 3b0f77601b9f9f3a2e36a813e4cd32c00e0864d6 (add_path fixes)
    This changes join order in one of the queries, with lots of nested    loops (this is the join order change we've
seenin this thread).    Anyway, this is mostly expected consequence of the add_path changes.
 

So both changes seem fine.

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services