Re: nested loop semijoin estimates - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: nested loop semijoin estimates
Date
Msg-id 556E0625.30802@2ndquadrant.com
Whole thread Raw
In response to Re: nested loop semijoin estimates  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: nested loop semijoin estimates  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers

On 06/02/15 17:50, Tom Lane wrote:
> Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
>> On 06/02/15 16:37, Tom Lane wrote:
>>> It's possible that the change was due to random variation in ANALYZE
>>> statistics, in which case it was just luck.
>
>> I don't think so. I simply loaded the data, ran ANALYZE, and then simply
>> started either master or patched master. There should be no difference
>> in statistics, I believe. Also, the plans contain pretty much the same
>> row counts, but the costs differ.
>
>> For example look at the 'cs_ui' CTE, right at the beginning of the
>> analyze logs. The row counts are exactly the same, but the costs are
>> different. And it's not using semijoins or not nested loops ...
>
> The cost estimates in that CTE all look exactly the same to me, and the
> actual runtime's not all that different either.  The key runtime
> difference in the first query seems to be in the first couple of joins
> to the cs_ui CTE:

D'oh! I was looking at the timings, not costs. EINOTENOUGHCAFFEINE ...


> Those are the exact same plans, and same cost estimates, but for
> some reason the master run is 2X slower. The only explanation I can
> think of is disk caching effects, or maybe hint-bit setting during
> the first run. It's certainly not the planner that deserves either
> credit or blame for that.

Yes, seems like that.


> The part of this plan that's actually different is a different join
> order sequence for a lot of follow-on joins that are expected to get
> single rows from their other table. I think that's basically
> irrelevant really, but again, this patch should not have changed
> anything there, since those were plain joins not semijoins.

Yes, it's interesting. I've repeated the tests on 1GB TPC-DS dataset,
and this time I got slightly different set of plan changes. I don't see
the change in join order, for example.

Attached is the whole explain output (all 63 queries), explain.master.1
and explain.patched.1.

The are two kinds of plan changes:

1) minor cost changes (these queries contain semijoins, so it's
    expected), but the plan is exactly the same

2) queries with nested loop replaced by hash join, i.e. plan on master
    looks like this:

      ->  Nested Loop
            ->  Nested Loop
                  ->  Nested Loop
                        ->  Seq Scan
                        ->  Index Scan
                  ->  Index Scan
            ->  Materialize
                  ->  Seq Scan

     while with the patches:

      ->  Hash Join
            ->  Hash Join
                  ->  Nested Loop
                        ->  Seq Scan
                        ->  Index Scan
                  ->  Hash
                        ->  Seq Scan
            ->  Hash
                  ->  Seq Scan

     This is slightly surprising, because those queries do not contain
     any semijoins. Detailed explain analyze for the three queries with
     different plans is attached.

At the end, I repeated the explain on master, and got exactly the same
results as at the beginning (same as explain.master.1) - so there's no
variability in statistics at all.

Another interesting thing is that these plan changes only happen after
the second patch, i.e. with nestloop-semijoin-costing-fix everything is
exactly as on master, but with consider-startup-costs-for-semijoins the
plans change.

I suspected the problem is somewhere in compare_path_costs_fuzzily() but
that doesn't seem to be the case - I've however noticed that the number
of calls to that function changes with the patch.

For example when running the query in tpcds-3.log, compare_fuzzily os
called ~1050x, but with the patch attached it's called only ~730x. I've
checked backtraces for a few of the paths that are not considered with
the patch, and all of them look like this (detailed backtrace attached):

#3  0x0000000000656cd5 in compare_path_costs_fuzzily at pathnode.c:154
#4  0x0000000000657146 in add_path at pathnode.c:438
#5  0x0000000000631bfb in try_nestloop_path at joinpath.c:347
#6  0x0000000000632931 in match_unsorted_outer at joinpath.c:900
#7  add_paths_to_joinrel at joinpath.c:225

That particular part of try_nestloop_path looks like this:

     if (add_path_precheck(joinrel,
                           workspace.startup_cost, workspace.total_cost,
                           pathkeys, required_outer))
     {
         add_path(joinrel, (Path *)
                  create_nestloop_path(...);
     }

I wouldn't be surprised if the problem was in add_path_precheck.
Actually, I'm quite sure that's where the problem is - particularly in
how required_outer was replaced with (!consider_startup):

    consider_startup
       = required_outer ? consider_param_startup : consider_startup;

If both _startup flags are false, the original required_outer value is
lost, and that changes the if() condition which was changed from

    if (startup_cost >= old_path->startup_cost || required_outer)

to

    if (startup_cost >= old_path->startup_cost || !consider_startup)

So, if required_outer=false and both p->_startup=false, we get
consider_startup=false irrespectedly of the required_outer value, so

     (!consider_startupe) != required_outer

so that the conditions are not equivalent. And indeed, by reverting the
if condition to the previous form, we get the same plans as on master.

I don't know whether this is correct behavior or not, but in all three
cases I've observed on TPC-DS, the new plans performed better

                    Q1              Q2             Q3
   --------------------------------------------------------------------
   master       518.330 ms      466.539 ms      560.585 ms
   patched      405.310 ms      340.506 ms      122.072 ms

This time I've been careful not to produce bogus numbers (also, the
explain analyze timing I mentioned in previous post were significantly
influenced by timing overhead). These results are quite stable.

A few more comments on the patch:

1) Shouldn't the CONSIDER_PATH_STARTUP_COST(p) be defined rather like
    this? I mean, if it's a parametric query, use any of the flags?
    And same in add_path_precheck()?

     #define CONSIDER_PATH_STARTUP_COST(p)  \
         ((p)->param_info == NULL ? (p)->parent->consider_startup :
          ((p)->parent->consider_param_startup ||
           (p)->parent->consider_startup))

2) Is it really possible to get different values for the startup
    flags on path1 and path2 in compare_path_costs_fuzzily()? If not
    (if I get it right, path1->parent == path2->parent anyway), then
    maybe passing that as a function parameter (just like before)
    would be better. The macro probably won't be as nice, but I find
    it easier to understand.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: checkpointer continuous flushing
Next
From: Tom Lane
Date:
Subject: Re: nested loop semijoin estimates