Re: nested loop semijoin estimates - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: nested loop semijoin estimates
Date
Msg-id 5569F09C.7090204@2ndquadrant.com
Whole thread Raw
In response to Re: nested loop semijoin estimates  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List pgsql-hackers
On 05/30/15 03:52, Tomas Vondra wrote:
>
>
> On 05/30/15 01:20, Tomas Vondra wrote:
>>
>> Notice the cost - it's way lover than the previous plan (9.2 vs
>> ~111k), yet this plan was not chosen. So either the change broke
>> something (e.g. by violating some optimizer assumption), or maybe
>> there's a bug somewhere else ...
>
> After a bit more investigation, what I think is happening here is
> add_path() does not realize this is a SEMI join and a single tuple is
> enough, and discards the simple indexscan path in favor of the bitmap
> index scan as that seems cheaper when scanning everything.
>
> So not really a bug, but maybe 'consider_startup' would help?

A bit more info - the reason why the IndexPath gets discarded (in favor
of BitmapHeapScan) seems to be twofold:

(a) Semijoin does no imply 'consider_startup' for the inner path, so it
     gets discarded by the BitmapHeapScan, as it has lower total cost.

     I'm not sure where exactly the consider_startup should be set, so
     I've forced that at the beginning on add_path() to see what happens.
     Sadly, this alone is not sufficient, because of the next point.

(b) To actually use startup costs in compare_path_costs_fuzzily, there
     is an additional condition about handling parameterized paths, as
     explained in this comment:

     * This function also includes special hacks to support a policy
     * enforced by its sole caller, add_path(): paths that have any
     * parameterization cannot win comparisons on the grounds of having
     * cheaper startup cost, since we deem only total cost to be of
     * interest for a parameterized path. (Unparameterized paths are
     * more common, so we check for this case last.)

     so the startup_cost comparisons look like this:

     if (consider_startup &&
         path2->startup_cost > path1->startup_cost * fuzz_factor &&
         path1->param_info == NULL)

     Of course, path1 (IndexPath) actually has parameters, which makes
     the whole condition false, and the IndexPath gets discarded :-(

     ISTM the reported test case seems like a counter example to the
     policy, so maybe this should really be relaxed somehow.

To see what happens, I tweaked both (a) and (b) by forcing

     consider_startup = true

at the beginning of add_path(), and removing the (param_info==NULL)
conditions (patch attached). It also includes the run_cost tweak
mentioned in the first message.

I'm not claiming this is the correct fix, the only goal was to see what
happens. And it really does produce the 'right' plan:

                              QUERY PLAN
-----------------------------------------------------------------------
  Nested Loop Semi Join  (cost=0.99..9.20 rows=1 width=74)
    ->  Index Scan using term_facttablename_columnname_idx on term t
          (cost=0.55..8.57 rows=1 width=74)
          Index Cond: (((facttablename)::text =
               'facttable_stat_fta4'::text) AND ((columnname)::text =
               'berechnungsart'::text))
    ->  Index Only Scan using facttable_stat_fta4_berechnungsart_idx on
        facttable_stat_fta4 f (cost=0.43..310525.02 rows=4870496 width=2)
          Index Cond: (berechnungsart = (t.term)::text)
(5 rows)


If I get the comments in pathnode.c about the startup_cost policy, it's
"only" an optimization to reduce the number of parameterized paths, so I
believe this does not break anything. Also, it's no fun if the planning
is a tad faster, but the query runtime is 1000x higher.

Ideas how to relax the policy without significant impact on optimizer?

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: [Proposal] More Vacuum Statistics
Next
From: Robert Haas
Date:
Subject: Re: [CORE] postpone next week's release