Re: Disabling options lowers the estimated cost of a query - Mailing list pgsql-performance

From Tomas Vondra
Subject Re: Disabling options lowers the estimated cost of a query
Date
Msg-id 24ad0121-e8df-14cb-df97-8601a7a6039b@enterprisedb.com
Whole thread Raw
Responses Re: Disabling options lowers the estimated cost of a query
List pgsql-performance
On 2/26/21 4:00 AM, Tom Lane wrote:
> Arne Roland <A.Roland@index.de> writes:
>> I want to examine the exhaustive search and not the geqo here. I'd
>> expect the exhaustive search to give the plan with the lowest cost,
>> but apparently it doesn't. I have found a few dozen different
>> querys where that isn't the case. I attached one straight forward
>> example. For the join of two partitions a row first approach would
>> have been reasonable.
> 
> Hmm.  While the search should be exhaustive, there are pretty
> aggressive pruning heuristics (mostly in and around add_path()) that
> can cause us to drop paths that don't seem to be enough better than
> other alternatives. I suspect that the seqscan plan may have beaten
> out the other one at some earlier stage that didn't think that the
> startup-cost advantage was sufficient reason to keep it.
> 
> It's also possible that you've found a bug.  I notice that both plans
> are using incremental sort, which has been, um, rather buggy. Hard to
> tell without a concrete test case to poke at.
> 

Well, it's true incremental sort was not exactly bug free. But I very
much doubt it's causing this issue, for two reasons:

(a) It's trivial to simplify the reproducer further, so that there are
no incremental sort nodes. See the attached script, which has just a
single partition.

(b) The incremental sort patch does not really tweak the costing or
add_path in ways that would break this.

(c) PostgreSQL 12 has the same issue.


It seems the whole problem is in generate_orderedappend_paths(), which
simply considers two cases - paths with minimal startup cost and paths
with minimal total costs. But with LIMIT that does not work, of course.

With the simplified reproducer, I get these two plans:


                               QUERY PLAN
  -------------------------------------------------------------------
   Limit  (cost=9748.11..10228.11 rows=10000 width=8)
     ->  Merge Left Join (cost=9748.11..14548.11 rows=100000 width=8)
           Merge Cond: (a.id = b.id)
           ->  Index Only Scan Backward using a0_pkey on a0 a
               (cost=0.29..3050.29 rows=100000 width=8)
           ->  Sort  (cost=9747.82..9997.82 rows=100000 width=8)
                 Sort Key: b.id DESC
                 ->  Seq Scan on b0 b
                     (cost=0.00..1443.00 rows=100000 width=8)
  (7 rows)

                               QUERY PLAN
  -------------------------------------------------------------------
   Limit  (cost=0.58..3793.16 rows=10000 width=8)
     ->  Nested Loop Left Join (cost=0.58..37926.29 rows=100000 ...)
           ->  Index Only Scan Backward using a0_pkey on a0 a
               (cost=0.29..3050.29 rows=100000 width=8)
           ->  Index Only Scan using b0_pkey on b0 b
               (cost=0.29..0.34 rows=1 width=8)
                 Index Cond: (id = a.id)
  (5 rows)


The reason is quite simple - we get multiple join paths for each child
(not visible in the plans, because there's just a single partition),
with these costs:

    A: nestloop_path   startup 0.585000    total 35708.292500
    B: nestloop_path   startup 0.292500    total 150004297.292500
    C: mergejoin_path  startup 9748.112737 total 14102.092737

The one we'd like is the nestloop (A), and with disabled partition-wise
join that's what we pick. But generate_orderedappend_paths calls
get_cheapest_path_for_pathkeys for startup/total cost, and gets the two
other paths. Clearly, nestlop (B) is pretty terrible for LIMIT, because
of the high total cost, and mergejoin (C) is what we end up with.

Not sure how to fix this without making generate_orderedappend_paths way
more complicated ...


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

pgsql-performance by date:

Previous
From: Arne Roland
Date:
Subject: Re: Disabling options lowers the estimated cost of a query
Next
From: "Daniel Westermann (DWE)"
Date:
Subject: Re: Strange behavior once statistics are there