Re: Disabling options lowers the estimated cost of a query - Mailing list pgsql-performance
From | Tomas Vondra |
---|---|
Subject | Re: Disabling options lowers the estimated cost of a query |
Date | |
Msg-id | 24ad0121-e8df-14cb-df97-8601a7a6039b@enterprisedb.com Whole thread Raw |
Responses |
Re: Disabling options lowers the estimated cost of a query
|
List | pgsql-performance |
On 2/26/21 4:00 AM, Tom Lane wrote: > Arne Roland <A.Roland@index.de> writes: >> I want to examine the exhaustive search and not the geqo here. I'd >> expect the exhaustive search to give the plan with the lowest cost, >> but apparently it doesn't. I have found a few dozen different >> querys where that isn't the case. I attached one straight forward >> example. For the join of two partitions a row first approach would >> have been reasonable. > > Hmm. While the search should be exhaustive, there are pretty > aggressive pruning heuristics (mostly in and around add_path()) that > can cause us to drop paths that don't seem to be enough better than > other alternatives. I suspect that the seqscan plan may have beaten > out the other one at some earlier stage that didn't think that the > startup-cost advantage was sufficient reason to keep it. > > It's also possible that you've found a bug. I notice that both plans > are using incremental sort, which has been, um, rather buggy. Hard to > tell without a concrete test case to poke at. > Well, it's true incremental sort was not exactly bug free. But I very much doubt it's causing this issue, for two reasons: (a) It's trivial to simplify the reproducer further, so that there are no incremental sort nodes. See the attached script, which has just a single partition. (b) The incremental sort patch does not really tweak the costing or add_path in ways that would break this. (c) PostgreSQL 12 has the same issue. It seems the whole problem is in generate_orderedappend_paths(), which simply considers two cases - paths with minimal startup cost and paths with minimal total costs. But with LIMIT that does not work, of course. With the simplified reproducer, I get these two plans: QUERY PLAN ------------------------------------------------------------------- Limit (cost=9748.11..10228.11 rows=10000 width=8) -> Merge Left Join (cost=9748.11..14548.11 rows=100000 width=8) Merge Cond: (a.id = b.id) -> Index Only Scan Backward using a0_pkey on a0 a (cost=0.29..3050.29 rows=100000 width=8) -> Sort (cost=9747.82..9997.82 rows=100000 width=8) Sort Key: b.id DESC -> Seq Scan on b0 b (cost=0.00..1443.00 rows=100000 width=8) (7 rows) QUERY PLAN ------------------------------------------------------------------- Limit (cost=0.58..3793.16 rows=10000 width=8) -> Nested Loop Left Join (cost=0.58..37926.29 rows=100000 ...) -> Index Only Scan Backward using a0_pkey on a0 a (cost=0.29..3050.29 rows=100000 width=8) -> Index Only Scan using b0_pkey on b0 b (cost=0.29..0.34 rows=1 width=8) Index Cond: (id = a.id) (5 rows) The reason is quite simple - we get multiple join paths for each child (not visible in the plans, because there's just a single partition), with these costs: A: nestloop_path startup 0.585000 total 35708.292500 B: nestloop_path startup 0.292500 total 150004297.292500 C: mergejoin_path startup 9748.112737 total 14102.092737 The one we'd like is the nestloop (A), and with disabled partition-wise join that's what we pick. But generate_orderedappend_paths calls get_cheapest_path_for_pathkeys for startup/total cost, and gets the two other paths. Clearly, nestlop (B) is pretty terrible for LIMIT, because of the high total cost, and mergejoin (C) is what we end up with. Not sure how to fix this without making generate_orderedappend_paths way more complicated ... regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
pgsql-performance by date: