Re: BUG #17540: Prepared statement: PG switches to a generic query plan which is consistently much slower - Mailing list pgsql-bugs
From | Andres Freund |
---|---|
Subject | Re: BUG #17540: Prepared statement: PG switches to a generic query plan which is consistently much slower |
Date | |
Msg-id | 20220707043656.6tug2hfauxbizmi3@awork3.anarazel.de Whole thread Raw |
In response to | Re: BUG #17540: Prepared statement: PG switches to a generic query plan which is consistently much slower (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: BUG #17540: Prepared statement: PG switches to a generic query plan which is consistently much slower
|
List | pgsql-bugs |
Hi, On 2022-07-06 23:13:18 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > I think the cost for the slow plan being so much cheaper can almost be > > qualified as bug. > > The slow plan seems pretty nonsensical to me. ISTM that something in the > > costing there is at least almost broken. > > I think this is probably an instance of the known problem that a generic > plan is made without knowledge of the actual parameter values, and that > can lead us to make statistical assumptions that are not valid for the > actual values, but nonetheless make one plan look cheaper than another > even though the opposite is true given the actual values. In essence, > comparing the cost estimate for the generic plan to the cost estimate > for a custom plan is not really logically valid, because those estimates > are founded on different statistics. I don't know how to fix that :-(. I think there's something more fundamentally wrong - somehow we end up with assuming > 50% selectivity on both the min and the max initplan, for the same condition! And afaics (although it's a bit hard to see with the precision explain prints floating point values as) don't charge cpu_operator_cost / cpu_tuple_cost. And this is on a table where we can know, despite not know the parameter value, that the column being compared has a correlation of 1. In this case the whole generic plan part seems like a red herring. The generic plan is *awful* and would still be awful if the value were known, but somewhere around the middle of the value range. Here's the op's tables + query, but without the prepared statement part: CREATE TABLE relation_tuple_transaction ( id BIGSERIAL NOT NULL UNIQUE, timestamp TIMESTAMP WITHOUT TIME ZONE NOT NULL UNIQUE, CONSTRAINT pk_rttx PRIMARY KEY (id) ); CREATE INDEX ix_relation_tuple_transaction_by_timestamp on relation_tuple_transaction(timestamp); INSERT INTO relation_tuple_transaction(timestamp) SELECT * FROM generate_series ( now() - interval '3 days' , now() , '1 second'::interval) dd ; vacuum freeze analyze; EXPLAIN ANALYZE SELECT MIN(id), MAX(id) from relation_tuple_transaction WHERE timestamp >= (now() - interval '1.5 days'); postgres[631148][1]=# EXPLAIN ANALYZE SELECT MIN(id), MAX(id) from relation_tuple_transaction WHERE timestamp >= (now() -interval '1.5 days');; Result (cost=1.01..1.02 rows=1 width=16) (actual time=113.379..113.381 rows=1 loops=1) InitPlan 1 (returns $0) -> Limit (cost=0.42..0.50 rows=1 width=8) (actual time=113.347..113.348 rows=1 loops=1) -> Index Scan using pk_rttx on relation_tuple_transaction (cost=0.42..10741.45 rows=127009 width=8) (actual time=113.345..113.345rows=1 loops=1) Index Cond: (id IS NOT NULL) Filter: ("timestamp" >= (now() - '1 day 12:00:00'::interval)) Rows Removed by Filter: 129746 InitPlan 2 (returns $1) -> Limit (cost=0.42..0.50 rows=1 width=8) (actual time=0.024..0.024 rows=1 loops=1) -> Index Scan Backward using pk_rttx on relation_tuple_transaction relation_tuple_transaction_1 (cost=0.42..10741.45rows=127009 width=8) (actual time=0.023..0.023 rows=1 loops=1) Index Cond: (id IS NOT NULL) Filter: ("timestamp" >= (now() - '1 day 12:00:00'::interval)) Planning Time: 0.370 ms Execution Time: 113.441 ms (14 rows) We're pretty much by definition scanning half the table via the index scans, and end up with a cost of 1.02 (yes, aware that the paths are costed separately). FWIW, manually writing the min/max as ORDER BY timestamp ASC/DESC LIMIT 1 queries yields a *vastly* better plan: EXPLAIN ANALYZE SELECT (SELECT id FROM relation_tuple_transaction WHERE timestamp >= (now() - interval '1.5 days') ORDERBY timestamp ASC LIMIT 1), (SELECT id FROM relation_tuple_transaction WHERE timestamp >= (now() - interval '1.5 days')ORDER BY timestamp DESC LIMIT 1); Result (cost=0.92..0.93 rows=1 width=16) (actual time=0.110..0.111 rows=1 loops=1) InitPlan 1 (returns $0) -> Limit (cost=0.42..0.46 rows=1 width=16) (actual time=0.079..0.079 rows=1 loops=1) -> Index Scan using ix_relation_tuple_transaction_by_timestamp on relation_tuple_transaction (cost=0.42..4405.46rows=129602 width=16) (actual time=0.077..0.078 rows=1 loops=1) Index Cond: ("timestamp" >= (now() - '1 day 12:00:00'::interval)) InitPlan 2 (returns $1) -> Limit (cost=0.42..0.46 rows=1 width=16) (actual time=0.028..0.028 rows=1 loops=1) -> Index Scan Backward using ix_relation_tuple_transaction_by_timestamp on relation_tuple_transaction relation_tuple_transaction_1 (cost=0.42..4405.46 rows=129602 width=16) (actual time=0.027..0.027 rows=1 loops=1) Index Cond: ("timestamp" >= (now() - '1 day 12:00:00'::interval)) Planning Time: 0.270 ms Execution Time: 0.159 ms (11 rows) And it stays sane even if you add a (redundantly evaluated) AND id IS NOT NULL. EXPLAIN SELECT id FROM relation_tuple_transaction WHERE timestamp >= (now() - interval '1.5 days') AND id IS NOT NULL ORDERBY timestamp ASC LIMIT 1; QUERY PLAN Limit (cost=0.42..0.46 rows=1 width=16) -> Index Scan using ix_relation_tuple_transaction_by_timestamp on relation_tuple_transaction (cost=0.42..4405.46 rows=129602width=16) Index Cond: ("timestamp" >= (now() - '1 day 12:00:00'::interval)) Filter: (id IS NOT NULL) (4 rows) EXPLAIN SELECT min(id) FROM relation_tuple_transaction WHERE timestamp >= (now() - interval '1.5 days'); QUERY PLAN Result (cost=0.50..0.51 rows=1 width=8) InitPlan 1 (returns $0) -> Limit (cost=0.42..0.50 rows=1 width=8) -> Index Scan using pk_rttx on relation_tuple_transaction (cost=0.42..10741.45 rows=129602 width=8) Index Cond: (id IS NOT NULL) Filter: ("timestamp" >= (now() - '1 day 12:00:00'::interval)) (6 rows) Greetings, Andres Freund
pgsql-bugs by date: