[HACKERS] Linear vs. nonlinear planner cost estimates - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | [HACKERS] Linear vs. nonlinear planner cost estimates |
Date | |
Msg-id | 31065.1481742760@sss.pgh.pa.us Whole thread Raw |
Responses |
Re: [HACKERS] Linear vs. nonlinear planner cost estimates
|
List | pgsql-hackers |
I've been looking into the problem reported here: https://www.postgresql.org/message-id/CAOfJSTwzQ7Fx6Yjeg9mFkMsM5OVKPoa=EgkHceGdkr1TWg8vkA@mail.gmail.com I can reproduce similar misbehavior with this test case: create table updates as select (-log(random()) * 10)::int as driver_id, now() + x * '1 second'::interval as time, random() as junk from generate_series(1,15000000) x; (I've intentionally set up this test data so that "driver_id" is randomly ordered while the "time" column is perfectly in table order. While I don't have direct evidence, it seems pretty plausible that the original problem table might be somewhat like that.) set maintenance_work_mem = '100MB'; set default_statistics_target = 10000; vacuum analyze updates; create index on updates(driver_id, time); explain SELECT * FROM updates WHERE driver_id=1 ORDER BY "time" DESC LIMIT 1; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------Limit (cost=0.56..0.73 rows=1 width=20) -> Index Scan Backward using updates_driver_id_time_idx on updates (cost=0.56..470495.61rows=2750660 width=20) Index Cond: (driver_id = 1) (3 rows) That's all good, but if we create this less-well-adapted index, the planner prefers it: create index on updates(time); explain SELECT * FROM updates WHERE driver_id=1 ORDER BY "time" DESC LIMIT 1; QUERY PLAN -----------------------------------------------------------------------------------------------------------Limit (cost=0.43..0.62rows=1 width=20) -> Index Scan Backward using updates_time_idx on updates (cost=0.43..522569.43 rows=2750660width=20) Filter: (driver_id = 1) (3 rows) Why did that happen? The filter can be expected to reject about four out of every five tuples, so this plan should be expected to cost about five times as much to get the first row. (If the driver_id values aren't randomly located, things could be far worse; but the point here is that even under the planner's assumption of uncorrelated row order, this should not be considered a preferable plan.) I think the reason is that, because the planner knows that using this index will require a full-index scan, it's costing the updates_time_idx plan on the basis of a full scan. Moreover, that index is perfectly correlated with the physical table order, so it gets a low cost per fetch. Evidently, the cost is coming out at about 522569/15000000 or 0.035 cost units per tuple fetched. Meanwhile, the better plan is being costed with the knowledge that it fetches only one-fifth of the table; and we also see that that index has zero ordering correlation with the table. So the cost per tuple fetched is coming out at about 0.171, which is enough more that even having to fetch five tuples instead of one makes the poorer plan look cheaper. This is, of course, nonsense; we're costing the plans on the basis of fetching many more tuples than will actually happen, and the cost amortization effects that cost_index is accounting for will not happen. So in short, we're taking a fundamentally nonlinear cost estimate from cost_index and then scaling it linearly to try to account for the LIMIT. I've known that this was a theoretical hazard of the way we do LIMIT costing for some time (I recall bitching about it in my 2011 PGCon talk). But this is the first case I've seen where it results in a wrong choice in a relatively simple query. I don't have a concrete proposal right now about how to fix this. The most expansive response would be to decorate every path with an explicitly nonlinear cost function, which would need to be able to report the cost to fetch the first N tuples rather than the total scan cost. That would be a lot of work though. Maybe we could make this work just by setting the startup cost of an indexscan differently. We could redefine "startup cost" as "cost to fetch the first tuple and then stop", and compute that independently of the total cost for plan types where it'd matter. That would fix the problem for LIMIT 1 cases, and even for cases with a larger limit, scaling linearly from that to the total cost would produce a better estimate than what we get now. But it feels like it's still a band-aid. Thoughts? regards, tom lane
pgsql-hackers by date: