Re: Delaying the planning of unnamed statements until Bind - Mailing list pgsql-hackers
From | Heikki Linnakangas |
---|---|
Subject | Re: Delaying the planning of unnamed statements until Bind |
Date | |
Msg-id | Pine.OSF.4.58.0405291601540.450948@kosh.hut.fi Whole thread Raw |
In response to | Re: Delaying the planning of unnamed statements until Bind (Greg Stark <gsstark@mit.edu>) |
Responses |
Re: Delaying the planning of unnamed statements until Bind
|
List | pgsql-hackers |
On Sat, 22 May 2004, Greg Stark wrote: > Tom Lane <tgl@sss.pgh.pa.us> writes: > > > I think it's wishful thinking to assume that picking an indexscan is the > > right thing when we don't know any better. > > If we don't know any better then any solution is going to be wishful thinking. > It's kind of like a bridge game. If you don't know where the card is but you > need it somewhere in order to win, then you have to assume it's there and hope > for the best. If the index is wrong then the query was going to be slow either > way. If the index was right and you chose not to use it you're risking making > it slow unnecessarily and potentially when it was absoluetely required to be > fast. Could we try to plan for both cases? How about calculating the cutoff point where the seqscan becomes faster than indexscan, creating a plan with both paths, and picking which path to take at execute time? More generally, keep *all* paths that make sense with *some* combination of parameter values, and determine rules on which path to use with which parameter values. For example, if the query plan looks currently like this: template1=# prepare b (int) AS SELECT * FROM foo WHERE bar > $1 ORDER BY bar*10; PREPARE template1=# explain EXECUTE b (2); QUERY PLAN -------------------------------------------------------------------------Sort (cost=19.71..20.28 rows=230 width=4) SortKey: (bar * 10) -> Index Scan using fooo on foo (cost=0.00..10.69 rows=230 width=4) Index Cond: (bar > $1) (4 rows) It would become something like this: template1=# prepare b (int) AS SELECT * FROM foo WHERE bar > $1 ORDER BY bar*10; PREPARE template1=# explain EXECUTE b (2); QUERY PLAN -------------------------------------------------------------------------Sort (cost=19.71..20.28 rows=230 width=4) SortKey: (bar * 10) if $1 > 400 then -> Index Scan using fooo on foo (cost=0.00..10.69 rows=230 width=4) IndexCond: (bar > $1) else -> Seq Scan on foo (cost=0.00..14.17 rows=629 width=4) Filter: (bar > 100) This means that execute stage would look at $1, and choose the seq scan if it's > 400, otherwise use the seq scan. I haven't looked at the planner code, I don't know how hard it would be to implement, but I think it's something to consider. Until we figure how to do the above, I think the plan-on-execute mode would be very handy. However, it should not be on by default, IMHO. I'm worried about plan stability if we enable it by default. It could lead to nasty, hard to reproduce performance problems. Think about this scenario: A long-running server application prepares the query "SELECT * FROM foo WHERE timestamp > $1". 99% of the transactions that use the prepared query are online transactions that need to be very quick. They use parameter values very close to now(), and should do an indexscan. The rest are reports, running the same query with a parameter value of now() - 1 month. The reports should optimally use seqscan, but the slowness of indexscan is acceptable too. The application goes down every night for maintenance purposes, and is restarted in the morning. If the first transaction in the morning happens to be a report, all the following online transactions will use a seqscan, and become veeery slow. The operator gets angry phone calls, and reboots the system. Everything starts to work ok. Also keep in mind that at least some application servers have a client-side prepared statement cache, so that even if the application used a non-prepared statement for the reports, the middleware prepares it anyway. - Heikki
pgsql-hackers by date: