Re: Avoiding bad prepared-statement plans. - Mailing list pgsql-hackers

From Richard Huxton
Subject Re: Avoiding bad prepared-statement plans.
Date
Msg-id 4B717525.2010907@archonet.com
Whole thread Raw
In response to Re: Avoiding bad prepared-statement plans.  (Jeroen Vermeulen <jtv@xs4all.nl>)
List pgsql-hackers
On 09/02/10 14:25, Jeroen Vermeulen wrote:
> Richard Huxton wrote:
>
>>> = Actual-cost threshold =
>>>
>>> Also stop using the generic plan if the statement takes a long time to
>>> run in practice.
>>
>> Do you mean:
>> 1. Rollback the current query and start again
>> 2. Mark the plan as a bad one and plan again next execute
>>
>> If you can figure out how to do #1 then you could probably do it for
>> all queries, but I'm guessing it's far from simple to implement.
>
> I'm talking about #2. As a matter of fact #1 did come up in one of those
> discussions, but how do you know you're not killing the query juuust
> before it'd done, and then maybe executing a different plan that's no
> better?

Ah, you'd need to be smarter when planning and also remember the 
expected rows from each node. That way if your (index driven) inner node 
was expecting 3 rows you could mark it to force a cancellation if it 
returns (say) 30 or more. You'd allow more slack in later processing and 
less slack earlier on where a bad estimate can explode the final number 
of rows.

Or, there is always the case where we reverse-search an index to find 
the last 10 messages in a group say, but the particular group in 
question hasn't had a comment for months, so you trawl half the table. 
People regularly get bitten by that, and there's not much to be done 
about it. If we could abort when it looks like we're in worst-case 
rather than best-case scenarios then it would be one less thing for 
users to worry about.

>>> = Plan refresh =
>>>
>>> Periodically re-plan prepared statements on EXECUTE. This is also a
>>> chance for queries that were being re-planned every time to go back to a
>>> generic plan.
>>
>> Presumably some score based on update stats and vacuum activity etc.
>
> I was thinking of something very simple: re-do whatever we'd do if the
> statement were only being prepared at that point.

Yes, I thought so, the scoring was for *when* to decide to cancel the 
old plan. I suppose total query-time would be another way to decide this 
plan needs reworking.

>> The good side of all these ideas is good indeed. The bad side is plan
>> instability. Someone somewhere will have a generic plan that turns out
>> better than the specific plan (due to bad stats or config settings or
>> just planner limitations). The question is (I guess): How many more
>> winners will there be than losers?
>
> That's a good and surprising point, and therefore I'd like to draw
> attention away to a different point. :-)
>
> Yes, there will be losers in the sense that people may have optimized
> their use of prepared statements to whatever the current planner does.
> Maybe somebody out there even deliberately uses them to trick the
> planner into a different plan. But that is always going to happen; we're
> aiming for better plans, not for giving more detailed control over them.
> If you really can't take a change, don't upgrade.
>
> The competing point is: people out there may currently be forgoing
> prepared statements entirely because of erratic performance. To those
> people, if we can help them, it's like having a new feature.

Oh, I'm persuaded, but that doesn't really get you anywhere :-)

--   Richard Huxton  Archonet Ltd


pgsql-hackers by date:

Previous
From: Magnus Hagander
Date:
Subject: Re: About psycopg2 (by its author)
Next
From: Mark Mielke
Date:
Subject: Re: Avoiding bad prepared-statement plans.