Re: Damage control for planner's get_actual_variable_endpoint() runaway - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Damage control for planner's get_actual_variable_endpoint() runaway |
Date | |
Msg-id | CA+TgmoYNPP4NgwhuPeL1Xb0QCaNYQ4B_z4DH5sEXVXVEob_xhA@mail.gmail.com Whole thread Raw |
In response to | Re: Damage control for planner's get_actual_variable_endpoint() runaway (Alvaro Herrera <alvherre@alvh.no-ip.org>) |
Responses |
Re: Damage control for planner's get_actual_variable_endpoint() runaway
|
List | pgsql-hackers |
On Mon, Nov 21, 2022 at 7:22 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > On 2022-Nov-21, Jakub Wartak wrote: > > b) I was wondering about creating a new wait class "Planner" with the > > event "ReadingMinMaxIndex" (or similar). The obvious drawback is the > > naming/categorization as wait events are ... well.. as the name "WAIT" > > implies, while those btree lookups could easily be ON-CPU activity. > > I think we should definitely do this, regardless of any other fixes we > add, so that this condition can be identified more easily. I wonder if > we can backpatch it safely. I don't think this is safe at all. Wait events can only bracket individual operations, like the reads of the individual index blocks, not report on which phase of a larger operation is in progress. If we try to make them do the latter, we will have a hot mess on our hands. It might not be a bad capability to have, but it's a different system. But that doesn't mean we can't do anything about this problem, and I think we should do something about this problem. It's completely crazy that after this many years of people getting hosed by this, we haven't taken more than half measures to fix the problem. I think commit 3ca930fc39ccf987c1c22fd04a1e7463b5dd0dfd was the last time we poked at this, and before that there was commit fccebe421d0c410e6378fb281419442c84759213, but neither of those prevented us from scanning an unbounded number of index tuples before finding one that we're willing to use, as the commit messages pretty much admit. What we need is a solution that avoids reading an unbounded number of tuples under any circumstances. I previously suggested using SnapshotAny here, but Tom didn't like that. I'm not sure if there are safety issues there or if Tom was just concerned about the results being misleading. Either way, maybe there's some variant on that theme that could work. For instance, could we teach the index scan to stop if the first 100 tuples that it finds are all invisible? Or to reach at most 1 page, or at most 10 pages, or something? If we don't find a match, we could either try to use a dead tuple, or we could just return false which, I think, would end up using the value from pg_statistic rather than any updated value. That is of course not a great outcome, but it is WAY WAY BETTER than spending OVER AN HOUR looking for a more suitable tuple, as Jakub describes having seen on a production system. I really can't understand why this is even mildly controversial. What exactly to do here may be debatable, but the idea that it's OK to spend an unbounded amount of resources during any phase of planning is clearly wrong. We can say that at the time we wrote the code we didn't know it was going to actually ever happen, and that is fine and true. But there have been multiple reports of this over the years and we know *for sure* that spending totally unreasonable amounts of time inside this function is a real-world problem that actually brings down production systems. Unless and until we find a way of putting a tight bound on the amount of effort that can be expended here, that's going to keep happening. -- Robert Haas EDB: http://www.enterprisedb.com
pgsql-hackers by date: