Re: Less selective index chosen unexpectedly - Mailing list pgsql-bugs

From Arne Roland
Subject Re: Less selective index chosen unexpectedly
Date
Msg-id 6172fc42175b46a88eb3c9e08c253f1a@index.de
Whole thread Raw
In response to Re: Less selective index chosen unexpectedly  (David Rowley <dgrowleyml@gmail.com>)
List pgsql-bugs

David Rowley <dgrowleyml@gmail.com> wrote:
> The planner does often take plenty of other risks. The situation you
> describe here is just one of many.  Another example is that if we do a
> join on say 3 or more conditions that those conditions could be
> correlated. We currently always assume they're independent and just
> multiply individual selectivities. We could end up thinking that the
> join will produce far fewer rows than it actually will.  Problems can
> arise in subsequent joins. We may end up doing a subsequent join using
> a nested loop thinking that only 1 row will be produced from the first
> join.  That is likely to turn bad if the innermost join instead
> produces millions of rows.

The n_distinct case is another especially hard one. I go around and set those values by hand all the time. Or how about the gazillion of fallback cases where the planner gives up estimating any selectivity and just pulls up another constant?
We are still living in a world where even the simplest function could cause one of these "I don't know the selectivity...let's say X0%"-cases.
And I am still surprised how well these broad estimates tend to work.
While we progressed a lot on that front, and I am grateful for all these folks who worked on the planner, the general problem is way to hard.
Getting to a point where I as a human cannot easily find a case where I can say "Ugh, that cliff is dangerous, don't go ther..." seems impossible to me.

The underlying problem here is more severe. We make systematic errors and the more complex the query gets the more likely it is that we will find a cliff to jump.
It's not just that we stumble upon one, no we actively search for it. Thinking about predicate pushing: That stuff can alter the estimated rows of a subquery.
From an optimizing standpoint that's a very scary thought. It tells me how fragile my models are.

To be honest with you here, the simple one relation index case doesn't interest me.
I have confirmed again and again, that the planner is better than me picking the correct index than I am. It wins in more than 80 % of the cases.
We always notice this one case where it spectacularly doesn't.
But if benchmarking has taught me anything, it's how much better on average the optimizer is at choosing a good index than I am.
And if get more indexes, the speed disparity tends to get bigger. It's not actively searching for a cliff. It might be, that I am just terrible at choosing from 50+ indexes. But I have learned, that I not that great at estimating whether using the more specific index is indeed the less risky thing to do. Spoiler alert: Reading the more specific and rarely used index from disc is an actual risk for some workloads.
That's different from join orders, where it's trying hard to shot itself into the foot. Or these limit cases where it lacks better statistics/knowledge of the problem.

I am at a loss what to do about this. I suspect there isn’t one solution.
I dislike the idea of a risk score though. In part because of the sheer insane amount of complexity that would be adding to the beast the planner already is.
But I also suspect, there is no way to reach sensible consensus on what is indeed risky, because it depends on the kind of querys and workloads you have.

Peter Geoghegan <pg@bowt.ie> wrote:
> I wonder if there is a more general design that incorporates changes
> over time. That is, a design that has ANALYZE store old statistics for
> a table in order to give the system (and the DBA) a sense of how
> things change over time. This could enable autoanalyze behavior that
> more or less settles on an optimal frequency between ANALYZE
> operations -- frequent enough to get stable statistics with some
> margin of error, but not too frequent.

I feel like the current design mainly views the analyze as a little addition to the autovacuum. And fwiw syncing them up, seems worthwhile to me in most cases.
I don't want to think to much about combining different parts into an incremental statistic. Even for MVC that seems impossible to do.

I find that idea interesting though. In particular because it shows potential to get rid of a lot of analyze scripts I have running at timed intervals, some of which definitely often run more frequently than necessary.
Do you have a particular idea in mind to predict future changes? Forecasting problems are impossible to solve reliably.
If it breaks in some weird high velocity cases, we could always allow for a configuration to switch back to the old behavior.
Every change could ruin somebody's day. But to beat the simple heuristic of y% of rows in most cases, doesn't sound like rocket science. We can always drop in more complex ai models, once we have figured out what breaks with some test databases.
What are your thoughts on this?

Regards
Arne


pgsql-bugs by date:

Previous
From: Herwig Goemans
Date:
Subject: Re: BUG #16976: server crash when deleting via a trigger on a foreign table
Next
From: Vitaly Ustinov
Date:
Subject: Re: Generated column is not updated (Postgres 13)