Overriding the optimizer - Mailing list pgsql-performance
From | Craig A. James |
---|---|
Subject | Overriding the optimizer |
Date | |
Msg-id | 43A1F6DB.3080805@modgraph-usa.com Whole thread Raw |
In response to | Re: Simple Join (Mark Kirkwood <markir@paradise.net.nz>) |
Responses |
Re: Overriding the optimizer
Re: Overriding the optimizer Re: Overriding the optimizer Re: Overriding the optimizer |
List | pgsql-performance |
I asked a while back if there were any plans to allow developers to override the optimizer's plan and force certain plans,and received a fairly resounding "No". The general feeling I get is that a lot of work has gone into the optimizer,and by God we're going to use it! I think this is just wrong, and I'm curious whether I'm alone in this opinion. Over and over, I see questions posted to this mailing list about execution plans that don't work out well. Many times thereare good answers - add an index, refactor the design, etc. - that yield good results. But, all too often the answercomes down to something like this recent one: > Right on. Some of these "coerced" plans may perform > much better. If so, we can look at tweaking your runtime > config: e.g. > > effective_cache_size > random_page_cost > default_statistics_target > > to see if said plans can be chosen "naturally". I see this over and over. Tweak the parameters to "force" a certain plan, because there's no formal way for a developerto say, "I know the best plan." There isn't a database in the world that is as smart as a developer, or that can have insight into things that only a developercan possibly know. Here's a real-life example that caused me major headaches. It's a trivial query, but Postgrestotally blows it: select * from my_table where row_num >= 50000 and row_num < 100000 and myfunc(foo, bar); How can Postgres possibly know what "myfunc()" does? In this example, my_table is about 10 million rows and row_num is indexed. When the row_num range is less than about 30,000, Postgres (correctly) uses an row_num index scan, then filtersby myfunc(). But beyond that, it chooses a sequential scan, filtering by myfunc(). This is just wrong. Postgrescan't possibly know that myfunc() is VERY expensive. The correct plan would be to switch from index to filteringon row_num. Even if 99% of the database is selected by row_num, it should STILL at least filter by row_num first,and only filter by myfunc() as the very last step. How can a database with no ability to override a plan possibly cope with this? Without the explicit ability to override the plan Postgres generates, these problems dominate our development efforts. Postgresdoes an excellent job optimizing on 90% of the SQL we write, but the last 10% is nearly impossible to get right. We spend huge amounts of time on trial-and-error queries, second guessing Postgress, creating unnecessary temporarytables, sticking in the occasional OFFSET in a subquery to prevent merging layers, and so forth. This same application also runs on Oracle, and although I've cursed Oracle's stupid planner many times, at least I can forceit to do it right if I need to. The danger of forced plans is that inexperienced developers tend to abuse them. So it goes -- the documentation should beclear that forced plans are always a last resort. But there's no getting around the fact that Postgres needs a way for a developer to specify the execution plan. Craig
pgsql-performance by date: