Overriding the optimizer - Mailing list pgsql-performance

From Craig A. James
Subject Overriding the optimizer
Date
Msg-id 43A1F6DB.3080805@modgraph-usa.com
Whole thread Raw
In response to Re: Simple Join  (Mark Kirkwood <markir@paradise.net.nz>)
Responses Re: Overriding the optimizer
Re: Overriding the optimizer
Re: Overriding the optimizer
Re: Overriding the optimizer
List pgsql-performance
I asked a while back if there were any plans to allow developers to override the optimizer's plan and force certain
plans,and received a fairly resounding "No".  The general feeling I get is that a lot of work has gone into the
optimizer,and by God we're going to use it! 

I think this is just wrong, and I'm curious whether I'm alone in this opinion.

Over and over, I see questions posted to this mailing list about execution plans that don't work out well.  Many times
thereare good answers - add an index, refactor the design, etc. - that yield good results.  But, all too often the
answercomes down to something like this recent one: 

   > Right on. Some of these "coerced" plans may perform
   > much better. If so, we can look at tweaking your runtime
   > config: e.g.
   >
   > effective_cache_size
   > random_page_cost
   > default_statistics_target
   >
   > to see if said plans can be chosen "naturally".

I see this over and over.  Tweak the parameters to "force" a certain plan, because there's no formal way for a
developerto say, "I know the best plan." 

There isn't a database in the world that is as smart as a developer, or that can have insight into things that only a
developercan possibly know.  Here's a real-life example that caused me major headaches.  It's a trivial query, but
Postgrestotally blows it: 

    select * from my_table
     where row_num >= 50000 and row_num < 100000
     and myfunc(foo, bar);

How can Postgres possibly know what "myfunc()" does?  In this example, my_table is about 10 million rows and row_num is
indexed. When the row_num range is less than about 30,000, Postgres (correctly) uses an row_num index scan, then
filtersby myfunc().  But beyond that, it chooses a sequential scan, filtering by myfunc().  This is just wrong.
Postgrescan't possibly know that myfunc() is VERY expensive.  The correct plan would be to switch from index to
filteringon row_num.  Even if 99% of the database is selected by row_num, it should STILL at least filter by row_num
first,and only filter by myfunc() as the very last step. 

How can a database with no ability to override a plan possibly cope with this?

Without the explicit ability to override the plan Postgres generates, these problems dominate our development efforts.
Postgresdoes an excellent job optimizing on 90% of the SQL we write, but the last 10% is nearly impossible to get
right. We spend huge amounts of time on trial-and-error queries, second guessing Postgress, creating unnecessary
temporarytables, sticking in the occasional OFFSET in a subquery to prevent merging layers, and so forth. 

This same application also runs on Oracle, and although I've cursed Oracle's stupid planner many times, at least I can
forceit to do it right if I need to. 

The danger of forced plans is that inexperienced developers tend to abuse them.  So it goes -- the documentation should
beclear that forced plans are always a last resort.   

But there's no getting around the fact that Postgres needs a way for a developer to specify the execution plan.

Craig


pgsql-performance by date:

Previous
From: "Luke Lonergan"
Date:
Subject: Re: SAN/NAS options
Next
From: Michael Fuhr
Date:
Subject: Re: How much expensive are row level statistics?