Re: Simple join optimized badly? - Mailing list pgsql-performance

From Craig A. James
Subject Re: Simple join optimized badly?
Date
Msg-id 4529A0CA.8000904@modgraph-usa.com
Whole thread Raw
In response to Re: Simple join optimized badly?  (Josh Berkus <josh@agliodbs.com>)
Responses Re: Simple join optimized badly?
List pgsql-performance
> ... and add 100 other problems.  Hints are used because the DBA thinks that
> they are smarter than the optimizer; 99% of the time, they are wrong.
> Just try manually optimizing a complex query, you'll see -- with three
> join types, several scan types, aggregates, bitmaps, internal and external
> sorts, and the ability to collapse subqueries it's significantly more than
> a human can figure out accurately.

Sorry, this is just wrong, wrong, wrong.

I've heard this from several PG developers every time hints have come up in my roughly eighteen months as a PG
applicationdeveloper.  And in between every assertion that "the application programmers aren't as smart as the
optimizer",there are a dozen or two examples where posters to this list are told to increase this setting, decrease
thatone, adjust these other two, and the end result is to get the plan that the application programmer -- AND the PG
professionals-- knew was the right plan to start with. 

People are smarter than computers.  Period.

Now I'll agree that the majority, perhaps the great majority, of questions to this group should NOT be solved with
hints. You're absolutely right that in most cases hints are a really bad idea.  People will resort to hints when they
shouldbe learning better ways to craft SQL, and when they should have read the configuration guides. 

But that doesn't alter the fact that many, perhaps most, complicated application will, sooner or later, run into a
showstoppercase where PG just optimizes wrong, and there's not a damned thing the app programmer can do about it. 

My example, discussed previously in this forum, is a classic.  I have a VERY expensive function (it's in the class of
NP-completeproblems, so there is no faster way to do it).  There is no circumstance when my function should be used as
afilter, and no circumstance when it should be done before a join.  But PG has no way of knowing the cost of a
function,and so the optimizer assigns the same cost to every function.  Big disaster. 

The result?  I can't use my function in any WHERE clause that involves any other conditions or joins.  Only by itself.
PGwill occasionally decide to use my function as a filter instead of doing the join or the other WHERE conditions
first,and I'm dead. 

The interesting thing is that PG works pretty well for me on big tables -- it does the join first, then applies my
expensivefunctions.  But with a SMALL (like 50K rows) table, it applies my function first, then does the join.  A
searchthat completes in 1 second on a 5,000,000 row database can take a minute or more on a 50,000 row database. 

Instead, I have to separate the WHERE terms into two SQL statements, and do the join myself.  I do the first half of my
query,suck it all into memory, do the second half, suck it into memory, build a hash table and join the two lists in
memory,then take the joined results and apply my function to it. 

This is not how a relational database should work.  It shouldn't fall over dead just when a table's size SHRINKS beyond
somethreshold that causes the planner to switch to a poor plan. 

Since these tables are all in the same database, adjusting configuration parameters doesn't help me.  And I suppose I
coulduse SET to disable various plans, but how is that any different from a HINT feature? 

Now you might argue that function-cost needs to be added to the optimizer's arsenal of tricks.  And I'd agree with you:
ThatWOULD be a better solution than hints.  But I need my problem solved TODAY, not next year.  Hints can help solve
problemsNOW that can be brought to the PG team's attention later, and in the mean time let me get my application to
work.

Sorry if I seem particularly hot under the collar on this one.  I think you PG designers have created a wonderful
product. It's not the lack of hints that bothers me, it's the "You app developers are dumber than we are" attitude.
We'renot.  Some of us know what we're doing, and we need hints. 

If it is just a matter of resources, that's fine.  I understand that these things take time.  But please don't keep
dismissingthe repeated and serious requests for this feature.  It's important. 

Thanks for listening.
Craig

pgsql-performance by date:

Previous
From: Josh Berkus
Date:
Subject: Re: Simple join optimized badly?
Next
From: Tom Lane
Date:
Subject: Re: Simple join optimized badly?