Re: [HACKERS] Discussion on missing optimizations - Mailing list pgsql-hackers

From Nico Williams
Subject Re: [HACKERS] Discussion on missing optimizations
Date
Msg-id 20171007204007.GB3093@localhost
Whole thread Raw
In response to Re: [HACKERS] Discussion on missing optimizations  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Fri, Oct 06, 2017 at 10:19:54PM -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2017-10-06 21:33:16 -0400, Adam Brusselback wrote:
> >> The article in question is here:
> >> https://blog.jooq.org/2017/09/28/10-cool-sql-optimisations-that-do-not-depend-on-the-cost-model/
> 
> > That's interesting.
> 
> The impression I have in a quick scan is that probably hardly any of these
> are cases that any of the DB designers think are important in themselves.

That's true for some of those.  But some of them might become important
when you start pushing WHERE constraints from outside into inner table
sources and subqueries, as dumb-looking constraints can simply appear
from pushing non-dumb-looking constraints.

More than the op optimizations would make a big difference for me:
- turning subqueries into joins
- turning ORs into UNIONs
  It is easy enough to work around the lack of this optimization in  many cases, but it does make queries more
verbose.
- pushing WHERE constraints from outer queries into the table source  queries (_including_ VIEWs)
   - determining that some table in a query that had WHERE constraints     pushed into it... now has a very well-filled
outlookup key,     therefore it's the one that should be the table source to start     the plan with, i.e., that it
shouldbe first in the outermost loop     of a nested loop join 
 
     For me these two would be huge wins.  I have to resort to     functions with roughly the same body as views just
sothat I can     have the optimizer pick the correct plan.  This causes a lot of     code duplication in my schemas.
 
- pushing WHERE constraints from outer queries into HAVING thence WHERE  constraints on GROUP BY queries where the
outerconstraints are on  columns used to GROUP BY
 
  I find myself making two versions of views that do aggregation: one  that does not, and one that does.  This allows
meto use the  non-aggregating view in contexts where I need this optimization, but  then I have to re-code the
aggregationat that layer.  Again, lots of  duplication.
 

These sorts of optimizations are huge.

> Rather, they fall out of more general optimization attempts, or not,
> depending on the optimization mechanisms in use in a particular DB.
> For example, reducing "WHERE 1=1" to "WHERE TRUE" and then to nothing
> comes out of a constant-subexpression-precalculation mechanism for us,
> whereas "WHERE column=column" doesn't fall to that approach.  ISTM it
> would be really dumb to expend planner cycles looking specifically for
> that case, so I guess that DB2 et al are finding it as a side-effect of
> some more general optimization ... I wonder what that is?

If you can reduce the number of compilations / optimization passes for
statements, then spending more time in the optimizer is not a big deal.
So, when invoked via PREPARE I would say spending more cycles looking
for this sort of thing is OK, but in many other cases it's not.

Also, sometimes these cases crop up do to pushing constraints into VIEWs
and sub-queries.  In those cases then constant sub-expression
elimination can be a win.

> (edit: a few minutes later, I seem to remember that equivclass.c has
> to do something special with the X=X case, so maybe it could do
> something else special instead, with little new overhead.)

I'd expect that column = column is not trivial to turn into TRUE, not
unless those columns are NOT NULLable.

> > 9. Unneeded Self JOIN
> 
> > Can't remember discussions of this.
> 
> I can't get very excited about that one either.
> 
> In the end, what the article fails to consider is that all of these are
> tradeoffs, not unalloyed goods.  If you spend planner cycles on every
> query to look for cases that only the most unabashedly brain-dead ORMs
> ever generate, you're not really doing your users a favor on balance.

I can't get very excited about this one either, though I do believe it
can arise as the author says, "when you build complex views and JOIN
them to each other".  Maybe I'm not excited about it because I've not
needed it :)

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

pgsql-hackers by date:

Previous
From: Konstantin Knizhnik
Date:
Subject: Re: [HACKERS] Slow synchronous logical replication
Next
From: Tom Lane
Date:
Subject: Re: [HACKERS] Horrible CREATE DATABASE Performance in High Sierra