Thread: Predicate migration on complex self joins

Predicate migration on complex self joins

From
Simon Riggs
Date:
In some cases, we have SQL being submitted that has superfluous
self-joins. An example would be

select count(*) 
from foo1 a, foo1 b 
where a.c1 = b.c1 /* PK join */
and a.c2 = 5 
and b.c2 = 10;

We can recognise that <a> and <b> are the same table because they are
joined on the PK. PK is never NULL, so a join b == a in set terms. We
can use this to re-write the query as if all predicates on either of the
two aliases were on the LHS only. e.g. rewrite query like this:

select count(*) 
from foo1 a, foo1 b 
where a.c1 = b.c1 
and a.c2 = 5 
and a.c2 = 10;  /* predicate migration */

Predicate migration is important because it either allows us to detect
impossible logic, as above, or to use multi-column index access/ bitmap
scans, or to allow join removal of the RHS as a superfluous join. (I
believe that self-joins were not originally part of the analysis of
potentially removable joins).

You may well ask who would be stupid enough to write SQL like that. The
answer is of course that it is automatically generated by an ORM.

Implementing something along these lines is secondary to join removal,
but it seems worth noting as non-high priority item for the TODO.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: Predicate migration on complex self joins

From
"Kevin Grittner"
Date:
Simon Riggs <simon@2ndQuadrant.com> wrote:
> select count(*) 
> from foo1 a, foo1 b 
> where a.c1 = b.c1 /* PK join */
> You may well ask who would be stupid enough to write SQL like that.
> The answer is of course that it is automatically generated by an
> ORM.
We had to do something like that to get acceptable performance from
Sybase ASE.  That code probably has not been changed since migrating
to PostgreSQL, and since we have a strong portability mandate, it
probably should be left alone, since the penalty for the extra join in
PostgreSQL is small and the penalty for not having it in Sybase ASE is
large.
In short, it would be a welcome optimization here, although (in our
case) a relatively minor one.
-Kevin


Re: Predicate migration on complex self joins

From
Tom Lane
Date:
Simon Riggs <simon@2ndQuadrant.com> writes:
> In some cases, we have SQL being submitted that has superfluous
> self-joins. An example would be

> select count(*) 
> from foo1 a, foo1 b 
> where a.c1 = b.c1 /* PK join */
> and a.c2 = 5 
> and b.c2 = 10;

> You may well ask who would be stupid enough to write SQL like that. The
> answer is of course that it is automatically generated by an ORM.

Seems like the right answer is "fix the damn ORM".  It's hard to believe
this sort of case comes up often enough to justify the cycles that would
be expended (on *every* join query) to try to recognize it.
        regards, tom lane


Re: Predicate migration on complex self joins

From
Tom Lane
Date:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> Simon Riggs <simon@2ndQuadrant.com> wrote:
>> select count(*) 
>> from foo1 a, foo1 b 
>> where a.c1 = b.c1 /* PK join */
> We had to do something like that to get acceptable performance from
> Sybase ASE.

Writing a join for a single-table query?  Why, in heavens name?
(Or have you mercifully blotted the details from your memory?)
        regards, tom lane


Re: Predicate migration on complex self joins

From
Simon Riggs
Date:
On Mon, 2009-07-13 at 13:33 -0400, Tom Lane wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
> > In some cases, we have SQL being submitted that has superfluous
> > self-joins. An example would be
> 
> > select count(*) 
> > from foo1 a, foo1 b 
> > where a.c1 = b.c1 /* PK join */
> > and a.c2 = 5 
> > and b.c2 = 10;
> 
> > You may well ask who would be stupid enough to write SQL like that. The
> > answer is of course that it is automatically generated by an ORM.
> 
> Seems like the right answer is "fix the damn ORM".  It's hard to believe
> this sort of case comes up often enough to justify the cycles that would
> be expended (on *every* join query) to try to recognize it.

Yeh, damn ORMs seem to spring up faster than vines.

Not just because of this but I wonder if we might benefit from an
optimizer setting specifically aimed at the foolishnesses of
automatically generated SQL.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: Predicate migration on complex self joins

From
Robert Haas
Date:
On Mon, Jul 13, 2009 at 1:33 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
>> In some cases, we have SQL being submitted that has superfluous
>> self-joins. An example would be
>
>> select count(*)
>> from foo1 a, foo1 b
>> where a.c1 = b.c1 /* PK join */
>> and a.c2 = 5
>> and b.c2 = 10;
>
>> You may well ask who would be stupid enough to write SQL like that. The
>> answer is of course that it is automatically generated by an ORM.
>
> Seems like the right answer is "fix the damn ORM".  It's hard to believe
> this sort of case comes up often enough to justify the cycles that would
> be expended (on *every* join query) to try to recognize it.

I think it's more common than you might think.  It's been requested on
-performance within recent memory, and I've had cases where I needed
to deal with it as well.  You can't write:

DELETE FROM table AS alias LEFT JOIN othertable ...

so you end up writing:

DELETE FROM table AS alias USING sametable LEFT JOIN othertable ...

or sometimes:

DELETE FROM table AS alias USING viewthatconstainstable ...

...Robert


Re: Predicate migration on complex self joins

From
"Kevin Grittner"
Date:
Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Writing a join for a single-table query?  Why, in heavens name?
> (Or have you mercifully blotted the details from your memory?)
Actually, I had only the vaguest recollection of why, but I found an
email where I was explaining the problem to Sybase.  Basically, it
boiled down to a corner case involving the intersection of named
caches and index optimizations similar to what Heikki's currently
developing.  If we did a search such as:
SELECT searchName FROM Party WHERE searchName LIKE 'PET%,RANDY%'
where searchName was the first column of an index bound to a named
cache, it would scan the range of the index where searchName >= 'PET'
and searchName < 'PEU', determine which rows actually matched the
whole pattern, and access the heap pages only for those rows which
matched the pattern.  (In this case, 298 rows.)  As long as only
columns from the index were returned, there were only 298 access to
the heap.  Now, if you added a column which was not in the index, it
went to the heap before checking the full pattern, so it went to the
heap 87,632 times for the above criteria, and returned the same 298
rows.  Since the primary key columns were in all indexes (to allow use
of READ UNCOMMITTED :-/ ), we could select those columns without
driving it to the heap, so we used the first table reference just
for selecting the rows, and joined back to the same table on primary
key to get the values to return.
We could not convince Sybase that they should fix that issue.
-Kevin


Re: Predicate migration on complex self joins

From
decibel
Date:
On Jul 13, 2009, at 1:06 PM, Simon Riggs wrote:
> Not just because of this but I wonder if we might benefit from an
> optimizer setting specifically aimed at the foolishnesses of
> automatically generated SQL.


+1. And it's not just ORMs that do stupid things, I've seen crap like  
this come out of users too (not this exact case, but similar).

Perhaps what we really want is an "optimization level" GUC so that  
users can tell the backend how much overhead they want the optimizer  
to spend on trying to work around stupidity... :)
-- 
Decibel!, aka Jim C. Nasby, Database Architect  decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828




Re: Predicate migration on complex self joins

From
Jaime Casanova
Date:
On Mon, Jul 13, 2009 at 3:48 PM, decibel<decibel@decibel.org> wrote:
> On Jul 13, 2009, at 1:06 PM, Simon Riggs wrote:
>>
>> Not just because of this but I wonder if we might benefit from an
>> optimizer setting specifically aimed at the foolishnesses of
>> automatically generated SQL.
>
>
> +1. And it's not just ORMs that do stupid things, I've seen crap like this
> come out of users too (not this exact case, but similar).
>

i've this come from users most of the time...
the few systems i have seen that generate sql, try to avoid using
complex queries and make simple ones and the JOIN them at the app
level...

> Perhaps what we really want is an "optimization level" GUC so that users can
> tell the backend how much overhead they want the optimizer to spend on
> trying to work around stupidity... :)

i wonder what the levels have to be:
- smart_sql
- application_generated_sql
- normal_user_sql
- xtremely_stupid_sql
- what_the_hell_sql

;)

--
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157


Re: Predicate migration on complex self joins

From
Sam Mason
Date:
On Mon, Jul 13, 2009 at 07:06:40PM +0100, Simon Riggs wrote:
> On Mon, 2009-07-13 at 13:33 -0400, Tom Lane wrote:
> > It's hard to believe
> > this sort of case comes up often enough to justify the cycles that would
> > be expended (on *every* join query) to try to recognize it.
> 
> Yeh, damn ORMs seem to spring up faster than vines.
> 
> Not just because of this but I wonder if we might benefit from an
> optimizer setting specifically aimed at the foolishnesses of
> automatically generated SQL.

The best suggestion I heard was to carry on optimizing until the plan
looked cheap enough or all the options had been exhausted.

In practical terms; I think this means doing the planning in two stages,
try with all the "simple" optimizations and see if it results in less
than "n" page accesses, if it's above this level then try again with all
the optimizations turned on.

--  Sam  http://samason.me.uk/