Re: Okay to tighten definition of oprcanhash? - Mailing list pgsql-hackers

From Greg Stark
Subject Re: Okay to tighten definition of oprcanhash?
Date
Msg-id 87of7gbcqd.fsf@stark.dyndns.tv
Whole thread Raw
In response to Okay to tighten definition of oprcanhash?  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Okay to tighten definition of oprcanhash?
List pgsql-hackers
Tom Lane <tgl@sss.pgh.pa.us> writes:

> I have been looking into the possibility of using a hashtable to speed
> up "x IN (SELECT y FROM ...)" operations.  

That's certainly one of the join types that Oracle can perform, and it's
frequently by far the fastest.

I'm not sure but I think the way Oracle optimizes subselects is by
transforming them into the equivalent join. Or rather, probably by
transforming both joins and subselects into an equivalent internal
representation.

This applies equally to things likeWHERE x=(select..)
as well as things likeWHERE x IN (select...)

The former is exactly equivalent to a join with an assertion of uniqueness.
The latter is a little different but still equivalent to a join with special
behaviour in case of duplicates. In many cases the database will have a
constraint telling it that no duplicates will appear in which cases it should
be able to optimize out any extra work anyways.

I know when I guided less experienced SQL programmers I urged them not to
worry about which form would perform better, only which form more clearly
expressed the result set they wanted. I promised them the database should
produce the same query plan for equivalent queries regardless of the form
chosen to express that in.

That was almost always true for Oracle. I generally found it impossible to
optimize a query merely by changing it from one equivalent form to another.
Oracle nearly always produced exactly the same plan. I always had to either
find a non-equivalent form that I knew would produce the same results only
because of extra information I had about the data, or add optimizer hints.

So is there some more general internal representation that can represent all
three of these cases in a consistent manner? It seems more powerful to
implement hash joins in a way that helps normal join queries as well as
particular subselect forms of queries.

For what it's worth my limited experience so far with postgres is that what
you're talking about is sorely needed. I'm having trouble getting queries that
I would write without thinking twice about on Oracle to perform reasonably on
postgres even after trying all kinds of contortions. And they're precisely the
types of queries that would be helped by hash joins.

-- 
greg



pgsql-hackers by date:

Previous
From: "Diego T."
Date:
Subject: ...
Next
From: Tom Lane
Date:
Subject: Re: Okay to tighten definition of oprcanhash?