Thread: Lifting WHERE conditions out of inner select

Lifting WHERE conditions out of inner select

From

"John D. Burger"

Date:

08 October 2007, 19:35:44

Hi -

A colleague presented the following very slow query to me:

SELECT DISTINCT lemma FROM word
    JOIN sense USING (wordid)
    JOIN synset USING (synsetid)
   WHERE sense.synsetid
     IN (SELECT synset2id FROM semlinkref
          WHERE synset1id
            IN (SELECT synsetid FROM sense
                 WHERE wordid = (SELECT wordid FROM word WHERE
lemma='scramble'))
          AND linkid=1
          AND synset.pos='v')
   ORDER BY lemma;

I realized that the last constraint, synset.pos='v', actually applies
to one of the tables in the main join, and could be lifted out of the
double IN clause.  Doing so sped the query up by a factor of 10,000.

My question is, should the planner have figured this out, and we're
just losing out because we're stuck in 7.4?  Or is there some subtle
difference in semantics I'm missing?  The select results were the
same in both cases, but I'm willing to believe that's an accident of
our data.

(Sorry if no one can answer my question without the table
definitions, etc. - it seemed worthwhile trying to get away without
that for now.)

Thanks.

- John D. Burger
   MITRE

Re: Lifting WHERE conditions out of inner select

From

Richard Broersma Jr

Date:

08 October 2007, 20:09:35

--- "John D. Burger" <john@mitre.org> wrote:
> My question is, should the planner have figured this out, and we're
> just losing out because we're stuck in 7.4?  Or is there some subtle
> difference in semantics I'm missing?  The select results were the
> same in both cases, but I'm willing to believe that's an accident of
> our data.

I don't know if this helps.

    SELECT lemma
      FROM word AS W
INNER JOIN sense AS S
        ON W.wordid = S.wordid
INNER JOIN semlinkref AS R
        ON R.synset1id = S.synsetid
       AND R.synset2id = S.synsetid
INNER JOIN synset AS T
        ON S1.synsetid = T.synsetid
     WHERE W.lemma = 'scramble'
       AND R.linked = 1
       AND R.pos='v'
  ORDER BY lemma;

Regards,
Richard Broersma Jr.

Re: Lifting WHERE conditions out of inner select

From

Richard Broersma Jr

Date:

08 October 2007, 20:16:59

OOPs!
--- Richard Broersma Jr <rabroersma@yahoo.com> wrote:

> --- "John D. Burger" <john@mitre.org> wrote:
> > My question is, should the planner have figured this out, and we're
> > just losing out because we're stuck in 7.4?  Or is there some subtle
> > difference in semantics I'm missing?  The select results were the
> > same in both cases, but I'm willing to believe that's an accident of
> > our data.
>
> I don't know if this helps.
>
>     SELECT lemma
>       FROM word AS W
> INNER JOIN sense AS S
>         ON W.wordid = S.wordid
> INNER JOIN semlinkref AS R
>         ON R.synset1id = S.synsetid
>        AND R.synset2id = S.synsetid
> INNER JOIN synset AS T
>         ON S.synsetid = T.synsetid
>      WHERE W.lemma = 'scramble'
>        AND R.linked = 1
>        AND T.pos='v'
>   ORDER BY lemma;
>
> Regards,
> Richard Broersma Jr.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
>                http://www.postgresql.org/docs/faq
>

Re: Lifting WHERE conditions out of inner select

From

Tom Lane

Date:

08 October 2007, 23:07:03

"John D. Burger" <john@mitre.org> writes:
> ... WHERE sense.synsetid
>      IN (SELECT synset2id FROM semlinkref
>           WHERE synset1id
>             IN (SELECT synsetid FROM sense
>                  WHERE wordid = (SELECT wordid FROM word WHERE
> lemma='scramble'))
>           AND linkid=1
>           AND synset.pos='v')

> I realized that the last constraint, synset.pos='v', actually applies
> to one of the tables in the main join, and could be lifted out of the
> double IN clause.  Doing so sped the query up by a factor of 10,000.

> My question is, should the planner have figured this out, and we're
> just losing out because we're stuck in 7.4?  Or is there some subtle
> difference in semantics I'm missing?

As long as the condition involving the upper variable is STABLE
(including IMMUTABLE), I think your analysis is correct --- at least for
upper references that're within top-level WHERE clauses of a simple
subquery.  An example where it would not work is

    foo IN (SELECT COUNT(*) FROM sometable
        WHERE something-involving-upper-variable)

Here the upper condition doesn't simply filter out all the rows of the
subquery but actually changes the value of the (one) returned row.

The planner does not look for this type of situation though, and after
some study I think it'd be more trouble than it was worth.  It'd be
less than trivial to determine whether the upper references occurred
only in places where it was safe to pull them up, and the actual pulling
would take some code that doesn't exist now, too,

            regards, tom lane

Re: Lifting WHERE conditions out of inner select

From

"John D. Burger"

Date:

09 October 2007, 12:05:18

Tom Lane wrote:

> The planner does not look for this type of situation though, and after
> some study I think it'd be more trouble than it was worth.  It'd be
> less than trivial to determine whether the upper references occurred
> only in places where it was safe to pull them up, and the actual
> pulling
> would take some code that doesn't exist now, too,

Okay, good to know.  The situation is obviously easy to avoid, I just
found the contrast surprising.

Thanks.

- John D. Burger
   MITRE