Re: Two "equivalent" WITH RECURSIVE queries, one of them slow. - Mailing list pgsql-performance

From Robert Haas
Subject Re: Two "equivalent" WITH RECURSIVE queries, one of them slow.
Date
Msg-id AANLkTinE_BZ4QSo-uZcQBKzPTSPcYKQ0wOS81v0Jrkwd@mail.gmail.com
Whole thread Raw
In response to Two "equivalent" WITH RECURSIVE queries, one of them slow.  ("Octavio Alvarez" <alvarezp@alvarezp.ods.org>)
List pgsql-performance
On Mon, Jul 5, 2010 at 2:07 AM, Octavio Alvarez
<alvarezp@alvarezp.ods.org> wrote:
> Hello.
>
> I have a tree-like table with a three-field PK (name, date, id) and one
> parent field.
> It has 5k to 6k records as of now, but it will hold about 1 million records.
>
> I am trying the following WITH RECURSIVE query:
>
> WITH RECURSIVE t AS (
>                 SELECT par.id AS tid, par.name, par.date, par.id, par.text,
> par.h_title, par.h_name, par.parent
>                   FROM _books.par
>        UNION
>                 SELECT t.tid AS pid, p.name, p.date, p.id, p.text,
> p.h_title, p.h_name, p.parent
>                   FROM t, _books.par p
>                  WHERE p.name = t.name AND p.date = t.date AND t.id =
> p.parent
>        )
>  SELECT t.tid, t.name, t.date, t.id, t.text, t.h_title, t.h_name, t.parent
>   FROM t WHERE name = 'cfx' AND date = '2009-08-19' AND tid = '28340';
>
> ... which takes 2547.503 ms
>
> However, if I try the same query but adding the same WHERE clause to the
> non-recursive term, I get much better results.
>
>
> WITH RECURSIVE t AS (
>                 SELECT par.id AS tid, par.name, par.date, par.id, par.text,
> par.h_title, par.h_name, par.parent
>                   FROM _books.par WHERE name = 'cfx' AND date = '2009-08-19'
> AND par.id = '28340'
>        UNION
>                 SELECT t.tid AS pid, p.name, p.date, p.id, p.text,
> p.h_title, p.h_name, p.parent
>                   FROM t, _books.par p
>                  WHERE p.name = t.name AND p.date = t.date AND t.id =
> p.parent
>        )
>  SELECT t.tid, t.name, t.date, t.id, t.text, t.h_title, t.h_name, t.parent
>   FROM t WHERE name = 'cfx' AND date = '2009-08-19' AND tid = '28340';
>
> ... which takes 0.221 ms
>
> I am being forced to use the slow query because I want to define it as a
> view, leaving the WHERE clause to the application.

I think this is just a limitation of the optimizer.  Recursive queries
are a relatively new feature and the optimizer doesn't know a whole
lot about how to deal with them.  That may get improved at some point,
but the optimization you're hoping for here is pretty tricky.  In
order to push the WHERE clauses down into the non-recursive term, the
optimizer would need to prove that this doesn't change the final
results.  I think that's possible here because it so happens that your
recursive term only generates results that have the same name, date,
and tid as some existing result, but with a slightly different
recursive query that wouldn't be true, so you'd need to make the code
pretty smart to work this one out.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

pgsql-performance by date:

Previous
From: Robert Haas
Date:
Subject: Re: Highly Efficient Custom Sorting
Next
From: Robert Haas
Date:
Subject: Re: Question about partitioned query behavior