Thread: Two "equivalent" WITH RECURSIVE queries, one of them slow.

Two "equivalent" WITH RECURSIVE queries, one of them slow.

From
"Octavio Alvarez"
Date:
Hello.

I have a tree-like table with a three-field PK (name, date, id) and one
parent field.
It has 5k to 6k records as of now, but it will hold about 1 million
records.

I am trying the following WITH RECURSIVE query:

WITH RECURSIVE t AS (
                  SELECT par.id AS tid, par.name, par.date, par.id,
par.text, par.h_title, par.h_name, par.parent
                    FROM _books.par
         UNION
                  SELECT t.tid AS pid, p.name, p.date, p.id, p.text,
p.h_title, p.h_name, p.parent
                    FROM t, _books.par p
                   WHERE p.name = t.name AND p.date = t.date AND t.id =
p.parent
         )
  SELECT t.tid, t.name, t.date, t.id, t.text, t.h_title, t.h_name, t.parent
    FROM t WHERE name = 'cfx' AND date = '2009-08-19' AND tid = '28340';

... which takes 2547.503 ms

However, if I try the same query but adding the same WHERE clause to the
non-recursive term, I get much better results.


WITH RECURSIVE t AS (
                  SELECT par.id AS tid, par.name, par.date, par.id,
par.text, par.h_title, par.h_name, par.parent
                    FROM _books.par WHERE name = 'cfx' AND date =
'2009-08-19' AND par.id = '28340'
         UNION
                  SELECT t.tid AS pid, p.name, p.date, p.id, p.text,
p.h_title, p.h_name, p.parent
                    FROM t, _books.par p
                   WHERE p.name = t.name AND p.date = t.date AND t.id =
p.parent
         )
  SELECT t.tid, t.name, t.date, t.id, t.text, t.h_title, t.h_name, t.parent
    FROM t WHERE name = 'cfx' AND date = '2009-08-19' AND tid = '28340';

... which takes 0.221 ms

I am being forced to use the slow query because I want to define it as a
view, leaving the WHERE clause to the application.

I fail to see where the two queries might be different, or, what cases the
slow one considers that the fast one doesn't, as to get a clue on how to
workaround this.

I have taken the EXPLAIN ANALYZE output for both queries. It looks like
the slow one is processing all records (read: not adding the WHERE clause
to the non-recursive term).


                                                                   QUERY
PLAN

----------------------------------------------------------------------------------------------------------------------------------------------
  CTE Scan on t  (cost=96653.20..96820.57 rows=1 width=144) (actual
time=32.931..2541.792 rows=1 loops=1)
    Filter: (((name)::text = 'cfx'::text) AND (date = '2009-08-19'::date)
AND (tid = 28340))
    CTE t
      ->  Recursive Union  (cost=0.00..96653.20 rows=6086 width=212)
(actual time=0.017..2442.655 rows=33191 loops=1)
            ->  Seq Scan on par  (cost=0.00..237.96 rows=5996 width=208)
(actual time=0.011..5.591 rows=5996 loops=1)
            ->  Merge Join  (cost=8909.74..9629.35 rows=9 width=212)
(actual time=225.979..254.727 rows=3022 loops=9)
                  Merge Cond: (((t.name)::text = (p.name)::text) AND
(t.date = p.date) AND (t.id = p.parent))
                  ->  Sort  (cost=7700.54..7850.44 rows=59960 width=44)
(actual time=58.163..59.596 rows=3685 loops=9)
                        Sort name: t.name, t.date, t.id
                        Sort Method:  quicksort  Memory: 17kB
                        ->  WorkTable Scan on t  (cost=0.00..1199.20
rows=59960 width=44) (actual time=0.027..3.486 rows=3688 loops=9)
                  ->  Materialize  (cost=1209.20..1284.15 rows=5996
width=208) (actual time=163.062..177.415 rows=5810 loops=9)
                        ->  Sort  (cost=1209.20..1224.19 rows=5996
width=208) (actual time=163.054..172.543 rows=5810 loops=9)
                              Sort name: p.name, p.date, p.parent
                              Sort Method:  external merge  Disk: 1304kB
                              ->  Seq Scan on par p  (cost=0.00..237.96
rows=5996 width=208) (actual time=0.015..3.330 rows=5996 loops=9)
  Total runtime: 2547.503 ms
(17 rows)



                                                                     QUERY
PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------
  CTE Scan on t  (cost=927.80..928.10 rows=1 width=144) (actual
time=0.036..0.132 rows=1 loops=1)
    Filter: (((name)::text = 'cfx'::text) AND (date = '2009-08-19'::date)
AND (tid = 28340))
    CTE t
      ->  Recursive Union  (cost=0.00..927.80 rows=11 width=212) (actual
time=0.030..0.124 rows=1 loops=1)
            ->  Index Scan using par_id on par  (cost=0.00..8.27 rows=1
width=208) (actual time=0.024..0.026 rows=1 loops=1)
                  Index Cond: (id = 28340)
                  Filter: (((name)::text = 'cfx'::text) AND (date =
'2009-08-19'::date))
            ->  Nested Loop  (cost=0.00..91.93 rows=1 width=212) (actual
time=0.091..0.091 rows=0 loops=1)
                  Join Filter: (((t.name)::text = (p.name)::text) AND
(t.date = p.date))
                  ->  WorkTable Scan on t  (cost=0.00..0.20 rows=10
width=44) (actual time=0.001..0.001 rows=1 loops=1)
                  ->  Index Scan using par_parent on par p
(cost=0.00..9.07 rows=6 width=208) (actual time=0.085..0.085 rows=0
loops=1)
                        Index Cond: (p.parent = t.id)
  Total runtime: 0.221 ms
(13 rows)



books=# \d _books.par
            Table "_books.par"
     Column    |       Type        | Modifiers
--------------+-------------------+-----------
  name         | character varying | not null
  date         | date              | not null
  id           | integer           | not null
  text         | character varying |
  h_title      | character varying |
  h_name       | character varying |
  parent       | integer           |
Indexes:
     "par_pkey" PRIMARY KEY, btree (name, date, id)
     "par_name" btree (name)
     "par_name_fpub_parent" btree (name, date, parent)
     "par_id" btree (id)
     "par_parent" btree (parent)



$ psql --version
psql (PostgreSQL) 8.4.4
contains support for command-line editing




--
Octavio.

Re: Two "equivalent" WITH RECURSIVE queries, one of them slow.

From
Robert Haas
Date:
On Mon, Jul 5, 2010 at 2:07 AM, Octavio Alvarez
<alvarezp@alvarezp.ods.org> wrote:
> Hello.
>
> I have a tree-like table with a three-field PK (name, date, id) and one
> parent field.
> It has 5k to 6k records as of now, but it will hold about 1 million records.
>
> I am trying the following WITH RECURSIVE query:
>
> WITH RECURSIVE t AS (
>                 SELECT par.id AS tid, par.name, par.date, par.id, par.text,
> par.h_title, par.h_name, par.parent
>                   FROM _books.par
>        UNION
>                 SELECT t.tid AS pid, p.name, p.date, p.id, p.text,
> p.h_title, p.h_name, p.parent
>                   FROM t, _books.par p
>                  WHERE p.name = t.name AND p.date = t.date AND t.id =
> p.parent
>        )
>  SELECT t.tid, t.name, t.date, t.id, t.text, t.h_title, t.h_name, t.parent
>   FROM t WHERE name = 'cfx' AND date = '2009-08-19' AND tid = '28340';
>
> ... which takes 2547.503 ms
>
> However, if I try the same query but adding the same WHERE clause to the
> non-recursive term, I get much better results.
>
>
> WITH RECURSIVE t AS (
>                 SELECT par.id AS tid, par.name, par.date, par.id, par.text,
> par.h_title, par.h_name, par.parent
>                   FROM _books.par WHERE name = 'cfx' AND date = '2009-08-19'
> AND par.id = '28340'
>        UNION
>                 SELECT t.tid AS pid, p.name, p.date, p.id, p.text,
> p.h_title, p.h_name, p.parent
>                   FROM t, _books.par p
>                  WHERE p.name = t.name AND p.date = t.date AND t.id =
> p.parent
>        )
>  SELECT t.tid, t.name, t.date, t.id, t.text, t.h_title, t.h_name, t.parent
>   FROM t WHERE name = 'cfx' AND date = '2009-08-19' AND tid = '28340';
>
> ... which takes 0.221 ms
>
> I am being forced to use the slow query because I want to define it as a
> view, leaving the WHERE clause to the application.

I think this is just a limitation of the optimizer.  Recursive queries
are a relatively new feature and the optimizer doesn't know a whole
lot about how to deal with them.  That may get improved at some point,
but the optimization you're hoping for here is pretty tricky.  In
order to push the WHERE clauses down into the non-recursive term, the
optimizer would need to prove that this doesn't change the final
results.  I think that's possible here because it so happens that your
recursive term only generates results that have the same name, date,
and tid as some existing result, but with a slightly different
recursive query that wouldn't be true, so you'd need to make the code
pretty smart to work this one out.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

Re: Two "equivalent" WITH RECURSIVE queries, one of them slow.

From
Merlin Moncure
Date:
On Mon, Jul 5, 2010 at 2:07 AM, Octavio Alvarez
<alvarezp@alvarezp.ods.org> wrote:
> Hello.
>
> I have a tree-like table with a three-field PK (name, date, id) and one
> parent field.
> It has 5k to 6k records as of now, but it will hold about 1 million records.
>
> I am trying the following WITH RECURSIVE query:
>
> WITH RECURSIVE t AS (
>                 SELECT par.id AS tid, par.name, par.date, par.id, par.text,
> par.h_title, par.h_name, par.parent
>                   FROM _books.par
>        UNION
>                 SELECT t.tid AS pid, p.name, p.date, p.id, p.text,
> p.h_title, p.h_name, p.parent
>                   FROM t, _books.par p
>                  WHERE p.name = t.name AND p.date = t.date AND t.id =
> p.parent
>        )
>  SELECT t.tid, t.name, t.date, t.id, t.text, t.h_title, t.h_name, t.parent
>   FROM t WHERE name = 'cfx' AND date = '2009-08-19' AND tid = '28340';
>
> ... which takes 2547.503 ms
>
> However, if I try the same query but adding the same WHERE clause to the
> non-recursive term, I get much better results.
>
>
> WITH RECURSIVE t AS (
>                 SELECT par.id AS tid, par.name, par.date, par.id, par.text,
> par.h_title, par.h_name, par.parent
>                   FROM _books.par WHERE name = 'cfx' AND date = '2009-08-19'
> AND par.id = '28340'
>        UNION
>                 SELECT t.tid AS pid, p.name, p.date, p.id, p.text,
> p.h_title, p.h_name, p.parent
>                   FROM t, _books.par p
>                  WHERE p.name = t.name AND p.date = t.date AND t.id =
> p.parent
>        )
>  SELECT t.tid, t.name, t.date, t.id, t.text, t.h_title, t.h_name, t.parent
>   FROM t WHERE name = 'cfx' AND date = '2009-08-19' AND tid = '28340';
>
> ... which takes 0.221 ms

If you want the fast plan, you might want to consider reworking your
query into a set returning function.  It's pretty easy to do:


create or replace function f(arg int) returns setof something as
$$
  with recursive foo as
  (
    select * from bar where id = $1
      union all
    [...]
  )
  select * from foo
$$ language sql;

Obviously, a pure view approach would be nicer but it just isn't going
to hapen at present.  CTE are currently problematic generally when you
need quals in the 'with' term, especially in the case of recursive
CTE.

merlin