Two "equivalent" WITH RECURSIVE queries, one of them slow. - Mailing list pgsql-performance
| From | Octavio Alvarez | 
|---|---|
| Subject | Two "equivalent" WITH RECURSIVE queries, one of them slow. | 
| Date | |
| Msg-id | op.vfcwmiow4oyyg1@localhost.localdomain Whole thread Raw | 
| Responses | Re: Two "equivalent" WITH RECURSIVE queries, one of them 
	slow. Re: Two "equivalent" WITH RECURSIVE queries, one of them slow. | 
| List | pgsql-performance | 
Hello.
I have a tree-like table with a three-field PK (name, date, id) and one
parent field.
It has 5k to 6k records as of now, but it will hold about 1 million
records.
I am trying the following WITH RECURSIVE query:
WITH RECURSIVE t AS (
                  SELECT par.id AS tid, par.name, par.date, par.id,
par.text, par.h_title, par.h_name, par.parent
                    FROM _books.par
         UNION
                  SELECT t.tid AS pid, p.name, p.date, p.id, p.text,
p.h_title, p.h_name, p.parent
                    FROM t, _books.par p
                   WHERE p.name = t.name AND p.date = t.date AND t.id =
p.parent
         )
  SELECT t.tid, t.name, t.date, t.id, t.text, t.h_title, t.h_name, t.parent
    FROM t WHERE name = 'cfx' AND date = '2009-08-19' AND tid = '28340';
... which takes 2547.503 ms
However, if I try the same query but adding the same WHERE clause to the
non-recursive term, I get much better results.
WITH RECURSIVE t AS (
                  SELECT par.id AS tid, par.name, par.date, par.id,
par.text, par.h_title, par.h_name, par.parent
                    FROM _books.par WHERE name = 'cfx' AND date =
'2009-08-19' AND par.id = '28340'
         UNION
                  SELECT t.tid AS pid, p.name, p.date, p.id, p.text,
p.h_title, p.h_name, p.parent
                    FROM t, _books.par p
                   WHERE p.name = t.name AND p.date = t.date AND t.id =
p.parent
         )
  SELECT t.tid, t.name, t.date, t.id, t.text, t.h_title, t.h_name, t.parent
    FROM t WHERE name = 'cfx' AND date = '2009-08-19' AND tid = '28340';
... which takes 0.221 ms
I am being forced to use the slow query because I want to define it as a
view, leaving the WHERE clause to the application.
I fail to see where the two queries might be different, or, what cases the
slow one considers that the fast one doesn't, as to get a clue on how to
workaround this.
I have taken the EXPLAIN ANALYZE output for both queries. It looks like
the slow one is processing all records (read: not adding the WHERE clause
to the non-recursive term).
                                                                   QUERY
PLAN
----------------------------------------------------------------------------------------------------------------------------------------------
  CTE Scan on t  (cost=96653.20..96820.57 rows=1 width=144) (actual
time=32.931..2541.792 rows=1 loops=1)
    Filter: (((name)::text = 'cfx'::text) AND (date = '2009-08-19'::date)
AND (tid = 28340))
    CTE t
      ->  Recursive Union  (cost=0.00..96653.20 rows=6086 width=212)
(actual time=0.017..2442.655 rows=33191 loops=1)
            ->  Seq Scan on par  (cost=0.00..237.96 rows=5996 width=208)
(actual time=0.011..5.591 rows=5996 loops=1)
            ->  Merge Join  (cost=8909.74..9629.35 rows=9 width=212)
(actual time=225.979..254.727 rows=3022 loops=9)
                  Merge Cond: (((t.name)::text = (p.name)::text) AND
(t.date = p.date) AND (t.id = p.parent))
                  ->  Sort  (cost=7700.54..7850.44 rows=59960 width=44)
(actual time=58.163..59.596 rows=3685 loops=9)
                        Sort name: t.name, t.date, t.id
                        Sort Method:  quicksort  Memory: 17kB
                        ->  WorkTable Scan on t  (cost=0.00..1199.20
rows=59960 width=44) (actual time=0.027..3.486 rows=3688 loops=9)
                  ->  Materialize  (cost=1209.20..1284.15 rows=5996
width=208) (actual time=163.062..177.415 rows=5810 loops=9)
                        ->  Sort  (cost=1209.20..1224.19 rows=5996
width=208) (actual time=163.054..172.543 rows=5810 loops=9)
                              Sort name: p.name, p.date, p.parent
                              Sort Method:  external merge  Disk: 1304kB
                              ->  Seq Scan on par p  (cost=0.00..237.96
rows=5996 width=208) (actual time=0.015..3.330 rows=5996 loops=9)
  Total runtime: 2547.503 ms
(17 rows)
                                                                     QUERY
PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
  CTE Scan on t  (cost=927.80..928.10 rows=1 width=144) (actual
time=0.036..0.132 rows=1 loops=1)
    Filter: (((name)::text = 'cfx'::text) AND (date = '2009-08-19'::date)
AND (tid = 28340))
    CTE t
      ->  Recursive Union  (cost=0.00..927.80 rows=11 width=212) (actual
time=0.030..0.124 rows=1 loops=1)
            ->  Index Scan using par_id on par  (cost=0.00..8.27 rows=1
width=208) (actual time=0.024..0.026 rows=1 loops=1)
                  Index Cond: (id = 28340)
                  Filter: (((name)::text = 'cfx'::text) AND (date =
'2009-08-19'::date))
            ->  Nested Loop  (cost=0.00..91.93 rows=1 width=212) (actual
time=0.091..0.091 rows=0 loops=1)
                  Join Filter: (((t.name)::text = (p.name)::text) AND
(t.date = p.date))
                  ->  WorkTable Scan on t  (cost=0.00..0.20 rows=10
width=44) (actual time=0.001..0.001 rows=1 loops=1)
                  ->  Index Scan using par_parent on par p
(cost=0.00..9.07 rows=6 width=208) (actual time=0.085..0.085 rows=0
loops=1)
                        Index Cond: (p.parent = t.id)
  Total runtime: 0.221 ms
(13 rows)
books=# \d _books.par
            Table "_books.par"
     Column    |       Type        | Modifiers
--------------+-------------------+-----------
  name         | character varying | not null
  date         | date              | not null
  id           | integer           | not null
  text         | character varying |
  h_title      | character varying |
  h_name       | character varying |
  parent       | integer           |
Indexes:
     "par_pkey" PRIMARY KEY, btree (name, date, id)
     "par_name" btree (name)
     "par_name_fpub_parent" btree (name, date, parent)
     "par_id" btree (id)
     "par_parent" btree (parent)
$ psql --version
psql (PostgreSQL) 8.4.4
contains support for command-line editing
--
Octavio.
		
	pgsql-performance by date: