Two "equivalent" WITH RECURSIVE queries, one of them slow. - Mailing list pgsql-performance
From | Octavio Alvarez |
---|---|
Subject | Two "equivalent" WITH RECURSIVE queries, one of them slow. |
Date | |
Msg-id | op.vfcwmiow4oyyg1@localhost.localdomain Whole thread Raw |
Responses |
Re: Two "equivalent" WITH RECURSIVE queries, one of them
slow.
Re: Two "equivalent" WITH RECURSIVE queries, one of them slow. |
List | pgsql-performance |
Hello. I have a tree-like table with a three-field PK (name, date, id) and one parent field. It has 5k to 6k records as of now, but it will hold about 1 million records. I am trying the following WITH RECURSIVE query: WITH RECURSIVE t AS ( SELECT par.id AS tid, par.name, par.date, par.id, par.text, par.h_title, par.h_name, par.parent FROM _books.par UNION SELECT t.tid AS pid, p.name, p.date, p.id, p.text, p.h_title, p.h_name, p.parent FROM t, _books.par p WHERE p.name = t.name AND p.date = t.date AND t.id = p.parent ) SELECT t.tid, t.name, t.date, t.id, t.text, t.h_title, t.h_name, t.parent FROM t WHERE name = 'cfx' AND date = '2009-08-19' AND tid = '28340'; ... which takes 2547.503 ms However, if I try the same query but adding the same WHERE clause to the non-recursive term, I get much better results. WITH RECURSIVE t AS ( SELECT par.id AS tid, par.name, par.date, par.id, par.text, par.h_title, par.h_name, par.parent FROM _books.par WHERE name = 'cfx' AND date = '2009-08-19' AND par.id = '28340' UNION SELECT t.tid AS pid, p.name, p.date, p.id, p.text, p.h_title, p.h_name, p.parent FROM t, _books.par p WHERE p.name = t.name AND p.date = t.date AND t.id = p.parent ) SELECT t.tid, t.name, t.date, t.id, t.text, t.h_title, t.h_name, t.parent FROM t WHERE name = 'cfx' AND date = '2009-08-19' AND tid = '28340'; ... which takes 0.221 ms I am being forced to use the slow query because I want to define it as a view, leaving the WHERE clause to the application. I fail to see where the two queries might be different, or, what cases the slow one considers that the fast one doesn't, as to get a clue on how to workaround this. I have taken the EXPLAIN ANALYZE output for both queries. It looks like the slow one is processing all records (read: not adding the WHERE clause to the non-recursive term). QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------- CTE Scan on t (cost=96653.20..96820.57 rows=1 width=144) (actual time=32.931..2541.792 rows=1 loops=1) Filter: (((name)::text = 'cfx'::text) AND (date = '2009-08-19'::date) AND (tid = 28340)) CTE t -> Recursive Union (cost=0.00..96653.20 rows=6086 width=212) (actual time=0.017..2442.655 rows=33191 loops=1) -> Seq Scan on par (cost=0.00..237.96 rows=5996 width=208) (actual time=0.011..5.591 rows=5996 loops=1) -> Merge Join (cost=8909.74..9629.35 rows=9 width=212) (actual time=225.979..254.727 rows=3022 loops=9) Merge Cond: (((t.name)::text = (p.name)::text) AND (t.date = p.date) AND (t.id = p.parent)) -> Sort (cost=7700.54..7850.44 rows=59960 width=44) (actual time=58.163..59.596 rows=3685 loops=9) Sort name: t.name, t.date, t.id Sort Method: quicksort Memory: 17kB -> WorkTable Scan on t (cost=0.00..1199.20 rows=59960 width=44) (actual time=0.027..3.486 rows=3688 loops=9) -> Materialize (cost=1209.20..1284.15 rows=5996 width=208) (actual time=163.062..177.415 rows=5810 loops=9) -> Sort (cost=1209.20..1224.19 rows=5996 width=208) (actual time=163.054..172.543 rows=5810 loops=9) Sort name: p.name, p.date, p.parent Sort Method: external merge Disk: 1304kB -> Seq Scan on par p (cost=0.00..237.96 rows=5996 width=208) (actual time=0.015..3.330 rows=5996 loops=9) Total runtime: 2547.503 ms (17 rows) QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------- CTE Scan on t (cost=927.80..928.10 rows=1 width=144) (actual time=0.036..0.132 rows=1 loops=1) Filter: (((name)::text = 'cfx'::text) AND (date = '2009-08-19'::date) AND (tid = 28340)) CTE t -> Recursive Union (cost=0.00..927.80 rows=11 width=212) (actual time=0.030..0.124 rows=1 loops=1) -> Index Scan using par_id on par (cost=0.00..8.27 rows=1 width=208) (actual time=0.024..0.026 rows=1 loops=1) Index Cond: (id = 28340) Filter: (((name)::text = 'cfx'::text) AND (date = '2009-08-19'::date)) -> Nested Loop (cost=0.00..91.93 rows=1 width=212) (actual time=0.091..0.091 rows=0 loops=1) Join Filter: (((t.name)::text = (p.name)::text) AND (t.date = p.date)) -> WorkTable Scan on t (cost=0.00..0.20 rows=10 width=44) (actual time=0.001..0.001 rows=1 loops=1) -> Index Scan using par_parent on par p (cost=0.00..9.07 rows=6 width=208) (actual time=0.085..0.085 rows=0 loops=1) Index Cond: (p.parent = t.id) Total runtime: 0.221 ms (13 rows) books=# \d _books.par Table "_books.par" Column | Type | Modifiers --------------+-------------------+----------- name | character varying | not null date | date | not null id | integer | not null text | character varying | h_title | character varying | h_name | character varying | parent | integer | Indexes: "par_pkey" PRIMARY KEY, btree (name, date, id) "par_name" btree (name) "par_name_fpub_parent" btree (name, date, parent) "par_id" btree (id) "par_parent" btree (parent) $ psql --version psql (PostgreSQL) 8.4.4 contains support for command-line editing -- Octavio.
pgsql-performance by date: