I'm having a problem with a simple query, that finds children of a node,
using a materialized path to the node. The query:
select n1.id
from nodes n1, nodes n2
where n1.path like n2.path || '%'
and n2.id = 14;
QUERY
PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..120256.56 rows=17517 width=4) (actual
time=0.901..953.485 rows=7 loops=1)
Join Filter: (("inner".path)::text ~~ (("outer".path)::text ||
'%'::text))
-> Index Scan using nodes_id on nodes n2 (cost=0.00..35.08 rows=11
width=34) (actual time=0.050..0.059 rows=1 loops=1)
Index Cond: (id = 14)
-> Seq Scan on nodes n1 (cost=0.00..6151.89 rows=318489 width=38)
(actual time=0.010..479.479 rows=318489 loops=1)
Total runtime: 953.551 ms
(6 rows)
I've tried re-writing the query, which results in a different plan:
select id
from nodes
where path like (
select path
from nodes
where id = 14
limit 1
) || '%';
QUERY
PLAN
------------------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on nodes (cost=3.19..7747.52 rows=1592 width=4) (actual
time=0.230..226.311 rows=7 loops=1)
Filter: ((path)::text ~~ (($0)::text || '%'::text))
InitPlan
-> Limit (cost=0.00..3.19 rows=1 width=34) (actual
time=0.018..0.019 rows=1 loops=1)
-> Index Scan using nodes_id on nodes (cost=0.00..35.08
rows=11 width=34) (actual time=0.016..0.016 rows=1 loops=1)
Index Cond: (id = 14)
Total runtime: 226.381 ms
(7 rows)
While the plan looks a little better, the estimated rows are woefully
inaccurate for some reason, resulting in a seq scan on nodes.
If I perform the nested select in the second query separately, then use
the result in the outer select, it's extremely fast:
test=# select path from nodes where id = 14;
path
--------
/3/13/
(1 row)
Time: 0.555 ms
test=# select id from nodes where path like '/3/13/%';
id
---------
14
169012
15
16
17
169219
169220
(7 rows)
Time: 1.062 ms
I've vacuum full analyzed. PG version is 8.1.4
The nodes table is as follows:
test=# \d nodes
Table "public.nodes"
Column | Type | Modifiers
--------+-------------------------+-----------
id | integer | not null
path | character varying(2000) | not null
depth | integer | not null
Indexes:
"nodes_pkey" PRIMARY KEY, btree (id, path)
"nodes_id" btree (id)
"nodes_path" btree (path)
test# select count(*) from nodes;
count
--------
318489
Is there a way to perform this efficiently in one query ?