Thread: LIKE query problem

LIKE query problem

From
Marc McIntyre
Date:
I'm having a problem with a simple query, that finds children of a node,
using a materialized path to the node. The query:

select n1.id
from nodes n1, nodes n2
where n1.path like n2.path || '%'
and n2.id = 14;

                                                                QUERY
PLAN

-------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..120256.56 rows=17517 width=4) (actual
time=0.901..953.485 rows=7 loops=1)
   Join Filter: (("inner".path)::text ~~ (("outer".path)::text ||
'%'::text))
   ->  Index Scan using nodes_id on nodes n2  (cost=0.00..35.08 rows=11
width=34) (actual time=0.050..0.059 rows=1 loops=1)
         Index Cond: (id = 14)
   ->  Seq Scan on nodes n1  (cost=0.00..6151.89 rows=318489 width=38)
(actual time=0.010..479.479 rows=318489 loops=1)
 Total runtime: 953.551 ms
(6 rows)

I've tried re-writing the query, which results in a different plan:

select id
from nodes
where path like (
    select path
    from nodes
    where id = 14
    limit 1
) || '%';

                                                                   QUERY
PLAN

------------------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on nodes  (cost=3.19..7747.52 rows=1592 width=4) (actual
time=0.230..226.311 rows=7 loops=1)
   Filter: ((path)::text ~~ (($0)::text || '%'::text))
   InitPlan
     ->  Limit  (cost=0.00..3.19 rows=1 width=34) (actual
time=0.018..0.019 rows=1 loops=1)
           ->  Index Scan using nodes_id on nodes  (cost=0.00..35.08
rows=11 width=34) (actual time=0.016..0.016 rows=1 loops=1)
                 Index Cond: (id = 14)
 Total runtime: 226.381 ms
(7 rows)

While the plan looks a little better, the estimated rows are woefully
inaccurate for some reason, resulting in a seq scan on nodes.
If I perform the nested select in the second query separately, then use
the result in the outer select, it's extremely fast:

test=# select path from nodes where id = 14;
  path
--------
 /3/13/
(1 row)

Time: 0.555 ms

test=# select id from nodes where path like '/3/13/%';
id
---------
      14
  169012
      15
      16
      17
  169219
  169220
(7 rows)

Time: 1.062 ms

I've vacuum full analyzed. PG version is 8.1.4

The nodes table is as follows:

test=# \d nodes
           Table "public.nodes"
 Column |          Type           | Modifiers
--------+-------------------------+-----------
 id     | integer                 | not null
 path   | character varying(2000) | not null
 depth  | integer                 | not null
Indexes:
    "nodes_pkey" PRIMARY KEY, btree (id, path)
    "nodes_id" btree (id)
    "nodes_path" btree (path)

test# select count(*) from nodes;
 count
--------
 318489

Is there a way to perform this efficiently in one query ?

Re: LIKE query problem

From
Tom Lane
Date:
Marc McIntyre <mmcintyre@squiz.net> writes:
> ... Is there a way to perform this efficiently in one query ?

No, because you're hoping for an indexscan optimization of a LIKE
query, and that can only happen if the pattern is a plan-time constant.

            regards, tom lane

Re: LIKE query problem

From
Marc McIntyre
Date:
Thanks Tom,

Is that documented somewhere? I can't seem to see any mention of it in
the docs.

Tom Lane wrote:
> Marc McIntyre <mmcintyre@squiz.net> writes:
>
>> ... Is there a way to perform this efficiently in one query ?
>>
>
> No, because you're hoping for an indexscan optimization of a LIKE
> query, and that can only happen if the pattern is a plan-time constant.
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to majordomo@postgresql.org so that your
>        message can get through to the mailing list cleanly
>
>