How to get around LIKE inefficiencies? - Mailing list pgsql-hackers

From The Hermit Hacker
Subject How to get around LIKE inefficiencies?
Date
Msg-id Pine.BSF.4.21.0011052045440.494-100000@thelab.hub.org
Whole thread Raw
Responses Re: How to get around LIKE inefficiencies?  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: How to get around LIKE inefficiencies?  (Tatsuo Ishii <t-ishii@sra.co.jp>)
List pgsql-hackers
I'm tryin to figure out how to speed up udmsearch when run under
postgresql, and am being hit by atrocious performance when using a LIKE
query ... the query looks like:

SELECT ndict.url_id,ndict.intag  FROM ndict,url WHERE ndict.word_id=1971739852   AND url.rec_id=ndict.url_id    AND
(url.urlLIKE 'http://www.postgresql.org/%');
 

Take off the AND ( LIKE ) part of the query, finishes almost as soon as
you hit return.  Put it back in, and you can go for coffee before it
finishes ...
If I do 'SELECT url_id FROM ndict WHERE word_id=1971739852', there
are 153 records returned ... is there some way, that I'm not thinking, of
re-writing the above so that it 'resolves' the equality before the LIKE in
order to reduce the number of tuples that it has to do the LIKE on?  Is
there some way of writing the above so that it doesn't take forever to
execute?
I'm running this on a Dual-PIII 450 Server, 512Meg of RAM, zero
swap space being used ... the database has its indices on one hard drive,
the tables themselves are on a second one ... its PgSQL 7.0.2 (Tom,
anything in v7.0.3 that might improve this?) and startup is as:

#!/bin/tcsh
setenv PORT 5432
setenv POSTMASTER /pgsql/bin/postmaster
unlimit
${POSTMASTER} -B 384 -N 192 \             -o "-F -S 32768" \             -i -p ${PORT} -D/pgsql/data >&
/pgsql/logs/postmaster.${PORT}.$$ &
So its not like I'm not throwing alot of resources at this ...
Is there anything that we can do to improve this?  I was trying to
think of some way to use a subselect to narrow the search results, or
something ...
Oh, the above explains down to:

NOTICE:  QUERY PLAN:

Nested Loop  (cost=0.00..1195.14 rows=1 width=10) ->  Index Scan using url_url on url  (cost=0.00..2.73 rows=1 width=4)
-> Index Scan using n_word on ndict  (cost=0.00..1187.99 rows=353 width=6)
 

EXPLAIN
ndict: 663018 tuples  url:  29276 tuples

Marc G. Fournier                   ICQ#7615664               IRC Nick: Scrappy
Systems Administrator @ hub.org 
primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org 



pgsql-hackers by date:

Previous
From: Tatsuo Ishii
Date:
Subject: Re: Unicode conversion (Re: [COMMITTERS] pgsql (configure.in))
Next
From: Tom Lane
Date:
Subject: Re: How to get around LIKE inefficiencies?