Home > mailing lists

Re: t1.col like '%t2.col%' - Mailing list pgsql-performance

From	Tom Lane
Subject	Re: t1.col like '%t2.col%'
Date	February 29, 2008 22:10:55
Msg-id	21195.1204337447@sss.pgh.pa.us Whole thread Raw
In response to	Re: t1.col like '%t2.col%' ("Dan Kaplan" <dkaplan@citizenhawk.com>)
List	pgsql-performance

Tree view

"Dan Kaplan" <dkaplan@citizenhawk.com> writes:
> I learned a little about pg_trgm here:
> http://www.sai.msu.su/~megera/postgres/gist/pg_trgm/README.pg_trgm

There's also real documentation in the 8.3 release:
http://www.postgresql.org/docs/8.3/static/pgtrgm.html
AFAIK pg_trgm hasn't changed much lately, so you should be able to
rely on that for recent earlier branches.

> But this seems like it's for finding similarities, not substrings.  How can
> I use it to speed up t1.col like '%t2.col%'?

The idea is to use it as a lossy index.  You make a trigram index on
t1.col and then do something like

    ... where t1.col % t2.col and t1.col like ('%'||t2.col||'%');

The index gets you the %-matches and then you filter for the exact
matches with LIKE.

The similarity threshold (set_limit()) has to be set low enough that you
don't lose any desired matches, but not so low that you get everything
in the table back.  Not sure how delicate that will be.  It might be
unworkable, but surely it's worth a try.

            regards, tom lane

pgsql-performance by date:

From: "Joshua D. Drake"
Date: 29 February 2008, 21:39:15
Subject: Re: t1.col like '%t2.col%'

From: Greg Smith
Date: 01 March 2008, 00:57:05
Subject: Re: 12 disks raid setup

Re: t1.col like '%t2.col%' - Mailing list pgsql-performance

Previous

Next