Re: Relative performance of prefix and suffix string matching - Mailing list pgsql-general

From Stéphane A. Schildknecht
Subject Re: Relative performance of prefix and suffix string matching
Date
Msg-id 4E7C6543.5040501@postgresql.fr
Whole thread Raw
In response to Re: Relative performance of prefix and suffix string matching  (Alban Hertroys <haramrae@gmail.com>)
List pgsql-general
Le 23/09/2011 12:30, Alban Hertroys a écrit :
> On 23 September 2011 11:47, Andrew Rose <andrew.rose@metaswitch.com
> <mailto:andrew.rose@metaswitch.com>> wrote:
>
>     Basic Question: In text fields, is prefix matching significantly faster
>     than suffix matching?
>
>
> It does depend on what type of index you use. BTrees split off text strings,
> from left to right, halving the number of records you need to scan at every
> branch. For a suffix match, that's exactly the wrong way around.
> Hash indexes probably don't fare any better.
> I don't know enough about GIST or GIN indexes to comment on their suitability
> for suffix matches, but presumably they're better at those.
>
> I recall doing suffix matches used to be a problem in at least earlier versions
> of Postgres, but it's quite possible that the query planner is smart enough to
> do the reverse match by itself nowadays (I doubt it, seeing that it would also
> need to reverse the way the index is organised).
>
>
>     2. Alternatively, I could store column 'rev_str' as a reversed version of
>     column 'str' and have the client produce a reversed version of x on each
>     query (call it r).  Then the client would issue...
>
>     SELECT * FROM tbl WHERE str LIKE 'x%' OR rev_str LIKE 'r%'
>
>     ...which would use prefix matches only instead of requiring suffix matches.
>      Since I've seen this form used by others, I was wondering if it's
>     necessary - i.e. if databases really do perform prefix matching faster?
>
>     3. Is there a solution I'm unaware of with even better performance?
>
>
> You can create a functional index on the reverse of the string, that way
> omitting the need for an extra column (that needs updating as well).
>
> CREATE INDEX tbl_str_rev ON tbl (reverse(str));
> SELECT * FROM tbl WHERE str LIKE 'x%' OR reverse(str) LIKE 'x%';
>
> See: http://www.postgresql.org/docs/9.0/static/indexes-expressional.html

You can use the pg_trgm extension which lets you create gin or gist indexes on
your text field.

There is also wildspeed (see http://www.sai.msu.su/~megera/wiki/wildspeed).

Didn't try the latter solution, but the first one gives really great result for
searching partial strings.

a propos, there's one thing I'd like to know, is how to set the similarity
limit within pg_trgm on a server side (I'd like to have it settled to 0.2 for
every new session, for instance).

Regards,
--
Stéphane Schildknecht
http://www.loxodata.com
Contact régional PostgreSQL


pgsql-general by date:

Previous
From: "Albe Laurenz"
Date:
Subject: Re: Speed of lo_unlink vs. DELETE on BYTEA
Next
From: Thomas Kellerer
Date:
Subject: Re: Problem with pg_upgrade 9.0 -> 9.1