Le 23/09/2011 12:30, Alban Hertroys a écrit :
> On 23 September 2011 11:47, Andrew Rose <andrew.rose@metaswitch.com
> <mailto:andrew.rose@metaswitch.com>> wrote:
>
> Basic Question: In text fields, is prefix matching significantly faster
> than suffix matching?
>
>
> It does depend on what type of index you use. BTrees split off text strings,
> from left to right, halving the number of records you need to scan at every
> branch. For a suffix match, that's exactly the wrong way around.
> Hash indexes probably don't fare any better.
> I don't know enough about GIST or GIN indexes to comment on their suitability
> for suffix matches, but presumably they're better at those.
>
> I recall doing suffix matches used to be a problem in at least earlier versions
> of Postgres, but it's quite possible that the query planner is smart enough to
> do the reverse match by itself nowadays (I doubt it, seeing that it would also
> need to reverse the way the index is organised).
>
>
> 2. Alternatively, I could store column 'rev_str' as a reversed version of
> column 'str' and have the client produce a reversed version of x on each
> query (call it r). Then the client would issue...
>
> SELECT * FROM tbl WHERE str LIKE 'x%' OR rev_str LIKE 'r%'
>
> ...which would use prefix matches only instead of requiring suffix matches.
> Since I've seen this form used by others, I was wondering if it's
> necessary - i.e. if databases really do perform prefix matching faster?
>
> 3. Is there a solution I'm unaware of with even better performance?
>
>
> You can create a functional index on the reverse of the string, that way
> omitting the need for an extra column (that needs updating as well).
>
> CREATE INDEX tbl_str_rev ON tbl (reverse(str));
> SELECT * FROM tbl WHERE str LIKE 'x%' OR reverse(str) LIKE 'x%';
>
> See: http://www.postgresql.org/docs/9.0/static/indexes-expressional.html
You can use the pg_trgm extension which lets you create gin or gist indexes on
your text field.
There is also wildspeed (see http://www.sai.msu.su/~megera/wiki/wildspeed).
Didn't try the latter solution, but the first one gives really great result for
searching partial strings.
a propos, there's one thing I'd like to know, is how to set the similarity
limit within pg_trgm on a server side (I'd like to have it settled to 0.2 for
every new session, for instance).
Regards,
--
Stéphane Schildknecht
http://www.loxodata.com
Contact régional PostgreSQL