>How can we boost performance of queries containing pattern matching
>characters? In my case, we're using a percent sign (%) that matches any
string of zero or more characters.
>
> QUERY: DELETE FROM MYTABLE WHERE EMAIL ILIKE '%domain.com%'
>
> EMAIL column is VARCHAR(256).
>
> As it is clear from the above query, email is matched "partially and
case-insensitively", which my application requirement demands.
>
> In case, if it were a full match, I could easily define a functional
> INDEX on EMAIL column (lower(EMAIL)) and I could rewrite my DELETE where
criteria like lower(EMAIL) = 'someemail@domain.com'.
>
> MYTABLE currently contains 2 million records and grows consistently.
I had almost the same problem.
To resolve it, I created my own text search parser (myftscfg) which divides
text in column into three letters parts, for example:
someemail@domain.com is divided to som, ome,mee,eem,ema,mai,ail,il@,
l@d,@do,dom,oma,mai,ain,in.,n.c,.co,com
There should be also index on email column:
CREATE INDEX "email _fts" on mytable using gin
(to_tsvector('myftscfg'::regconfig, email))
Every query like email ilike '%domain.com%' should be rewrited to:
WHERE
to_tsvector('myftscfg',email) @@ to_tsquery('dom') AND
to_tsvector('myftscfg',email) @@ to_tsquery('oma') AND
to_tsvector('myftscfg',email) @@ to_tsquery('mai') AND
to_tsvector('myftscfg',email) @@ to_tsquery('ain') AND
to_tsvector('myftscfg',email) @@ to_tsquery('in.') AND
to_tsvector('myftscfg',email) @@ to_tsquery('n.c') AND
to_tsvector('myftscfg',email) @@ to_tsquery('.co') AND
to_tsvector('myftscfg',email) @@ to_tsquery('com') AND email ILIKE
'%domain.com%';
Index is reducing number of records and clause email ILIKE '%domain.com%' is
selecting only valid records.
I didn't found better solution.
-------------------------------------------
Artur Zajac