On Thu, Feb 21, 2013 at 6:06 AM, Ivan Voras <ivoras@freebsd.org> wrote:
> On 21/02/2013 12:52, Ivan Voras wrote:
>
>> I'd like to use pg_trgm for matching substrings case-insensitively, but
>> it doesn't seem to use the index:
>
>
> As a sort-of followup, the '%' operator kind of works but takes
> incredibly long time, and the selectivity estimates are completely wrong:
>
> nn=> vacuum analyze documents;
> VACUUM
> nn=> explain select id,title from documents where raw_data % 'zagreb';
> QUERY PLAN
>
> -----------------------------------------------------------------------------------------
> Bitmap Heap Scan on documents (cost=128.42..330.87 rows=54 width=108)
> Recheck Cond: ((raw_data)::text % 'zagreb'::text)
> -> Bitmap Index Scan on documents_raw_data_trgm (cost=0.00..128.40
> rows=54 width=0)
> Index Cond: ((raw_data)::text % 'zagreb'::text)
> (4 rows)
>
> nn=> explain analyze select id,title from documents where raw_data %
> 'zagreb';
> QUERY
> PLAN
>
-----------------------------------------------------------------------------------------------------------------------------------------
> Bitmap Heap Scan on documents (cost=128.42..330.87 rows=54 width=108)
> (actual time=98750.283..98750.283 rows=0 loops=1)
> Recheck Cond: ((raw_data)::text % 'zagreb'::text)
> -> Bitmap Index Scan on documents_raw_data_trgm (cost=0.00..128.40
> rows=54 width=0) (actual time=26.748..26.748 rows=51874 loops=1)
> Index Cond: ((raw_data)::text % 'zagreb'::text)
> Total runtime: 98750.623 ms
> (5 rows)
>
>
> There is no IO load during this query.
pg_trgm is not really designed for indexing large documents, but for
fuzzy simple string (company name, address, etc) matching. probably
better off with full text search.
merlin