Re: Trigram (pg_trgm) GIN index not used - Mailing list pgsql-general

From Merlin Moncure
Subject Re: Trigram (pg_trgm) GIN index not used
Date
Msg-id CAHyXU0xZWHhHQjPjXKKsbS2PSX+O=+Wa-p-pw49varaOoWCo_A@mail.gmail.com
Whole thread Raw
In response to Re: Trigram (pg_trgm) GIN index not used  (Ivan Voras <ivoras@freebsd.org>)
List pgsql-general
On Thu, Feb 21, 2013 at 6:06 AM, Ivan Voras <ivoras@freebsd.org> wrote:
> On 21/02/2013 12:52, Ivan Voras wrote:
>
>> I'd like to use pg_trgm for matching substrings case-insensitively, but
>> it doesn't seem to use the index:
>
>
> As a sort-of followup, the '%' operator kind of works but takes
> incredibly long time, and the selectivity estimates are completely wrong:
>
> nn=> vacuum analyze documents;
> VACUUM
> nn=> explain select id,title from documents where raw_data % 'zagreb';
>                                        QUERY PLAN
>
> -----------------------------------------------------------------------------------------
>  Bitmap Heap Scan on documents  (cost=128.42..330.87 rows=54 width=108)
>    Recheck Cond: ((raw_data)::text % 'zagreb'::text)
>    ->  Bitmap Index Scan on documents_raw_data_trgm  (cost=0.00..128.40
> rows=54 width=0)
>          Index Cond: ((raw_data)::text % 'zagreb'::text)
> (4 rows)
>
> nn=> explain analyze select id,title from documents where raw_data %
> 'zagreb';
>                                                                QUERY
> PLAN
>
-----------------------------------------------------------------------------------------------------------------------------------------
>  Bitmap Heap Scan on documents  (cost=128.42..330.87 rows=54 width=108)
> (actual time=98750.283..98750.283 rows=0 loops=1)
>    Recheck Cond: ((raw_data)::text % 'zagreb'::text)
>    ->  Bitmap Index Scan on documents_raw_data_trgm  (cost=0.00..128.40
> rows=54 width=0) (actual time=26.748..26.748 rows=51874 loops=1)
>          Index Cond: ((raw_data)::text % 'zagreb'::text)
>  Total runtime: 98750.623 ms
> (5 rows)
>
>
> There is no IO load during this query.

pg_trgm is not really designed for indexing large documents, but for
fuzzy simple string (company name, address, etc) matching.   probably
better off with full text search.

merlin

pgsql-general by date:

Previous
From: Ioana Danes
Date:
Subject: Ioana Danes
Next
From: Merlin Moncure
Date:
Subject: Re: subselects vs WITH in views