[GENERAL] tgrm index for word_similarity - Mailing list pgsql-general

From Igal @ Lucee.org
Subject [GENERAL] tgrm index for word_similarity
Date
Msg-id dc21b306-70fa-7473-b375-cc20b77a2cb6@lucee.org
Whole thread Raw
Responses Re: [GENERAL] tgrm index for word_similarity  ("Igal @ Lucee.org" <igal@lucee.org>)
Re: [GENERAL] tgrm index for word_similarity  (Arthur Zakirov <a.zakirov@postgrespro.ru>)
List pgsql-general

Hello,

I want to use Postgres for a fuzzy auto-suggest search field.  As the user will be typing their search phrase, Postgres will show a list of items that fuzzy-matches what they typed so far, ordered by popularity (ntile(20)) and distance, i.e. 1 - word_similarity().

I created a Materialized View with two columns: name text, popularity int.

My query at the moment is:

    SELECT name, popularity
    FROM   temp.items3_v
          ,(values ('some phrase'::text)) consts(input)
    WHERE  true
        and word_similarity(input, name) > 0.01  -- be lenient as some names are 75 characters long and we want to match even on a few characters of input
    ORDER BY 2, input <<-> name

Which seems to yield pretty good results, but takes over 40+ ms on a table that's not that large.

So I tried to add a GIN trgm index on `name`:

    CREATE INDEX temp_items3_v_tgrm_item_name ON temp.items3_v USING GIN(name gin_trgm_ops);

But it is not used:

QUERY PLAN                                                                                                       |
-----------------------------------------------------------------------------------------------------------------|
Sort  (cost=264.42..269.91 rows=2198 width=43) (actual time=41.060..41.117 rows=1044 loops=1)                    |
  Sort Key: items3_v.popularity, (('kandels'::text <<-> items3_v.name))                                          |
  Sort Method: quicksort  Memory: 149kB                                                                          |
  ->  Seq Scan on items3_v  (cost=0.00..142.41 rows=2198 width=43) (actual time=0.217..40.471 rows=1044 loops=1) |
        Filter: (word_similarity('kandels'::text, name) > '0.01'::double precision)                              |
        Rows Removed by Filter: 5550                                                                             |
Planning time: 0.149 ms                                                                                          |
Execution time: 41.308 ms                                                                                        |

What index would be good for that kind of query?

Thanks,

Igal Sapir
Lucee Core Developer
Lucee.org

pgsql-general by date:

Previous
From: John R Pierce
Date:
Subject: Re: [GENERAL] Alternative to pgAdmin Postgres Manager that supportpgagent (jobs)
Next
From: rakeshkumar464
Date:
Subject: Re: [GENERAL] Is it OK to create a directory in PGDATA dir