Re: How to Optimize pg_trgm Performance - Mailing list pgsql-general

From Ivan E. Panchenko
Subject Re: How to Optimize pg_trgm Performance
Date
Msg-id bb31c511-5518-b405-f083-27e394358ad8@postgrespro.ru
Whole thread Raw
In response to How to Optimize pg_trgm Performance  ("Igal @ Lucee.org" <igal@lucee.org>)
List pgsql-general

Hi Igal, 29.01.2018 02:42, Igal @ Lucee.org пишет:

I want to use pg_trgm for auto-suggest functionality.  I created a Materialized View with the information that I need, with the relevant columns being (keywords text, rank int).  keywords is the column from which I build the tri-grams, and rank is some popularity factor so that popular results will show up higher than less popular results given the same tri-gram distance.

I want to return results in the order of [distance], [distance_word], [rank].  The input comes from the user and is not known in advance.  My query is as follows:

    SELECT title
        ,id
        ,(input <-> keywords) AS distance
        ,(input <<-> keywords) AS distance_word
        ,rank
    FROM  (VALUES (cast('red pill' AS text))) consts(input)
        ,mv_autosuggest
    ORDER BY 3, 4, 5
    LIMIT 20;

This gives me pretty good results, but it takes too long and is not likely to scale well.

I have created two indexes but neither seem to be used:

CREATE INDEX mv_autosuggest_keywords_tgrm_gist ON staging.mv_autosuggest USING gist (keywords gist_trgm_ops);

CREATE INDEX mv_autosuggest_keywords_tgrm_gin ON staging.mv_autosuggest USING gin (keywords gin_trgm_ops);

This is the result of explain analyze:

QUERY PLAN                                                                                                                                 |
-------------------------------------------------------------------------------------------------------------------------------------------|
Limit  (cost=356.41..356.46 rows=20 width=51) (actual time=163.132..163.135 rows=20 loops=1)                                               |
  ->  Sort  (cost=356.41..372.96 rows=6619 width=51) (actual time=163.130..163.131 rows=20 loops=1)                                        |
        Sort Key: (('red pill'::text <-> mv_autosuggest.keywords)), (('red pill'::text <<-> mv_autosuggest.keywords)), mv_autosuggest.rank |
        Sort Method: top-N heapsort  Memory: 28kB                                                                                          |
        ->  Seq Scan on mv_autosuggest  (cost=0.00..180.29 rows=6619 width=51) (actual time=0.263..161.289 rows=6619 loops=1)              |
Planning time: 0.139 ms                                                                                                                    |
Execution time: 163.174 ms                                                                                                                 |

How can I improve the performance here?

This can be improved if you use sort only by distances
(try ORDER BY 3,4
or ORDER BY 3
or ORDER BY 4

Than you should get  plan  like

Index Scan using mv_autosuggest_keywords_tgrm_gist on mv_autosuggest 
         Order By: ((keywords <-> 'red pill'::text) AND (keywords <->> 'red pill'::text))

Which means that KNN  index search is enabled : http://www.sai.msu.su/~megera/postgres/talks/pgcon-2010-1.pdf

If you want to sort it also by rank, you can make a two-level construction like:

SELECT * FROM (
   SELECT .... ORDER BY 3,4 LIMIT ....  /* make some empirical redundant limit here */
) foo ORDER BY 5 LIMIT ....


Regards,
Ivan Panchenko




Thank you,

Igal Sapir
Lucee Core Developer
Lucee.org


pgsql-general by date:

Previous
From: "David G. Johnston"
Date:
Subject: Re: Working with JSONB data having node lists
Next
From: Peter Eisentraut
Date:
Subject: Re: pg10 logical replication set schema