plan using BTree VS GIN - Mailing list pgsql-general

From Nicolas Seinlet
Subject plan using BTree VS GIN
Date
Msg-id qfCKqMBYu-8Hl-6wrncTpAHmoK_mkhvIz9fpDbyDR6iH1FpZViaR6yqE30B_gcPAdkQzrSWeifsppv5o5lG-Pic1sS-oneYGxt7z44fwVaQ=@seinlet.com
Whole thread Raw
Responses Re: plan using BTree VS GIN
List pgsql-general
Hello, we faced an issue with a select query on a relatively large table on our database. The query involves one single table. The table has more than 10 million records. It's mainly composed of varchar fields, have a primary key (id) of type serial, and when records of this table are shown to users, they are sorted users 2 fields, display_name (varchar) and id (the primary key). Because this table is heavily used in various contexts in our application, we have multiple indexes on it. Among other index, we have gin index on some fields of the table. Among other things, we're using unaccent. We are aware the unaccent function is mutable, but we have an immutable version of unaccent. the table is similar to (I can give you all the fields of the table if needed):                                                     Table "public.res_partner"               Column               |            Type             | Collation | Nullable |                 Default                 -----------------------------------+-----------------------------+-----------+----------+-----------------------------------------  id                                | integer                     |           | not null | nextval('res_partner_id_seq'::regclass)  active                            | boolean                     |           |          |  name                              | character varying           |           |          |  display_name                      | character varying           |           |          |  ref                               | character varying           |           |          |  email                             | character varying           |           |          |  vat                               | character varying           |           |          |  type                              | character varying           |           |          |  company_registry                  | character varying           |           |          | Gin Index: "res_partner_unaccent_tgm_ref" gin (unaccent(ref::text) gin_trgm_ops) WHERE ref IS NOT NULL "res_partner_unaccent_tgm_vat" gin (unaccent(vat::text) gin_trgm_ops) WHERE vat IS NOT NULL "res_partner_unaccent_tgm_idx_gin2" gin (unaccent(name::text) gin_trgm_ops, unaccent(display_name::text) gin_trgm_ops, unaccent(ref::text) gin_trgm_ops, unaccent(email::text) gin_trgm_ops, unaccent(vat::text) gin_trgm_ops) "res_partner_name_tgm_idx_gin" gin (name gin_trgm_ops, display_name gin_trgm_ops, ref gin_trgm_ops, email gin_trgm_ops, vat gin_trgm_ops) "res_partner_unaccent_tgm_display_namee" gin (unaccent(display_name::text) gin_trgm_ops) "res_partner_unaccent_tgm_email" gin (unaccent(email::text) gin_trgm_ops) WHERE email IS NOT NULL "res_partner_comp_reg_idx3" gin (unaccent(company_registry::text) gin_trgm_ops) WHERE company_registry IS NOT NULL BTree index: "res_partner_displayname_id_idx" btree (display_name, id) WHERE active "res_partner_comp_reg_idx2" btree (unaccent(company_registry::text)) WHERE company_registry IS NOT NULL The btree index res_partner_displayname_id_idx have been added lately and perfectly match a criteria (where active) and sorting (display_name, id) we have in quite all our queries on this table. The query that cause the issue is this one: SELECT "res_partner"."id"   FROM "res_partner"  WHERE (("res_partner"."active" = true) AND          (          (            (              ((unaccent("res_partner"."display_name"::text) ilike unaccent('%nse%'))            OR (unaccent("res_partner"."email"::text) ilike unaccent('%nse%')))         OR (unaccent("res_partner"."ref"::text) ilike unaccent('%nse)%')))      OR (unaccent("res_partner"."vat"::text) ilike unaccent('%nse%')))    OR (unaccent("res_partner"."company_registry"::text) ilike unaccent('%nse)%'))))  AND ((("res_partner"."type" != 'private') OR "res_partner"."type" IS NULL) OR "res_partner"."type" IS NULL ) ORDER BY "res_partner"."display_name" ,"res_partner"."id"      LIMIT 100 We have the common criteria  (active=true), the common sorting, a limit, and a search on various fields. The fields on which we're searching with criteria like '% whatever%' are gin indexed.  Here is the query plan:  Limit  (cost=0.56..10703.36 rows=100 width=25) (actual time=56383.794..86509.036 rows=1 loops=1)    Output: id, display_name    Buffers: shared hit=4322296 read=1608998 dirtied=1 written=1247    ->  Index Scan using res_partner_displayname_id_idx on public.res_partner  (cost=0.56..1200212.37 rows=11214 width=25) (actual time=56383.793..86509.022 rows=1 loops=1)          Output: id, display_name          Filter: ((((res_partner.type)::text <> 'private'::text) OR (res_partner.type IS NULL) OR (res_partner.type IS NULL)) AND ((unaccent((res_partner.display_name)::text) ~~* '%nse%'::text) OR (unaccent((res_partner.email)::text) ~~ * '%nse%'::text) OR (unaccent((res_partner.ref)::text) ~~* '%nse%'::text) OR (unaccent((res_partner.vat)::text) ~~* '%nse%'::text) OR (unaccent((res_partner.company_registry)::text) ~~* '%nse%'::text)))          Rows Removed by Filter: 6226870          Buffers: shared hit=4322296 read=1608998 dirtied=1 written=1247  Planning Time: 0.891 ms  Execution Time: 86509.070 ms (10 rows) It's not using our gin index at all, but the btree one. If we change some things, like:    remove criteria on company_registry field or limit=10k or longer string than 3 characters, like '%full name of partner%' or sometimes, but not every time and for a limited period of time, analyze the table or set enable_indexscan=FALSE the plan switch to:  Limit  (cost=3061.03..3061.28 rows=100 width=25) (actual time=496.092..496.095 rows=1 loops=1)    Output: id, display_name    Buffers: shared hit=4527 read=2791    ->  Sort  (cost=3061.03..3065.70 rows=1868 width=25) (actual time=496.091..496.094 rows=1 loops=1)          Output: id, display_name          Sort Key: res_partner.display_name, res_partner.id          Sort Method: quicksort  Memory: 25kB          Buffers: shared hit=4527 read=2791          ->  Bitmap Heap Scan on public.res_partner  (cost=958.14..2989.64 rows=1868 width=25) (actual time=496.050..496.053 rows=1 loops=1)                Output: id, display_name                Recheck Cond: ((unaccent((res_partner.display_name)::text) ~~* '%nse%'::text) OR (unaccent((res_partner.email)::text) ~~* '%nse%'::text) OR (unaccent((res_partner.ref)::text) ~~* '%nse%'::text) OR (unaccent((res_partner.vat)::text) ~~* '%nse%'::text) OR (unaccent((res_partner.company_registry)::text) ~~* '%nse%'::text))                Filter: (res_partner.active AND (((res_partner.type)::text <> 'private'::text) OR (res_partner.type IS NULL) OR (res_partner.type IS NULL)))                Heap Blocks: exact=1                Buffers: shared hit=4521 read=2791                ->  BitmapOr  (cost=958.14..958.14 rows=1960 width=0) (actual time=496.020..496.022 rows=0 loops=1)                      Buffers: shared hit=4520 read=2791                      ->  Bitmap Index Scan on res_partner_unaccent_tgm_idx_gin2  (cost=0.00..230.83 rows=645 width=0) (actual time=228.725..228.725 rows=1 loops=1)                            Index Cond: (unaccent((res_partner.display_name)::text) ~~* '%nse%'::text)                            Buffers: shared hit=2169 read=1374                      ->  Bitmap Index Scan on res_partner_unaccent_tgm_idx_gin2  (cost=0.00..230.81 rows=642 width=0) (actual time=256.083..256.083 rows=0 loops=1)                            Index Cond: (unaccent((res_partner.email)::text) ~~* '%nse%'::text)                            Buffers: shared hit=1906 read=1348                      ->  Bitmap Index Scan on res_partner_unaccent_tgm_idx_gin2  (cost=0.00..226.00 rows=1 width=0) (actual time=2.693..2.693 rows=0 loops=1)                            Index Cond: (unaccent((res_partner.ref)::text) ~~* '%nse%'::text)                            Buffers: shared hit=178 read=7                      ->  Bitmap Index Scan on res_partner_unaccent_tgm_idx_gin2  (cost=0.00..226.14 rows=19 width=0) (actual time=8.414..8.415 rows=0 loops=1)                            Index Cond: (unaccent((res_partner.vat)::text) ~~* '%nse%'::text)                            Buffers: shared hit=225 read=62                      ->  Bitmap Index Scan on res_partner_comp_reg_idx3  (cost=0.00..42.01 rows=654 width=0) (actual time=0.099..0.099 rows=0 loops=1)                            Index Cond: (unaccent((res_partner.company_registry)::text) ~~* '%nse%'::text)                            Buffers: shared hit=42  Planning Time: 20.921 ms  Execution Time: 496.296 ms (33 rows) How can we have PostgreSQL choose the second query plan? If we look at the expected VS actual rows, statistics look quite accurate. Thanks for reading, Nicolas.
Attachment

pgsql-general by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: move databases from a MySQL server to Postgresql.
Next
From: Laurenz Albe
Date:
Subject: Re: plan using BTree VS GIN