Re: plan using BTree VS GIN - Mailing list pgsql-general
From | Nicolas Seinlet |
---|---|
Subject | Re: plan using BTree VS GIN |
Date | |
Msg-id | 3Q63H826f-HLI1cWWe5giRlNyIE0OkUVVa4yKYJPVYnByDgyLO_K09Y1mHqOlT2_-_AiZ4u_Uu2JqXodrUF4d-HvLXLqJEsbyBBty4oWZBc=@seinlet.com Whole thread Raw |
In response to | Re: plan using BTree VS GIN (Laurenz Albe <laurenz.albe@cybertec.at>) |
List | pgsql-general |
On Friday, June 23rd, 2023 at 2:52 PM, Laurenz Albe <laurenz.albe@cybertec.at> wrote: > > > On Fri, 2023-06-23 at 12:08 +0000, Nicolas Seinlet wrote: > > > we faced an issue with a select query on a relatively large table on our database. > > The query involves one single table. The table has more than 10 million records. > > It's mainly composed of varchar fields, have a primary key (id) of type serial, > > and when records of this table are shown to users, they are sorted users 2 fields, > > display_name (varchar) and id (the primary key). Because this table is heavily used > > in various contexts in our application, we have multiple indexes on it. Among other > > index, we have gin index on some fields of the table. > > > > The btree index res_partner_displayname_id_idx have been added lately and perfectly > > match a criteria (where active) and sorting (display_name, id) we have in quite all > > our queries on this table. > > > > The query that cause the issue is this one: > > SELECT "res_partner"."id" > > FROM "res_partner" > > WHERE (("res_partner"."active" = true) AND > > ( > > ( > > ( > > ((unaccent("res_partner"."display_name"::text) ilike unaccent('%nse%')) > > OR (unaccent("res_partner"."email"::text) ilike unaccent('%nse%'))) > > OR (unaccent("res_partner"."ref"::text) ilike unaccent('%nse)%'))) > > OR (unaccent("res_partner"."vat"::text) ilike unaccent('%nse%'))) > > OR (unaccent("res_partner"."company_registry"::text) ilike unaccent('%nse)%')))) > > > > AND ((("res_partner"."type" != 'private') OR "res_partner"."type" IS NULL) OR "res_partner"."type" IS NULL ) > > > > ORDER BY "res_partner"."display_name" ,"res_partner"."id" > > LIMIT 100 > > > > We have the common criteria (active=true), the common sorting, a limit, and a search > > on various fields. The fields on which we're searching with criteria like '% whatever%' are gin indexed. > > > > Here is the query plan: > > Limit (cost=0.56..10703.36 rows=100 width=25) (actual time=56383.794..86509.036 rows=1 loops=1) > > Output: id, display_name > > Buffers: shared hit=4322296 read=1608998 dirtied=1 written=1247 > > -> Index Scan using res_partner_displayname_id_idx on public.res_partner (cost=0.56..1200212.37 rows=11214 width=25)(actual time=56383.793..86509.022 rows=1 loops=1) > > Output: id, display_name > > Filter: ((((res_partner.type)::text <> 'private'::text) OR (res_partner.type IS NULL) OR (res_partner.type IS NULL))AND ((unaccent((res_partner.display_name)::text) ~~* '%nse%'::text) OR > > (unaccent((res_partner.email)::text) ~~ > > * '%nse%'::text) OR (unaccent((res_partner.ref)::text) ~~* '%nse%'::text) OR (unaccent((res_partner.vat)::text) ~~* '%nse%'::text)OR (unaccent((res_partner.company_registry)::text) ~~* > > '%nse%'::text))) > > Rows Removed by Filter: 6226870 > > Buffers: shared hit=4322296 read=1608998 dirtied=1 written=1247 > > Planning Time: 0.891 ms > > Execution Time: 86509.070 ms > > (10 rows) > > > > It's not using our gin index at all, but the btree one. > > > The problem is that PostgreSQL estimates that the index scan will return 11214 > rows, when it is actually one. This makes the plan to scan the table using > an index that matches the ORDER BY clause appealing: we might find 100 rows > quickly and avoid a sort. > > You can try to improve the estimates with more detailed statistics, > but if that doesn't do the job, you can modify the ORDER BY clause so > that it cannot use the bad index: > > ORDER BY res_partner.display_name ,res_partner.id + 0 > > Yours, > Laurenz Albe Hello, First of all, thanks, this solves the issue for the given query. Some more questions then, > we might find 100 rows quickly The cost estimate for 11214 rows is 1200212.37 If I look at the other plan, none of the estimated cost reach such levels (~2k for indexes + 1k for the BitmapOr, 3k forBitmap Heap Scan, and finally 1k for sort and limit), roughly 7k And that's part of what I didn't understand. How is the first cost estimated? If we divide by 110 the cost to go from 11krecords to 100, it's still ~10k, more than the other plan. Thanks again, Nicolas.
Attachment
pgsql-general by date: