Re: plan using BTree VS GIN - Mailing list pgsql-general

From Nicolas Seinlet
Subject Re: plan using BTree VS GIN
Date
Msg-id 3Q63H826f-HLI1cWWe5giRlNyIE0OkUVVa4yKYJPVYnByDgyLO_K09Y1mHqOlT2_-_AiZ4u_Uu2JqXodrUF4d-HvLXLqJEsbyBBty4oWZBc=@seinlet.com
Whole thread Raw
In response to Re: plan using BTree VS GIN  (Laurenz Albe <laurenz.albe@cybertec.at>)
List pgsql-general
On Friday, June 23rd, 2023 at 2:52 PM, Laurenz Albe <laurenz.albe@cybertec.at> wrote:


>

>

> On Fri, 2023-06-23 at 12:08 +0000, Nicolas Seinlet wrote:
>

> > we faced an issue with a select query on a relatively large table on our database.
> > The query involves one single table. The table has more than 10 million records.
> > It's mainly composed of varchar fields, have a primary key (id) of type serial,
> > and when records of this table are shown to users, they are sorted users 2 fields,
> > display_name (varchar) and id (the primary key). Because this table is heavily used
> > in various contexts in our application, we have multiple indexes on it. Among other
> > index, we have gin index on some fields of the table.
> >

> > The btree index res_partner_displayname_id_idx have been added lately and perfectly
> > match a criteria (where active) and sorting (display_name, id) we have in quite all
> > our queries on this table.
> >

> > The query that cause the issue is this one:
> > SELECT "res_partner"."id"
> > FROM "res_partner"
> > WHERE (("res_partner"."active" = true) AND
> > (
> > (
> > (
> > ((unaccent("res_partner"."display_name"::text) ilike unaccent('%nse%'))
> > OR (unaccent("res_partner"."email"::text) ilike unaccent('%nse%')))
> > OR (unaccent("res_partner"."ref"::text) ilike unaccent('%nse)%')))
> > OR (unaccent("res_partner"."vat"::text) ilike unaccent('%nse%')))
> > OR (unaccent("res_partner"."company_registry"::text) ilike unaccent('%nse)%'))))
> >

> > AND ((("res_partner"."type" != 'private') OR "res_partner"."type" IS NULL) OR "res_partner"."type" IS NULL )
> >

> > ORDER BY "res_partner"."display_name" ,"res_partner"."id"
> > LIMIT 100
> >

> > We have the common criteria (active=true), the common sorting, a limit, and a search
> > on various fields. The fields on which we're searching with criteria like '% whatever%' are gin indexed.
> >

> > Here is the query plan:
> > Limit (cost=0.56..10703.36 rows=100 width=25) (actual time=56383.794..86509.036 rows=1 loops=1)
> > Output: id, display_name
> > Buffers: shared hit=4322296 read=1608998 dirtied=1 written=1247
> > -> Index Scan using res_partner_displayname_id_idx on public.res_partner (cost=0.56..1200212.37 rows=11214
width=25)(actual time=56383.793..86509.022 rows=1 loops=1) 
> > Output: id, display_name
> > Filter: ((((res_partner.type)::text <> 'private'::text) OR (res_partner.type IS NULL) OR (res_partner.type IS
NULL))AND ((unaccent((res_partner.display_name)::text) ~~* '%nse%'::text) OR 
> > (unaccent((res_partner.email)::text) ~~
> > * '%nse%'::text) OR (unaccent((res_partner.ref)::text) ~~* '%nse%'::text) OR (unaccent((res_partner.vat)::text) ~~*
'%nse%'::text)OR (unaccent((res_partner.company_registry)::text) ~~* 
> > '%nse%'::text)))
> > Rows Removed by Filter: 6226870
> > Buffers: shared hit=4322296 read=1608998 dirtied=1 written=1247
> > Planning Time: 0.891 ms
> > Execution Time: 86509.070 ms
> > (10 rows)
> >

> > It's not using our gin index at all, but the btree one.
>

>

> The problem is that PostgreSQL estimates that the index scan will return 11214
> rows, when it is actually one. This makes the plan to scan the table using
> an index that matches the ORDER BY clause appealing: we might find 100 rows
> quickly and avoid a sort.
>

> You can try to improve the estimates with more detailed statistics,
> but if that doesn't do the job, you can modify the ORDER BY clause so
> that it cannot use the bad index:
>

> ORDER BY res_partner.display_name ,res_partner.id + 0
>

> Yours,
> Laurenz Albe

Hello,

First of all, thanks, this solves the issue for the given query.

Some more questions then,
> we might find 100 rows quickly
The cost estimate for 11214 rows is 1200212.37
If I look at the other plan, none of the estimated cost reach such levels (~2k for indexes + 1k for the BitmapOr, 3k
forBitmap Heap Scan, and finally 1k for sort and limit), roughly 7k 

And that's part of what I didn't understand. How is the first cost estimated? If we divide by 110 the cost to go from
11krecords to 100, it's still ~10k, more than the other plan. 

Thanks again,

Nicolas.


Attachment

pgsql-general by date:

Previous
From: Avin Kavish
Date:
Subject: Re: bug or lacking doc hint
Next
From: 陈锡汉
Date:
Subject: How to show current schema of running queries in postgresql 13