Re: Question on execution plan and suitable index - Mailing list pgsql-general

From Laurenz Albe
Subject Re: Question on execution plan and suitable index
Date
Msg-id 2fa62200cf92dc03dff20f8a42e45bb30fc40f35.camel@cybertec.at
Whole thread Raw
In response to Re: Question on execution plan and suitable index  (yudhi s <learnerdatabase99@gmail.com>)
Responses Re: Question on execution plan and suitable index
List pgsql-general
On Mon, 2026-02-16 at 14:43 +0530, yudhi s wrote:
> On Mon, Feb 16, 2026 at 2:29 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
> > On Mon, 2026-02-16 at 00:34 +0530, yudhi s wrote:
> > > It's postgres version 17. We are having a critical UI query which runs for ~7 seconds+. The requirement is to
bringdown the response time within ~1 sec. Now in this plan , If i read this correctly, the below section is consuming
asignificant amount of resources and should be addressed. i.e. "Full scan of table "orders" and Nested loop with
event_audit_logtable". 
> > >
> > > Below is the query and its complete plan:- 
> > > https://gist.github.com/databasetech0073/f564ac23ee35d1f0413980fe4d00efa9
> > >
> > > I am a bit new to the indexing strategy in postgres. My question is, what suitable index should we create to
caterthese above? 
> > >
> > > 1)For table event_audit_log:- Should we create composite Index on column
(request_id,created_at,event_comment_text)or should we create the covering index i.e. just on two column
(request_id,created_at)with "include" clause for "event_comment_text". How and when the covering index indexes should
beused here in postgres. Want to understand from experts?  
> > > 2)Similarly for table orders:- Should we create a covering index on column (entity_id,due_date,order_type) with
includeclause (firm_dspt_case_id). Or just a composite index (entity_id,due_date,order_type). 
> > > 3)Whether the column used as range operator (here created_at or due_date) should be used as leading column in the
compositeindex or is it fine to keep it as non leading? 
> > >
> > > ->  Nested Loop  (cost=50.06..2791551.71 rows=3148 width=19) (actual time=280.735..7065.313 rows=57943 loops=3)
> > >   Buffers: shared hit=10014901
> > >   ->  Hash Join  (cost=49.49..1033247.35 rows=36729 width=8) (actual time=196.407..3805.755 rows=278131 loops=3)
> > >  Hash Cond: ((ord.entity_id)::numeric = e.entity_id)
> > >  Buffers: shared hit=755352
> > >  ->  Parallel Seq Scan on orders ord  (cost=0.00..1022872.54 rows=3672860 width=16) (actual
time=139.883..3152.627rows=2944671 loops=3) 
> > >   Filter: ((due_date >= '2024-01-01'::date) AND (due_date <= '2024-04-01'::date) AND (order_type = ANY
('{TYPE_A,TYPE_B}'::text[])))
> > >   Rows Removed by Filter: 6572678
> > >   Buffers: shared hit=755208
> >
> > You are selecting a lot of rows, so the query will never be really cheap.
> > But I agree that an index scan should be a win.
> >
> > If the condition on "order_type" is always the same, a partial index is ideal:
> >
> >    CREATE INDEX ON orders (due_date) WHERE order_type IN ('TYPE_A', 'TYPE_B');
> >
> > Otherwise, I'd create two indexes: one on "order_type" and one on "due_date".
>
> Version is 17.7. Below is the table definitions as i pulled from Dbeaver tool:-
>
> https://gist.github.com/databasetech0073/f22d95de18dc3f1fa54af13e7fd2ce9e 
>
> The Order_type will be TYPE_A and TYPE_B in most of the cases. And below is the distribution.
> So , it looks like the index on this column will not help much. Correct me if I'm wrong.
>
> TYPE_A  25 Million
> TYPE_B  2 Million
> TYPE_C  700K
> TYPE_D  200K
> TYPE_E  6k

No, you are right about that.

> I am wondering why the already existing index on column "due_date" of table "order" is not
> getting used by the optimizer? Should we also add the column "entity_id" to the index too?

Seeing that your execution plan is incomplete, it is hard to say anything about that.
The scans of "entities" are missing, as is the UNION.

> And, Yes there are differences in data types of the "entity_id" for columns of table "order"
> and "entity". We need to fix that after analyzing the data.
>
> Also the highlighted Nested loop above shows ~10M shared hits (which will be ~70GB+ if we
> consider one hit as an 8K block). So does that mean , apart from the Full scan on the "order"
> table , the main resource consuming factor here is the scanning of "event_audit_log".

Correct.

Yours,
Laurenz Albe



pgsql-general by date:

Previous
From: yudhi s
Date:
Subject: Re: Question on execution plan and suitable index
Next
From: yudhi s
Date:
Subject: Re: Question on execution plan and suitable index