Home > mailing lists

Re: Top -N Query performance issue and high CPU usage - Mailing list pgsql-general

From	Ron Johnson
Subject	Re: Top -N Query performance issue and high CPU usage
Date	February 3 18:59:51
Msg-id	CANzqJaAE6bevjj43W-94Bx1xGB9OPbUtOyXV2kaGpju1wt3k+A@mail.gmail.com Whole thread
In response to	Re: Top -N Query performance issue and high CPU usage (yudhi s <learnerdatabase99@gmail.com>)
Responses	Re: Top -N Query performance issue and high CPU usage
List	pgsql-general

Tree view

On Tue, Feb 3, 2026 at 4:26 AM yudhi s <learnerdatabase99@gmail.com> wrote:

On Tue, Feb 3, 2026 at 4:50 AM Ron Johnson <ronljohnsonjr@gmail.com> wrote:
On Mon, Feb 2, 2026 at 3:43 PM yudhi s <learnerdatabase99@gmail.com> wrote:

On Tue, Feb 3, 2026 at 1:01 AM Ron Johnson <ronljohnsonjr@gmail.com> wrote:
On Mon, Feb 2, 2026 at 1:39 PM yudhi s <learnerdatabase99@gmail.com> wrote:
On Mon, Feb 2, 2026 at 8:57 PM Ron Johnson <ronljohnsonjr@gmail.com> wrote:

My apologies if i misunderstand the plan, But If I see, it's spending ~140ms(140ms-6ms) i.e. almost all the time now, in performing the below nested loop join. So my question was , is there any possibility to reduce the resource consumption or response time further here? Hope my understanding is correct here.

-> Nested Loop (cost=266.53..1548099.38 rows=411215 width=20) (actual time=6.009..147.695 rows=1049 loops=1)
Join Filter: ((df.ent_id)::numeric = m.ent_id)
Rows Removed by Join Filter: 513436
Buffers: shared hit=1939

I don't see m.ent_id in the actual query. Did you only paste a portion of the query?

Also, casting in a JOIN typically brutalizes the ability to use an index.

Thank you.
Actually i tried executing the first two CTE where the query was spending most of the time and teh alias has changed.

We need to see everything, not just what you think is relevant.

Also here i have changed the real table names before putting it here, hope that is fine.
However , i verified the data type of the ent_id column in "ent" its "int8" and in table "txn_tbl" is "numeric 12", so do you mean to say this difference in the data type is causing this high response time during the nested loop join? My understanding was it will be internally castable without additional burden. Also, even i tried creating an index on the "(df.ent_id)::numeric" its still reulting into same plan and response time.

If you'd shown the "\d" table definitions like Adrian asked two days ago, we'd know what indexes are on each table, and not have to beg you to dispense dribs and drabs of information.

I am unable to run "\d" from the dbeaver sql worksheet. However, I have fetched the DDL for the three tables and their selected columns, used in the smaller version of the query and its plan , which I recently updated.

https://gist.github.com/databasetech0073/e4290b085f8f974e315fb41bdc47a1f3

https://gist.github.com/databasetech0073/344df46c328e02b98961fab0cd221492

Lines 30-32 are where most of the time and effort are taken.

I can't be certain, but changing APP_schema.ent.ent_id from NUMERIC to int8 (with a CHECK constraint to, well, constrain it to 12 digits, if really necessary) is something I'd test.

--

Thank you so much.

After making the data types equal on both tables for the column ent_id the plan now looks as below. The costing function sinow removed. So it must be helping reduce CPU cycle consumption to some extent, But, I still see ~100ms is spent in this step. Is there anything we can do to further drop the response time here? Or it's the best time we can get here.

-> Nested Loop (cost=262.77..1342550.91 rows=579149 width=20) (actual time=6.406..107.946 rows=1049 loops=1)
Join Filter: (df.ent_id = m.ent_id)
Rows Removed by Join Filter: 514648
Buffers: shared hit=1972

Hmm. What does pg_stat_user_tables say about when you last analyzed and vacuumed APP_schema.txn_tbl and APP_schema.ent?

Beyond "aggressively keep those two tables analyzed, via reducing autovacuum_analyze_scale_factor to something like 0.05, and adding 'vacuumdb -d mumble -j2 --analyze-only -t APP_schema.txn_tbl -t APP_schema.ent' to crontab", I'm out of ideas. An 85% speed improvement is nothing to sneeze at, though.

Also I do see in some other steps in the plan , the casting function is getting used. For example in the below filter. Here txn_tbl_type_nm is defined as Varchar(25) and still it's trying to cast it to Text. Can we do anything to avoid these force casts as these must consume the CPU cycles?

AND txn_tbl_dcsn.txn_tbl_txn_sts_tx NOT IN ('STATUS_A','STATUS_B')
WHERE txn_tbl.txn_tbl_type_nm IN ('TYPE1','TYPE2','TYPE3')

-> Index Scan Backward using txn_tbl_due_dt_idx on txn_tbl df (cost=0.43..115879.87 rows=1419195 width=20) (actual time=0.019..20.377 rows=43727 loops=1)
Filter: ((txn_tbl_type_nm)::text = ANY ('{TYPE1,TYPE2,TYPE3}'::text[]))
Rows Removed by Filter: 17
Buffers: shared hit=1839

There is no VARCHAR or CHAR; there is only TEXT. Thus, this is 100% expected and normal.

Death to <Redacted>, and butter sauce.

Don't boil me, I'm still alive.

<Redacted> lobster!

pgsql-general by date:

From: yudhi s
Date: 03 February, 12:26:14
Subject: Re: Top -N Query performance issue and high CPU usage

From: Adrian Klaver
Date: 03 February, 19:07:39
Subject: Re: Top -N Query performance issue and high CPU usage

Re: Top -N Query performance issue and high CPU usage - Mailing list pgsql-general

Previous

Next