Re: Query takes around 15 to 20 min over 20Lakh rows - Mailing list pgsql-general

From Michael Lewis
Subject Re: Query takes around 15 to 20 min over 20Lakh rows
Date
Msg-id CAHOFxGpLa+BnSrfCb1xA9xKubu213W-r2bBBfszKAAH8kgZHzA@mail.gmail.com
Whole thread Raw
In response to Re: Query takes around 15 to 20 min over 20Lakh rows  (Shubham Mittal <mittalshubham30@gmail.com>)
Responses Re: Query takes around 15 to 20 min over 20Lakh rows  (Shubham Mittal <mittalshubham30@gmail.com>)
List pgsql-general
Have you ever used this site to visualize the explain plan and spot bad estimates and slow nodes? https://explain.depesz.com/s/WE1R

This stands out to me-
Subquery Scan on npiactionjoin (cost=10,165,289.40..10,167,192.01 rows=293 width=16) (actual time=118,413.432..118,806.684 rows=446,782 loops=1)
Filter: (npiactionjoin.rn = 1)

It seems that estimate is pretty far off and this node and the final node above this are the biggest slowdowns. If you filtered down to the record you want from task_history BEFORE the join, then maybe you would have quicker results. I might try a materialized CTE or even an analyzed temp table if that option is available to you, so the planner makes informed decisions.

By the way, the order by on that row_number seems like you are getting the OLDEST activity related to the task which could maybe be cached rather than re-calculated daily as this query runs.


Michael Lewis  |  Database Engineer
Entrata

pgsql-general by date:

Previous
From: Raul Kaubi
Date:
Subject: Re: How to log bind values for statements that produce errors
Next
From: Ninad Shah
Date:
Subject: Re: prevent WAL replication to fill filesystem