Re: Query takes around 15 to 20 min over 20Lakh rows - Mailing list pgsql-general

From Shubham Mittal
Subject Re: Query takes around 15 to 20 min over 20Lakh rows
Date
Msg-id CA+ERcR_krjkGUHU0WRNZm06N28kZBuauo2B-ymz6A=tHAUGDcQ@mail.gmail.com
Whole thread Raw
In response to Re: Query takes around 15 to 20 min over 20Lakh rows  (Michael Lewis <mlewis@entrata.com>)
List pgsql-general
Hi Tom/David

Could you please help me getting started to optimise this query??

Thanks & Regards
Shubham mittal



On Tue, Sep 7, 2021, 8:57 PM Michael Lewis <mlewis@entrata.com> wrote:
Have you ever used this site to visualize the explain plan and spot bad estimates and slow nodes? https://explain.depesz.com/s/WE1R

This stands out to me-
Subquery Scan on npiactionjoin (cost=10,165,289.40..10,167,192.01 rows=293 width=16) (actual time=118,413.432..118,806.684 rows=446,782 loops=1)
Filter: (npiactionjoin.rn = 1)

It seems that estimate is pretty far off and this node and the final node above this are the biggest slowdowns. If you filtered down to the record you want from task_history BEFORE the join, then maybe you would have quicker results. I might try a materialized CTE or even an analyzed temp table if that option is available to you, so the planner makes informed decisions.

By the way, the order by on that row_number seems like you are getting the OLDEST activity related to the task which could maybe be cached rather than re-calculated daily as this query runs.


Michael Lewis  |  Database Engineer
Entrata

pgsql-general by date:

Previous
From: Israel Brewster
Date:
Subject: Fastest option to transfer db?
Next
From: Michael Lewis
Date:
Subject: Re: Fastest option to transfer db?