Re: performance of analytical query - Mailing list pgsql-performance

From Justin Pryzby
Subject Re: performance of analytical query
Date
Msg-id 20211112024151.GF17618@telsasoft.com
Whole thread Raw
In response to performance of analytical query  (Jiří Fejfar <jurafejfar@gmail.com>)
Responses Re: performance of analytical query  (Jiří Fejfar <jurafejfar@gmail.com>)
List pgsql-performance
On Thu, Nov 11, 2021 at 08:20:57PM +0100, Jiří Fejfar wrote:
> Hi folks,
> 
> we have found that (probably after VACUUM ANALYZE) one analytical query
> starts to be slow on our production DB. Moreover, more or less the same
> plan is used on our testing data (how to restore our testing data is
> described at the end of this email), or better to say the same problem
> exists in both (production vs testing data) scenarios: nested loop scanning
> CTE several thousand times is used due to the bad estimates:
> https://explain.dalibo.com/plan/sER#plan/node/87 (query is included on
> dalibo).

> Do you have any idea how to get HASH JOINS in the CTE w_1p_data instead of
> NESTED LOOPs?
> * Add some statistics to not get bad estimates on "lower-level" CTEs?

Do you know why the estimates are bad ?

Index Scan using t_map_plot_cell__cell_gid__idx on cm_plot2cell_mapping cm_plot2cell_mapping (cost=0.29..18.59 rows=381
width=12)(actual time=0.015..2.373 rows=3,898 loops=1)
 
    Index Cond: (cm_plot2cell_mapping.estimation_cell = f_a_cell.estimation_cell)
    Buffers: shared hit=110

I don't know, but is the estimate for this portion of the plan improved by doing:
| ALTER TABLE f_a_cell ALTER estimation_cell SET STATISTICS 500; ANALYZE f_a_cell;

> * In a slightly more complicated function I used temporary tables to be
> able to narrow statistics [2] but I am afraid of system table bloating
> because of the huge amount of usage of this function on the production
> (hundred thousand of calls by day when data are to be analyzed).

I would try this for sure - I think hundreds of calls per day would be no
problem.  If you're concerned, you could add manual calls to do (for example)
VACUUM pg_attribute; after dropping the temp tables.

BTW, we disable nested loops for the our analytic report queries.  I have never
been able to avoid pathological plans any other way.



pgsql-performance by date:

Previous
From: Jiří Fejfar
Date:
Subject: performance of analytical query
Next
From: Ashkil Dighin
Date:
Subject: Re: Lock contention high