On Thu, Nov 11, 2021 at 08:20:57PM +0100, Jiří Fejfar wrote:
> Hi folks,
>
> we have found that (probably after VACUUM ANALYZE) one analytical query
> starts to be slow on our production DB. Moreover, more or less the same
> plan is used on our testing data (how to restore our testing data is
> described at the end of this email), or better to say the same problem
> exists in both (production vs testing data) scenarios: nested loop scanning
> CTE several thousand times is used due to the bad estimates:
> https://explain.dalibo.com/plan/sER#plan/node/87 (query is included on
> dalibo).
> Do you have any idea how to get HASH JOINS in the CTE w_1p_data instead of
> NESTED LOOPs?
> * Add some statistics to not get bad estimates on "lower-level" CTEs?
Do you know why the estimates are bad ?
Index Scan using t_map_plot_cell__cell_gid__idx on cm_plot2cell_mapping cm_plot2cell_mapping (cost=0.29..18.59 rows=381
width=12)(actual time=0.015..2.373 rows=3,898 loops=1)
Index Cond: (cm_plot2cell_mapping.estimation_cell = f_a_cell.estimation_cell)
Buffers: shared hit=110
I don't know, but is the estimate for this portion of the plan improved by doing:
| ALTER TABLE f_a_cell ALTER estimation_cell SET STATISTICS 500; ANALYZE f_a_cell;
> * In a slightly more complicated function I used temporary tables to be
> able to narrow statistics [2] but I am afraid of system table bloating
> because of the huge amount of usage of this function on the production
> (hundred thousand of calls by day when data are to be analyzed).
I would try this for sure - I think hundreds of calls per day would be no
problem. If you're concerned, you could add manual calls to do (for example)
VACUUM pg_attribute; after dropping the temp tables.
BTW, we disable nested loops for the our analytic report queries. I have never
been able to avoid pathological plans any other way.