Home > mailing lists

Re: Fwd: temp_file_limit? - Mailing list pgsql-performance

From	Frits Jalvingh
Subject	Re: Fwd: temp_file_limit?
Date	December 23, 2022 08:12:08
Msg-id	CAKhTGFU4rMhDgWewvOyTQCJfJ4Esqxm3Y93TA6d1tNz9mSY3PQ@mail.gmail.com Whole thread
In response to	Re: Fwd: temp_file_limit? (Justin Pryzby <pryzby@telsasoft.com>)
List	pgsql-performance

Tree view

Hi Justin,

As our queries are generated I decided to create a peephole optimizer kind of thing to scan the generated SQL AST to find multiple conditions on the same table reference. I can then use our metadata to see if these references are expected to be correlated. This creates about 20 statistics sets, including the one you have indicated. This at least makes the problematic query have a stable and very fast plan (so far). I had hoped for some more improvement with other queries but that has not yet been evident ;)

Thanks a lot for the tips and your help!

Cordially,

Frits

On Tue, Dec 20, 2022 at 10:11 PM Justin Pryzby <pryzby@telsasoft.com> wrote:

On Mon, Dec 19, 2022 at 09:10:27PM +0100, Frits Jalvingh wrote:
> @justin
>
> I tried the create statistics variant and that definitely improves the
> estimate, and with that one of the "bad" cases (the one with the 82 minute
> plan) now creates a good plan using only a few seconds.
> That is a worthwhile path to follow. A bit hard to do, because those
> conditions can be anything, but I can probably calculate the ones used per
> customer and create those correlation statistics from that... It is
> definitely better than tweaking the "poor man's query hints" enable_xxxx
> 8-/ which is really not helping with plan stability either.
>
> That will be a lot of work, but I'll let you know the results ;)

Yeah, if the conditions are arbitrary, then it's going to be more
difficult. Hopefully you don't have too many columns. :)

I suggest enabling autoexplain and monitoring for queries which were
slow, and retroactively adding statistics to those columns which are
most-commonly queried, and which have correlations (which the planner
doesn't otherwise know about).

You won't want to have more than a handful of columns in a stats object
(since it requires factorial(N) complexity), but you can have multiple
stats objects with different combinations of columns (and, in v14,
expressions). You can also set a lower stats target to make the cost a
bit lower.

You could try to check which columns are correlated, either by running:
| SELECT COUNT(1),col1,col2 FROM tbl GROUP BY 2,3 ORDER BY 1;
for different combinations of columns.

Or by creating a tentative/experimental stats object on a handful of
columns at a time for which you have an intuition about their
correlation, and then checking the calculated dependencies FROM
pg_stats_ext. You may need to to something clever to use that for
arbitrarily columns. Maybe this is a start.
| SELECT dep.value::float, tablename, attnames, dep.key, exprs FROM (SELECT (json_each_text(dependencies::text::json)).* AS dep, * FROM pg_stats_ext)dep WHERE dependencies IS NOT NULL ORDER BY 1 DESC ; -- AND regexp_count(key, ',') < 2

--
Justin

pgsql-performance by date:

From: Justin Pryzby
Date: 20 December 2022, 21:11:09
Subject: Re: Fwd: temp_file_limit?

From: Gunther Schadow
Date: 28 December 2022, 15:39:14
Subject: When you really want to force a certain join type?

Re: Fwd: temp_file_limit? - Mailing list pgsql-performance

Previous

Next