Home > mailing lists

Re: Showing applied extended statistics in explain - Mailing list pgsql-hackers

From	Dmitry Dolgov
Subject	Re: Showing applied extended statistics in explain
Date	July 27, 2021 10:21:12
Msg-id	20210727102112.mzlilrx4w5ojw74o@localhost Whole thread Raw
In response to	Showing applied extended statistics in explain (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Responses	Re: Showing applied extended statistics in explain
List	pgsql-hackers

Tree view

> On Sat, Mar 27, 2021 at 01:50:54AM +0100, Tomas Vondra wrote:
> Hi,
>
> With extended statistics it may not be immediately obvious if they were
> applied and to which clauses. If you have multiple extended statistics,
> we may also apply them in different order, etc. And with expressions,
> there's also the question of matching expressions to the statistics.
>
> So it seems useful to include this into in the explain plan - show which
> statistics were applied, in which order. Attached is an early PoC patch
> doing that in VERBOSE mode. I'll add it to the next CF.

Hi,

sounds like a useful improvement indeed, thanks for the patch. Do you
plan to invest more time in it?

> 1) The information is stashed in multiple lists added to a Plan. Maybe
> there's a better place, and maybe we need to invent a better way to
> track the info (a new node stashed in a single List).
>
> ...
>
> 3) It does not work for functional dependencies, because we effectively
> "merge" all functional dependencies and apply the entries. Not sure how
> to display this, but I think it should show the individual dependencies
> actually applied.
>
> ...
>
> 5) It includes just statistics name + clauses, but maybe we should
> include additional info (e.g estimate for that combination of clauses).

Yes, a new node would be nice to have. The other questions above are
somewhat related to what it should contain, and I guess it depends on
the use case this patch targets. E.g. for the case "help to figure out
if an extended statistics was applied" even "merged" functional
dependencies would be fine I believe. More advanced plan troubleshooting
may benefit from estimates. What exactly use case do you have in mind?

> 4) The info is collected always, but I guess we should do that only when
> in explain mode. Not sure how expensive it is.

Maybe it's in fact not that expensive to always collect the info? Adding
it as it is now do not increase number of cache lines Plan structure
occupies (although it comes close to the boundary), and a couple of
simple tests with CPU bounded load of various types doesn't show
significant slowdown. I haven't tried the worst case scenario with a lot
of extended stats involved, but in such situations I can imagine the
overhead also could be rather small in comparison with other expenses.

I've got few more questions after reading the patch:

* Is there any particular reason behind choosing only some scan nodes to
  display extended stats? E.g. BitmapHeapScan is missing.

* StatisticExtInfo should have a _copy etc. node functionalty now,
  right? I've hit "unrecognized node type" with a prepared statement
  because it's missing.

pgsql-hackers by date:

From: "Daniel Westermann (DWE)"
Date: 27 July 2021, 10:04:36
Subject: Small typo in variable.c

From: Greg Nancarrow
Date: 27 July 2021, 10:30:29
Subject: Re: [bug?] Missed parallel safety checks, and wrong parallel safety

Re: Showing applied extended statistics in explain - Mailing list pgsql-hackers

Previous

Next