Re: Showing applied extended statistics in explain - Mailing list pgsql-hackers

From Dmitry Dolgov
Subject Re: Showing applied extended statistics in explain
Date
Msg-id 20210727102112.mzlilrx4w5ojw74o@localhost
Whole thread Raw
In response to Showing applied extended statistics in explain  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Responses Re: Showing applied extended statistics in explain  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
List pgsql-hackers
> On Sat, Mar 27, 2021 at 01:50:54AM +0100, Tomas Vondra wrote:
> Hi,
>
> With extended statistics it may not be immediately obvious if they were
> applied and to which clauses. If you have multiple extended statistics,
> we may also apply them in different order, etc. And with expressions,
> there's also the question of matching expressions to the statistics.
>
> So it seems useful to include this into in the explain plan - show which
> statistics were applied, in which order. Attached is an early PoC patch
> doing that in VERBOSE mode. I'll add it to the next CF.

Hi,

sounds like a useful improvement indeed, thanks for the patch. Do you
plan to invest more time in it?

> 1) The information is stashed in multiple lists added to a Plan. Maybe
> there's a better place, and maybe we need to invent a better way to
> track the info (a new node stashed in a single List).
>
> ...
>
> 3) It does not work for functional dependencies, because we effectively
> "merge" all functional dependencies and apply the entries. Not sure how
> to display this, but I think it should show the individual dependencies
> actually applied.
>
> ...
>
> 5) It includes just statistics name + clauses, but maybe we should
> include additional info (e.g estimate for that combination of clauses).

Yes, a new node would be nice to have. The other questions above are
somewhat related to what it should contain, and I guess it depends on
the use case this patch targets. E.g. for the case "help to figure out
if an extended statistics was applied" even "merged" functional
dependencies would be fine I believe. More advanced plan troubleshooting
may benefit from estimates. What exactly use case do you have in mind?

> 4) The info is collected always, but I guess we should do that only when
> in explain mode. Not sure how expensive it is.

Maybe it's in fact not that expensive to always collect the info? Adding
it as it is now do not increase number of cache lines Plan structure
occupies (although it comes close to the boundary), and a couple of
simple tests with CPU bounded load of various types doesn't show
significant slowdown. I haven't tried the worst case scenario with a lot
of extended stats involved, but in such situations I can imagine the
overhead also could be rather small in comparison with other expenses.

I've got few more questions after reading the patch:

* Is there any particular reason behind choosing only some scan nodes to
  display extended stats? E.g. BitmapHeapScan is missing.

* StatisticExtInfo should have a _copy etc. node functionalty now,
  right? I've hit "unrecognized node type" with a prepared statement
  because it's missing.



pgsql-hackers by date:

Previous
From: "Daniel Westermann (DWE)"
Date:
Subject: Small typo in variable.c
Next
From: Greg Nancarrow
Date:
Subject: Re: [bug?] Missed parallel safety checks, and wrong parallel safety