Re: Showing applied extended statistics in explain - Mailing list pgsql-hackers

From Ronan Dunklau
Subject Re: Showing applied extended statistics in explain
Date
Msg-id 2382297.VW1VTv4sZ3@aivenronan
Whole thread Raw
In response to Showing applied extended statistics in explain  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
List pgsql-hackers
Le samedi 27 mars 2021, 01:50:54 CEST Tomas Vondra a écrit :
> The current implementation is a bit ugly PoC, with a couple annoying
> issues that need to be solved:
>
Hello Thomas,

I haven't looked at the implementation at all but I think it's an interesting
idea.


> 1) The information is stashed in multiple lists added to a Plan. Maybe
> there's a better place, and maybe we need to invent a better way to
> track the info (a new node stashed in a single List).

Yes this would probably be cleaner.

>
> 2) The deparsing is modeled (i.e. copied) from how we deal with index
> quals, but it's having issues with nested OR clauses, because there are
> nested RestrictInfo nodes and the deparsing does not expect that.
>
> 3) It does not work for functional dependencies, because we effectively
> "merge" all functional dependencies and apply the entries. Not sure how
> to display this, but I think it should show the individual dependencies
> actually applied.

Yes that would be useful when trying to understand where an estimation comes
from.

>
> 4) The info is collected always, but I guess we should do that only when
> in explain mode. Not sure how expensive it is.

That would probably be better yes.

>
> 5) It includes just statistics name + clauses, but maybe we should
> include additional info (e.g estimate for that combination of clauses).

I'm not sure the estimate for the combination is that useful, as you have an
associated estimated number of rows attached to the node.  I think to be able
to really make sense of it, a GUC disabling the extended statistics could be
useful for the curious DBA to compare the selectivity estimation for a plan
with the statistics and a plan without.

>
> 6) The clauses in the grouping query are transformed to AND list, which
> is wrong. This is easy to fix, I was lazy to do that in a PoC patch.
>
> 7) It does not show statistics for individual expressions. I suppose
> examine_variable could add it to the rel somehow, and maybe we could do
> that with index expressions too?

Yes this would be useful for single-expression extended statistics as well as
statistics collected from a functional index.

I don't know if it's doable, but if we want to provide insights into how
statistics are used, it could be nice to also display the statistics target
used. Since the values at the time of the last analyze and the current value
might be different, it could be nice to store it along with the stats. I
remember having to troubleshoot queries where the problem was an ALTER <TABLE/
INDEX> ... SET STATISTICS had not been run as expected, and having that
information available in the plan for a complex query might help in diagnosing
the problem quicker.

Regards,

--
Ronan Dunklau





pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Have I found an interval arithmetic bug?
Next
From: Pavel Stehule
Date:
Subject: Re: badly calculated width of emoji in psql