Re: Showing applied extended statistics in explain - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Showing applied extended statistics in explain
Date
Msg-id 81125fcd-4d73-f0ec-5b2b-825a9c241926@enterprisedb.com
Whole thread Raw
In response to Re: Showing applied extended statistics in explain  (Dmitry Dolgov <9erthalion6@gmail.com>)
Responses Re: Showing applied extended statistics in explain  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Showing applied extended statistics in explain  (Dmitry Dolgov <9erthalion6@gmail.com>)
List pgsql-hackers
Hi,

On 7/27/21 12:21 PM, Dmitry Dolgov wrote:
>> On Sat, Mar 27, 2021 at 01:50:54AM +0100, Tomas Vondra wrote:
>> Hi,
>>
>> With extended statistics it may not be immediately obvious if they were
>> applied and to which clauses. If you have multiple extended statistics,
>> we may also apply them in different order, etc. And with expressions,
>> there's also the question of matching expressions to the statistics.
>>
>> So it seems useful to include this into in the explain plan - show which
>> statistics were applied, in which order. Attached is an early PoC patch
>> doing that in VERBOSE mode. I'll add it to the next CF.
> 
> Hi,
> 
> sounds like a useful improvement indeed, thanks for the patch. Do you
> plan to invest more time in it?
> 

Yes. I think providing more insight into which statistics were applied,
in which order and to which clauses, is quite desirable.

>> 1) The information is stashed in multiple lists added to a Plan. Maybe
>> there's a better place, and maybe we need to invent a better way to
>> track the info (a new node stashed in a single List).
>>
>> ...
>>
>> 3) It does not work for functional dependencies, because we effectively
>> "merge" all functional dependencies and apply the entries. Not sure how
>> to display this, but I think it should show the individual dependencies
>> actually applied.
>>
>> ...
>>
>> 5) It includes just statistics name + clauses, but maybe we should
>> include additional info (e.g estimate for that combination of clauses).
> 
> Yes, a new node would be nice to have. The other questions above are
> somewhat related to what it should contain, and I guess it depends on
> the use case this patch targets. E.g. for the case "help to figure out
> if an extended statistics was applied" even "merged" functional
> dependencies would be fine I believe.

What do you mean by "merged" functional dependencies? I guess we could
say "these clauses were estimated using dependencies" without listing
which exact dependencies were applied.

> More advanced plan troubleshooting may benefit from estimates.

I'm sorry, I don't know what you mean by this. Can you elaborate?

> What exactly use case do you have in mind?
Well, my goal was to help users investigating the plan/estimates,
because once you create multiple "candidate" statistics objects it may
get quite tricky to determine which of them were applied, in what order,
etc. It's not all that straightforward, given the various heuristics we
use to pick the "best" statistics, apply dependencies last, etc. And I
don't quite want to teach the users those rules, because I consider them
mostly implementation details that wee may want to tweak in the future.

>> 4) The info is collected always, but I guess we should do that only when
>> in explain mode. Not sure how expensive it is.
> 
> Maybe it's in fact not that expensive to always collect the info? Adding
> it as it is now do not increase number of cache lines Plan structure
> occupies (although it comes close to the boundary), and a couple of
> simple tests with CPU bounded load of various types doesn't show
> significant slowdown. I haven't tried the worst case scenario with a lot
> of extended stats involved, but in such situations I can imagine the
> overhead also could be rather small in comparison with other expenses.
> 

Yeah, once there are many statistics it's probably not an issue - the
overhead from processing them is likely way higher than copying this
extra info. Plus people tend to create statistics when there are issues
with planning the queries.

> I've got few more questions after reading the patch:
> 
> * Is there any particular reason behind choosing only some scan nodes to
>   display extended stats? E.g. BitmapHeapScan is missing.
> 

All nodes that may apply extended stats to estimate quals should include
this info. I guess BitmapHeapScan may do that for the filter, right?

> * StatisticExtInfo should have a _copy etc. node functionalty now,
>   right? I've hit "unrecognized node type" with a prepared statement
>   because it's missing.
> 

Yeah, probably.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: pg_settings.pending_restart not set when line removed
Next
From: Tom Lane
Date:
Subject: Re: Showing applied extended statistics in explain