Expose custom planning data in EXPLAIN - Mailing list pgsql-hackers

From Andrei Lepikhov
Subject Expose custom planning data in EXPLAIN
Date
Msg-id e7cb9db0-afb6-4ba5-b86f-307683133387@gmail.com
Whole thread Raw
List pgsql-hackers
Hi,

Background and motivation
-------------------------

This feature is inspired by the pg_overexplain, new EXPLAIN-related 
hooks and recent change of the Memoize node representation (4bc62b86849).

Based on user reports from the production system, we typically receive 
no more than an 'explain verbose analyse' output to identify the issue's 
origin. It is obvious to me that the detailisation of the EXPLAIN format 
will never be enough. Even more, each new parameter, adding more 
information, also complicates life for some people, flooding the screen 
with unnecessary (for them) data.

With extensible EXPLAIN options and per-node/summary hooks, we can add 
as many details as needed using modules. However, there is one 
limitation: we can only explore Plan and PlanState data. If something 
isn't transferred from the planning stage to the plan, we won't have a 
way to expose this data.
For example, it is sometimes unclear why the optimiser chose 
IncrementalSort or [did not choose] HashJoin, as we don't know the 
ngroups estimation used at this specific point of the plan.

Design Overview
---------------

It appears that the only two changes required to enable the feature are 
a hook and a field in the Plan node. In this patch, I have chosen to add 
the hook to the copy_generic_path_info routine to limit its usage for 
tracking purposes only. Also, I extended its interface with the 
PlannerInfo pointer, which may be helpful in many cases. The new extlist 
field in the Plan structure should contain (by convention) extensible 
nodes only to let modules correctly pick their data. Also, it simplifies 
the logic of the node serialisation.

An additional motivation for choosing Extensible Node is its lack of 
core usage, which makes it seem unpolished and requires copying a 
significant amount of code to use. This patch highlights this imperfection.

Tests
-----

To demonstrate its real-life application, I added an example to 
pg_overexplain. Here, a ngroups value is computed, stored in the Plan 
node, and exposed in explain. Also, it is a test for the ExtensionNode 
machinery.

Downsides
-----------

1. Growth of the plan node
2. Read/write extensible node - what if reading worker (or backend ?) 
doesn't have the module installed?
3. The point for the hook call.

The first issue is quite limited because the only version of the plan 
exists, in contrast to the multiple paths.
The second issue is a little more complicated. However, I believe the 
issue could be resolved by allowing extensions to determine the logic 
for serialising their ExtensibleNode.
The selection of the point for the hook appears to be quite strict. It 
does not permit any extensions to alter the final plan or disrupt its 
consistency, except for some cost data. However, it does allow for 
tracking the decisions made during the planning phase.

See the patch attached.

-- 
regards, Andrei Lepikhov

Attachment

pgsql-hackers by date:

Previous
From: Bertrand Drouvot
Date:
Subject: Re: Improve LWLock tranche name visibility across backends
Next
From: Peter Eisentraut
Date:
Subject: Re: [PATCH] Proposal: Improvements to PDF stylesheet and table column widths