pgsql: Rework output format of pg_ndistinct - Mailing list pgsql-committers

From Michael Paquier
Subject pgsql: Rework output format of pg_ndistinct
Date
Msg-id E1vKnU3-006y4F-0K@gemulon.postgresql.org
Whole thread Raw
List pgsql-committers
Rework output format of pg_ndistinct

The existing format of pg_ndistinct uses a single-object JSON structure
where each key is itself a comma-separated list of attnums, like:
{"3, 4": 11, "3, 6": 11, "4, 6": 11, "3, 4, 6": 11}

While this is a very compact format, it is confusing to read and it is
difficult to manipulate the values within the object.

The new output format introduced in this commit is an array of objects,
with:
- A key named "attributes", that contains an array of attribute numbers.
- A key named "ndistinct", represented as an integer.

The values use the same underlying type as previously when printed, with
a new output format that shows now as follows:
[{"ndistinct": 11, "attributes": [3,4]},
 {"ndistinct": 11, "attributes": [3,6]},
 {"ndistinct": 11, "attributes": [4,6]},
 {"ndistinct": 11, "attributes": [3,4,6]}]

This new format will become handy for a follow-up set of changes, so as
it becomes possible to inject extended statistics rather than require an
ANALYZE, like in a dump/restore sequence or after pg_upgrade on a new
cluster.

This format has been suggested by Tomas Vondra.  The key names are
defined in a new header, to ease with the integration of
frontend-specific changes that are still under discussion.  (Personal
note: I am not specifically wedded to these key names, but if there are
better name suggestions for this release, feel free.)

The bulk of the changes come from the regression tests, where
jsonb_pretty() is now used to make the outputs generated easier to
parse.

Author: Corey Huinker <corey.huinker@gmail.com>
Reviewed-by: Jian He <jian.universality@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CADkLM=dpz3KFnqP-dgJ-zvRvtjsa8UZv8wDAQdqho=qN3kX0Zg@mail.gmail.com

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/1f927cce44983ed59a3c1eccc95ad2946ac13b42

Modified Files
--------------
doc/src/sgml/perform.sgml                  |  38 ++++++-
src/backend/utils/adt/pg_ndistinct.c       |  22 ++--
src/include/statistics/statistics_format.h |  32 ++++++
src/test/regress/expected/stats_ext.out    | 156 +++++++++++++++++++++++++----
src/test/regress/sql/stats_ext.sql         |  12 +--
5 files changed, 223 insertions(+), 37 deletions(-)


pgsql-committers by date:

Previous
From: Thomas Munro
Date:
Subject: pgsql: Define PS_USE_CLOBBER_ARGV on GNU/Hurd.
Next
From: Michael Paquier
Date:
Subject: pgsql: Rework output format of pg_dependencies