Home > mailing lists

Re: Add expressions to pg_restore_extended_stats() - Mailing list pgsql-hackers

From	Corey Huinker
Subject	Re: Add expressions to pg_restore_extended_stats()
Date	January 30 11:07:30
Msg-id	CADkLM=fZci8mBEoKX9Gk-+0jXrVmQBuTErP-0JYgE1iUL+EX2A@mail.gmail.com Whole thread Raw
In response to	Re: Add expressions to pg_restore_extended_stats() (Michael Paquier <michael@paquier.xyz>)
List	pgsql-hackers

Tree view

On Fri, Jan 30, 2026 at 12:55 AM Michael Paquier <michael@paquier.xyz> wrote:

On Fri, Jan 30, 2026 at 12:08:49AM -0500, Corey Huinker wrote:
> 3. Keeping the 2-D text array in #1, but each "row" is a list of
> kwargs-like pairs like the arguments used in pg_restore_attribute_stats
> (i.e. ARRAY['null_frac','0.5','avg_width','1.0','most_common_values',...]
>
> 4. JSON. The outer structure would be an array of objects, each object
> would be a key-value.

I'd still favor 4 on the ground that it's easier to edit and read,
which would more in line with the MCV, dependencies, ndistinct and
att/rel statistics. The format proposed in the attached patch is hard
to work with, anyway. Now, I do take your point about composite
record values casted into a single text value could be confusing
(double-quote issues, I guess?), so perhaps a text[] as in 3 would be
more adapted for readability.

Hmm, maybe it isn't so bad:

SELECT '{"{\"{1,1}\",\"{2,1}\",\"{3,-1}\",\"{NULL,0}\"}"}'::text[];
text
---------------------------------------------------
{"{\"{1,1}\",\"{2,1}\",\"{3,-1}\",\"{NULL,0}\"}"}
(1 row)

SELECT array_to_json('{"{\"{1,1}\",\"{2,1}\",\"{3,-1}\",\"{NULL,0}\"}"}'::text[]);
array_to_json
---------------------------------------------------
["{\"{1,1}\",\"{2,1}\",\"{3,-1}\",\"{NULL,0}\"}"]
(1 row)

Mind you, this is an ANYARRAY first casted to text, if we cast the pg_stats_ext_exprs.most_common_values directly to JSON then it'll drill down into the innards of the composite values because it can see the local datatypes, and that breaks our ability to use regular input functions. I learned that the hard way when using JSON for serializing attribute stats stuff when this effort first began.

Before seeing that, I wanted to try option 3 first, as it brings clarity with no real increase in tooling other than looking for repeated keywords, but if you're hyped for json then I'll try that first.

We could also force some checks based
the order of the arguments in the input array, so as duplicates are
not an issue, I guess?

If we're doing a kwargs-thing then I may as well just track which keywords were already used. We already bail out on the whole expressions array at the first sign of inconsistency, so it's not like we have to decide which of the duplicates to keep.

Structurally, I feel that import_expressions() is overcomplicated, and
with the correct structure tracking the state of each field, I would
try to reduce a bit the duplication that the patch presents, aiming at
less callers of statatt_build_stavalues() and statatt_set_slot(),
perhaps.

I don't think we can get around those. It's a limitation of how the sta(kindN/opN/collN/numbersN/valuesN) system in pg_statistic works. We want to fill in each stakind as we find it, and we don't know how many of them we've already filled out. An array of records would have been better, but we've got 5 parallel arrays of scalars and we have to live with it.

pgsql-hackers by date:

From: Masahiko Sawada
Date: 30 January, 11:03:49
Subject: Re: pg_upgrade: optimize replication slot caught-up check

From: Peter Smith
Date: 30 January, 11:16:13
Subject: Re: Skipping schema changes in publication

Re: Add expressions to pg_restore_extended_stats() - Mailing list pgsql-hackers

Previous

Next