Re: Add expressions to pg_restore_extended_stats() - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: Add expressions to pg_restore_extended_stats()
Date
Msg-id aYWUKkN0Ye_Poo7q@paquier.xyz
Whole thread Raw
In response to Re: Add expressions to pg_restore_extended_stats()  (Corey Huinker <corey.huinker@gmail.com>)
Responses Re: Add expressions to pg_restore_extended_stats()
List pgsql-hackers
On Thu, Feb 05, 2026 at 11:50:20PM -0500, Corey Huinker wrote:
> For pg_restore_extended_stats(), we've learned that we *must* have the
> right number of elements in the pg_statistic array, so the notion of
> replace-ability is severely if not fatally weakened. However, there are
> errors that we might want an individual pg_statistic to recover from, a
> good example being keys that somehow get removed in a future version. So
> while I've modified import_pg_statistic to return a null datum on any
> inconsistency, that might not be the case in the future, and
> import_expressions() should check to see if it happens. Similarly,
> import_expressions could have multiple pg_statistic rows, and if one of
> them has an inconsistency, I'd still like the others to still make it in.
> That's implemented by checking a counter of pg_statistics that imported ok
> vs the total number of exprs, and if they match then import_expressions()
> was "perfect".

I am not sure to get the point about some keys removed in the future.
First, that sounds very unlikely.  Even if it were the case, isn't
pg_dump going to filter them out anyway?  Forcing a stricter check in
the restore function to force a set of keys to exist still sounds like
a better option, as we have only backward-compatibility requirements
in pg_dump, not in the backend restore function at a fixed version.

>> Also, I think that the format of the dump should be better when it
>> comes to a set of expressions where some of them have invalid data.
>> Even if the stats of an expression are invalid, pg_dump builds a JSON
>> blob for the element of the expression with all the keys and their
>> values set to NULL.  I'd suggest to just publish a NULL for the
>> element where invalid stats are found.  That makes the dumps shorter
>> as well.
>
> Stats can't be invalid on the way out, only on the way in. I'm assuming
> that you mean null/pointless data. I used jsonb_strip_nulls() to remove
> keys where the value is null, and nullif() to map empty objects to null,
> and I think that's much more tidy.

Nice trick with the LATERAL join in pg_dump.  That makes the dump of
the expression data behave in a more consistent way now.

> We still could get a situation where all
> the exprs are empty (i.e. [null,null,null]). There is no simple test to map
> that to just a plain null, but even if there were the SQL is already
> drifting into "clever" territory and I know that pg_dump likes to keep
> things very simple.

With a set of three expressions [null,null,null] is IMO a valid thing,
we have no valid data.  Another case is when we have a non-NULL
element with all the keys present, but all their values are NULL.  If
my test of the patch is right, the restore function accepts this case,
and decides to set the following fields:
        "avg_width": "0",
        "null_frac": "0",
        "n_distinct": "0"

jbv_string_get_cstr() is IMO incorrect in deciding that, no?  It
sounds to me that some specific NULL checks would be in order?
--
Michael

Attachment

pgsql-hackers by date:

Previous
From: Andrey Borodin
Date:
Subject: Windows locales and tests portability
Next
From: Jakub Wartak
Date:
Subject: Re: [PING] fallocate() causes btrfs to never compress postgresql files