Re: Add expressions to pg_restore_extended_stats() - Mailing list pgsql-hackers
| From | Corey Huinker |
|---|---|
| Subject | Re: Add expressions to pg_restore_extended_stats() |
| Date | |
| Msg-id | CADkLM=fSvftbJ7fcGmokGGi_w2CbyvwDNg-3shtiZOsaqLamGA@mail.gmail.com Whole thread Raw |
| In response to | Re: Add expressions to pg_restore_extended_stats() (Michael Paquier <michael@paquier.xyz>) |
| Responses |
Re: Add expressions to pg_restore_extended_stats()
|
| List | pgsql-hackers |
v6-0001 has less fuzz, thanks for cleaning up the whole. I am looking
at the patch, and immediately noted two concepts that bump into my eyes
and look rather non-reliable.
+ if (!sta_ok)
+ *exprs_is_perfect = false;
+ isnull = false;
This bit looks incorrect to me? If !sta_ok (or !row_is_perfect in
import_pg_statistic()), then why is a non-NULL value inserted into the
resulting array? If we fail to parse even one field or miss one key,
we should force NULL for this single expression in the worst case and
attempt to move on with the rest. But it does not matter anyway
because import_expressions() would fail the import. Why don't we just
skip the rest of the expressions then? We know that the resulting
array will go to the garbage bit and that the restore failed, issuing
a WARNING for the statext object.
This is where the metaphor between pg_statistic as attribute statistic and pg_statistic as an array element in stxdexpr breaks down a bit.
For pg_restore_attribute_stats(), our goal was to create a new stat if none exists, and if one exists then replace only those elements that are 1) specified in the call and 2) successfully import. This means that the function can update a pg_statistic row, but return false because not all attributes specified were actually updates. i.e. it wasn't "perfect".
I think that import_expressions() has its logic going backwards, by
this I mean that intializing exprs_is_perfect to true could be risky.
It seems to me that we should do the exact opposite: initialize it at
false, and switch to true *if and only if* all the correct conditions
we want are reached. I'd suggest set of gotos and a single exit path
at the end of import_expressions() where exprs_is_perfect is set to
true to let the caller know that the expression can be safely used.
Remember import_mcv(), as one example.
+1
Similarly, the same concept should be applied to import_pg_statistic().
row_is_perfect should be false to start, and switched to true once we
are sure that everything we want is valid.
+1
Also, I think that the format of the dump should be better when it
comes to a set of expressions where some of them have invalid data.
Even if the stats of an expression are invalid, pg_dump builds a JSON
blob for the element of the expression with all the keys and their
values set to NULL. I'd suggest to just publish a NULL for the
element where invalid stats are found. That makes the dumps shorter
as well.
Stats can't be invalid on the way out, only on the way in. I'm assuming that you mean null/pointless data. I used jsonb_strip_nulls() to remove keys where the value is null, and nullif() to map empty objects to null, and I think that's much more tidy. We still could get a situation where all the exprs are empty (i.e. [null,null,null]). There is no simple test to map that to just a plain null, but even if there were the SQL is already drifting into "clever" territory and I know that pg_dump likes to keep things very simple.
Attachment
pgsql-hackers by date: