On Tue, Feb 25, 2025 at 11:36 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Corey Huinker <corey.huinker@gmail.com> writes: > My solution so far is to take allo the v11+ (SELECT array_agg...) functions > and put them into a LATERAL, two of them filtered by attstattarget > 0 and > a new one aggregating attnames with no filter.
> An alternative would be a new subselect for array_agg(attname) WHERE > in.indexprs IS NOT NULL, thus removing the extra compute for the indexes > that lack an index expression (i.e. most of them), and thus lack settable > stats (at least for now) and wouldn't be affected by the name-jitter issue > anyway.
Yeah, I've been thinking about that. I think that the idea of the current design is that relatively few indexes will have explicit stats targets set on them, so most of the time the sub-SELECTs produce no data. (Which is not to say that they're cheap to execute.) If we pull all the column names for all indexes then we'll likely bloat pg_dump's working storage quite a bit. Pulling them only for indexes with expression columns should fix that, and as you say we don't need the names otherwise.
I still fear that those sub-selects are pretty expensive in aggregate -- they are basically forcing a nestloop join -- and maybe we need to rethink that whole idea.
BTW, just as a point of order: it is not the case that non-expression indexes are free of name-jitter problems. That's because we don't bother to rename index columns when the underlying table column is renamed, thus:
Ouch.
After dump-n-reload, this index's column will be named "xx". That's not relevant to our current problem as long as we don't store stats on such index columns, but it's plenty relevant to the ALTER INDEX ... SET STATISTICS code.
The only way I can imagine those columns getting their own stats is if we start adding stats for columns of partial indexes, in which case we'd just bump the predicate to WHERE (i.indexprs IS NOT NULL OR i.indpred IS NOT NULL)
Just to confirm, we ARE able to assume dense packing of attributes in an index, and thus we can infer the attnum from the position of the attname in the aggregated array, and there's no need to do a parallel array_agg of attnums, yes?
> I'm on the fence about how to handle pg_clear_attribute_stats(), leaning > toward overloaded functions.
I kinda felt that we didn't need to bother with an attnum-based variant of pg_clear_attribute_stats(), since pg_dump has no use for that. I won't stand in the way if you're desperate to do it, though.
I'm not desperate to slow this thread down, no. We'll stick with attname-only.