Re: Statistics Import and Export - Mailing list pgsql-hackers

From Corey Huinker
Subject Re: Statistics Import and Export
Date
Msg-id CADkLM=cCJFH7DE1W+nT4uAxgEL-gOBoKMQORJTnzmoXTFw231w@mail.gmail.com
Whole thread Raw
In response to Re: Statistics Import and Export  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Statistics Import and Export
List pgsql-hackers
On Tue, Feb 25, 2025 at 11:36 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Corey Huinker <corey.huinker@gmail.com> writes:
> My solution so far is to take allo the v11+ (SELECT array_agg...) functions
> and put them into a LATERAL, two of them filtered by attstattarget > 0 and
> a new one aggregating attnames with no filter.

> An alternative would be a new subselect for array_agg(attname) WHERE
> in.indexprs IS NOT NULL, thus removing the extra compute for the indexes
> that lack an index expression (i.e. most of them), and thus lack settable
> stats (at least for now) and wouldn't be affected by the name-jitter issue
> anyway.

Yeah, I've been thinking about that.  I think that the idea of the
current design is that relatively few indexes will have explicit stats
targets set on them, so most of the time the sub-SELECTs produce no
data.  (Which is not to say that they're cheap to execute.)  If we
pull all the column names for all indexes then we'll likely bloat
pg_dump's working storage quite a bit.  Pulling them only for indexes
with expression columns should fix that, and as you say we don't need
the names otherwise.

I still fear that those sub-selects are pretty expensive in aggregate
-- they are basically forcing a nestloop join -- and maybe we need to
rethink that whole idea.

BTW, just as a point of order: it is not the case that non-expression
indexes are free of name-jitter problems.  That's because we don't
bother to rename index columns when the underlying table column is
renamed, thus:

Ouch.
 
After dump-n-reload, this index's column will be named "xx".
That's not relevant to our current problem as long as we
don't store stats on such index columns, but it's plenty
relevant to the ALTER INDEX ... SET STATISTICS code.

The only way I can imagine those columns getting their own stats is if we start adding stats for columns of partial indexes, in which case we'd just bump the predicate to WHERE (i.indexprs IS NOT NULL OR i.indpred IS NOT NULL)

Just to confirm, we ARE able to assume dense packing of attributes in an index, and thus we can infer the attnum from the position of the attname in the aggregated array, and there's no need to do a parallel array_agg of attnums, yes?
 

> I'm on the fence about how to handle pg_clear_attribute_stats(), leaning
> toward overloaded functions.

I kinda felt that we didn't need to bother with an attnum-based
variant of pg_clear_attribute_stats(), since pg_dump has no
use for that.  I won't stand in the way if you're desperate to
do it, though.

I'm not desperate to slow this thread down, no. We'll stick with attname-only.

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Spinlock can be released twice in procsignal.c
Next
From: Tom Lane
Date:
Subject: Re: Statistics Import and Export