Re: PoC/WIP: Extended statistics on expressions - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: PoC/WIP: Extended statistics on expressions
Date
Msg-id fc7ec493-c7cc-1db0-5e1d-9251c5e51387@enterprisedb.com
Whole thread Raw
In response to Re: PoC/WIP: Extended statistics on expressions  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
List pgsql-hackers
On 3/25/21 1:05 AM, Tomas Vondra wrote:
> ...
>
> 0002 is an attempt to fix an issue I noticed today - we need to handle
> type changes. Until now we did not have problems with that, because we
> only had attnums - so we just reset the statistics (with the exception
> of functional dependencies, on the assumption that those remain valid).
> 
> With expressions it's a bit more complicated, though.
> 
> 1) we need to transform the expressions so that the Vars contain the
> right type info etc. Otherwise an analyze with the old pg_node_tree crashes
> 
> 2) we need to reset the pg_statistic[] data too, which however makes
> keeping the functional dependencies a bit less useful, because those
> rely on the expression stats :-(
> 
> So I'm wondering what to do about this. I looked into how ALTER TABLE
> handles indexes, and 0003 is a PoC to do the same thing for statistics.
> Of couse, this is a bit unfortunate because it recreates the statistics
> (so we don't keep anything, not even functional dependencies).
> 
> I think we have two options:
> 
> a) Make UpdateStatisticsForTypeChange smarter to also transform and
> update the expression string, and reset pg_statistics[] data.
> 
> b) Just recreate the statistics, just like we do for indexes. Currently
> this does not force analyze, so it just resets all the stats. Maybe it
> should do analyze, though.
> 
> Any opinions? I need to think about this a bit more, but maybe (b) with
> the analyze is the right thing to do. Keeping just some of the stats
> always seemed a bit weird. (This is why the 0002 patch breaks one of the
> regression tests.)
> 

After thinking about this a bit more I think (b) is the right choice,
and the analyze is not necessary. The reason is fairly simple - we drop
the per-column statistics, because ATExecAlterColumnType does

    RemoveStatistics(RelationGetRelid(rel), attnum);

so the user has to run analyze anyway, to get any reasonable estimates
(we keep the functional dependencies, but those still rely on per-column
statistics quite a bit). And we'll have to do the same thing with
per-expression stats too. It was a nice idea to keep at least the stats
that are not outright broken, but unfortunately it's not a very useful
optimization. It increases the instability of the system, because now we
have estimates with all statistics, no statistics, and something in
between after the partial reset. Not nice.

So my plan is to get rid of UpdateStatisticsForTypeChange, and just do
mostly what we do for indexes. It's not perfect (as demonstrated in last
message), but that'd apply even to option (a).

Any better ideas?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Amul Sul
Date:
Subject: Re: [CLOBBER_CACHE]Server crashed with segfault 11 while executing clusterdb
Next
From: Andres Freund
Date:
Subject: Re: wal stats questions