On 3/16/19 11:55 AM, Dean Rasheed wrote:
> On Fri, 15 Mar 2019 at 00:06, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>> I've noticed an annoying thing when modifying type of column not
>> included in a statistics...
>>
>> That is, we don't remove the statistics, but the estimate still changes.
>> But that's because the ALTER TABLE also resets reltuples/relpages:
>>
>> That's a bit unfortunate, and it kinda makes the whole effort to not
>> drop the statistics unnecessarily kinda pointless :-(
>>
>
> Well not entirely. Repeating that test with 100,000 rows, I get an
> initial estimate of 9850 (actual 10,000), which then drops to 2451
> after altering the column. But if you drop the dependency statistics,
> the estimate drops to 241, so clearly there is some benefit in keeping
> them in that case.
>
Sure. What I meant is that to correct the relpages/reltuples estimates
you need to do ANALYZE, which rebuilds the statistics anyway. Although
VACUUM also fixes the estimates, without the stats rebuild.
> Besides, I thought there was no extra effort in keeping the extended
> statistics in this case -- isn't it just using the column
> dependencies, so in this case UpdateStatisticsForTypeChange() never
> gets called anyway?
>
Yes, it does not get called at all. My point was that I was a little bit
confused because the test says "check change of unrelated column type
does not reset the MCV statistics" yet the estimates do actually change.
I wonder why we reset the relpages/reltuples to 0, instead of retaining
the original values, though. That would likely give us better density
estimates in estimate_rel_size, I think.
So I've tried doing that, and I've included it as 0001 into the patch
series. It seems to work, but I suppose the reset is there for a reason.
In any case, this is a preexisting issue, independent of what this patch
does or changes.
I've discovered another issue, though. Currently, clauselist_selectivity
has this as the very beginning:
/*
* If there's exactly one clause, just go directly to
* clause_selectivity(). None of what we might do below is relevant.
*/
if (list_length(clauses) == 1)
return clause_selectivity(root, (Node *) linitial(clauses),
varRelid, jointype, sjinfo);
Which however fails with queries like this:
WHERE (a = 1 OR b = 1)
because clauselist_selectivity sees it as a single clause, passes it to
clause_selectivity and the OR-clause handling simply relies on
(s1 + s2 - s1 * s2)
which entirely ignores the multivariate stats. The other similar places
in clause_selectivity() simply call clauselist_selectivity() so that's
OK, but OR-clauses don't do that.
For functional dependencies this is not a huge issue because those apply
only to AND-clauses. But there were proposals to maybe apply them to
other types of clauses, in which case it might become issue.
I think the best fix is moving the optimization after the multivariate
stats are applied. The only alternative I can think of is modifying
clauselist_selectivity so that it can be executed on OR-clauses. But
that seems much more complicated than the former option for almost no
other advantages.
I've also changed how statext_is_compatible_clause_internal() handles
the attnums bitmapset - you were right in your 3/10 message that we can
just pass the value, without creating a local bitmapset. So I've just
done that.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services