Re: [HACKERS] PATCH: multivariate histograms and MCV lists - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: [HACKERS] PATCH: multivariate histograms and MCV lists
Date
Msg-id 66f652d0-030a-2f53-df85-effb272a5919@2ndquadrant.com
Whole thread Raw
In response to Re: [HACKERS] PATCH: multivariate histograms and MCV lists  (Dean Rasheed <dean.a.rasheed@gmail.com>)
List pgsql-hackers
On 03/26/2018 06:21 PM, Dean Rasheed wrote:
> On 26 March 2018 at 14:08, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>> On 03/26/2018 12:31 PM, Dean Rasheed wrote:
>>> A wider concern I have is that I think this function is trying to be
>>> too clever by only resetting selected stats. IMO it should just reset
>>> all stats unconditionally when the column type changes, which would
>>> be consistent with what we do for regular stats.
>>>
>> The argument a year ago was that it's more plausible that the semantics
>> remains the same. I think the question is how the type change affects
>> precision - had the type change in the opposite direction (int to real)
>> there would be no problem, because both ndistinct and dependencies would
>> produce the same statistics.
>>
>> In my experience people are far more likely to change data types in a
>> way that preserves precision, so I think the current behavior is OK.
> 
> Hmm, I don't really buy that argument. Altering a column's type
> allows the data in it to be rewritten in arbitrary ways, and I don't
> think we should presume that the statistics will still be valid just
> because the user *probably* won't do something that changes the data
> much.
> 

Maybe, I can only really speak about my experience, and in those cases
it's usually "the column is an INT and I need a FLOAT". But you're right
it's not guaranteed to be like that, perhaps the right thing to do is
resetting the stats.

Another reason to do that might be consistency - resetting just some of
the stats might be surprising for users. And we're are already resetting
per-column stats on that column, so the users running ANALYZE anyway.

BTW in my response I claimed this:

>
> The other reason is that when reducing precision, it generally
> enforces the dependency (you can't violate functional dependencies or
> break grouping by merging values). So you will have stale stats with
> weaker dependencies, but it's still better than not having any.>

That's actually bogus. For example for functional dependencies, it's
important on which side of the dependency we reduce precision. With
(a->b) dependency, reducing precision of "b" does indeed strengthen it,
but reducing precision of "a" does weaken it. So I take that back.

So, I'm not particularly opposed to just resetting extended stats
referencing the altered column.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Proposal: http2 wire format
Next
From: Simon Riggs
Date:
Subject: Re: [HACKERS] MERGE SQL Statement for PG11