Re: PoC/WIP: Extended statistics on expressions - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: PoC/WIP: Extended statistics on expressions
Date
Msg-id a7d17aec-234d-c7f2-bfdc-99c87767a507@enterprisedb.com
Whole thread Raw
In response to Re: PoC/WIP: Extended statistics on expressions  (Dean Rasheed <dean.a.rasheed@gmail.com>)
Responses Re: PoC/WIP: Extended statistics on expressions  (Dean Rasheed <dean.a.rasheed@gmail.com>)
Re: PoC/WIP: Extended statistics on expressions  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
List pgsql-hackers

On 3/24/21 5:28 PM, Dean Rasheed wrote:
> On Wed, 24 Mar 2021 at 14:48, Tomas Vondra
> <tomas.vondra@enterprisedb.com> wrote:
>>
>> AFAIK the primary issue here is that the two places disagree. While
>> estimate_num_groups does this
>>
>>     varnos = pull_varnos(root, (Node *) varshere);
>>     if (bms_membership(varnos) == BMS_SINGLETON)
>>     { ... }
>>
>> the add_unique_group_expr does this
>>
>>     varnos = pull_varnos(root, (Node *) groupexpr);
>>
>> That is, one looks at the group expression, while the other look at vars
>> extracted from it by pull_var_clause(). Apparently for PlaceHolderVar
>> this can differ, causing the crash.
>>
>> So we need to change one of those places - my fix tweaked the second
>> place to also look at the vars, but maybe we should change the other
>> place? Or maybe it's not the right fix for PlaceHolderVars ...
>>
> 
> I think that it doesn't make any difference which place is changed.
> 
> This is a case of an expression with no stats. With your change,
> you'll get a single GroupExprInfo containing a list of
> VariableStatData's for each of it's Var's, whereas if you changed it
> the other way, you'd get a separate GroupExprInfo for each Var. But I
> think they'd both end up being treated the same by
> estimate_multivariate_ndistinct(), since there wouldn't be any stats
> matching the expression, only the individual Var's. Maybe changing the
> first place would be the more bulletproof fix though.
> 

Yeah, I think that's true. I'll do a bit more research / experiments.

As for the changes proposed in the create_statistics, do we really want
to use univariate / multivariate there? Yes, the terms are correct, but
I'm not sure how many people looking at CREATE STATISTICS will
understand them.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Jacob Champion
Date:
Subject: Re: Proposal: Save user's original authenticated identity for logging
Next
From: Stephen Frost
Date:
Subject: Re: Support for NSS as a libpq TLS backend