Re: PoC/WIP: Extended statistics on expressions - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: PoC/WIP: Extended statistics on expressions
Date
Msg-id 6bc0ab0c-a807-2b7f-7df8-ac90c7839013@enterprisedb.com
Whole thread Raw
In response to Re: PoC/WIP: Extended statistics on expressions  (Justin Pryzby <pryzby@telsasoft.com>)
Responses Re: PoC/WIP: Extended statistics on expressions
Re: PoC/WIP: Extended statistics on expressions
List pgsql-hackers
On 8/24/21 3:13 PM, Justin Pryzby wrote:
> On Mon, Aug 16, 2021 at 05:41:57PM +0200, Tomas Vondra wrote:
>>> This still seems odd:
>>>
>>> postgres=# CREATE STATISTICS asf ON i FROM t;
>>> ERROR:  extended statistics require at least 2 columns
>>> postgres=# CREATE STATISTICS asf ON (i) FROM t;
>>> CREATE STATISTICS
>>>
>>> It seems wrong that the command works with added parens, but builds expression
>>> stats on a simple column (which is redundant with what analyze does without
>>> extended stats).
>>
>> Well, yeah. But I think this is a behavior that was discussed somewhere in
>> this thread, and the agreement was that it's not worth the complexity, as
>> this comment explains
>>
>>    * XXX We do only the bare minimum to separate simple attribute and
>>    * complex expressions - for example "(a)" will be treated as a complex
>>    * expression. No matter how elaborate the check is, there'll always be
>>    * a way around it, if the user is determined (consider e.g. "(a+0)"),
>>    * so it's not worth protecting against it.
>>
>> Patch 0001 fixes the "double parens" issue discussed elsewhere in this
>> thread, and patch 0002 tweaks CREATE STATISTICS to treat "(a)" as a simple
>> column reference.
> 
> 0002 refuses to create expressional stats on a simple column reference like
> (a), which I think is helps to avoid a user accidentally creating useless ext
> stats objects (which are redundant with the table's column stats).
> 
> 0002 does not attempt to refuse cases like (a+0), which I think is fine:
> we don't try to reject useless cases if someone insists on it.
> See 240971675, 701fd0bbc.
> 
> So I am +1 to apply both patches.
> 
> I added this as an Opened Item for increased visibility.
> 

I've pushed both fixes, so the open item should be resolved.

However while polishing the second patch, I realized we're allowing 
statistics on expressions referencing system attributes. So this fails;

CREATE STATISTICS s ON ctid, x FROM t;

but this passes:

CREATE STATISTICS s ON (ctid::text), x FROM t;

IMO we should reject such expressions, just like we reject direct 
references to system attributes - patch attached.

Furthermore, I wonder if we should reject expressions without any Vars? 
This works now:

CREATE STATISTICS s ON (11:text) FROM t;

but it seems rather silly / useless, so maybe we should reject it.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Is it safe to use the extended protocol with COPY?
Next
From: Jaime Casanova
Date:
Subject: Re: 2021-09 Commitfest