Re: pg_stats and range statistics - Mailing list pgsql-hackers

From Egor Rogov
Subject Re: pg_stats and range statistics
Date
Msg-id 718126ae-c6f5-39ea-0c56-6089a2e9e54d@postgrespro.ru
Whole thread Raw
In response to Re: pg_stats and range statistics  (Egor Rogov <e.rogov@postgrespro.ru>)
List pgsql-hackers
On 23.01.2023 13:01, Egor Rogov wrote:

> On 23.01.2023 02:21, Tomas Vondra wrote:
>> On 1/22/23 22:33, Justin Pryzby wrote:
>>> On Sun, Jan 22, 2023 at 07:19:41PM +0100, Tomas Vondra wrote:
>>>> On 1/21/23 19:53, Egor Rogov wrote:
>>>>> Hi Tomas,
>>>>> On 21.01.2023 00:50, Tomas Vondra wrote:
>>>>>> This simply adds two functions, accepting/producing anyarray - 
>>>>>> one for
>>>>>> lower bounds, one for upper bounds. I don't think it can be done 
>>>>>> with a
>>>>>> plain subquery (or at least I don't know how).
>>>>> Anyarray is an alien to SQL, so functions are well justified here. 
>>>>> What
>>>>> makes me a bit uneasy is two almost identical functions. Should we
>>>>> consider other options like a function with an additional 
>>>>> parameter or a
>>>>> function returning an array of bounds arrays (which is somewhat
>>>>> wasteful, but probably it doesn't matter much here)?
>>>>>
>>>> I thought about that, but I think the alternatives (e.g. a single
>>>> function with a parameter determining which boundary to return). But I
>>>> don't think it's better.
>>> What about a common function, maybe called like:
>>>
>>> ranges_upper_bounds(PG_FUNCTION_ARGS)
>>> {
>>>      AnyArrayType *array = PG_GETARG_ANY_ARRAY_P(0);
>>>      Oid         element_type = AARR_ELEMTYPE(array);
>>>      TypeCacheEntry *typentry;
>>>
>>>      /* Get information about range type; note column might be a 
>>> domain */
>>>      typentry = range_get_typcache(fcinfo, getBaseType(element_type));
>>>
>>>      return ranges_bounds_common(typentry, array, false);
>>> }
>>>
>>> That saves 40 LOC.
>>>
>> Thanks, that's better. But I'm still not sure it's a good idea to add
>> function with anyarray argument, when we need it to be an array of
>> ranges ...
>>
>> I wonder if we have other functions doing something similar, i.e.
>> accepting a polymorphic type and then imposing additional restrictions
>> on it.
>
>
> I couldn't find such examples, but adding an adhoc polymorphic type 
> just doesn't look right for me. Besides, you'll end up adding not just 
> anyrangearray type, but also anymultirangearray, 
> anycompatiblerangearray, anycompatiblemultirangearray, and maybe their 
> "non"-counterparts like anynonrangearray, and all of these are not of 
> much use. And one day you may need an array of arrays or something...
>
> I wonder if it's possible to teach SQL to work with anyarray type - at 
> runtime the actual type of anyarray elements is known, right? In fact, 
> unnest() alone is enough to eliminate the need of C functions altogether.


When started to look at how we deal with anyarray columns, I came across 
the following comment in parse_coerce.c for 
enforce_generic_type_consistency():

* A special case is that we could see ANYARRAY as an actual_arg_type even
  * when allow_poly is false (this is possible only because pg_statistic has
  * columns shown as anyarray in the catalogs).

It makes me realize how anyarray as-a-real-type is specific to 
pg_statistic. Even if it's possible to somehow postpone type inference 
for this case from parse time to execute time, it clearly doesn't worth 
the effort.

So, I am for the simplest possible approach, that is, the two proposed 
functions ranges_upper(anyarray) and ranges_lower(anyarray). I am not 
even sure if it's worth documenting them, as they are very 
pg_statistic-specific and likely won't be useful for end users.


>
>
>>> Shouldn't this add some sql tests ?
>>>
>> Yeah, I guess we should have a couple tests calling these functions on
>> different range arrays.
>>
>> This reminds me lower()/upper() have some extra rules about handling
>> empty ranges / infinite boundaries etc. These functions should behave
>> consistently (as if we called lower() in a loop) and I'm pretty sure
>> that's not the current state.
>
>
> I can try to tidy things up, but first we need to decide on the 
> general approach.
>
>



pgsql-hackers by date:

Previous
From: Peter Smith
Date:
Subject: Re: Perform streaming logical transactions by background workers and parallel apply
Next
From: Kyotaro Horiguchi
Date:
Subject: Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)