Re: pg_stats and range statistics - Mailing list pgsql-hackers
From | Egor Rogov |
---|---|
Subject | Re: pg_stats and range statistics |
Date | |
Msg-id | 718126ae-c6f5-39ea-0c56-6089a2e9e54d@postgrespro.ru Whole thread Raw |
In response to | Re: pg_stats and range statistics (Egor Rogov <e.rogov@postgrespro.ru>) |
List | pgsql-hackers |
On 23.01.2023 13:01, Egor Rogov wrote: > On 23.01.2023 02:21, Tomas Vondra wrote: >> On 1/22/23 22:33, Justin Pryzby wrote: >>> On Sun, Jan 22, 2023 at 07:19:41PM +0100, Tomas Vondra wrote: >>>> On 1/21/23 19:53, Egor Rogov wrote: >>>>> Hi Tomas, >>>>> On 21.01.2023 00:50, Tomas Vondra wrote: >>>>>> This simply adds two functions, accepting/producing anyarray - >>>>>> one for >>>>>> lower bounds, one for upper bounds. I don't think it can be done >>>>>> with a >>>>>> plain subquery (or at least I don't know how). >>>>> Anyarray is an alien to SQL, so functions are well justified here. >>>>> What >>>>> makes me a bit uneasy is two almost identical functions. Should we >>>>> consider other options like a function with an additional >>>>> parameter or a >>>>> function returning an array of bounds arrays (which is somewhat >>>>> wasteful, but probably it doesn't matter much here)? >>>>> >>>> I thought about that, but I think the alternatives (e.g. a single >>>> function with a parameter determining which boundary to return). But I >>>> don't think it's better. >>> What about a common function, maybe called like: >>> >>> ranges_upper_bounds(PG_FUNCTION_ARGS) >>> { >>> AnyArrayType *array = PG_GETARG_ANY_ARRAY_P(0); >>> Oid element_type = AARR_ELEMTYPE(array); >>> TypeCacheEntry *typentry; >>> >>> /* Get information about range type; note column might be a >>> domain */ >>> typentry = range_get_typcache(fcinfo, getBaseType(element_type)); >>> >>> return ranges_bounds_common(typentry, array, false); >>> } >>> >>> That saves 40 LOC. >>> >> Thanks, that's better. But I'm still not sure it's a good idea to add >> function with anyarray argument, when we need it to be an array of >> ranges ... >> >> I wonder if we have other functions doing something similar, i.e. >> accepting a polymorphic type and then imposing additional restrictions >> on it. > > > I couldn't find such examples, but adding an adhoc polymorphic type > just doesn't look right for me. Besides, you'll end up adding not just > anyrangearray type, but also anymultirangearray, > anycompatiblerangearray, anycompatiblemultirangearray, and maybe their > "non"-counterparts like anynonrangearray, and all of these are not of > much use. And one day you may need an array of arrays or something... > > I wonder if it's possible to teach SQL to work with anyarray type - at > runtime the actual type of anyarray elements is known, right? In fact, > unnest() alone is enough to eliminate the need of C functions altogether. When started to look at how we deal with anyarray columns, I came across the following comment in parse_coerce.c for enforce_generic_type_consistency(): * A special case is that we could see ANYARRAY as an actual_arg_type even * when allow_poly is false (this is possible only because pg_statistic has * columns shown as anyarray in the catalogs). It makes me realize how anyarray as-a-real-type is specific to pg_statistic. Even if it's possible to somehow postpone type inference for this case from parse time to execute time, it clearly doesn't worth the effort. So, I am for the simplest possible approach, that is, the two proposed functions ranges_upper(anyarray) and ranges_lower(anyarray). I am not even sure if it's worth documenting them, as they are very pg_statistic-specific and likely won't be useful for end users. > > >>> Shouldn't this add some sql tests ? >>> >> Yeah, I guess we should have a couple tests calling these functions on >> different range arrays. >> >> This reminds me lower()/upper() have some extra rules about handling >> empty ranges / infinite boundaries etc. These functions should behave >> consistently (as if we called lower() in a loop) and I'm pretty sure >> that's not the current state. > > > I can try to tidy things up, but first we need to decide on the > general approach. > >
pgsql-hackers by date: