Home > mailing lists

Re: pg_stats and range statistics - Mailing list pgsql-hackers

From	Tomas Vondra
Subject	Re: pg_stats and range statistics
Date	January 22, 2023 18:19:41
Msg-id	d6280c94-febf-c17d-f18b-773260089ebf@enterprisedb.com Whole thread Raw
In response to	Re: pg_stats and range statistics (Egor Rogov <e.rogov@postgrespro.ru>)
Responses	Re: pg_stats and range statistics
List	pgsql-hackers

Tree view

On 1/21/23 19:53, Egor Rogov wrote:
> Hi Tomas,
> 
> On 21.01.2023 00:50, Tomas Vondra wrote:
>> Hi Egor,
>>
>> While reviewing a patch improving join estimates for ranges [1] I
>> realized we don't show stats for ranges in pg_stats, and I recalled we
>> had this patch.
>>
>> I rebased the v2, and I decided to took a stab at showing separate
>> histograms for lower/upper histogram bounds. I believe it makes it way
>> more readable, which is what pg_stats is about IMHO.
> 
> 
> Thanks for looking into this.
> 
> I have to admit it looks much better this way, so +1.
> 

OK, good to hear.

> 
>> This simply adds two functions, accepting/producing anyarray - one for
>> lower bounds, one for upper bounds. I don't think it can be done with a
>> plain subquery (or at least I don't know how).
> 
> 
> Anyarray is an alien to SQL, so functions are well justified here. What
> makes me a bit uneasy is two almost identical functions. Should we
> consider other options like a function with an additional parameter or a
> function returning an array of bounds arrays (which is somewhat
> wasteful, but probably it doesn't matter much here)?
> 

I thought about that, but I think the alternatives (e.g. a single
function with a parameter determining which boundary to return). But I
don't think it's better.

Moreover, I think this is pretty similar to lower/upper, which already
work on range values. So if we have separate functions for that, we
should do the same thing here.

I renamed the functions to ranges_lower/ranges_upper, but maybe why not
to even call the functions lower/upper too?

The main trouble with the function I can think of is that we only have
anyarray type, not anyrangearray. So the functions will get called for
arbitrary array, and the check that it's array of ranges happens inside.
I'm not sure if that's a good or bad idea, or what would it take to add
a new polymorphic type ...

For now I at least kept "ranges_" to make it less likely.

> 
>> Finally, it renames the empty_range_frac to start with range_, per the
>> earlier discussion. I wonder if the new column names for lower/upper
>> bounds (range_lower_bounds_histograms/range_upper_bounds_histograms) are
>> too long ...
> 
> 
> It seems so. The ending -s should be left out since it's a single
> histogram now. And I think that
> range_lower_histogram/range_upper_histogram are descriptive enough.
> 
> I'm adding one more patch to shorten the column names, refresh the docs,
> and make 'make check' happy (unfortunately, we have to edit
> src/regress/expected/rules.out every time pg_stats definition changes).
> 

Thanks. I noticed the docs were added to pg_user_mapping by mistake, not
to pg_stats. So I fixed that, and I also added the new functions.

Finally, I reordered the fields a bit - moved range_empty_frac to keep
the histogram fields together.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

pgsql-hackers by date:

From: Jelte Fennema
Date: 22 January 2023, 18:14:24
Subject: Re: run pgindent on a regular basis / scripted manner

From: Pavel Stehule
Date: 22 January 2023, 18:47:07
Subject: Re: Schema variables - new implementation for Postgres 15 (typo)

Re: pg_stats and range statistics - Mailing list pgsql-hackers

Attachment

Previous

Next