Re: Split index and table statistics into different types of stats - Mailing list pgsql-hackers

From Drouvot, Bertrand
Subject Re: Split index and table statistics into different types of stats
Date
Msg-id 06517d27-7a81-3b60-60c4-9a92c6498981@gmail.com
Whole thread Raw
In response to Re: Split index and table statistics into different types of stats  (Melanie Plageman <melanieplageman@gmail.com>)
Responses Re: Split index and table statistics into different types of stats
List pgsql-hackers
Hi,

On 11/4/22 9:51 PM, Melanie Plageman wrote:
> Hi Bertrand,
> 
> I'm glad you are working on this.
> 
> I had a few thoughts/ideas
> 

Thanks for looking at it!

> It seems better to have all of the counts in the various stats structs
> not be prefixed with n_, i_, t_
> 
> typedef struct PgStat_StatDBEntry
> {
> ...
>      PgStat_Counter n_blocks_fetched;
>      PgStat_Counter n_blocks_hit;
>      PgStat_Counter n_tuples_returned;
>      PgStat_Counter n_tuples_fetched;
> ...
> 
> I've attached a patch (0002) to change this in case you are interested
> in making such a change

I did not renamed initially the fields/columns to ease the review.

Indeed, I think we should go further than removing the n_, i_ and t_ 
prefixes so that the fields actually match the view's columns.

For example, currently pg_stat_all_indexes.idx_tup_read is linked to 
"tuples_returned", so that it would make sense to rename 
"tuples_returned" to "tuples_read" or even "tup_read" in the indexes 
counters.

That's why I had in mind to do this fields/columns renaming into a 
separate patch (once this one is committed), so that the current one 
focus only on splitting the stats: what do you think?

> (I've attached all of my suggestions in patches
> along with your original patch so that cfbot still passes).
> 
> On Wed, Nov 2, 2022 at 5:00 AM Drouvot, Bertrand
> <bertranddrouvot.pg@gmail.com> wrote:
>> On 11/1/22 1:30 AM, Andres Freund wrote:
>>> On 2022-10-31 14:14:15 +0100, Drouvot, Bertrand wrote:
>>>> @@ -240,9 +293,23 @@ pg_stat_get_blocks_fetched(PG_FUNCTION_ARGS)
>>>>       PG_RETURN_INT64(result);
>>>>    }
>>>>
>>>> +Datum
>>>> +pg_stat_get_index_blocks_fetched(PG_FUNCTION_ARGS)
>>>> +{
>>>> +    Oid                     relid = PG_GETARG_OID(0);
>>>> +    int64           result;
>>>> +    PgStat_StatIndEntry *indentry;
>>>> +
>>>> +    if ((indentry = pgstat_fetch_stat_indentry(relid)) == NULL)
>>>> +            result = 0;
>>>> +    else
>>>> +            result = (int64) (indentry->blocks_fetched);
>>>> +
>>>> +    PG_RETURN_INT64(result);
>>>> +}
>>>
>>> We have so many copies of this by now - perhaps we first should deduplicate
>>> them somehow? Even if it's just a macro or such.
>>>
>>
>> Yeah good point, a new macro has been defined for the "int64" return
>> case in V3 attached.
> 
> I looked for other opportunities to de-duplicate, but most of the functions
> that were added that are identical except the return type and
> PgStat_Kind are short enough that it doesn't make sense to make wrappers
> or macros.
> 

Yeah, agree.

> I do think it makes sense to reorder the members of the two structs
> PgStat_IndexCounts and PgStat_TableCounts so that they have a common
> header. I've done that in the attached patch (0003).
> 

That's a good idea, thanks! But for that we would need to have the same 
fields names, means:

- Remove the prefixes (as you've done in 0002)
- And probably reduce the number of fields in the new 
PgStat_RelationCounts that 003 is introducing (for example 
tuples_returned should be excluded if we're going to rename it later on 
to "tuples_read" for the indexes to map the 
pg_stat_all_indexes.idx_tup_read column).

ISTM that we should do it in the "renaming" effort, after this patch is 
committed.

> In the flush functions, I was also thinking it might be nice to use the
> same pattern as is used in [1] and [2] to add the counts together. It
> makes the lines a bit easier to read, IMO. If we remove the prefixes
> from the count fields, this works for many of the fields. I've attached
> a patch (0004) that does something like this, in case you wanted to go
> in this direction.

I like it too but same remarks as previously. I think it should be part 
of the "renaming" effort.

> 
> Since you have made new parallel functions for indexes and tables for
> many of the functions in pgstat_relation.c, perhaps it makes sense to
> split it into pgstat_table.c and pgstat_index.c?

Good point, thanks, I'll work on it.

> 
> One question I had about the original code (not related to your
> refactor) is why the pending stats aren't memset in the flush functions
> after aggregating them into the shared stats.

Not sure I'm getting your point, do you think something is not right 
with the flush functions?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock
Next
From: Marina Polyakova
Date:
Subject: Re: Fix order of checking ICU options in initdb and create database