Re: Draft for basic NUMA observability - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Draft for basic NUMA observability
Date
Msg-id c8d8c6d6-fb8a-4303-a471-f4a54a023328@vondra.me
Whole thread Raw
In response to Re: Draft for basic NUMA observability  (Bertrand Drouvot <bertranddrouvot.pg@gmail.com>)
Responses Re: Draft for basic NUMA observability
List pgsql-hackers
On 4/3/25 10:23, Bertrand Drouvot wrote:
> Hi,
> 
> On Thu, Apr 03, 2025 at 09:01:43AM +0200, Jakub Wartak wrote:
>> On Wed, Apr 2, 2025 at 6:40 PM Tomas Vondra <tomas@vondra.me> wrote:
>>
>> Hi Tomas,
>>
>>> OK, so you agree the commit messages are complete / correct?
>>
>> Yes.
> 
> Not 100% sure on my side. 
> 
> === v21-0002
> 
> Says:
> 
> "
> This introduces three new functions:
> 
>     - pg_buffercache_init_entries
>     - pg_buffercache_build_tuple
>     - get_buffercache_tuple
> "
> 
> While pg_buffercache_build_tuple() is not added (pg_buffercache_save_tuple()
> is).
> 

Ah, OK. Jakub, can you correct (and double-check) this in the next
version of the patch?

> About v21-0002:
> 
> === 1
> 
> I can see that the pg_buffercache_init_entries() helper comments are added in
> v21-0003 but I think that it would be better to add them in v21-0002
> (where the helper is actually created).
> 

+1 to that

> About v21-0003:
> 
> === 2
> 
>> I hear you, attached v21 / 0003 is free of float/double arithmetics
>> and uses non-float point values.
> 
> +               if (buffers_per_page > 1)
> +                       os_page_query_count = NBuffers;
> +               else
> +                       os_page_query_count = NBuffers * pages_per_buffer;
> 
> yeah, that's more elegant. I think that it properly handles the relationships
> between buffer and page sizes without relying on float arithmetic.
> 

In the review I just submitted, I changed this to

    os_page_query_count = Max(NBuffers, NBuffers * pages_per_buffer);

but maybe it's less clear. Feel free to undo my change.

> === 3
> 
> +                       if (buffers_per_page > 1)
> +                       {
> 
> As buffers_per_page does not change, I think I'd put this check outside of the
> for (i = 0; i < NBuffers; i++) loop, something like:
> 
> "
> if (buffers_per_page > 1) {
>     /* BLCKSZ < PAGESIZE: one page hosts many Buffers */
>     for (i = 0; i < NBuffers; i++) {
> .
> .
> .
> .
> } else {
>     /* BLCKSZ >= PAGESIZE: Buffer occupies more than one OS page */
>     for (i = 0; i < NBuffers; i++) {
> .
> .

I don't know. It's a matter of opinion, but I find the current code
fairly understandable. Maybe if it meaningfully reduced the code
nesting, but even with the extra branch we'd still need the for loop.
I'm not against doing this differently, but I'd have to see how that
looks. Until then I think it's fine to have the code as is.

> .
> "
> 
> === 4
> 
>> That _numa_prepare_ptrs() is unused and will need to be removed,
>> but we can still move some code there if necessary.
> 
> Yeah I think that it can be simply removed then.
> 

I'm not particularly attached on having the _ptrs() function, but why
couldn't it build the os_page_ptrs array as before?

> === 5
> 
>> @Bertrand: do you have anything against pg_shm_allocations_numa
>> instead of pg_shm_numa_allocations? I don't mind changing it...
> 
> I think that pg_shm_allocations_numa() is better (given the examples you just
> shared).
> 
> === 6
> 
>> I find all of those non-user friendly and I'm afraid I won't be able
>> to pull that alone in time...
> 
> Maybe we could add a few words in the doc to mention the "multiple nodes per
> buffer" case? And try to improve it for say 19? Also maybe we should just focus
> till v21-0003 (and discard v21-0004 for 18).
> 

IMHO it's not enough to paper this over by mentioning it in the docs.

I'm a bit confused about which patch you suggest to leave out. I think
0004 is pg_shmem_allocations_numa, but I think that part is fine, no?
We've been discussing how to represent nodes for buffers in 0003.

I understand it'd require a bit of coding, but AFAICS adding an array
would be fairly trivial amount of code. Something like

  values[i++]
    = PointerGetDatum(construct_array_builtin(nodes, nodecnt, INT4OID));

would do, I think. But if we decide to do the 3-column view, that'd be
even simpler, I think. AFAICS it means we could mostly ditch the
pg_buffercache refactoring in patch 0002, and 0003 would get much
simpler too I think.

If Jakub doesn't have time to work on this, I can take a stab at it
later today.


regards

-- 
Tomas Vondra




pgsql-hackers by date:

Previous
From: Shlok Kyal
Date:
Subject: Re: Proposal: Filter irrelevant change before reassemble transactions during logical decoding
Next
From: Ashutosh Bapat
Date:
Subject: Detach partition with constraint test