Re: Draft for basic NUMA observability - Mailing list pgsql-hackers

From Jakub Wartak
Subject Re: Draft for basic NUMA observability
Date
Msg-id CAKZiRmw4OYiKEFXSAV6jt11SZOU0q1j4CKsZYcDH3qV1_+tN8A@mail.gmail.com
Whole thread Raw
In response to Re: Draft for basic NUMA observability  (Tomas Vondra <tomas@vondra.me>)
List pgsql-hackers
On Thu, Apr 3, 2025 at 2:15 PM Tomas Vondra <tomas@vondra.me> wrote:

> Ah, OK. Jakub, can you correct (and double-check) this in the next
> version of the patch?

Done.

> > About v21-0002:
> >
> > === 1
> >
> > I can see that the pg_buffercache_init_entries() helper comments are added in
> > v21-0003 but I think that it would be better to add them in v21-0002
> > (where the helper is actually created).
> >
>
> +1 to that

Done.

> > About v21-0003:
> >
> > === 2
> >
> >> I hear you, attached v21 / 0003 is free of float/double arithmetics
> >> and uses non-float point values.
> >
> > +               if (buffers_per_page > 1)
> > +                       os_page_query_count = NBuffers;
> > +               else
> > +                       os_page_query_count = NBuffers * pages_per_buffer;
> >
> > yeah, that's more elegant. I think that it properly handles the relationships
> > between buffer and page sizes without relying on float arithmetic.
> >
>
> In the review I just submitted, I changed this to
>
>     os_page_query_count = Max(NBuffers, NBuffers * pages_per_buffer);
>
> but maybe it's less clear. Feel free to undo my change.

Cool, thanks, will send shortly v23 with this applied.

>
> > === 3
> >
> > +                       if (buffers_per_page > 1)
> > +                       {
> >
> > As buffers_per_page does not change, I think I'd put this check outside of the
> > for (i = 0; i < NBuffers; i++) loop, something like:
> >
> > "
> > if (buffers_per_page > 1) {
> >     /* BLCKSZ < PAGESIZE: one page hosts many Buffers */
> >     for (i = 0; i < NBuffers; i++) {
> > .
> > .
> > .
> > .
> > } else {
> >     /* BLCKSZ >= PAGESIZE: Buffer occupies more than one OS page */
> >     for (i = 0; i < NBuffers; i++) {
> > .
> > .
>
> I don't know. It's a matter of opinion, but I find the current code
> fairly understandable. Maybe if it meaningfully reduced the code
> nesting, but even with the extra branch we'd still need the for loop.
> I'm not against doing this differently, but I'd have to see how that
> looks. Until then I think it's fine to have the code as is.

v23 will have incorporated Bertrand's idea soon. No hard feelings, but
it's kind of painful to switch like that ;)

> > === 4
> >
> >> That _numa_prepare_ptrs() is unused and will need to be removed,
> >> but we can still move some code there if necessary.
> >
> > Yeah I think that it can be simply removed then.
> >
>
> I'm not particularly attached on having the _ptrs() function, but why
> couldn't it build the os_page_ptrs array as before?

I've removed it in v23, the code for me just didn't have flow...

> > === 5
> >
> >> @Bertrand: do you have anything against pg_shm_allocations_numa
> >> instead of pg_shm_numa_allocations? I don't mind changing it...
> >
> > I think that pg_shm_allocations_numa() is better (given the examples you just
> > shared).

Done.

> > === 6
> >
> >> I find all of those non-user friendly and I'm afraid I won't be able
> >> to pull that alone in time...
> >
> > Maybe we could add a few words in the doc to mention the "multiple nodes per
> > buffer" case? And try to improve it for say 19? Also maybe we should just focus
> > till v21-0003 (and discard v21-0004 for 18).
> >
>
> IMHO it's not enough to paper this over by mentioning it in the docs.

OK

> I'm a bit confused about which patch you suggest to leave out. I think
> 0004 is pg_shmem_allocations_numa, but I think that part is fine, no?
> We've been discussing how to represent nodes for buffers in 0003.

IMHO 0001 + 0004 is good. 0003 is probably the last troublemaker, but
we settled on arrays right?

> I understand it'd require a bit of coding, but AFAICS adding an array
> would be fairly trivial amount of code. Something like
>
>   values[i++]
>     = PointerGetDatum(construct_array_builtin(nodes, nodecnt, INT4OID));
>
> would do, I think. But if we decide to do the 3-column view, that'd be
> even simpler, I think. AFAICS it means we could mostly ditch the
> pg_buffercache refactoring in patch 0002, and 0003 would get much
> simpler too I think.
>
> If Jakub doesn't have time to work on this, I can take a stab at it
> later today.

I won't be able to even start on this today, so if you have cycles for
please do so...

-J.



pgsql-hackers by date:

Previous
From: Bertrand Drouvot
Date:
Subject: Re: Draft for basic NUMA observability
Next
From: Heikki Linnakangas
Date:
Subject: Re: libpq support for NegotiateProtocolVersion