Re: Draft for basic NUMA observability - Mailing list pgsql-hackers
From | Bertrand Drouvot |
---|---|
Subject | Re: Draft for basic NUMA observability |
Date | |
Msg-id | Z9LockmbFInuSt0X@ip-10-97-1-34.eu-west-3.compute.internal Whole thread Raw |
In response to | Re: Draft for basic NUMA observability (Jakub Wartak <jakub.wartak@enterprisedb.com>) |
Responses |
Re: Draft for basic NUMA observability
|
List | pgsql-hackers |
Hi, On Wed, Mar 12, 2025 at 04:41:15PM +0100, Jakub Wartak wrote: > On Mon, Mar 10, 2025 at 11:14 AM Bertrand Drouvot > <bertranddrouvot.pg@gmail.com> wrote: > > > Thanks for the new version! > > v10 is attached with most fixes after review and one new thing > introduced: pg_numa_available() for run-time decision inside tests > which was needed after simplifying code a little bit as you wanted. > I've also fixed -Dlibnuma=disabled as it was not properly implemented. > There are couple open items (minor/decision things), but most is fixed > or answered: Thanks for the new version! > > === 2 > > > > +else > > + as_fn_error $? "header file <numa.h> is required for --with-libnuma" "$LINENO" 5 > > +fi > > > > Maybe we should add the same test (checking for numa.h) for meson? > > TBH, I have no idea, libnuma.h may exist but it may not link e.g. when > cross-compiling 32-bit on 64-bit. Or is this more about keeping sync > between meson and autoconf? Yeah, idea was to have both in sync. > > === 8 > > > > # create extension pg_buffercache; > > ERROR: could not load library "/home/postgres/postgresql/pg_installed/pg18/lib/pg_buffercache.so": /home/postgres/postgresql/pg_installed/pg18/lib/pg_buffercache.so:undefined symbol: pg_numa_query_pages > > CONTEXT: SQL statement "CREATE FUNCTION pg_buffercache_pages() > > RETURNS SETOF RECORD > > AS '$libdir/pg_buffercache', 'pg_buffercache_pages' > > LANGUAGE C PARALLEL SAFE" > > extension script file "pg_buffercache--1.2.sql", near line 7 > > > > While that's ok if 0003 is applied. I think that each individual patch should > > "fully" work individually. > > STILL OPEN QUESTION: Not sure I understand: you need to have 0001 + > 0002 or 0001 + 0003, but here 0002 is complaining about lack of > pg_numa_query_pages() which is part of 0001 (?) Should I merge those > patches or keep them separate? Applying 0001 + 0002 only does produce the issue above. I think that each sub-patch once applied should pass make check-world, that means: 0001 : should pass 0001 + 0002 : should pass 0001 + 0002 + 0003 : should pass > > Also I need to think more about how firstUseInBackend is used, for example: > > > > ==== 17.1 > > > > would it be better to define it as a file scope variable? (at least that > > would avoid to pass it as an extra parameter in some functions). > > I have no strong opinion on this, but I have one doubt (for future): > isn't creating global variables making life harder for upcoming > multithreading guys ? That would be "just" one more. I think it's better to use it to avoid the "current" code using "useless" function parameters. > > === 17.2 > > > > what happens if firstUseInBackend is set to false and later on the pages > > are moved to different NUMA nodes. Then pg_buffercache_numa_pages() is > > called again by a backend that already had set firstUseInBackend to false, > > would that provide accurate information? > > It is still the correct result. That "touching" (paging-in) is only > necessary probably to properly resolve PTEs as the fork() does not > seem to carry them over from parent: > > So the above clearly shows that initial touching of shm is required, > but just once and it stays valid afterwards. Great, thanks for the demo and the explanation! > > === 19 > > > > Can you please take a look again on this Sure, will do. > > === 23 > > > > + if (s >= 0) > > + zones[s]++; > > > > should we also check that s is below a limit? > > STILL OPEN QUESTION: I'm not sure it would give us value to report > e.g. -2 on per shm entry/per numa node entry, would it? If it would we > could somehow overallocate that array and remember negative ones too. I meant to say, ensure that it is below the max node number. > > === 24 > > > > Regarding how we make use of pg_numa_get_max_node(), are we sure there is > > no possible holes? I mean could a system have node 0,1 and 3 but not 2? > > I have never seen a hole in numbering as it is established during > boot, and the only way that could get close to adjusting it could be > making processor books (CPU and RAM together) offline. Yeah probably. > Even with all of that, that still > shouldn't cause issues for this code I think, because > `numa_max_node()` says it `returns the >> highest node number <<< > available on the current system.` Yeah, I agree, thanks for clearing my doubts on it. > > Also I don't think I'm a Co-author, I think I'm just a reviewer (even if I > > did a little in 0001 though) > > That was an attempt to say "Thank You", Thanks a lot! OTOH, I don't want to get credit for something that I did not do ;-) I'll have a look at v11 soon. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
pgsql-hackers by date: