Re: Draft for basic NUMA observability - Mailing list pgsql-hackers
From | Bertrand Drouvot |
---|---|
Subject | Re: Draft for basic NUMA observability |
Date | |
Msg-id | Z867b09EIZncOsmV@ip-10-97-1-34.eu-west-3.compute.internal Whole thread Raw |
In response to | Re: Draft for basic NUMA observability (Jakub Wartak <jakub.wartak@enterprisedb.com>) |
Responses |
Re: Draft for basic NUMA observability
|
List | pgsql-hackers |
Hi, On Fri, Mar 07, 2025 at 12:33:27PM +0100, Jakub Wartak wrote: > On Fri, Mar 7, 2025 at 11:20 AM Jakub Wartak > <jakub.wartak@enterprisedb.com> wrote: > > > > Hi, > > On Wed, Mar 5, 2025 at 10:30 AM Jakub Wartak > > <jakub.wartak@enterprisedb.com> wrote: > > >Hi, > > > > > > Yeah, that's why I was mentioning to use a "shared" populate_buffercache_entry() > > > > or such function: to put the "duplicated" code in it and then use this > > > > shared function in pg_buffercache_pages() and in the new numa related one. > > > > > > OK, so hastily attempted that in 7b , I had to do a larger refactor > > > there to avoid code duplication between those two. I don't know which > > > attempt is better though (7 vs 7b).. > > > > > > > I'm attaching basically the earlier stuff (v7b) as v8 with the > > following minor changes: > > - docs are included > > - changed int8 to int4 in one function definition for numa_zone_id > > .. and v9 attached because cfbot partially complained about > .cirrus.tasks.yml being adjusted recently (it seems master is hot > these days). > Thanks for the new version! Some random comments on 0001: === 1 It does not compiles "alone". It's missing: +++ b/src/include/storage/pg_shmem.h @@ -45,6 +45,7 @@ typedef struct PGShmemHeader /* standard header for all Postgres shmem */ extern PGDLLIMPORT int shared_memory_type; extern PGDLLIMPORT int huge_pages; extern PGDLLIMPORT int huge_page_size; +extern PGDLLIMPORT int huge_pages_status; and --- a/src/backend/utils/misc/guc_tables.c +++ b/src/backend/utils/misc/guc_tables.c @@ -563,7 +563,7 @@ static int ssl_renegotiation_limit; */ int huge_pages = HUGE_PAGES_TRY; int huge_page_size; -static int huge_pages_status = HUGE_PAGES_UNKNOWN; +int huge_pages_status = HUGE_PAGES_UNKNOWN; That come with 0002. So it has to be in 0001 instead. === 2 +else + as_fn_error $? "header file <numa.h> is required for --with-libnuma" "$LINENO" 5 +fi Maybe we should add the same test (checking for numa.h) for meson? === 3 +# FIXME: filter-out / with/without with_libnuma? +LIBS += $(LIBNUMA_LIBS) It looks to me that we can remove those 2 lines. === 4 + Only supported on Linux. s/on Linux/on platforms for which the libnuma library is implemented/? I did a quick grep on "Only supported on" and it looks like that could be a more consistent wording. === 5 +#include "c.h" +#include "postgres.h" +#include "port/pg_numa.h" +#include "storage/pg_shmem.h" +#include <unistd.h> I had a closer look to other header files and it looks like it "should" be: #include "c.h" #include "postgres.h" #include <unistd.h> #ifdef WIN32 #include <windows.h> #endif #include "port/pg_numa.h" #include "storage/pg_shmem.h" And is "#include "c.h"" really needed? === 6 +/* FIXME not tested, might crash */ That's a bit scary. === 7 + * XXX: for now we issue just WARNING, but long-term that might depend on + * numa_set_strict() here s/here/here./ Did not look carefully at all the comments in 0001, 0002 and 0003 though. A few random comments regarding 0002: === 8 # create extension pg_buffercache; ERROR: could not load library "/home/postgres/postgresql/pg_installed/pg18/lib/pg_buffercache.so": /home/postgres/postgresql/pg_installed/pg18/lib/pg_buffercache.so:undefined symbol: pg_numa_query_pages CONTEXT: SQL statement "CREATE FUNCTION pg_buffercache_pages() RETURNS SETOF RECORD AS '$libdir/pg_buffercache', 'pg_buffercache_pages' LANGUAGE C PARALLEL SAFE" extension script file "pg_buffercache--1.2.sql", near line 7 While that's ok if 0003 is applied. I think that each individual patch should "fully" work individually. === 9 + do + { + if (os_page_ptrs[blk2page + j] == 0) blk2page + j will be repeated multiple times, better to store it in a local variable instead? === 10 + if (firstUseInBackend == true) if (firstUseInBackend) instead? === 11 + int j = 0, + blk2page = (int) i * pages_per_blk; I wonder if size_t is more appropriate for blk2page: size_t blk2page = (size_t)(i * pages_per_blk) maybe? === 12 as we know that we'll iterate until pages_per_blk then would a for loop be more appropriate here, something like? " for (size_t j = 0; j < pages_per_blk; j++) { if (os_page_ptrs[blk2page + j] == 0) { " === 13 + if (buf_state & BM_DIRTY) + fctx->record[i].isdirty = true; + else + fctx->record[i].isdirty = false; fctx->record[i].isdirty = (buf_state & BM_DIRTY) != 0 maybe? === 14 + if ((buf_state & BM_VALID) && (buf_state & BM_TAG_VALID)) + fctx->record[i].isvalid = true; + else + fctx->record[i].isvalid = false; fctx->record[i].isvalid = ((buf_state & BM_VALID) && (buf_state & BM_TAG_VALID)) maybe? === 15 +populate_buffercache_entry(int i, BufferCachePagesContext *fctx) I wonder if "i" should get a more descriptive name? === 16 s/populate_buffercache_entry/pg_buffercache_build_tuple/? (to be consistent with pg_stat_io_build_tuples for example). I now realize that I did propose populate_buffercache_entry() up-thread, sorry for changing my mind. === 17 + static bool firstUseInBackend = true; maybe we should give it a more descriptive name? Also I need to think more about how firstUseInBackend is used, for example: ==== 17.1 would it be better to define it as a file scope variable? (at least that would avoid to pass it as an extra parameter in some functions). === 17.2 what happens if firstUseInBackend is set to false and later on the pages are moved to different NUMA nodes. Then pg_buffercache_numa_pages() is called again by a backend that already had set firstUseInBackend to false, would that provide accurate information? === 18 +Datum +pg_buffercache_numa_pages(PG_FUNCTION_ARGS) +{ + FuncCallContext *funcctx; + BufferCachePagesContext *fctx; /* User function context. */ + bool query_numa = true; I don't think we need query_numa anymore in pg_buffercache_numa_pages(). I think that we can just ERROR (or NOTICE and return) here: + if (pg_numa_init() == -1) + { + elog(NOTICE, "libnuma initialization failed or NUMA is not supported on this platform, some NUMAdata might be unavailable."); + query_numa = false; and fully get rid of query_numa. === 19 And then here: for (i = 0; i < NBuffers; i++) { populate_buffercache_entry(i, fctx); if (query_numa) pg_buffercache_numa_prepare_ptrs(i, pages_per_blk, os_page_size, os_page_ptrs, firstUseInBackend); } if (query_numa) { if (pg_numa_query_pages(0, os_page_count, os_page_ptrs, os_pages_status) == -1) elog(ERROR, "failed NUMA pages inquiry: %m"); for (i = 0; i < NBuffers; i++) { int blk2page = (int) i * pages_per_blk; /* * Technically we can get errors too here and pass that to * user. Also we could somehow report single DB block spanning * more than one NUMA zone, but it should be rare. */ fctx->record[i].numa_zone_id = os_pages_status[blk2page]; } maybe we can just loop a single time over "for (i = 0; i < NBuffers; i++)"? A few random comments regarding 0003: === 20 + <indexterm zone="view-pg-shmem-numa-allocations"> + <primary>pg_shmem_numa_allocations</primary> + </indexterm> + + <para> + The <structname>pg_shmem_allocations</structname> view shows NUMA nodes s/pg_shmem_allocations/pg_shmem_numa_allocations/? === 21 + /* FIXME: should we release LWlock here ? */ + elog(ERROR, "failed NUMA pages inquiry status: %m"); There is no need, see src/backend/storage/lmgr/README. === 22 +#define MAX_NUMA_ZONES 32 /* FIXME? */ + Size zones[MAX_NUMA_ZONES]; could we rely on pg_numa_get_max_node() instead? === 23 + if (s >= 0) + zones[s]++; should we also check that s is below a limit? === 24 Regarding how we make use of pg_numa_get_max_node(), are we sure there is no possible holes? I mean could a system have node 0,1 and 3 but not 2? Also I don't think I'm a Co-author, I think I'm just a reviewer (even if I did a little in 0001 though) Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
pgsql-hackers by date: