Re: Adding basic NUMA awareness - Preliminary feedback and outline for an extensible approach - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: Adding basic NUMA awareness - Preliminary feedback and outline for an extensible approach |
Date | |
Msg-id | 20537e21-3a32-4941-91eb-20bdfdb96e26@vondra.me Whole thread Raw |
In response to | Re: Adding basic NUMA awareness - Preliminary feedback and outline for an extensible approach (Cédric Villemain <cedric.villemain@data-bene.io>) |
List | pgsql-hackers |
On 7/9/25 08:40, Cédric Villemain wrote: >> On 7/8/25 18:06, Cédric Villemain wrote: >>> >>> >>> >>> >>> >>> >>>> On 7/8/25 03:55, Cédric Villemain wrote: >>>>> Hi Andres, >>>>> >>>>>> Hi, >>>>>> >>>>>> On 2025-07-05 07:09:00 +0000, Cédric Villemain wrote: >>>>>>> In my work on more careful PostgreSQL resource management, I've come >>>>>>> to the >>>>>>> conclusion that we should avoid pushing policy too deeply into the >>>>>>> PostgreSQL core itself. Therefore, I'm quite skeptical about >>>>>>> integrating >>>>>>> NUMA-specific management directly into core PostgreSQL in such a >>>>>>> way. >>>>>> >>>>>> I think it's actually the opposite - whenever we pushed stuff like >>>>>> this >>>>>> outside of core it has hurt postgres substantially. Not having >>>>>> replication in >>>>>> core was a huge mistake. Not having HA management in core is >>>>>> probably the >>>>>> biggest current adoption hurdle for postgres. >>>>>> >>>>>> To deal better with NUMA we need to improve memory placement and >>>>>> various >>>>>> algorithms, in an interrelated way - that's pretty much impossible >>>>>> to do >>>>>> outside of core. >>>>> >>>>> Except the backend pinning which is easy to achieve, thus my >>>>> comment on >>>>> the related patch. >>>>> I'm not claiming NUMA memory and all should be managed outside of core >>>>> (though I didn't read other patches yet). >>>>> >>>> >>>> But an "optimal backend placement" seems to very much depend on >>>> where we >>>> placed the various pieces of shared memory. Which the external module >>>> will have trouble following, I suspect. >>>> >>>> I still don't have any idea what exactly would the external module do, >>>> how would it decide where to place the backend. Can you describe some >>>> use case with an example? >>>> >>>> Assuming we want to actually pin tasks from within Postgres, what I >>>> think might work is allowing modules to "advise" on where to place the >>>> task. But the decision would still be done by core. >>> >>> Possibly exactly what you're doing in proc.c when managing allocation of >>> process, but not hardcoded in postgresql (patches 02, 05 and 06 are good >>> candidates), I didn't get that they require information not available to >>> any process executing code from a module. >>> >> >> Well, it needs to understand how some other stuff (especially PGPROC >> entries) is distributed between nodes. I'm not sure how much of this >> internal information we want to expose outside core ... >> >>> Parts of your code where you assign/define policy could be in one or >>> more relevant routines of a "numa profile manager", like in an >>> initProcessRoutine(), and registered in pmroutine struct: >>> >>> pmroutine = GetPmRoutineForInitProcess(); >>> if (pmroutine != NULL && >>> pmroutine->init_process != NULL) >>> pmroutine->init_process(MyProc); >>> >>> This way it's easier to manage alternative policies, and also to be able >>> to adjust when hardware and linux kernel changes. >>> >> >> I'm not against making this extensible, in some way. But I still >> struggle to imagine a reasonable alternative policy, where the external >> module gets the same information and ends up with a different decision. >> >> So what would the alternate policy look like? What use case would the >> module be supporting? > > > That's the whole point: there are very distinct usages of PostgreSQL in > the field. And maybe not all of them will require the policy defined by > PostgreSQL core. > > May I ask the reverse: what prevent external modules from taking those > decisions ? There are already a lot of area where external code can take > over PostgreSQL processing, like Neon is doing. > The complexity of making everything extensible in an arbitrary way. To make it extensible in a useful, we need to have a reasonably clear idea what aspects need to be extensible, and what's the goal. > There are some very early processing for memory setup that I can see as > a current blocker, and here I'd refer a more compliant NUMA api as > proposed by Jakub so it's possible to arrange based on workload, > hardware configuration or other matters. Reworking to get distinct > segment and all as you do is great, and combo of both approach probably > of great interest. There is also this weighted interleave discussed and > probably much more to come in this area in Linux. > > I think some points raised already about possible distinct policies, I > am precisely claiming that it is hard to come with one good policy with > limited setup options, thus requirement to keep that flexible enough > (hooks, api, 100 GUc ?). > I'm sorry, I don't want to sound too negative, but "I want arbitrary extensibility" is not a very useful feedback. I've asked you to give some examples of policies that'd customize some of the NUMA stuff. > There is an EPYC story here also, given the NUMA setup can vary > depending on BIOS setup, associated NUMA policy must probably take that > into account (L3 can be either real cache or 4 extra "local" NUMA nodes > - with highly distinct access cost from a RAM module). > Does that change how PostgreSQL will place memory and process? Is it > important or of interest ? > So how exactly would the policy handle this? Right now we're entirely oblivious to L3, or on-CPU caches in general. We don't even consider the size of L3 when sizing hash tables in a hashjoin etc. regards -- Tomas Vondra
pgsql-hackers by date: