Re: Draft for basic NUMA observability - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: Draft for basic NUMA observability |
Date | |
Msg-id | cpzkjrlcgage2api6hushndya6i2yq7omjhga7tfp4ba3goyyb@53ot6clau7ij Whole thread Raw |
In response to | Re: Draft for basic NUMA observability (Tomas Vondra <tomas@vondra.me>) |
Responses |
Re: Draft for basic NUMA observability
Re: Draft for basic NUMA observability |
List | pgsql-hackers |
Hi, On 2025-04-07 18:36:24 +0200, Tomas Vondra wrote: > > Forcing all those pages to be allocated via pg_numa_touch_mem_if_required() > > itself wouldn't be too bad - in fact I'd rather like to have an explicit way > > of doing that. The problem is that that leads to all those allocations to > > happen on the *current* numa node (unless you have started postgres with > > numactl --interleave=all or such), rather than the node where the normal first > > use woul have allocated it. > > > > I agree, forcing those allocations to happen on a single node seems > rather unfortunate. But really, how likely is it that someone will run > this function on a cluster that hasn't already allocated this memory? I think it's not at all unlikely to have parts of shared buffers unused at the start of a benchmark, e.g. because the table sizes grow over time. > I'm not saying it can't happen, but we already have this issue if you > start and do a warmup from a single connection ... Indeed! We really need to fix this... > > > >> It's just that we don't have the memory mapped in the current backend, so > >> I'd bet people would not be happy with NULL, and would proceed to force the > >> allocation in some other way (say, a large query of some sort). Which > >> obviously causes a lot of other problems. > > > > I don't think that really would be the case with what I proposed? If any > > buffer in the region were valid, we would force the allocation to become known > > to the current backend. > > > > It's not quite clear to me what exactly are you proposing :-( > > I believe you're referring to this: > > > The only allocation where that really matters is shared_buffers. I wonder if > > we could special case the logic for that, by only probing if at least one of > > the buffers in the range is valid. > > > > Then we could treat a page status of -ENOENT as "page is not mapped" and > > display NULL for the node_id? > > > > Of course that would mean that we'd always need to > > pg_numa_touch_mem_if_required(), not just the first time round, because we > > previously might not have for a page that is now valid. But compared to the > > cost of actually allocating pages, the cost for that seems small. > > I suppose by "range" you mean buffers on a given memory page Correct. > and "valid" means BufferIsValid. I was thinking of checking if the BufferDesc indicates BM_VALID or BM_TAG_VALID. BufferIsValid() just does a range check :(. > Yeah, that probably means the memory page is allocated. But if the buffer is > invalid, it does not mean the memory is not allocated, right? So does it > make the buffer not interesting? Well, you don't have contents in it it can't really affect performance. But yea, I agree, it's not perfect either. > I think we need to decide whether the current patches are good enough > for PG18, with the current behavior, and then maybe improve that in > PG19. I think as long as the docs mention this with <note> or <warning> it's ok for now. Greetings, Andres Freund
pgsql-hackers by date: