Re: Draft for basic NUMA observability - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Draft for basic NUMA observability
Date
Msg-id cpzkjrlcgage2api6hushndya6i2yq7omjhga7tfp4ba3goyyb@53ot6clau7ij
Whole thread Raw
In response to Re: Draft for basic NUMA observability  (Tomas Vondra <tomas@vondra.me>)
Responses Re: Draft for basic NUMA observability
Re: Draft for basic NUMA observability
List pgsql-hackers
Hi,

On 2025-04-07 18:36:24 +0200, Tomas Vondra wrote:
> > Forcing all those pages to be allocated via pg_numa_touch_mem_if_required()
> > itself wouldn't be too bad - in fact I'd rather like to have an explicit way
> > of doing that.  The problem is that that leads to all those allocations to
> > happen on the *current* numa node (unless you have started postgres with
> > numactl --interleave=all or such), rather than the node where the normal first
> > use woul have allocated it.
> > 
> 
> I agree, forcing those allocations to happen on a single node seems
> rather unfortunate. But really, how likely is it that someone will run
> this function on a cluster that hasn't already allocated this memory?

I think it's not at all unlikely to have parts of shared buffers unused at the
start of a benchmark, e.g. because the table sizes grow over time.


> I'm not saying it can't happen, but we already have this issue if you
> start and do a warmup from a single connection ...

Indeed!  We really need to fix this...


> > 
> >> It's just that we don't have the memory mapped in the current backend, so
> >> I'd bet people would not be happy with NULL, and would proceed to force the
> >> allocation in some other way (say, a large query of some sort). Which
> >> obviously causes a lot of other problems.
> > 
> > I don't think that really would be the case with what I proposed? If any
> > buffer in the region were valid, we would force the allocation to become known
> > to the current backend.
> > 
> 
> It's not quite clear to me what exactly are you proposing :-(
> 
> I believe you're referring to this:
> 
> > The only allocation where that really matters is shared_buffers. I wonder if
> > we could special case the logic for that, by only probing if at least one of
> > the buffers in the range is valid.
> > 
> > Then we could treat a page status of -ENOENT as "page is not mapped" and
> > display NULL for the node_id?
> > 
> > Of course that would mean that we'd always need to
> > pg_numa_touch_mem_if_required(), not just the first time round, because we
> > previously might not have for a page that is now valid.  But compared to the
> > cost of actually allocating pages, the cost for that seems small.
> 
> I suppose by "range" you mean buffers on a given memory page

Correct.

> and  "valid" means BufferIsValid.

I was thinking of checking if the BufferDesc indicates BM_VALID or
BM_TAG_VALID.

BufferIsValid() just does a range check :(.


> Yeah, that probably means the memory page is allocated. But if the buffer is
> invalid, it does not mean the memory is not allocated, right? So does it
> make the buffer not interesting?

Well, you don't have contents in it it can't really affect performance.  But
yea, I agree, it's not perfect either.


> I think we need to decide whether the current patches are good enough
> for PG18, with the current behavior, and then maybe improve that in
> PG19.

I think as long as the docs mention this with <note> or <warning> it's ok for
now.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Jacob Champion
Date:
Subject: Re: [PoC] Federated Authn/z with OAUTHBEARER
Next
From: Andres Freund
Date:
Subject: Re: [PoC] Federated Authn/z with OAUTHBEARER