Re: Draft for basic NUMA observability - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Draft for basic NUMA observability
Date
Msg-id c7ce0a21-18dd-4682-bcd1-f9424bb7e2b9@vondra.me
Whole thread Raw
In response to Re: Draft for basic NUMA observability  (Jakub Wartak <jakub.wartak@enterprisedb.com>)
Responses Re: Draft for basic NUMA observability
List pgsql-hackers
Hi,

I've spent a bit of time reviewing this. In general I haven't found
anything I'd call a bug, but here's a couple comments for v18 ... Most
of this is in separate "review" commits, with a couple exceptions.

1) Please update the commit messages, with proper formatting, etc. I
tried to do that in the attached v19, but please go through that, add
relevant details, update list of reviewers, etc. The subject should not
be overly long, etc.

2) I don't think we need "libnuma for NUMA awareness" in configure, I'd
use just "libnuma support" similar to other libraries.

3) I don't think we need pg_numa.h to have this:

extern PGDLLIMPORT Datum pg_numa_available(PG_FUNCTION_ARGS);

AFAICS we don't have any SQL functions exposed as PGDLLIMPORT, so why
would it be necessary here? It's enough to have a prototype in .c file.

4) Improved .sgml to have acronym/productname in a couple places.

5) I don't think the comment for pg_buffercache_init_entries() is very
useful. That it's helper for pg_buffercache_pages() tells me nothing
about how to use it, what the arguments are, etc.

6) IMHO pg_buffercache_numa_prepare_ptrs() would deserve a better
comment too. I mean, there's no info about what the arguments are, which
arguments are input or output, etc. And it only discussed one option
(block page < memory page), but what happens in the other case? The
formulas with blk2page/blk2pageoff are not quite clear to me (I'm not
saying it's wrong).

However, it seems rather suspicious that pages_per_blk is calculated as
float, and pg_buffercache_numa_prepare_ptrs() then does this:

    for (size_t j = 0; j < pages_per_blk; j++)
    { ... }

I mean, isn't this vulnerable to rounding errors, which might trigger
some weird behavior? If not, it'd be good to add a comment why this is
fine, it confuses me a lot. I personally would probably prefer doing
just integer arithmetic here.


7) This para in the docs seems self-contradictory:

 <para>
  The <function>pg_buffercache_numa_pages()</function> provides the same
information
  as <function>pg_buffercache_pages()</function> but is slower because
it also
  provides the <acronym>NUMA</acronym> node ID per shared buffer entry.
  The <structname>pg_buffercache_numa</structname> view wraps the
function for
  convenient use.
 </para>

I mean, "provides the same information, but is slower because it
provides different information" is strange. I understand the point, but
maybe rephrase somehow?

8) Why is pg_numa_available() marked as volatile? Should not change in a
running cluster, no?

9) I noticed the new SGML docs have IDs with mixed "-" and "_". Maybe
let's not do that.

   <sect2 id="pgbuffercache-pg-buffercache_numa">

10) I think it'd be good to mention/explain why move_pages is used
instead of get_mempolicy - more efficient with batching, etc. This would
be useful both in the commit message and before the move_pages call (and
in general to explain why pg_buffercache_numa_prepare_ptrs prepares the
pointers like this etc.).

11) This could use UINT64_FORMAT, instead of a cast:

    elog(DEBUG1, "NUMA: os_page_count=%lu os_page_size=%zu
pages_per_blk=%.2f",
        (unsigned long) os_page_count, os_page_size, pages_per_blk);


regards

-- 
Tomas Vondra

Attachment

pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: TEMP_CONFIG vs test_aio
Next
From: Daniel Gustafsson
Date:
Subject: Re: Adding support for SSLKEYLOGFILE in the frontend