Hi,
I've spent a bit of time reviewing this. In general I haven't found
anything I'd call a bug, but here's a couple comments for v18 ... Most
of this is in separate "review" commits, with a couple exceptions.
1) Please update the commit messages, with proper formatting, etc. I
tried to do that in the attached v19, but please go through that, add
relevant details, update list of reviewers, etc. The subject should not
be overly long, etc.
2) I don't think we need "libnuma for NUMA awareness" in configure, I'd
use just "libnuma support" similar to other libraries.
3) I don't think we need pg_numa.h to have this:
extern PGDLLIMPORT Datum pg_numa_available(PG_FUNCTION_ARGS);
AFAICS we don't have any SQL functions exposed as PGDLLIMPORT, so why
would it be necessary here? It's enough to have a prototype in .c file.
4) Improved .sgml to have acronym/productname in a couple places.
5) I don't think the comment for pg_buffercache_init_entries() is very
useful. That it's helper for pg_buffercache_pages() tells me nothing
about how to use it, what the arguments are, etc.
6) IMHO pg_buffercache_numa_prepare_ptrs() would deserve a better
comment too. I mean, there's no info about what the arguments are, which
arguments are input or output, etc. And it only discussed one option
(block page < memory page), but what happens in the other case? The
formulas with blk2page/blk2pageoff are not quite clear to me (I'm not
saying it's wrong).
However, it seems rather suspicious that pages_per_blk is calculated as
float, and pg_buffercache_numa_prepare_ptrs() then does this:
for (size_t j = 0; j < pages_per_blk; j++)
{ ... }
I mean, isn't this vulnerable to rounding errors, which might trigger
some weird behavior? If not, it'd be good to add a comment why this is
fine, it confuses me a lot. I personally would probably prefer doing
just integer arithmetic here.
7) This para in the docs seems self-contradictory:
<para>
The <function>pg_buffercache_numa_pages()</function> provides the same
information
as <function>pg_buffercache_pages()</function> but is slower because
it also
provides the <acronym>NUMA</acronym> node ID per shared buffer entry.
The <structname>pg_buffercache_numa</structname> view wraps the
function for
convenient use.
</para>
I mean, "provides the same information, but is slower because it
provides different information" is strange. I understand the point, but
maybe rephrase somehow?
8) Why is pg_numa_available() marked as volatile? Should not change in a
running cluster, no?
9) I noticed the new SGML docs have IDs with mixed "-" and "_". Maybe
let's not do that.
<sect2 id="pgbuffercache-pg-buffercache_numa">
10) I think it'd be good to mention/explain why move_pages is used
instead of get_mempolicy - more efficient with batching, etc. This would
be useful both in the commit message and before the move_pages call (and
in general to explain why pg_buffercache_numa_prepare_ptrs prepares the
pointers like this etc.).
11) This could use UINT64_FORMAT, instead of a cast:
elog(DEBUG1, "NUMA: os_page_count=%lu os_page_size=%zu
pages_per_blk=%.2f",
(unsigned long) os_page_count, os_page_size, pages_per_blk);
regards
--
Tomas Vondra