Re: Improve monitoring of shared memory allocations - Mailing list pgsql-hackers
From | Rahila Syed |
---|---|
Subject | Re: Improve monitoring of shared memory allocations |
Date | |
Msg-id | CAH2L28uL-8EQTRSeyTpW1DqAXsDXRkXCkT1dus2u6p4HYDrxAg@mail.gmail.com Whole thread Raw |
In response to | Re: Improve monitoring of shared memory allocations (Andres Freund <andres@anarazel.de>) |
List | pgsql-hackers |
Hi,
Thank you for the review.
cfbot found a few compiler warnings:
https://cirrus-ci.com/task/6526903542087680
[16:47:46.964] make -s -j${BUILD_JOBS} clean
[16:47:47.452] time make -s -j${BUILD_JOBS} world-bin
[16:49:10.496] lwlock.c: In function ‘CreateLWLocks’:
[16:49:10.496] lwlock.c:467:22: error: unused variable ‘found’ [-Werror=unused-variable]
[16:49:10.496] 467 | bool found;
[16:49:10.496] | ^~~~~
[16:49:10.496] cc1: all warnings being treated as errors
[16:49:10.496] make[4]: *** [<builtin>: lwlock.o] Error 1
[16:49:10.496] make[3]: *** [../../../src/backend/common.mk:37: lmgr-recursive] Error 2
[16:49:10.496] make[3]: *** Waiting for unfinished jobs....
[16:49:11.881] make[2]: *** [common.mk:37: storage-recursive] Error 2
[16:49:11.881] make[2]: *** Waiting for unfinished jobs....
[16:49:20.195] dynahash.c: In function ‘hash_create’:
[16:49:20.195] dynahash.c:643:37: error: ‘curr_offset’ may be used uninitialized [-Werror=maybe-uninitialized]
[16:49:20.195] 643 | curr_offset = (((char *)curr_offset) + (temp * elementSize));
[16:49:20.195] | ~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[16:49:20.195] dynahash.c:588:23: note: ‘curr_offset’ was declared here
[16:49:20.195] 588 | void *curr_offset;
[16:49:20.195] | ^~~~~~~~~~~
[16:49:20.195] cc1: all warnings being treated as errors
[16:49:20.196] make[4]: *** [<builtin>: dynahash.o] Error 1
Fixed these.
> diff --git a/src/backend/utils/hash/dynahash.c b/src/backend/utils/hash/dynahash.c
> index cd5a00132f..5203f5b30b 100644
> --- a/src/backend/utils/hash/dynahash.c
> +++ b/src/backend/utils/hash/dynahash.c
> @@ -120,7 +120,6 @@
> * a good idea of the maximum number of entries!). For non-shared hash
> * tables, the initial directory size can be left at the default.
> */
> -#define DEF_SEGSIZE 256
> #define DEF_SEGSIZE_SHIFT 8 /* must be log2(DEF_SEGSIZE) */
> #define DEF_DIRSIZE 256
Why did you move this to the header? Afaict it's just used in
hash_get_shared_size(), which is also in dynahash.c?
Yes. This was accidentally left behind by the previous version of the
patch, so I undid the change.
> static void register_seq_scan(HTAB *hashp);
> static void deregister_seq_scan(HTAB *hashp);
> static bool has_seq_scans(HTAB *hashp);
> -
> +static int find_num_of_segs(long nelem, int *nbuckets, long num_partitions, long ssize);
>
> /*
> * memory allocation support
You removed a newline here that probably shouldn't be removed.
Fixed this.
> @@ -468,7 +466,11 @@ hash_create(const char *tabname, long nelem, const HASHCTL *info, int flags)
> else
> hashp->keycopy = memcpy;
>
> - /* And select the entry allocation function, too. */
> + /*
> + * And select the entry allocation function, too. XXX should this also
> + * Assert that flags & HASH_SHARED_MEM is true, since HASH_ALLOC is
> + * currently only set with HASH_SHARED_MEM *
> + */
> if (flags & HASH_ALLOC)
> hashp->alloc = info->alloc;
> else
> @@ -518,6 +520,7 @@ hash_create(const char *tabname, long nelem, const HASHCTL *info, int flags)
>
> hashp->frozen = false;
>
> + /* Initializing the HASHHDR variables with default values */
> hdefault(hashp);
>
> hctl = hashp->hctl;
I assume these were just observations you made while looking into this? They
seem unrelated to the change itself?
Yes. I removed the first one and left the second one as a code comment.
> @@ -582,7 +585,8 @@ hash_create(const char *tabname, long nelem, const HASHCTL *info, int flags)
> freelist_partitions,
> nelem_alloc,
> nelem_alloc_first;
> -
> + void *curr_offset;
> +
> /*
> * If hash table is partitioned, give each freelist an equal share of
> * the initial allocation. Otherwise only freeList[0] is used.
> @@ -592,6 +596,20 @@ hash_create(const char *tabname, long nelem, const HASHCTL *info, int flags)
> else
> freelist_partitions = 1;
>
> + /*
> + * If table is shared, calculate the offset at which to find the
> + * the first partition of elements
> + */
> + if (hashp->isshared)
> + {
> + int nsegs;
> + int nbuckets;
> + nsegs = find_num_of_segs(nelem, &nbuckets, hctl->num_partitions, hctl->ssize);
> +
> + curr_offset = (((char *) hashp->hctl) + sizeof(HASHHDR) + (info->dsize * sizeof(HASHSEGMENT)) +
> + + (sizeof(HASHBUCKET) * hctl->ssize * nsegs));
> + }
> +
Why only do this for shared hashtables? Couldn't we allocate the elments
together with the rest for non-share hashtables too?
I think it is possible to consolidate the allocations for non-shared hash tables
too. However, initial elements are much smaller in non-shared hash tables due to
their ease of expansion. Therefore, there is probably less benefit in trying to do
that for non-shared tables.
In addition, the proposed changes are targeted to improve the monitoring in
pg_shmem_allocations which won't be applicable to non-shared hashtables.
While I believe it is feasible, I am uncertain about the utility of such a change.
While I believe it is feasible, I am uncertain about the utility of such a change.
Seems a bit ugly to go through element_alloc() when pre-allocating. Perhaps
it's the best thing we can do to avoid duplicating code, but it seems worth
checking if we can do better. Perhaps we could split element_alloc() into
element_alloc() and element_add() or such? With the latter doing everything
after hashp->alloc().
Makes sense. I split the element_alloc() into element_alloc() and element_add().
> -
> +
> /*
> * initialize mutexes if it's a partitioned table
> */
Spurious change.
Fixed.
A function called find_num_of_segs() that also sets nbuckets seems a bit
confusing. I also don't like "find_*", as that sounds like it's searching
some datastructure, rather than just doing a bit of math.
I renamed it to compute_buckets_and_segs(). I am open to better suggestions.
> segp = (HASHSEGMENT) hashp->alloc(sizeof(HASHBUCKET) * hashp->ssize);
>
> if (!segp)
Spurious change.
Fixed.
> -static int
> +int
> next_pow2_int(long num)
> {
> if (num > INT_MAX / 2)
> @@ -1957,3 +1995,31 @@ AtEOSubXact_HashTables(bool isCommit, int nestDepth)
> }
> }
> }
Why export this?
It was a stale change, I removed it now
> diff --git a/src/include/utils/hsearch.h b/src/include/utils/hsearch.h
> index 932cc4f34d..5e16bd4183 100644
> --- a/src/include/utils/hsearch.h
> +++ b/src/include/utils/hsearch.h
> @@ -151,7 +151,7 @@ extern void hash_seq_term(HASH_SEQ_STATUS *status);
> extern void hash_freeze(HTAB *hashp);
> extern Size hash_estimate_size(long num_entries, Size entrysize);
> extern long hash_select_dirsize(long num_entries);
> -extern Size hash_get_shared_size(HASHCTL *info, int flags);
> +extern Size hash_get_shared_size(HASHCTL *info, int flags, long init_size);
> extern void AtEOXact_HashTables(bool isCommit);
> extern void AtEOSubXact_HashTables(bool isCommit, int nestDepth);
It's imo a bit weird that we have very related logic in hash_estimate_size()
and hash_get_shared_size(). Why do we need to duplicate it?
hash_estimate_size() estimates using default values and hash_get_shared_size()
calculates using specific values depending on the flags associated with the hash
table. For instance, segment_size used by the former is DEF_SEGSIZE and
the latter uses info->ssize which is set when the HASH_SEGMENT flag is true.
Hence, they might return different values for shared memory sizes.
calculates using specific values depending on the flags associated with the hash
table. For instance, segment_size used by the former is DEF_SEGSIZE and
the latter uses info->ssize which is set when the HASH_SEGMENT flag is true.
Hence, they might return different values for shared memory sizes.
These I'd just combine with the ShmemInitStruct("PredXactList"), by allocating
the additional space. The pointer math is a bit annoying, but it makes much
more sense to have one entry in pg_shmem_allocations.
Fixed accordingly.
> - (TransactionId *) ShmemAlloc(TotalProcs * sizeof(*ProcGlobal->xids));
> + (TransactionId *) ShmemInitStruct("Proc Transaction Ids", TotalProcs * sizeof(*ProcGlobal->xids), &found);
> MemSet(ProcGlobal->xids, 0, TotalProcs * sizeof(*ProcGlobal->xids));
> - ProcGlobal->subxidStates = (XidCacheStatus *) ShmemAlloc(TotalProcs * sizeof(*ProcGlobal->subxidStates));
> + ProcGlobal->subxidStates = (XidCacheStatus *) ShmemInitStruct("Proc Sub-transaction id states", TotalProcs * sizeof(*ProcGlobal->subxidStates), &found);
> MemSet(ProcGlobal->subxidStates, 0, TotalProcs * sizeof(*ProcGlobal->subxidStates));
> - ProcGlobal->statusFlags = (uint8 *) ShmemAlloc(TotalProcs * sizeof(*ProcGlobal->statusFlags));
> + ProcGlobal->statusFlags = (uint8 *) ShmemInitStruct("Proc Status Flags", TotalProcs * sizeof(*ProcGlobal->statusFlags), &found);
> MemSet(ProcGlobal->statusFlags, 0, TotalProcs * sizeof(*ProcGlobal->statusFlags));
>
> /*
Same.
Although here I'd say it's worth padding the size of each separate
"allocation" by PG_CACHE_LINE_SIZE.
Made this change.
> - fpPtr = ShmemAlloc(TotalProcs * (fpLockBitsSize + fpRelIdSize));
> + fpPtr = ShmemInitStruct("Fast path lock arrays", TotalProcs * (fpLockBitsSize + fpRelIdSize), &found);
> MemSet(fpPtr, 0, TotalProcs * (fpLockBitsSize + fpRelIdSize));
>
> /* For asserts checking we did not overflow. */
This one might actually make sense to keep separate, depending on the
configuration it can be reasonably big (max_connection = 1k,
max_locks_per_transaction=1k results in ~5MB)..
OK
PFA the rebased patches with the above changes.
Kindly let me know your views.
Thank you,
Rahila Syed
PFA the rebased patches with the above changes.
Kindly let me know your views.
Thank you,
Rahila Syed
Attachment
pgsql-hackers by date: