From 29f8a789f7e0931dc4e119d090a02e7b6a4c24cd Mon Sep 17 00:00:00 2001 From: Ashutosh Bapat Date: Tue, 3 Feb 2026 10:58:03 +0530 Subject: [PATCH v20260209 2/7] Memory and address space management for buffer resizing This has three changes 1. Allow to use multiple shared memory mappings ============================================ Currently all the work with shared memory is done via a single anonymous memory mapping, which limits ways how the shared memory could be organized. Introduce possibility to allocate multiple shared memory mappings, where a single mapping is associated with a specified shared memory segment. Modifies pg_shmem_allocations to report shared memory segment as well. Adds pg_shmem_segments to report shared memory segment information. 2. Address space reservation for shared memory ============================================ Currently the shared memory layout is designed to pack everything tight together, leaving no space between mappings for resizing. Here is how it looks like for one mapping in /proc/$PID/maps, /dev/zero represents the anonymous shared memory we talk about: 00400000-00490000 /path/bin/postgres ... 012d9000-0133e000 [heap] 7f443a800000-7f470a800000 /dev/zero (deleted) 7f470a800000-7f471831d000 /usr/lib/locale/locale-archive 7f4718400000-7f4718401000 /usr/lib64/libstdc++.so.6.0.34 ... Make the layout more dynamic via splitting every shared memory segment into two parts: * An anonymous file, which actually contains shared memory content. Such an anonymous file is created via memfd_create, it lives in memory, behaves like a regular file and semantically equivalent to an anonymous memory allocated via mmap with MAP_ANONYMOUS. * A reservation mapping, which size is much larger than required shared segment size. This mapping is created with flag MAP_NORESERVE (to not count the reserved space against memory limits). The anonymous file is mapped into this reservation mapping. If we have to change the address maps while resizing the shared buffer pool, it is needed to be done in Postmaster too, so that the new backends will inherit the resized address space from the Postmaster. However, Postmaster is not invovled in ProcSignalBarrier mechanism and we don't want it to spend time in things other than its core functionality. To achive that, maximum required address space maps are setup upfront with read and write access when starting the server. When resizing the buffer pool only the backing file object is resized from the coordinator. This also makes the ProcSignalBarrier handling code light for backends other than the coordinator. The resulting layout looks like this: 00400000-00490000 /path/bin/postgres ... 3f526000-3f590000 rw-p [heap] 7fbd827fe000-7fbd8bdde000 rw-s /memfd:main (deleted) -- anon file 7fbd8bdde000-7fbe82800000 ---s /memfd:main (deleted) -- reservation 7fbe82800000-7fbe90670000 r--p /usr/lib/locale/locale-archive 7fbe90800000-7fbe90941000 r-xp /usr/lib64/libstdc++.so.6.0.34 To resize a shared memory segment in this layout it's possible to use ftruncate on the memory mapped file. This approach also do not impact the actual memory usage as reported by the kernel. TODO: Verify that Cgroup v2 doesn't have any problems with that as well. To verify a new cgroup was created with the memory limit 256 MB, then PostgreSQL was launched within this cgroup with shared_buffers = 128 MB: $ cd /sys/fs/cgroup $ mkdir postgres $ cd postres $ echo 268435456 > memory.max $ echo $MASTER_PID_SHELL > cgroup.procs $ cat memory.current 17465344 (~16.6 MB) $ echo $PATCH_PID_SHELL > cgroup.procs $ cat memory.current 20770816 (~19.8 MB) There are also few unrelated advantages of using memory mapped files: * We've got a file descriptor, which could be used for regular file operations (modification, truncation, you name it). * The file could be given a name, which improves readability when it comes to process maps. * By default, Linux will not add file-backed shared mappings into a core dump, making it more convenient to work with them in PostgreSQL: no more huge dumps to process. - Some hackers have expressed concerns over it. The downside is that memfd_create is Linux specific. 3. Refactor CalculateShmemSize() ============================= This function calls many functions which return the amount of shared memory required for different shared memory data structures. Up until now, the returned total of these sizes was used to create a single shared memory segment. With this change, CalculateShmemSize() needs to estimate memory requirements for each of the segments. It now takes an array of MemoryMappingSizes, containing as many elements as the number of segments, as an argument. The sizes returned by all the function it calls, except BufferManagerShmemSize(), are added and saved in the first element (index 0) of the array. BufferManagerShmemSize() is modified to save the amount of memory required for buffer manager related segments in the corresponding array element. Additionally it also saves the amount of reserved space. For now, the amount of reserved address space is same as the amount of required memory but that is expected to change with the next commit which implements buffer pool resize. CalculateShmemSize() now returns the total of sizes corresponding to all the sizes. Author: Dmitrii Dolgov and Ashutosh Bapat Reviewed-by: Tomas Vondra --- doc/src/sgml/system-views.sgml | 9 + src/backend/catalog/system_views.sql | 7 + src/backend/port/posix_sema.c | 2 +- src/backend/port/sysv_sema.c | 2 +- src/backend/port/sysv_shmem.c | 552 ++++++++++++++++------ src/backend/port/win32_sema.c | 2 +- src/backend/port/win32_shmem.c | 291 +++++++----- src/backend/postmaster/launch_backend.c | 31 +- src/backend/storage/buffer/buf_init.c | 47 +- src/backend/storage/buffer/buf_table.c | 1 + src/backend/storage/buffer/freelist.c | 7 +- src/backend/storage/ipc/ipc.c | 4 +- src/backend/storage/ipc/ipci.c | 100 +++- src/backend/storage/ipc/shmem.c | 266 ++++++++--- src/backend/storage/lmgr/lwlock.c | 7 +- src/backend/storage/lmgr/predicate.c | 3 +- src/backend/utils/activity/pgstat_shmem.c | 3 +- src/include/catalog/pg_proc.dat | 12 +- src/include/storage/bufmgr.h | 3 +- src/include/storage/ipc.h | 4 +- src/include/storage/pg_shmem.h | 86 +++- src/include/storage/shmem.h | 9 +- src/test/regress/expected/rules.out | 9 +- src/tools/pgindent/typedefs.list | 4 + 24 files changed, 1042 insertions(+), 419 deletions(-) diff --git a/doc/src/sgml/system-views.sgml b/doc/src/sgml/system-views.sgml index c5683068470..6fa47e3c63d 100644 --- a/doc/src/sgml/system-views.sgml +++ b/doc/src/sgml/system-views.sgml @@ -4305,6 +4305,15 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx + + + segment text + + + The name of the shared memory segment concerning the allocation. + + + off int8 diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql index 7553f31fef0..bc11589aeab 100644 --- a/src/backend/catalog/system_views.sql +++ b/src/backend/catalog/system_views.sql @@ -668,6 +668,13 @@ GRANT SELECT ON pg_shmem_allocations TO pg_read_all_stats; REVOKE EXECUTE ON FUNCTION pg_get_shmem_allocations() FROM PUBLIC; GRANT EXECUTE ON FUNCTION pg_get_shmem_allocations() TO pg_read_all_stats; +CREATE VIEW pg_shmem_segments AS + SELECT * FROM pg_get_shmem_segments(); + +REVOKE ALL ON pg_shmem_segments FROM PUBLIC; +GRANT SELECT ON pg_shmem_segments TO pg_read_all_stats; +REVOKE EXECUTE ON FUNCTION pg_get_shmem_segments() FROM PUBLIC; +GRANT EXECUTE ON FUNCTION pg_get_shmem_segments() TO pg_read_all_stats; CREATE VIEW pg_shmem_allocations_numa AS SELECT * FROM pg_get_shmem_allocations_numa(); diff --git a/src/backend/port/posix_sema.c b/src/backend/port/posix_sema.c index e368e5ee7ed..5ad50c79dcd 100644 --- a/src/backend/port/posix_sema.c +++ b/src/backend/port/posix_sema.c @@ -216,7 +216,7 @@ PGReserveSemaphores(int maxSemas) #else sharedSemas = (PGSemaphore) - ShmemAlloc(PGSemaphoreShmemSize(maxSemas)); + ShmemAlloc(MAIN_SHMEM_SEGMENT, PGSemaphoreShmemSize(maxSemas)); #endif numSems = 0; diff --git a/src/backend/port/sysv_sema.c b/src/backend/port/sysv_sema.c index 86c4d359ef7..f0c7b064ffb 100644 --- a/src/backend/port/sysv_sema.c +++ b/src/backend/port/sysv_sema.c @@ -344,7 +344,7 @@ PGReserveSemaphores(int maxSemas) DataDir))); sharedSemas = (PGSemaphore) - ShmemAlloc(PGSemaphoreShmemSize(maxSemas)); + ShmemAlloc(MAIN_SHMEM_SEGMENT, PGSemaphoreShmemSize(maxSemas)); numSharedSemas = 0; maxSharedSemas = maxSemas; diff --git a/src/backend/port/sysv_shmem.c b/src/backend/port/sysv_shmem.c index 2e3886cf9fe..29ffcaa35f3 100644 --- a/src/backend/port/sysv_shmem.c +++ b/src/backend/port/sysv_shmem.c @@ -39,7 +39,17 @@ #include "utils/guc_hooks.h" #include "utils/pidfile.h" - +/* + * TODO: The first two sentences in the first paragraph below make me feel like + * we should have only one SysV segment. Is that true? Needs investigation. + */ +/* + * TODO: third paragraph should mention that we use memfd_create to create + * shared memory segment, and possibly there's a way to share that segment + * between two processes using the file descriptor instead of going through SysV + * shared memory segment. So one day EXEC_BACKEND can also use anonymous shared + * memory. + */ /* * As of PostgreSQL 9.3, we normally allocate only a very small amount of * System V shared memory, and only for the purposes of providing an @@ -91,12 +101,56 @@ typedef enum SHMSTATE_UNATTACHED, /* pertinent to DataDir, no attached PIDs */ } IpcMemoryState; +/* + * Anonymous mapping layout we use looks like this: + * + * 00400000-00c2a000 r-xp /bin/postgres + * ... + * 3f526000-3f590000 rw-p [heap] + * 7fbd827fe000-7fbd8bdde000 rw-s /memfd:main (deleted) + * 7fbd8bdde000-7fbe82800000 ---s /memfd:main (deleted) + * 7fbe82800000-7fbe90670000 r--p /usr/lib/locale/locale-archive + * 7fbe90800000-7fbe90941000 r-xp /usr/lib64/libstdc++.so.6.0.34 + * ... + * + * We need to place shared memory mappings in such a way, that there will be + * gaps between them in the address space. Those gaps have to be large enough + * to resize the mapping up to certain size, without counting towards the total + * memory consumption. + * + * To achieve this, for each shared memory segment we first create an anonymous + * file of specified size using memfd_create, which will accomodate actual + * shared memory mapping content. It is represented by the first /memfd:main + * with rw permissions. Then we create a mapping for this file using mmap, with + * size much larger than required and flags PROT_NONE (allows to make sure the + * reserved space will not be used) and MAP_NORESERVE (prevents the space from + * being counted against memory limits). The mapping serves as an address space + * reservation, into which shared memory segment can be extended and is + * represented by the second /memfd:main with no permissions. + */ + +PGUsedShmemInfo UsedShmemInfo[NUM_MEMORY_MAPPINGS]; + + /* + * Structure to hold anonymous shared memory segment properties. + */ +typedef struct AnonShmemData +{ + int fd; /* fd for the backing anon file */ + void *addr; /* Pointer to the start of the mapped memory */ + Size size; /* Size of the mapped memory */ -unsigned long UsedShmemSegID = 0; -void *UsedShmemSegAddr = NULL; +} AnonShmemData; -static Size AnonymousShmemSize; -static void *AnonymousShmem = NULL; +AnonShmemData AnonShmemInfo[NUM_MEMORY_MAPPINGS]; + +/* + * Flag telling that we have decided to use huge pages. + * + * XXX: It's possible to use GetConfigOption("huge_pages_status", false, false) + * instead, but it feels like an overkill. + */ +static bool huge_pages_on = false; static void *InternalIpcMemoryCreate(IpcMemoryKey memKey, Size size); static void IpcMemoryDetach(int status, Datum shmaddr); @@ -471,19 +525,20 @@ PGSharedMemoryAttach(IpcMemoryId shmId, * hugepage sizes, we might want to think about more invasive strategies, * such as increasing shared_buffers to absorb the extra space. * - * Returns the (real, assumed or config provided) page size into - * *hugepagesize, and the hugepage-related mmap flags to use into - * *mmap_flags if requested by the caller. If huge pages are not supported, - * *hugepagesize and *mmap_flags are set to 0. + * Returns the (real, assumed or config provided) page size into *hugepagesize, + * the hugepage-related mmap and memfd flags to use into *mmap_flags and + * *memfd_flags if requested by the caller. If huge pages are not supported, + * *hugepagesize, *mmap_flags and *memfd_flags are set to 0. */ void -GetHugePageSize(Size *hugepagesize, int *mmap_flags) +GetHugePageSize(Size *hugepagesize, int *mmap_flags, int *memfd_flags) { #ifdef MAP_HUGETLB Size default_hugepagesize = 0; Size hugepagesize_local = 0; int mmap_flags_local = 0; + int memfd_flags_local = 0; /* * System-dependent code to find out the default huge page size. @@ -542,6 +597,7 @@ GetHugePageSize(Size *hugepagesize, int *mmap_flags) } mmap_flags_local = MAP_HUGETLB; + memfd_flags_local = MFD_HUGETLB; /* * On recent enough Linux, also include the explicit page size, if @@ -556,11 +612,22 @@ GetHugePageSize(Size *hugepagesize, int *mmap_flags) } #endif +#if defined(MFD_HUGE_MASK) && defined(MFD_HUGE_SHIFT) + if (hugepagesize_local != default_hugepagesize) + { + int shift = pg_ceil_log2_64(hugepagesize_local); + + memfd_flags_local |= (shift & MFD_HUGE_MASK) << MFD_HUGE_SHIFT; + } +#endif + /* assign the results found */ if (mmap_flags) *mmap_flags = mmap_flags_local; if (hugepagesize) *hugepagesize = hugepagesize_local; + if (memfd_flags) + *memfd_flags = memfd_flags_local; #else @@ -568,6 +635,8 @@ GetHugePageSize(Size *hugepagesize, int *mmap_flags) *hugepagesize = 0; if (mmap_flags) *mmap_flags = 0; + if (memfd_flags) + *memfd_flags = 0; #endif /* MAP_HUGETLB */ } @@ -589,84 +658,266 @@ check_huge_page_size(int *newval, void **extra, GucSource source) return true; } +/* + * Wrapper around posix_fallocate() to allocate memory for a given shared memory + * segment. + * + * Performs retry on EINTR, and raises error upon failure. + */ +static void +shmem_fallocate(int fd, const char *mapping_name, Size size, int elevel) +{ +#if defined(HAVE_POSIX_FALLOCATE) && defined(__linux__) + int ret; + + + /* + * If there is not enough memory, trying to access a hole in address space + * will cause SIGBUS. If supported, avoid that by allocating memory + * upfront. + * + * We still use a traditional EINTR retry loop to handle SIGCONT. + * posix_fallocate() doesn't restart automatically, and we don't want this + * to fail if you attach a debugger. + */ + do + { + ret = posix_fallocate(fd, 0, size); + } while (ret == EINTR); + + if (ret != 0) + { + ereport(elevel, + (errmsg("segment[%s]: could not allocate space for anonymous file: %s", + mapping_name, strerror(ret)), + (ret == ENOMEM) ? + errhint("This error usually means that PostgreSQL's request " + "for a shared memory segment exceeded available memory, " + "swap space, or huge pages. To reduce the request size " + "(currently %zu bytes), reduce PostgreSQL's shared " + "memory usage, perhaps by reducing \"shared_buffers\" or " + "\"max_connections\".", + size) : 0)); + } +#endif +} + +/* + * Round up the required amount of memory and the amount of required reserved + * address space to the nearest huge page size. + */ +static inline void +round_off_mapping_sizes_for_hugepages(MemoryMappingSizes *mapping, int hugepagesize) +{ + if (hugepagesize == 0) + return; + + if (mapping->shmem_req_size % hugepagesize != 0) + mapping->shmem_req_size += add_size(mapping->shmem_req_size, + hugepagesize - (mapping->shmem_req_size % hugepagesize)); + + if (mapping->shmem_reserved % hugepagesize != 0) + mapping->shmem_reserved = add_size(mapping->shmem_reserved, + hugepagesize - (mapping->shmem_reserved % hugepagesize)); +} + /* * Creates an anonymous mmap()ed shared memory segment. * - * Pass the requested size in *size. This function will modify *size to the - * actual size of the allocation, if it ends up allocating a segment that is - * larger than requested. + * This function will modify mapping size to the actual size of the allocation, + * if it ends up allocating a segment that is larger than requested. If needed, + * it also rounds up the mapping reserved size to be a multiple of huge page + * size. + * + * Note that we do not fallback from huge pages to regular pages in this + * function, this decision was already made in ReserveAnonymousMemory and we + * stick to it. + * + * TODO: Update the prologue to be consistent with the code. */ -static void * -CreateAnonymousSegment(Size *size) +static void +CreateAnonymousSegment(int segment_id, MemoryMappingSizes *mapping) { - Size allocsize = *size; void *ptr = MAP_FAILED; - int mmap_errno = 0; - int mmap_flags = MAP_SHARED | MAP_ANONYMOUS | MAP_HASSEMAPHORE; + int mmap_flags = MAP_SHARED | MAP_HASSEMAPHORE | MAP_NORESERVE; + AnonShmemData *anonshmem = &AnonShmemInfo[segment_id]; + const char *segname = MappingName(segment_id); + int memfd_flags = 0; #ifndef MAP_HUGETLB - /* PGSharedMemoryCreate should have dealt with this case */ - Assert(huge_pages != HUGE_PAGES_ON); + /* PrepareHugePages should have dealt with this case */ + Assert(huge_pages != HUGE_PAGES_ON && !huge_pages_on); #else - if (huge_pages == HUGE_PAGES_ON || huge_pages == HUGE_PAGES_TRY) + if (huge_pages_on) { - /* - * Round up the request size to a suitable large value. - */ Size hugepagesize; int huge_mmap_flags; + int huge_memfd_flags; - GetHugePageSize(&hugepagesize, &huge_mmap_flags); + /* Make sure nothing is messed up */ + Assert(huge_pages == HUGE_PAGES_ON || huge_pages == HUGE_PAGES_TRY); - if (allocsize % hugepagesize != 0) - allocsize = add_size(allocsize, hugepagesize - (allocsize % hugepagesize)); + /* Round up the request size to a suitable large value */ + GetHugePageSize(&hugepagesize, &huge_mmap_flags, &huge_memfd_flags); + round_off_mapping_sizes_for_hugepages(mapping, hugepagesize); - ptr = mmap(NULL, allocsize, PROT_READ | PROT_WRITE, - mmap_flags | huge_mmap_flags, -1, 0); - mmap_errno = errno; - if (huge_pages == HUGE_PAGES_TRY && ptr == MAP_FAILED) - elog(DEBUG1, "mmap(%zu) with MAP_HUGETLB failed, huge pages disabled: %m", - allocsize); + /* Verify that the new size is withing the reserved boundaries */ + Assert(mapping->shmem_reserved >= mapping->shmem_req_size); + + mmap_flags = mmap_flags | huge_mmap_flags; + memfd_flags = memfd_flags | huge_memfd_flags; } #endif /* - * Report whether huge pages are in use. This needs to be tracked before - * the second mmap() call if attempting to use huge pages failed - * previously. + * Prepare an anonymous file backing the segment. Its size will be + * specified later via ftruncate. + * + * The file behaves like a regular file, but lives in memory. Once all + * references to the file are dropped, it is automatically released. + * Anonymous memory is used for all backing pages of the file, thus it has + * the same semantics as anonymous memory allocations using mmap with the + * MAP_ANONYMOUS flag. + * + * TODO: Need a configuration test for memfd_create. + * + * TODO: Earlier releases did not use file backed shared memory segments. + * By setting bit 1 in /proc//coredump_filter, those shared memory + * segments could be dumped to the core file. But dumping file backed + * shared memory segments requires bit 3 to be set. We need to document + * this change in the release notes. */ - SetConfigOption("huge_pages_status", (ptr == MAP_FAILED) ? "off" : "on", - PGC_INTERNAL, PGC_S_DYNAMIC_DEFAULT); + anonshmem->fd = memfd_create(segname, memfd_flags); + if (anonshmem->fd == -1) + ereport(FATAL, + (errmsg("segment[%s]: could not create anonymous shared memory file: %m", + segname))); - if (ptr == MAP_FAILED && huge_pages != HUGE_PAGES_ON) - { - /* - * Use the original size, not the rounded-up value, when falling back - * to non-huge pages. - */ - allocsize = *size; - ptr = mmap(NULL, allocsize, PROT_READ | PROT_WRITE, - mmap_flags, -1, 0); - mmap_errno = errno; - } + elog(DEBUG1, "segment[%s]: mmap(%zu)", segname, mapping->shmem_req_size); + /* + * Reserve maximum required address space for future expansion of this + * memory segment. The whole address space will be setup for read/write + * access, so that memory allocated to this address space can be read or + * written to even if it is resized in the future using just ftruncate. + * MAP_NORESERVE alone should ensure that no memory is allocated. But when + * using huge pages, the memory is allocated at mmap time if PROT_WRITE | + * PROT_READ is used. Hence we create the mapping with PROT_NONE first and + * then use mprotect to set the required permissions. + */ + ptr = mmap(NULL, mapping->shmem_reserved, PROT_NONE, + mmap_flags, anonshmem->fd, 0); if (ptr == MAP_FAILED) + ereport(FATAL, + (errmsg("segment[%s]: could not map anonymous shared memory: %m", + segname))); + + if (mprotect(ptr, mapping->shmem_reserved, PROT_READ | PROT_WRITE) == -1) + ereport(FATAL, + (errmsg("segment[%s]: could not update anonymous shared memory permissions: %m", + segname))); + + + /* + * Resize the backing file to the required size. On platforms where it is + * supported, we also allocate the required memory upfront. On other + * platform the memory upto the size of file will be allocated on demand. + */ + if (ftruncate(anonshmem->fd, mapping->shmem_req_size) == -1) { - errno = mmap_errno; + int save_errno = errno; + + close(anonshmem->fd); + anonshmem->fd = -1; + + errno = save_errno; ereport(FATAL, - (errmsg("could not map anonymous shared memory: %m"), - (mmap_errno == ENOMEM) ? + (errmsg("segment[%s]: could not truncate anonymous file to size %zu: %m", + segname, mapping->shmem_req_size), + (save_errno == ENOMEM) ? errhint("This error usually means that PostgreSQL's request " "for a shared memory segment exceeded available memory, " "swap space, or huge pages. To reduce the request size " "(currently %zu bytes), reduce PostgreSQL's shared " "memory usage, perhaps by reducing \"shared_buffers\" or " "\"max_connections\".", - allocsize) : 0)); + mapping->shmem_req_size) : 0)); } + shmem_fallocate(anonshmem->fd, segname, mapping->shmem_req_size, FATAL); - *size = allocsize; - return ptr; + anonshmem->addr = ptr; + anonshmem->size = mapping->shmem_reserved; +} + +/* + * PrepareHugePages + * + * Figure out if there are enough huge pages to allocate all shared memory + * segments, and report that information via huge_pages_status and + * huge_pages_on. It needs to be called before creating shared memory segments. + * + * It is necessary to maintain the same semantic (simple on/off) for + * huge_pages_status, even if there are multiple shared memory segments: all + * segments either use huge pages or not, there is no mix of segments with + * different page size. The latter might be actually beneficial, in particular + * because only some segments may require large amount of memory, but for now + * we go with a simple solution. + */ +void +PrepareHugePages() +{ + void *ptr = MAP_FAILED; + MemoryMappingSizes mapping_sizes[NUM_MEMORY_MAPPINGS]; + int mmap_flags = (MAP_SHARED | MAP_HASSEMAPHORE); + + CalculateShmemSize(mapping_sizes); + + /* Complain if hugepages demanded but we can't possibly support them */ +#if !defined(MAP_HUGETLB) + if (huge_pages == HUGE_PAGES_ON) + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("huge pages not supported on this platform"))); +#else + if (huge_pages == HUGE_PAGES_ON || huge_pages == HUGE_PAGES_TRY) + { + Size hugepagesize, + total_size = 0; + int huge_mmap_flags; + + GetHugePageSize(&hugepagesize, &huge_mmap_flags, NULL); + + /* + * Figure out how much memory is needed for all segments, keeping in + * mind that for every segment this value will be rounding up by the + * huge page size. The resulting value will be used to probe memory + * and decide whether we will allocate huge pages or not. + */ + for (int segment = 0; segment < NUM_MEMORY_MAPPINGS; segment++) + { + Size segment_size = mapping_sizes[segment].shmem_req_size; + + if (segment_size % hugepagesize != 0) + segment_size += hugepagesize - (segment_size % hugepagesize); + + total_size += segment_size; + } + + /* Map total amount of memory to test its availability. */ + elog(DEBUG1, "reserving space: probe mmap(%zu) with MAP_HUGETLB", + total_size); + ptr = mmap(NULL, total_size, PROT_NONE, + mmap_flags | MAP_ANONYMOUS | huge_mmap_flags, -1, 0); + } +#endif + + /* + * Report whether huge pages are in use. This needs to be tracked before + * creating shared memory segments. + */ + SetConfigOption("huge_pages_status", (ptr == MAP_FAILED) ? "off" : "on", + PGC_INTERNAL, PGC_S_DYNAMIC_DEFAULT); + huge_pages_on = ptr != MAP_FAILED; } /* @@ -676,20 +927,29 @@ CreateAnonymousSegment(Size *size) static void AnonymousShmemDetach(int status, Datum arg) { - /* Release anonymous shared memory block, if any. */ - if (AnonymousShmem != NULL) + for (int i = 0; i < NUM_MEMORY_MAPPINGS; i++) { - if (munmap(AnonymousShmem, AnonymousShmemSize) < 0) - elog(LOG, "munmap(%p, %zu) failed: %m", - AnonymousShmem, AnonymousShmemSize); - AnonymousShmem = NULL; + AnonShmemData *segment = &AnonShmemInfo[i]; + + /* Release anonymous shared memory block, if any. */ + if (segment->addr != NULL) + { + Assert(segment->fd != -1); + + if (munmap(segment->addr, segment->size) < 0) + elog(LOG, "munmap(%p, %zu) failed: %m", + segment->addr, segment->size); + segment->addr = NULL; + close(segment->fd); + segment->fd = -1; + } } } /* * PGSharedMemoryCreate * - * Create a shared memory segment of the given size and initialize its + * Create a shared memory segment for the given mapping and initialize its * standard header. Also, register an on_shmem_exit callback to release * the storage. * @@ -699,7 +959,7 @@ AnonymousShmemDetach(int status, Datum arg) * postmaster or backend. */ PGShmemHeader * -PGSharedMemoryCreate(Size size, +PGSharedMemoryCreate(int segment_id, MemoryMappingSizes *mapping, PGShmemHeader **shim) { IpcMemoryKey NextShmemSegID; @@ -707,6 +967,8 @@ PGSharedMemoryCreate(Size size, PGShmemHeader *hdr; struct stat statbuf; Size sysvsize; + AnonShmemData *anonshmem = &AnonShmemInfo[segment_id]; + PGUsedShmemInfo *usedShmem = &UsedShmemInfo[segment_id]; /* * We use the data directory's ID info (inode and device numbers) to @@ -719,14 +981,6 @@ PGSharedMemoryCreate(Size size, errmsg("could not stat data directory \"%s\": %m", DataDir))); - /* Complain if hugepages demanded but we can't possibly support them */ -#if !defined(MAP_HUGETLB) - if (huge_pages == HUGE_PAGES_ON) - ereport(ERROR, - (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), - errmsg("huge pages not supported on this platform"))); -#endif - /* For now, we don't support huge pages in SysV memory */ if (huge_pages == HUGE_PAGES_ON && shared_memory_type != SHMEM_TYPE_MMAP) ereport(ERROR, @@ -734,12 +988,12 @@ PGSharedMemoryCreate(Size size, errmsg("huge pages not supported with the current \"shared_memory_type\" setting"))); /* Room for a header? */ - Assert(size > MAXALIGN(sizeof(PGShmemHeader))); + Assert(mapping->shmem_req_size > MAXALIGN(sizeof(PGShmemHeader))); if (shared_memory_type == SHMEM_TYPE_MMAP) { - AnonymousShmem = CreateAnonymousSegment(&size); - AnonymousShmemSize = size; + /* On success, mapping data will be modified. */ + CreateAnonymousSegment(segment_id, mapping); /* Register on-exit routine to unmap the anonymous segment */ on_shmem_exit(AnonymousShmemDetach, (Datum) 0); @@ -749,7 +1003,7 @@ PGSharedMemoryCreate(Size size, } else { - sysvsize = size; + sysvsize = mapping->shmem_req_size; /* huge pages are only available with mmap */ SetConfigOption("huge_pages_status", "off", @@ -762,7 +1016,7 @@ PGSharedMemoryCreate(Size size, * loop simultaneously. (CreateDataDirLockFile() does not entirely ensure * that, but prefer fixing it over coping here.) */ - NextShmemSegID = statbuf.st_ino; + NextShmemSegID = statbuf.st_ino + usedShmem->UsedShmemSegID; for (;;) { @@ -800,6 +1054,8 @@ PGSharedMemoryCreate(Size size, errmsg("pre-existing shared memory block (key %lu, ID %lu) is still in use", (unsigned long) NextShmemSegID, (unsigned long) shmid), + errdetail("when trying to create shared memory block for segment \"%s\"", + MappingName(segment_id)), errhint("Terminate any old server processes associated with data directory \"%s\".", DataDir))); break; @@ -854,24 +1110,24 @@ PGSharedMemoryCreate(Size size, /* * Initialize space allocation status for segment. */ - hdr->totalsize = size; + hdr->totalsize = mapping->shmem_req_size; + hdr->reservedsize = mapping->shmem_reserved; hdr->content_offset = MAXALIGN(sizeof(PGShmemHeader)); *shim = hdr; /* Save info for possible future use */ - UsedShmemSegAddr = memAddress; - UsedShmemSegID = (unsigned long) NextShmemSegID; + usedShmem->UsedShmemSegAddr = memAddress; + usedShmem->UsedShmemSegID = (unsigned long) NextShmemSegID; /* - * If AnonymousShmem is NULL here, then we're not using anonymous shared - * memory, and should return a pointer to the System V shared memory - * block. Otherwise, the System V shared memory block is only a shim, and - * we must return a pointer to the real block. + * If we're not using anonymous shared memory, return a pointer to the + * System V shared memory block. Otherwise, the System V shared memory + * block is only a shim, and we must return a pointer to the real block. */ - if (AnonymousShmem == NULL) + if (anonshmem->addr == NULL) return hdr; - memcpy(AnonymousShmem, hdr, sizeof(PGShmemHeader)); - return (PGShmemHeader *) AnonymousShmem; + memcpy(anonshmem->addr, hdr, sizeof(PGShmemHeader)); + return anonshmem->addr; } #ifdef EXEC_BACKEND @@ -884,9 +1140,9 @@ PGSharedMemoryCreate(Size size, * EXEC_BACKEND case; otherwise postmaster children inherit the shared memory * segment attachment via fork(). * - * UsedShmemSegID and UsedShmemSegAddr are implicit parameters to this - * routine. The caller must have already restored them to the postmaster's - * values. + * Segments array is an implicit parameter to this + * routine. The caller must have already restored it to the postmaster's + * state. */ void PGSharedMemoryReAttach(void) @@ -894,32 +1150,42 @@ PGSharedMemoryReAttach(void) IpcMemoryId shmid; PGShmemHeader *hdr; IpcMemoryState state; - void *origUsedShmemSegAddr = UsedShmemSegAddr; + void *origUsedShmemSegAddr; - Assert(UsedShmemSegAddr != NULL); - Assert(IsUnderPostmaster); + for (int i = 0; i < NUM_MEMORY_MAPPINGS; i++) + { + PGUsedShmemInfo *usedShmem = &UsedShmemInfo[i]; + + origUsedShmemSegAddr = usedShmem->UsedShmemSegAddr; + + Assert(usedShmem->UsedShmemSegAddr != NULL); + Assert(IsUnderPostmaster); #ifdef __CYGWIN__ - /* cygipc (currently) appears to not detach on exec. */ - PGSharedMemoryDetach(); - UsedShmemSegAddr = origUsedShmemSegAddr; + /* cygipc (currently) appears to not detach on exec. */ + PGSharedMemoryDetach(); + usedShmem->UsedShmemSegAddr = origUsedShmemSegAddr; #endif - elog(DEBUG3, "attaching to %p", UsedShmemSegAddr); - shmid = shmget(UsedShmemSegID, sizeof(PGShmemHeader), 0); - if (shmid < 0) - state = SHMSTATE_FOREIGN; - else - state = PGSharedMemoryAttach(shmid, UsedShmemSegAddr, &hdr); - if (state != SHMSTATE_ATTACHED) - elog(FATAL, "could not reattach to shared memory (key=%d, addr=%p): %m", - (int) UsedShmemSegID, UsedShmemSegAddr); - if (hdr != origUsedShmemSegAddr) - elog(FATAL, "reattaching to shared memory returned unexpected address (got %p, expected %p)", - hdr, origUsedShmemSegAddr); - dsm_set_control_handle(hdr->dsm_control); - - UsedShmemSegAddr = hdr; /* probably redundant */ + elog(DEBUG3, "attaching to %p", usedShmem->UsedShmemSegAddr); + shmid = shmget(usedShmem->UsedShmemSegID, sizeof(PGShmemHeader), 0); + if (shmid < 0) + state = SHMSTATE_FOREIGN; + else + state = PGSharedMemoryAttach(shmid, usedShmem->UsedShmemSegAddr, &hdr); + if (state != SHMSTATE_ATTACHED) + elog(FATAL, "could not reattach to shared memory (key=%d, addr=%p): %m", + (int) usedShmem->UsedShmemSegID, usedShmem->UsedShmemSegAddr); + if (hdr != origUsedShmemSegAddr) + elog(FATAL, "reattaching to shared memory returned unexpected address (got %p, expected %p)", + hdr, origUsedShmemSegAddr); + + /* Re-establish dsm_control mapping, if any */ + if (hdr->dsm_control != 0) + dsm_set_control_handle(hdr->dsm_control); + + usedShmem->UsedShmemSegAddr = hdr; /* probably redundant */ + } } /* @@ -933,14 +1199,13 @@ PGSharedMemoryReAttach(void) * The child process startup logic might or might not call PGSharedMemoryDetach * after this; make sure that it will be a no-op if called. * - * UsedShmemSegID and UsedShmemSegAddr are implicit parameters to this - * routine. The caller must have already restored them to the postmaster's - * values. + * Segments array is an implicit parameter to this + * routine. The caller must have already restored it to the postmaster's + * state. */ void PGSharedMemoryNoReAttach(void) { - Assert(UsedShmemSegAddr != NULL); Assert(IsUnderPostmaster); #ifdef __CYGWIN__ @@ -948,10 +1213,16 @@ PGSharedMemoryNoReAttach(void) PGSharedMemoryDetach(); #endif - /* For cleanliness, reset UsedShmemSegAddr to show we're not attached. */ - UsedShmemSegAddr = NULL; - /* And the same for UsedShmemSegID. */ - UsedShmemSegID = 0; + for (int i = 0; i < NUM_MEMORY_MAPPINGS; i++) + { + PGUsedShmemInfo *usedShmem = &UsedShmemInfo[i]; + + Assert(usedShmem->UsedShmemSegAddr != NULL); + /* For cleanliness, reset UsedShmemSegAddr to show we're not attached. */ + usedShmem->UsedShmemSegAddr = NULL; + /* And the same for UsedShmemSegID. */ + usedShmem->UsedShmemSegID = 0; + } } #endif /* EXEC_BACKEND */ @@ -959,35 +1230,44 @@ PGSharedMemoryNoReAttach(void) /* * PGSharedMemoryDetach * - * Detach from the shared memory segment, if still attached. This is not + * Detach from the shared memory segments, if still attached. This is not * intended to be called explicitly by the process that originally created the - * segment (it will have on_shmem_exit callback(s) registered to do that). + * segments (it will have on_shmem_exit callback(s) registered to do that). * Rather, this is for subprocesses that have inherited an attachment and want * to get rid of it. * - * UsedShmemSegID and UsedShmemSegAddr are implicit parameters to this - * routine, also AnonymousShmem and AnonymousShmemSize. + * PGUsedShmemInfo::UsedShmemSegID and PGUsedShmemInfo::UsedShmemSegAddr are + * implicit parameters to this routine obtained from entries in UsedShmemInfo + * array. */ void PGSharedMemoryDetach(void) { - if (UsedShmemSegAddr != NULL) + for (int i = 0; i < NUM_MEMORY_MAPPINGS; i++) { - if ((shmdt(UsedShmemSegAddr) < 0) + PGUsedShmemInfo *usedShmem = &UsedShmemInfo[i]; + AnonShmemData *anonshmem = &AnonShmemInfo[i]; + + if (usedShmem->UsedShmemSegAddr != NULL) + { + if ((shmdt(usedShmem->UsedShmemSegAddr) < 0) #if defined(EXEC_BACKEND) && defined(__CYGWIN__) - /* Work-around for cygipc exec bug */ - && shmdt(NULL) < 0 + /* Work-around for cygipc exec bug */ + && shmdt(NULL) < 0 #endif - ) - elog(LOG, "shmdt(%p) failed: %m", UsedShmemSegAddr); - UsedShmemSegAddr = NULL; - } + ) + elog(LOG, "shmdt(%p) failed: %m", usedShmem->UsedShmemSegAddr); + usedShmem->UsedShmemSegAddr = NULL; + } - if (AnonymousShmem != NULL) - { - if (munmap(AnonymousShmem, AnonymousShmemSize) < 0) - elog(LOG, "munmap(%p, %zu) failed: %m", - AnonymousShmem, AnonymousShmemSize); - AnonymousShmem = NULL; + if (anonshmem->addr != NULL) + { + if (munmap(anonshmem->addr, anonshmem->size) < 0) + elog(LOG, "munmap(%p, %zu) failed: %m", + anonshmem->addr, anonshmem->size); + anonshmem->addr = NULL; + close(anonshmem->fd); + anonshmem->fd = -1; + } } } diff --git a/src/backend/port/win32_sema.c b/src/backend/port/win32_sema.c index ba97c9b2d64..4683736415b 100644 --- a/src/backend/port/win32_sema.c +++ b/src/backend/port/win32_sema.c @@ -44,7 +44,7 @@ PGSemaphoreShmemSize(int maxSemas) * process exits. */ void -PGReserveSemaphores(int maxSemas) +PGReserveSemaphores(int maxSemas, int shmem_segment) { mySemSet = (HANDLE *) malloc(maxSemas * sizeof(HANDLE)); if (mySemSet == NULL) diff --git a/src/backend/port/win32_shmem.c b/src/backend/port/win32_shmem.c index 794e4fcb2ad..034460eab96 100644 --- a/src/backend/port/win32_shmem.c +++ b/src/backend/port/win32_shmem.c @@ -39,15 +39,14 @@ * address space and is negligible relative to the 64-bit address space. */ #define PROTECTIVE_REGION_SIZE (10 * WIN32_STACK_RLIMIT) -void *ShmemProtectiveRegion = NULL; - -HANDLE UsedShmemSegID = INVALID_HANDLE_VALUE; -void *UsedShmemSegAddr = NULL; -static Size UsedShmemSegSize = 0; static bool EnableLockPagesPrivilege(int elevel); static void pgwin32_SharedMemoryDelete(int status, Datum shmId); +PGUsedShmemInfo UsedShmemInfo[NUM_MEMORY_MAPPINGS]; + +static Size UsedShmemSegSizes[NUM_MEMORY_MAPPINGS] = {0}; + /* * Generate shared memory segment name. Expand the data directory, to generate * an identifier unique for this data directory. Then replace all backslashes @@ -202,9 +201,11 @@ EnableLockPagesPrivilege(int elevel) * * Create a shared memory segment of the given size and initialize its * standard header. + * + * TODO: Check that the segment_id is a valid one before indexing corresponding arrays. */ -PGShmemHeader * -PGSharedMemoryCreate(Size size, +void +PGSharedMemoryCreate(int segment_id, MemoryMappingSizes *mapping_sizes, PGShmemHeader **shim) { void *memAddress; @@ -216,13 +217,14 @@ PGSharedMemoryCreate(Size size, DWORD size_high; DWORD size_low; SIZE_T largePageSize = 0; - Size orig_size = size; + Size size = mapping_sizes->shmem_req_size; DWORD flProtect = PAGE_READWRITE; DWORD desiredAccess; + PGUsedShmemInfo *usedShmem = &UsedShmemInfo[segment_id]; - ShmemProtectiveRegion = VirtualAlloc(NULL, PROTECTIVE_REGION_SIZE, - MEM_RESERVE, PAGE_NOACCESS); - if (ShmemProtectiveRegion == NULL) + usedShmem->ShmemProtectiveRegion = VirtualAlloc(NULL, PROTECTIVE_REGION_SIZE, + MEM_RESERVE, PAGE_NOACCESS); + if (usedShmem->ShmemProtectiveRegion == NULL) elog(FATAL, "could not reserve memory region: error code %lu", GetLastError()); @@ -231,8 +233,12 @@ PGSharedMemoryCreate(Size size, szShareMem = GetSharedMemName(); - UsedShmemSegAddr = NULL; + usedShmem->UsedShmemSegAddr = NULL; + /* + * TODO: We don't need to perform this as many times as the number of + * segments. Instead do something similar to sysv_shmem.c + */ if (huge_pages == HUGE_PAGES_ON || huge_pages == HUGE_PAGES_TRY) { /* Does the processor support large pages? */ @@ -304,7 +310,7 @@ retry: * Use the original size, not the rounded-up value, when * falling back to non-huge pages. */ - size = orig_size; + size = mapping_sizes->shmem_req_size; flProtect = PAGE_READWRITE; goto retry; } @@ -337,6 +343,8 @@ retry: if (!hmap) ereport(FATAL, (errmsg("pre-existing shared memory block is still in use"), + errdetail("when trying to create shared memory block for segment \"%s\"", + PGShmemSegmentName(segment)), errhint("Check if there are any old server processes still running, and terminate them."))); free(szShareMem); @@ -393,9 +401,9 @@ retry: hdr->dsm_control = 0; /* Save info for possible future use */ - UsedShmemSegAddr = memAddress; - UsedShmemSegSize = size; - UsedShmemSegID = hmap2; + usedShmem->UsedShmemSegAddr = memAddress; + UsedShmemSegSizes[segment_id] = size; + usedShmem->UsedShmemSegID = (unsigned long) hmap2; /* Register on-exit routine to delete the new segment */ on_shmem_exit(pgwin32_SharedMemoryDelete, PointerGetDatum(hmap2)); @@ -405,8 +413,6 @@ retry: /* Report whether huge pages are in use */ SetConfigOption("huge_pages_status", (flProtect & SEC_LARGE_PAGES) ? "on" : "off", PGC_INTERNAL, PGC_S_DYNAMIC_DEFAULT); - - return hdr; } /* @@ -416,42 +422,52 @@ retry: * an already existing shared memory segment, using the handle inherited from * the postmaster. * - * ShmemProtectiveRegion, UsedShmemSegID and UsedShmemSegAddr are implicit - * parameters to this routine. The caller must have already restored them to - * the postmaster's values. + * Segments is an implicit parameters to this routine. The caller must have + * already restored ShmemProtectiveRegion, UsedShmemSegID and UsedShmemSegAddr + * in each Segment to the postmaster's values. */ void PGSharedMemoryReAttach(void) { PGShmemHeader *hdr; - void *origUsedShmemSegAddr = UsedShmemSegAddr; + void *origUsedShmemSegAddr; - Assert(ShmemProtectiveRegion != NULL); - Assert(UsedShmemSegAddr != NULL); Assert(IsUnderPostmaster); - /* - * Release memory region reservations made by the postmaster - */ - if (VirtualFree(ShmemProtectiveRegion, 0, MEM_RELEASE) == 0) - elog(FATAL, "failed to release reserved memory region (addr=%p): error code %lu", - ShmemProtectiveRegion, GetLastError()); - if (VirtualFree(UsedShmemSegAddr, 0, MEM_RELEASE) == 0) - elog(FATAL, "failed to release reserved memory region (addr=%p): error code %lu", - UsedShmemSegAddr, GetLastError()); - - hdr = (PGShmemHeader *) MapViewOfFileEx(UsedShmemSegID, FILE_MAP_READ | FILE_MAP_WRITE, 0, 0, 0, UsedShmemSegAddr); - if (!hdr) - elog(FATAL, "could not reattach to shared memory (key=%p, addr=%p): error code %lu", - UsedShmemSegID, UsedShmemSegAddr, GetLastError()); - if (hdr != origUsedShmemSegAddr) - elog(FATAL, "reattaching to shared memory returned unexpected address (got %p, expected %p)", - hdr, origUsedShmemSegAddr); - if (hdr->magic != PGShmemMagic) - elog(FATAL, "reattaching to shared memory returned non-PostgreSQL memory"); - dsm_set_control_handle(hdr->dsm_control); - - UsedShmemSegAddr = hdr; /* probably redundant */ + for (int i = 0; i < NUM_MEMORY_MAPPINGS; i++) + { + PGUsedShmemInfo *usedShmem = &UsedShmemInfo[i]; + + Assert(usedShmem->ShmemProtectiveRegion != NULL); + Assert(usedShmem->UsedShmemSegAddr != NULL); + + origUsedShmemSegAddr = usedShmem->UsedShmemSegAddr; + + /* + * Release memory region reservations made by the postmaster + */ + if (VirtualFree(usedShmem->ShmemProtectiveRegion, 0, MEM_RELEASE) == 0) + elog(FATAL, "failed to release reserved memory region (addr=%p): error code %lu", + usedShmem->ShmemProtectiveRegion, GetLastError()); + if (VirtualFree(usedShmem->UsedShmemSegAddr, 0, MEM_RELEASE) == 0) + elog(FATAL, "failed to release reserved memory region (addr=%p): error code %lu", + usedShmem->UsedShmemSegAddr, GetLastError()); + + hdr = (PGShmemHeader *) MapViewOfFileEx(usedShmem->UsedShmemSegID, FILE_MAP_READ | FILE_MAP_WRITE, 0, 0, 0, usedShmem->UsedShmemSegAddr); + if (!hdr) + elog(FATAL, "could not reattach to shared memory (key=%p, addr=%p): error code %lu", + usedShmem->UsedShmemSegID, usedShmem->UsedShmemSegAddr, GetLastError()); + if (hdr != origUsedShmemSegAddr) + elog(FATAL, "reattaching to shared memory returned unexpected address (got %p, expected %p)", + hdr, origUsedShmemSegAddr); + if (hdr->magic != PGShmemMagic) + elog(FATAL, "reattaching to shared memory returned non-PostgreSQL memory"); + /* Re-establish dsm_control mapping, if any */ + if (hdr->dsm_control != 0) + dsm_set_control_handle(hdr->dsm_control); + + usedShmem->UsedShmemSegAddr = hdr; /* probably redundant */ + } } /* @@ -464,22 +480,28 @@ PGSharedMemoryReAttach(void) * The child process startup logic might or might not call PGSharedMemoryDetach * after this; make sure that it will be a no-op if called. * - * ShmemProtectiveRegion, UsedShmemSegID and UsedShmemSegAddr are implicit - * parameters to this routine. The caller must have already restored them to - * the postmaster's values. + * Segments is an implicit parameters to this routine. The caller must have + * already restored ShmemProtectiveRegion and UsedShmemSegAddr + * in each Segment to the postmaster's values. */ void PGSharedMemoryNoReAttach(void) { - Assert(ShmemProtectiveRegion != NULL); - Assert(UsedShmemSegAddr != NULL); Assert(IsUnderPostmaster); + for (int i = 0; i < NUM_MEMORY_MAPPINGS; i++) + { + PGUsedShmemInfo *usedShmem = &UsedShmemInfo[i]; - /* - * Under Windows we will not have mapped the segment, so we don't need to - * un-map it. Just reset UsedShmemSegAddr to show we're not attached. - */ - UsedShmemSegAddr = NULL; + Assert(usedShmem->ShmemProtectiveRegion != NULL); + Assert(usedShmem->UsedShmemSegAddr != NULL); + + /* + * Under Windows we will not have mapped the segment, so we don't need + * to un-map it. Just reset UsedShmemSegAddr to show we're not + * attached. + */ + usedShmem->UsedShmemSegAddr = NULL; + } /* * We *must* close the inherited shmem segment handle, else Windows will @@ -492,49 +514,55 @@ PGSharedMemoryNoReAttach(void) /* * PGSharedMemoryDetach * - * Detach from the shared memory segment, if still attached. This is not + * Detach from the shared memory segments, if still attached. This is not * intended to be called explicitly by the process that originally created the - * segment (it will have an on_shmem_exit callback registered to do that). - * Rather, this is for subprocesses that have inherited an attachment and want - * to get rid of it. + * segments (it will have an on_shmem_exit callback registered to do that). + * Rather, this is for subprocesses that have inherited an attachment and want to + * get rid of it. * - * ShmemProtectiveRegion, UsedShmemSegID and UsedShmemSegAddr are implicit - * parameters to this routine. + * UsedShmemInfo is an implicit parameters to this routine. The caller must have + * already restored ShmemProtectiveRegion, UsedShmemSegID and UsedShmemSegAddr in + * each Segment to the postmaster's values. */ void PGSharedMemoryDetach(void) { - /* - * Releasing the protective region liberates an unimportant quantity of - * address space, but be tidy. - */ - if (ShmemProtectiveRegion != NULL) + for (int i = 0; i < NUM_MEMORY_MAPPINGS; i++) { - if (VirtualFree(ShmemProtectiveRegion, 0, MEM_RELEASE) == 0) - elog(LOG, "failed to release reserved memory region (addr=%p): error code %lu", - ShmemProtectiveRegion, GetLastError()); + PGUsedShmemInfo *segment = &UsedShmemInfo[i]; - ShmemProtectiveRegion = NULL; - } + /* + * Releasing the protective region liberates an unimportant quantity + * of address space, but be tidy. + */ + if (segment->ShmemProtectiveRegion != NULL) + { + if (VirtualFree(segment->ShmemProtectiveRegion, 0, MEM_RELEASE) == 0) + elog(LOG, "failed to release reserved memory region (addr=%p): error code %lu", + segment->ShmemProtectiveRegion, GetLastError()); - /* Unmap the view, if it's mapped */ - if (UsedShmemSegAddr != NULL) - { - if (!UnmapViewOfFile(UsedShmemSegAddr)) - elog(LOG, "could not unmap view of shared memory: error code %lu", - GetLastError()); + segment->ShmemProtectiveRegion = NULL; + } - UsedShmemSegAddr = NULL; - } + /* Unmap the view, if it's mapped */ + if (segment->UsedShmemSegAddr != NULL) + { + if (!UnmapViewOfFile(segment->UsedShmemSegAddr)) + elog(LOG, "could not unmap view of shared memory: error code %lu", + GetLastError()); - /* And close the shmem handle, if we have one */ - if (UsedShmemSegID != INVALID_HANDLE_VALUE) - { - if (!CloseHandle(UsedShmemSegID)) - elog(LOG, "could not close handle to shared memory: error code %lu", - GetLastError()); + segment->UsedShmemSegAddr = NULL; + } - UsedShmemSegID = INVALID_HANDLE_VALUE; + /* And close the shmem handle, if we have one */ + if (segment->UsedShmemSegID != INVALID_HANDLE_VALUE) + { + if (!CloseHandle(segment->UsedShmemSegID)) + elog(LOG, "could not close handle to shared memory: error code %lu", + GetLastError()); + + segment->UsedShmemSegID = INVALID_HANDLE_VALUE; + } } } @@ -574,50 +602,55 @@ pgwin32_ReserveSharedMemoryRegion(HANDLE hChild) { void *address; - Assert(ShmemProtectiveRegion != NULL); - Assert(UsedShmemSegAddr != NULL); - Assert(UsedShmemSegSize != 0); - - /* ShmemProtectiveRegion */ - address = VirtualAllocEx(hChild, ShmemProtectiveRegion, - PROTECTIVE_REGION_SIZE, - MEM_RESERVE, PAGE_NOACCESS); - if (address == NULL) + for (int i = 0; i < NUM_MEMORY_MAPPINGS; i++) { - /* Don't use FATAL since we're running in the postmaster */ - elog(LOG, "could not reserve shared memory region (addr=%p) for child %p: error code %lu", - ShmemProtectiveRegion, hChild, GetLastError()); - return false; - } - if (address != ShmemProtectiveRegion) - { - /* - * Should never happen - in theory if allocation granularity causes - * strange effects it could, so check just in case. - * - * Don't use FATAL since we're running in the postmaster. - */ - elog(LOG, "reserved shared memory region got incorrect address %p, expected %p", - address, ShmemProtectiveRegion); - return false; - } + PGUsedShmemInfo *segment = &UsedShmemInfo[i]; - /* UsedShmemSegAddr */ - address = VirtualAllocEx(hChild, UsedShmemSegAddr, UsedShmemSegSize, - MEM_RESERVE, PAGE_READWRITE); - if (address == NULL) - { - elog(LOG, "could not reserve shared memory region (addr=%p) for child %p: error code %lu", - UsedShmemSegAddr, hChild, GetLastError()); - return false; - } - if (address != UsedShmemSegAddr) - { - elog(LOG, "reserved shared memory region got incorrect address %p, expected %p", - address, UsedShmemSegAddr); - return false; - } + Assert(segment->ShmemProtectiveRegion != NULL); + Assert(segment->UsedShmemSegAddr != NULL); + Assert(UsedShmemSegSizes[i] != 0); + + /* ShmemProtectiveRegion */ + address = VirtualAllocEx(hChild, segment->ShmemProtectiveRegion, + PROTECTIVE_REGION_SIZE, + MEM_RESERVE, PAGE_NOACCESS); + if (address == NULL) + { + /* Don't use FATAL since we're running in the postmaster */ + elog(LOG, "could not reserve shared memory region (addr=%p) for child %p: error code %lu", + segment->ShmemProtectiveRegion, hChild, GetLastError()); + return false; + } + if (address != segment->ShmemProtectiveRegion) + { + /* + * Should never happen - in theory if allocation granularity + * causes strange effects it could, so check just in case. + * + * Don't use FATAL since we're running in the postmaster. + */ + elog(LOG, "reserved shared memory region got incorrect address %p, expected %p", + address, segment->ShmemProtectiveRegion); + return false; + } + + /* UsedShmemSegAddr */ + address = VirtualAllocEx(hChild, segment->UsedShmemSegAddr, UsedShmemSegSizes[i], + MEM_RESERVE, PAGE_READWRITE); + if (address == NULL) + { + elog(LOG, "could not reserve shared memory region (addr=%p) for child %p: error code %lu", + segment->UsedShmemSegAddr, hChild, GetLastError()); + return false; + } + if (address != segment->UsedShmemSegAddr) + { + elog(LOG, "reserved shared memory region got incorrect address %p, expected %p", + address, segment->UsedShmemSegAddr); + return false; + } + } return true; } @@ -627,7 +660,7 @@ pgwin32_ReserveSharedMemoryRegion(HANDLE hChild) * use GetLargePageMinimum() instead. */ void -GetHugePageSize(Size *hugepagesize, int *mmap_flags) +GetHugePageSize(Size *hugepagesize, int *mmap_flags, int *memfd_flags) { if (hugepagesize) *hugepagesize = 0; diff --git a/src/backend/postmaster/launch_backend.c b/src/backend/postmaster/launch_backend.c index 926fd6f2700..b58ae118af1 100644 --- a/src/backend/postmaster/launch_backend.c +++ b/src/backend/postmaster/launch_backend.c @@ -89,13 +89,7 @@ typedef int InheritableSocket; typedef struct { char DataDir[MAXPGPATH]; -#ifndef WIN32 - unsigned long UsedShmemSegID; -#else - void *ShmemProtectiveRegion; - HANDLE UsedShmemSegID; -#endif - void *UsedShmemSegAddr; + PGUsedShmemInfo UsedShmemInfo[NUM_MEMORY_MAPPINGS]; #ifdef USE_INJECTION_POINTS struct InjectionPointsCtl *ActiveInjectionPoints; #endif @@ -677,8 +671,13 @@ SubPostmasterMain(int argc, char *argv[]) process_shared_preload_libraries(); /* Restore basic shared memory pointers */ - if (UsedShmemSegAddr != NULL) - InitShmemAllocator(UsedShmemSegAddr); + for (int i = 0; i < NUM_MEMORY_MAPPINGS; i++) + { + PGUsedShmemInfo *usedShmem = &UsedShmemInfo[i]; + + if (usedShmem->UsedShmemSegAddr != NULL) + InitShmemAllocator(i, usedShmem->UsedShmemSegAddr); + } /* * Run the appropriate Main function @@ -719,12 +718,7 @@ save_backend_variables(BackendParameters *param, strlcpy(param->DataDir, DataDir, MAXPGPATH); param->MyPMChildSlot = child_slot; - -#ifdef WIN32 - param->ShmemProtectiveRegion = ShmemProtectiveRegion; -#endif - param->UsedShmemSegID = UsedShmemSegID; - param->UsedShmemSegAddr = UsedShmemSegAddr; + memcpy(param->UsedShmemInfo, UsedShmemInfo, sizeof(UsedShmemInfo)); #ifdef USE_INJECTION_POINTS param->ActiveInjectionPoints = ActiveInjectionPoints; @@ -979,12 +973,7 @@ restore_backend_variables(BackendParameters *param) SetDataDir(param->DataDir); MyPMChildSlot = param->MyPMChildSlot; - -#ifdef WIN32 - ShmemProtectiveRegion = param->ShmemProtectiveRegion; -#endif - UsedShmemSegID = param->UsedShmemSegID; - UsedShmemSegAddr = param->UsedShmemSegAddr; + memcpy(UsedShmemInfo, param->UsedShmemInfo, sizeof(UsedShmemInfo)); #ifdef USE_INJECTION_POINTS ActiveInjectionPoints = param->ActiveInjectionPoints; diff --git a/src/backend/storage/buffer/buf_init.c b/src/backend/storage/buffer/buf_init.c index c0c223b2e32..42112109af9 100644 --- a/src/backend/storage/buffer/buf_init.c +++ b/src/backend/storage/buffer/buf_init.c @@ -17,6 +17,7 @@ #include "storage/aio.h" #include "storage/buf_internals.h" #include "storage/bufmgr.h" +#include "storage/pg_shmem.h" #include "storage/proclist.h" BufferDescPadded *BufferDescriptors; @@ -56,6 +57,10 @@ CkptSortItem *CkptBufferIds; * Pins must be released before end of transaction. For efficiency the * shared refcount isn't increased if an individual backend pins a buffer * multiple times. Check the PrivateRefCount infrastructure in bufmgr.c. + * + * All the data structures except the buffer blocks are allocated in the main + * shared memory segment. The buffer blocks are allocated in a separate segment + * to allow dynamic resizing of the buffer pool. */ @@ -75,22 +80,22 @@ BufferManagerShmemInit(void) /* Align descriptors to a cacheline boundary. */ BufferDescriptors = (BufferDescPadded *) - ShmemInitStruct("Buffer Descriptors", - NBuffers * sizeof(BufferDescPadded), - &foundDescs); + ShmemInitStructInSegment("Buffer Descriptors", + NBuffers * sizeof(BufferDescPadded), + &foundDescs, MAIN_SHMEM_SEGMENT); /* Align buffer pool on IO page size boundary. */ BufferBlocks = (char *) TYPEALIGN(PG_IO_ALIGN_SIZE, - ShmemInitStruct("Buffer Blocks", - NBuffers * (Size) BLCKSZ + PG_IO_ALIGN_SIZE, - &foundBufs)); + ShmemInitStructInSegment("Buffer Blocks", + NBuffers * (Size) BLCKSZ + PG_IO_ALIGN_SIZE, + &foundBufs, BUFFERS_SHMEM_SEGMENT)); /* Align condition variables to cacheline boundary. */ BufferIOCVArray = (ConditionVariableMinimallyPadded *) - ShmemInitStruct("Buffer IO Condition Variables", - NBuffers * sizeof(ConditionVariableMinimallyPadded), - &foundIOCV); + ShmemInitStructInSegment("Buffer IO Condition Variables", + NBuffers * sizeof(ConditionVariableMinimallyPadded), + &foundIOCV, MAIN_SHMEM_SEGMENT); /* * The array used to sort to-be-checkpointed buffer ids is located in @@ -100,8 +105,9 @@ BufferManagerShmemInit(void) * painful. */ CkptBufferIds = (CkptSortItem *) - ShmemInitStruct("Checkpoint BufferIds", - NBuffers * sizeof(CkptSortItem), &foundBufCkpt); + ShmemInitStructInSegment("Checkpoint BufferIds", + NBuffers * sizeof(CkptSortItem), &foundBufCkpt, + MAIN_SHMEM_SEGMENT); if (foundDescs || foundBufs || foundIOCV || foundBufCkpt) { @@ -147,21 +153,28 @@ BufferManagerShmemInit(void) * * compute the size of shared memory for the buffer pool including * data pages, buffer descriptors, hash tables, etc. + * + * The function adds the amount of required memory for buffer blocks to + * BUFFERS_SHMEM_SEGMENT segment. Amount of memory required for other structures + * is returned. */ Size -BufferManagerShmemSize(void) +BufferManagerShmemSize(MemoryMappingSizes *mapping_sizes) { - Size size = 0; + size_t size; + + /* size of data pages, plus alignment padding */ + size = add_size(0, PG_IO_ALIGN_SIZE); + size = add_size(size, mul_size(NBuffers, BLCKSZ)); + mapping_sizes[BUFFERS_SHMEM_SEGMENT].shmem_req_size = size; + mapping_sizes[BUFFERS_SHMEM_SEGMENT].shmem_reserved = size; + size = 0; /* size of buffer descriptors */ size = add_size(size, mul_size(NBuffers, sizeof(BufferDescPadded))); /* to allow aligning buffer descriptors */ size = add_size(size, PG_CACHE_LINE_SIZE); - /* size of data pages, plus alignment padding */ - size = add_size(size, PG_IO_ALIGN_SIZE); - size = add_size(size, mul_size(NBuffers, BLCKSZ)); - /* size of stuff controlled by freelist.c */ size = add_size(size, StrategyShmemSize()); diff --git a/src/backend/storage/buffer/buf_table.c b/src/backend/storage/buffer/buf_table.c index 5089c7322f3..a33786a460b 100644 --- a/src/backend/storage/buffer/buf_table.c +++ b/src/backend/storage/buffer/buf_table.c @@ -25,6 +25,7 @@ #include "funcapi.h" #include "storage/buf_internals.h" #include "storage/lwlock.h" +#include "storage/pg_shmem.h" #include "utils/rel.h" #include "utils/builtins.h" diff --git a/src/backend/storage/buffer/freelist.c b/src/backend/storage/buffer/freelist.c index b7687836188..403890055be 100644 --- a/src/backend/storage/buffer/freelist.c +++ b/src/backend/storage/buffer/freelist.c @@ -19,6 +19,7 @@ #include "port/atomics.h" #include "storage/buf_internals.h" #include "storage/bufmgr.h" +#include "storage/pg_shmem.h" #include "storage/proc.h" #define INT_ACCESS_ONCE(var) ((int)(*((volatile int *)&(var)))) @@ -418,9 +419,9 @@ StrategyInitialize(bool init) * Get or create the shared strategy control block */ StrategyControl = (BufferStrategyControl *) - ShmemInitStruct("Buffer Strategy Status", - sizeof(BufferStrategyControl), - &found); + ShmemInitStructInSegment("Buffer Strategy Status", + sizeof(BufferStrategyControl), + &found, MAIN_SHMEM_SEGMENT); if (!found) { diff --git a/src/backend/storage/ipc/ipc.c b/src/backend/storage/ipc/ipc.c index cb944edd8df..4af7782795e 100644 --- a/src/backend/storage/ipc/ipc.c +++ b/src/backend/storage/ipc/ipc.c @@ -62,6 +62,8 @@ static void proc_exit_prepare(int code); * but provide some additional features we need --- in particular, * we want to register callbacks to invoke when we are disconnecting * from a broken shared-memory context but not exiting the postmaster. + * Maximum number of such exit callbacks depends on the number of shared + * segments. * * Callback functions can take zero, one, or two args: the first passed * arg is the integer exitcode, the second is the Datum supplied when @@ -69,7 +71,7 @@ static void proc_exit_prepare(int code); * ---------------------------------------------------------------- */ -#define MAX_ON_EXITS 20 +#define MAX_ON_EXITS 40 struct ONEXIT { diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c index 1f7e933d500..6d2c4520c43 100644 --- a/src/backend/storage/ipc/ipci.c +++ b/src/backend/storage/ipc/ipci.c @@ -50,6 +50,7 @@ #include "storage/procarray.h" #include "storage/procsignal.h" #include "storage/sinvaladt.h" +#include "utils/builtins.h" #include "utils/guc.h" #include "utils/injection_point.h" @@ -81,10 +82,17 @@ RequestAddinShmemSpace(Size size) /* * CalculateShmemSize - * Calculates the amount of shared memory needed. + * Calculates the amount of shared memory needed. + * + * The amount of shared memory required per segment is saved in mapping_sizes, + * which is expected to be an array of size NUM_MEMORY_MAPPINGS. The total + * amount of memory needed across all the segments is returned. For the memory + * mappings which reserve address space for future expansion, the required + * amount of reserved space is saved in mapping_sizes of those segments. + * This memory is not included in the returned value. */ Size -CalculateShmemSize(void) +CalculateShmemSize(MemoryMappingSizes *mapping_sizes) { Size size; @@ -102,7 +110,13 @@ CalculateShmemSize(void) sizeof(ShmemIndexEnt))); size = add_size(size, dsm_estimate_size()); size = add_size(size, DSMRegistryShmemSize()); - size = add_size(size, BufferManagerShmemSize()); + + /* + * Buffer manager adds estimates for memory requirements for every shared + * memory segment that it uses in the corresponding AnonymousMappings. + * Consider size required from only the main shared memory segment here. + */ + size = add_size(size, BufferManagerShmemSize(mapping_sizes)); size = add_size(size, LockManagerShmemSize()); size = add_size(size, PredicateLockShmemSize()); size = add_size(size, ProcGlobalShmemSize()); @@ -145,8 +159,22 @@ CalculateShmemSize(void) /* include additional requested shmem from preload libraries */ size = add_size(size, total_addin_request); + /* + * All the shared memory allocations considered so far happen in the main + * shared memory segment. + */ + mapping_sizes[MAIN_SHMEM_SEGMENT].shmem_req_size = size; + mapping_sizes[MAIN_SHMEM_SEGMENT].shmem_reserved = size; + + size = 0; /* might as well round it off to a multiple of a typical page size */ - size = add_size(size, 8192 - (size % 8192)); + for (int segment = 0; segment < NUM_MEMORY_MAPPINGS; segment++) + { + mapping_sizes[segment].shmem_req_size = add_size(mapping_sizes[segment].shmem_req_size, 8192 - (mapping_sizes[segment].shmem_req_size % 8192)); + mapping_sizes[segment].shmem_reserved = add_size(mapping_sizes[segment].shmem_reserved, 8192 - (mapping_sizes[segment].shmem_reserved % 8192)); + /* Compute the total size of all segments */ + size = size + mapping_sizes[segment].shmem_req_size; + } return size; } @@ -185,25 +213,21 @@ AttachSharedMemoryStructs(void) /* * CreateSharedMemoryAndSemaphores - * Creates and initializes shared memory and semaphores. + * Creates shared memory segments and initializes shared memory structures + * and semaphores. */ void CreateSharedMemoryAndSemaphores(void) { - PGShmemHeader *shim; - PGShmemHeader *seghdr; - Size size; + PGShmemHeader *main_seg_shim = NULL; + MemoryMappingSizes mapping_sizes[NUM_MEMORY_MAPPINGS]; Assert(!IsUnderPostmaster); - /* Compute the size of the shared-memory block */ - size = CalculateShmemSize(); - elog(DEBUG3, "invoking IpcMemoryCreate(size=%zu)", size); + CalculateShmemSize(mapping_sizes); - /* - * Create the shmem segment - */ - seghdr = PGSharedMemoryCreate(size, &shim); + /* Decide if we use huge pages or regular size pages */ + PrepareHugePages(); /* * Make sure that huge pages are never reported as "unknown" while the @@ -212,16 +236,42 @@ CreateSharedMemoryAndSemaphores(void) Assert(strcmp("unknown", GetConfigOption("huge_pages_status", false, false)) != 0); - /* - * Set up shared memory allocation mechanism - */ - InitShmemAllocator(seghdr); + for (int i = 0; i < NUM_MEMORY_MAPPINGS; i++) + { + MemoryMappingSizes *mapping = &mapping_sizes[i]; + PGUsedShmemInfo *usedShmem = &UsedShmemInfo[i]; + PGShmemHeader *shim; + PGShmemHeader *seghdr; + + /* + * Set seed shmem identifier which will be changed to the final one + * when creating the shared memory segment. + */ + usedShmem->UsedShmemSegID = i; + + /* Compute the size of the shared-memory block */ + elog(DEBUG3, "invoking IpcMemoryCreate(segment %s, size=%zu, reserved address space=%zu)", + MappingName(i), mapping->shmem_req_size, mapping->shmem_reserved); + + /* + * Create the shmem segment. + */ + seghdr = PGSharedMemoryCreate(i, mapping, &shim); + + /* + * Set up shared memory allocation mechanism + */ + InitShmemAllocator(i, seghdr); + + if (i == MAIN_SHMEM_SEGMENT) + main_seg_shim = shim; + } /* Initialize subsystems */ CreateOrAttachShmemStructs(); /* Initialize dynamic shared memory facilities. */ - dsm_postmaster_startup(shim); + dsm_postmaster_startup(main_seg_shim); /* * Now give loadable modules a chance to set up their shmem allocations @@ -334,7 +384,9 @@ CreateOrAttachShmemStructs(void) * InitializeShmemGUCs * * This function initializes runtime-computed GUCs related to the amount of - * shared memory required for the current configuration. + * shared memory required for the current configuration. It assumes that the + * memory required by the shared memory segments is already calculated and is + * available in AnonymousMappings. */ void InitializeShmemGUCs(void) @@ -343,11 +395,13 @@ InitializeShmemGUCs(void) Size size_b; Size size_mb; Size hp_size; + MemoryMappingSizes mapping_sizes[NUM_MEMORY_MAPPINGS]; + /* * Calculate the shared memory size and round up to the nearest megabyte. */ - size_b = CalculateShmemSize(); + size_b = CalculateShmemSize(mapping_sizes); size_mb = add_size(size_b, (1024 * 1024) - 1) / (1024 * 1024); sprintf(buf, "%zu", size_mb); SetConfigOption("shared_memory_size", buf, @@ -356,7 +410,7 @@ InitializeShmemGUCs(void) /* * Calculate the number of huge pages required. */ - GetHugePageSize(&hp_size, NULL); + GetHugePageSize(&hp_size, NULL, NULL); if (hp_size != 0) { Size hp_required; diff --git a/src/backend/storage/ipc/shmem.c b/src/backend/storage/ipc/shmem.c index 9f362ce8641..5cb59d97871 100644 --- a/src/backend/storage/ipc/shmem.c +++ b/src/backend/storage/ipc/shmem.c @@ -63,6 +63,7 @@ * unnecessary. */ + #include "postgres.h" #include "common/int.h" @@ -93,17 +94,28 @@ typedef struct ShmemAllocatorData slock_t shmem_lock; } ShmemAllocatorData; -static void *ShmemAllocRaw(Size size, Size *allocated_size); +/* Structure managing one shared memory segment. */ +typedef struct ShmemSegment +{ + PGShmemHeader *ShmemSegHdr; /* shared mem segment header */ + ShmemAllocatorData *ShmemAllocator; + void *ShmemBase; /* start address of shared memory */ + const char *ShmemSegmentName; /* name of the segment for logging */ +} ShmemSegment; + +ShmemSegment Segments[NUM_MEMORY_MAPPINGS]; -/* shared memory global variables */ +static void *ShmemAllocRaw(ShmemSegment *segment, Size size, Size *allocated_size); -static PGShmemHeader *ShmemSegHdr; /* shared mem segment header */ -static void *ShmemBase; /* start address of shared memory */ -static void *ShmemEnd; /* end+1 address of shared memory */ +/* Expose ShmemLock from the main segment for allocating LWLock tranches. */ +slock_t *ShmemLock; -static ShmemAllocatorData *ShmemAllocator; -slock_t *ShmemLock; /* points to ShmemAllocator->shmem_lock */ -static HTAB *ShmemIndex = NULL; /* primary index hashtable for shmem */ +/* + * Primary index hashtable for shmem, for simplicity we use a single for all + * shared memory segments. There can be performance consequences of that, and + * an alternative option would be to have one index per shared memory segments. + */ +static HTAB *ShmemIndex = NULL; /* To get reliable results for NUMA inquiry we need to "touch pages" once */ static bool firstNumaTouch = true; @@ -111,7 +123,7 @@ static bool firstNumaTouch = true; Datum pg_numa_available(PG_FUNCTION_ARGS); /* - * InitShmemAllocator() --- set up basic pointers to shared memory. + * InitShmemAllocator() --- set up basic pointers to shared memory in the given segment. * * Called at postmaster or stand-alone backend startup, to initialize the * allocator's data structure in the shared memory segment. In EXEC_BACKEND, @@ -119,9 +131,13 @@ Datum pg_numa_available(PG_FUNCTION_ARGS); * memory areas. */ void -InitShmemAllocator(PGShmemHeader *seghdr) +InitShmemAllocator(int segment_id, PGShmemHeader *seghdr) { + ShmemSegment *segment; + Assert(seghdr != NULL); + Assert(segment_id >= 0 && segment_id < NUM_MEMORY_MAPPINGS); + segment = &Segments[segment_id]; /* * We assume the pointer and offset are MAXALIGN. Not a hard requirement, @@ -130,23 +146,24 @@ InitShmemAllocator(PGShmemHeader *seghdr) Assert(seghdr == (void *) MAXALIGN(seghdr)); Assert(seghdr->content_offset == MAXALIGN(seghdr->content_offset)); - ShmemSegHdr = seghdr; - ShmemBase = seghdr; - ShmemEnd = (char *) ShmemBase + seghdr->totalsize; + segment->ShmemSegHdr = seghdr; + segment->ShmemBase = seghdr; + segment->ShmemSegmentName = MappingName(segment_id); #ifndef EXEC_BACKEND Assert(!IsUnderPostmaster); #endif if (IsUnderPostmaster) { - PGShmemHeader *shmhdr = ShmemSegHdr; + PGShmemHeader *shmhdr = segment->ShmemSegHdr; + + segment->ShmemAllocator = (ShmemAllocatorData *) ((char *) shmhdr + shmhdr->content_offset); - ShmemAllocator = (ShmemAllocatorData *) ((char *) shmhdr + shmhdr->content_offset); - ShmemLock = &ShmemAllocator->shmem_lock; } else { Size offset; + ShmemAllocatorData *ShmemAllocator; /* * Allocations after this point should go through ShmemAlloc, which @@ -163,47 +180,68 @@ InitShmemAllocator(PGShmemHeader *seghdr) ShmemAllocator = (ShmemAllocatorData *) ((char *) seghdr + seghdr->content_offset); SpinLockInit(&ShmemAllocator->shmem_lock); - ShmemLock = &ShmemAllocator->shmem_lock; ShmemAllocator->free_offset = offset; /* ShmemIndex can't be set up yet (need LWLocks first) */ ShmemAllocator->index = NULL; + + segment->ShmemAllocator = ShmemAllocator; ShmemIndex = (HTAB *) NULL; } + + /* Expose ShmemLock from the main segment for allocating LWLock tranches. */ + if (segment_id == MAIN_SHMEM_SEGMENT) + ShmemLock = &segment->ShmemAllocator->shmem_lock; } /* - * ShmemAlloc -- allocate max-aligned chunk from shared memory + * ShmemAlloc -- + * allocate max-aligned chunk from given shared memory segment * * Throws error if request cannot be satisfied. * - * Assumes ShmemLock and ShmemSegHdr are initialized. + * Assumes ShmemLock and ShmemSegHdr in the given segment are initialized. */ -void * -ShmemAlloc(Size size) + +static void * +ShmemAllocInternal(ShmemSegment *segment, Size size) { void *newSpace; Size allocated_size; - newSpace = ShmemAllocRaw(size, &allocated_size); + newSpace = ShmemAllocRaw(segment, size, &allocated_size); if (!newSpace) ereport(ERROR, (errcode(ERRCODE_OUT_OF_MEMORY), - errmsg("out of shared memory (%zu bytes requested)", - size))); + errmsg("out of shared memory in segment %s (%zu bytes requested)", + segment->ShmemSegmentName, size))); return newSpace; } +void * +ShmemAlloc(int segment_id, Size size) +{ + Assert(segment_id >= 0 && segment_id < NUM_MEMORY_MAPPINGS); + + return ShmemAllocInternal(&Segments[segment_id], size); +} + /* * ShmemAllocNoError -- allocate max-aligned chunk from shared memory * * As ShmemAlloc, but returns NULL if out of space, rather than erroring. + * + * This is used as a memory allocation callback for hash tables created using + * dynahash.c APIs. It's a bit of work to make the callback specify the segment + * where to allocate the memory. For now, there is not need to create shared + * memory hash tables in shared memory segments other than main memory segment. + * Hence we do not support segment_id parameter here. */ void * ShmemAllocNoError(Size size) { Size allocated_size; - return ShmemAllocRaw(size, &allocated_size); + return ShmemAllocRaw(&Segments[MAIN_SHMEM_SEGMENT], size, &allocated_size); } /* @@ -213,11 +251,13 @@ ShmemAllocNoError(Size size) * be equal to the number requested plus any padding we choose to add. */ static void * -ShmemAllocRaw(Size size, Size *allocated_size) +ShmemAllocRaw(ShmemSegment *segment, Size size, Size *allocated_size) { Size newStart; Size newFree; void *newSpace; + PGShmemHeader *shmhdr = segment->ShmemSegHdr; + ShmemAllocatorData *ShmemAllocator = segment->ShmemAllocator; /* * Ensure all space is adequately aligned. We used to only MAXALIGN this @@ -233,22 +273,21 @@ ShmemAllocRaw(Size size, Size *allocated_size) size = CACHELINEALIGN(size); *allocated_size = size; - Assert(ShmemSegHdr != NULL); + Assert(shmhdr != NULL); - SpinLockAcquire(ShmemLock); + SpinLockAcquire(&ShmemAllocator->shmem_lock); newStart = ShmemAllocator->free_offset; - newFree = newStart + size; - if (newFree <= ShmemSegHdr->totalsize) + if (newFree <= shmhdr->totalsize) { - newSpace = (char *) ShmemBase + newStart; + newSpace = (char *) segment->ShmemBase + newStart; ShmemAllocator->free_offset = newFree; } else newSpace = NULL; - SpinLockRelease(ShmemLock); + SpinLockRelease(&ShmemAllocator->shmem_lock); /* note this assert is okay with newSpace == NULL */ Assert(newSpace == (void *) CACHELINEALIGN(newSpace)); @@ -257,14 +296,23 @@ ShmemAllocRaw(Size size, Size *allocated_size) } /* - * ShmemAddrIsValid -- test if an address refers to shared memory + * ShmemAddrIsValid + * test if an address refers to the given shared memory segment. * * Returns true if the pointer points within the shared memory segment. */ bool -ShmemAddrIsValid(const void *addr) +ShmemAddrIsValid(int segment_id, const void *addr) { - return (addr >= ShmemBase) && (addr < ShmemEnd); + ShmemSegment *segment; + void *shmemEnd; + + Assert(segment_id >= 0 && segment_id < NUM_MEMORY_MAPPINGS); + + segment = &Segments[segment_id]; + shmemEnd = (char *) segment->ShmemBase + segment->ShmemSegHdr->totalsize; + + return (addr >= segment->ShmemBase) && (addr < shmemEnd); } /* @@ -318,6 +366,9 @@ InitShmemIndex(void) * Note: before Postgres 9.0, this function returned NULL for some failure * cases. Now, it always throws error instead, so callers need not check * for NULL. + * + * See prologue of ShmemAllocNoError for explanation about lack of segment_id + * parameter. */ HTAB * ShmemInitHash(const char *name, /* table string name for shmem index */ @@ -341,9 +392,9 @@ ShmemInitHash(const char *name, /* table string name for shmem index */ hash_flags |= HASH_SHARED_MEM | HASH_ALLOC | HASH_DIRSIZE; /* look it up in the shmem index */ - location = ShmemInitStruct(name, - hash_get_shared_size(infoP, hash_flags), - &found); + location = ShmemInitStructInSegment(name, + hash_get_shared_size(infoP, hash_flags), + &found, MAIN_SHMEM_SEGMENT); /* * if it already exists, attach to it rather than allocate and initialize @@ -376,15 +427,32 @@ ShmemInitHash(const char *name, /* table string name for shmem index */ */ void * ShmemInitStruct(const char *name, Size size, bool *foundPtr) +{ + return ShmemInitStructInSegment(name, size, foundPtr, MAIN_SHMEM_SEGMENT); +} + +void * +ShmemInitStructInSegment(const char *name, Size size, bool *foundPtr, int segment_id) { ShmemIndexEnt *result; void *structPtr; + ShmemSegment *segment; + + Assert(segment_id >= 0 && segment_id < NUM_MEMORY_MAPPINGS); + + segment = &Segments[segment_id]; LWLockAcquire(ShmemIndexLock, LW_EXCLUSIVE); if (!ShmemIndex) { - /* Must be trying to create/attach to ShmemIndex itself */ + ShmemAllocatorData *ShmemAllocator = segment->ShmemAllocator; + + /* + * Must be trying to create/attach to ShmemIndex itself in the main + * shared memory segment. + */ + Assert(segment_id == MAIN_SHMEM_SEGMENT); Assert(strcmp(name, "ShmemIndex") == 0); if (IsUnderPostmaster) @@ -405,7 +473,7 @@ ShmemInitStruct(const char *name, Size size, bool *foundPtr) * process can be accessing shared memory yet. */ Assert(ShmemAllocator->index == NULL); - structPtr = ShmemAlloc(size); + structPtr = ShmemAllocInternal(segment, size); ShmemAllocator->index = structPtr; *foundPtr = false; } @@ -422,8 +490,8 @@ ShmemInitStruct(const char *name, Size size, bool *foundPtr) LWLockRelease(ShmemIndexLock); ereport(ERROR, (errcode(ERRCODE_OUT_OF_MEMORY), - errmsg("could not create ShmemIndex entry for data structure \"%s\"", - name))); + errmsg("could not create ShmemIndex entry for data structure \"%s\" in segment %d", + name, segment_id))); } if (*foundPtr) @@ -448,7 +516,7 @@ ShmemInitStruct(const char *name, Size size, bool *foundPtr) Size allocated_size; /* It isn't in the table yet. allocate and initialize it */ - structPtr = ShmemAllocRaw(size, &allocated_size); + structPtr = ShmemAllocRaw(segment, size, &allocated_size); if (structPtr == NULL) { /* out of memory; remove the failed ShmemIndex entry */ @@ -463,18 +531,18 @@ ShmemInitStruct(const char *name, Size size, bool *foundPtr) result->size = size; result->allocated_size = allocated_size; result->location = structPtr; + result->segment_id = segment_id; } LWLockRelease(ShmemIndexLock); - Assert(ShmemAddrIsValid(structPtr)); + Assert(ShmemAddrIsValid(segment_id, structPtr)); Assert(structPtr == (void *) CACHELINEALIGN(structPtr)); return structPtr; } - /* * Add two Size values, checking for overflow */ @@ -509,13 +577,14 @@ mul_size(Size s1, Size s2) Datum pg_get_shmem_allocations(PG_FUNCTION_ARGS) { -#define PG_GET_SHMEM_SIZES_COLS 4 +#define PG_GET_SHMEM_SIZES_COLS 5 ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo; HASH_SEQ_STATUS hstat; ShmemIndexEnt *ent; - Size named_allocated = 0; + Size named_allocated[NUM_MEMORY_MAPPINGS] = {0}; Datum values[PG_GET_SHMEM_SIZES_COLS]; bool nulls[PG_GET_SHMEM_SIZES_COLS]; + int i; InitMaterializedSRF(fcinfo, 0); @@ -527,30 +596,49 @@ pg_get_shmem_allocations(PG_FUNCTION_ARGS) memset(nulls, 0, sizeof(nulls)); while ((ent = (ShmemIndexEnt *) hash_seq_search(&hstat)) != NULL) { + ShmemSegment *segment = &Segments[ent->segment_id]; + PGShmemHeader *shmhdr = segment->ShmemSegHdr; + values[0] = CStringGetTextDatum(ent->key); - values[1] = Int64GetDatum((char *) ent->location - (char *) ShmemSegHdr); - values[2] = Int64GetDatum(ent->size); - values[3] = Int64GetDatum(ent->allocated_size); - named_allocated += ent->allocated_size; + values[1] = CStringGetTextDatum(segment->ShmemSegmentName); + values[2] = Int64GetDatum((char *) ent->location - (char *) shmhdr); + values[3] = Int64GetDatum(ent->size); + values[4] = Int64GetDatum(ent->allocated_size); + named_allocated[ent->segment_id] += ent->allocated_size; tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc, values, nulls); } /* output shared memory allocated but not counted via the shmem index */ - values[0] = CStringGetTextDatum(""); - nulls[1] = true; - values[2] = Int64GetDatum(ShmemAllocator->free_offset - named_allocated); - values[3] = values[2]; - tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc, values, nulls); + for (i = 0; i < NUM_MEMORY_MAPPINGS; i++) + { + ShmemSegment *segment = &Segments[i]; + ShmemAllocatorData *ShmemAllocator = segment->ShmemAllocator; + + values[0] = CStringGetTextDatum(""); + values[1] = CStringGetTextDatum(segment->ShmemSegmentName); + nulls[2] = true; + values[3] = Int64GetDatum(ShmemAllocator->free_offset - named_allocated[i]); + values[4] = values[3]; + tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc, values, nulls); + } /* output as-of-yet unused shared memory */ - nulls[0] = true; - values[1] = Int64GetDatum(ShmemAllocator->free_offset); - nulls[1] = false; - values[2] = Int64GetDatum(ShmemSegHdr->totalsize - ShmemAllocator->free_offset); - values[3] = values[2]; - tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc, values, nulls); + memset(nulls, 0, sizeof(nulls)); + for (i = 0; i < NUM_MEMORY_MAPPINGS; i++) + { + ShmemSegment *segment = &Segments[i]; + PGShmemHeader *shmhdr = segment->ShmemSegHdr; + ShmemAllocatorData *ShmemAllocator = segment->ShmemAllocator; + + nulls[0] = true; + values[1] = CStringGetTextDatum(segment->ShmemSegmentName); + values[2] = Int64GetDatum(ShmemAllocator->free_offset); + values[3] = Int64GetDatum(shmhdr->totalsize - ShmemAllocator->free_offset); + values[4] = values[3]; + tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc, values, nulls); + } LWLockRelease(ShmemIndexLock); @@ -575,7 +663,7 @@ pg_get_shmem_allocations_numa(PG_FUNCTION_ARGS) Size os_page_size; void **page_ptrs; int *pages_status; - uint64 shm_total_page_count, + uint64 shm_total_page_count = 0, shm_ent_page_count, max_nodes; Size *nodes; @@ -610,7 +698,13 @@ pg_get_shmem_allocations_numa(PG_FUNCTION_ARGS) * this is not very likely, and moreover we have more entries, each of * them using only fraction of the total pages. */ - shm_total_page_count = (ShmemSegHdr->totalsize / os_page_size) + 1; + for (int segment = 0; segment < NUM_MEMORY_MAPPINGS; segment++) + { + PGShmemHeader *shmhdr = Segments[segment].ShmemSegHdr; + + shm_total_page_count += (shmhdr->totalsize / os_page_size) + 1; + } + page_ptrs = palloc0_array(void *, shm_total_page_count); pages_status = palloc_array(int, shm_total_page_count); @@ -751,7 +845,7 @@ pg_get_shmem_pagesize(void) Assert(huge_pages_status != HUGE_PAGES_UNKNOWN); if (huge_pages_status == HUGE_PAGES_ON) - GetHugePageSize(&os_page_size, NULL); + GetHugePageSize(&os_page_size, NULL, NULL); return os_page_size; } @@ -761,3 +855,45 @@ pg_numa_available(PG_FUNCTION_ARGS) { PG_RETURN_BOOL(pg_numa_init() != -1); } + +/* SQL SRF showing shared memory segments */ +Datum +pg_get_shmem_segments(PG_FUNCTION_ARGS) +{ +#define PG_GET_SHMEM_SEGS_COLS 5 + ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo; + Datum values[PG_GET_SHMEM_SEGS_COLS]; + bool nulls[PG_GET_SHMEM_SEGS_COLS]; + int i; + + InitMaterializedSRF(fcinfo, 0); + + /* output all allocated entries */ + for (i = 0; i < NUM_MEMORY_MAPPINGS; i++) + { + ShmemSegment *segment = &Segments[i]; + PGShmemHeader *shmhdr = segment->ShmemSegHdr; + ShmemAllocatorData *ShmemAllocator = segment->ShmemAllocator; + int j; + + if (shmhdr == NULL) + { + for (j = 0; j < PG_GET_SHMEM_SEGS_COLS; j++) + nulls[j] = true; + } + else + { + memset(nulls, 0, sizeof(nulls)); + values[0] = Int32GetDatum(i); + values[1] = CStringGetTextDatum(segment->ShmemSegmentName); + values[2] = Int64GetDatum(shmhdr->totalsize); + values[3] = Int64GetDatum(ShmemAllocator->free_offset); + values[4] = Int64GetDatum(shmhdr->reservedsize); + } + + tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc, + values, nulls); + } + + return (Datum) 0; +} diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c index 517c55375b4..160b6927c23 100644 --- a/src/backend/storage/lmgr/lwlock.c +++ b/src/backend/storage/lmgr/lwlock.c @@ -80,6 +80,8 @@ #include "pg_trace.h" #include "pgstat.h" #include "port/pg_bitutils.h" +#include "postmaster/postmaster.h" +#include "storage/pg_shmem.h" #include "storage/proc.h" #include "storage/proclist.h" #include "storage/procnumber.h" @@ -446,7 +448,7 @@ CreateLWLocks(void) char *ptr; /* Allocate space */ - ptr = (char *) ShmemAlloc(spaceLocks); + ptr = (char *) ShmemAlloc(MAIN_SHMEM_SEGMENT, spaceLocks); /* Initialize the dynamic-allocation counter for tranches */ LWLockCounter = (int *) ptr; @@ -612,6 +614,9 @@ LWLockNewTrancheId(const char *name) /* * We use the ShmemLock spinlock to protect LWLockCounter and * LWLockTrancheNames. + * + * XXX: Looks like this is the only use of Segments outside of shmem.c, + * it's maybe worth it to reshape this part to hide Segments structure. */ SpinLockAcquire(ShmemLock); diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c index fe75ead3501..9aab75f54d6 100644 --- a/src/backend/storage/lmgr/predicate.c +++ b/src/backend/storage/lmgr/predicate.c @@ -207,6 +207,7 @@ #include "miscadmin.h" #include "pgstat.h" #include "port/pg_lfind.h" +#include "storage/pg_shmem.h" #include "storage/predicate.h" #include "storage/predicate_internals.h" #include "storage/proc.h" @@ -595,7 +596,7 @@ CreatePredXact(void) static void ReleasePredXact(SERIALIZABLEXACT *sxact) { - Assert(ShmemAddrIsValid(sxact)); + Assert(ShmemAddrIsValid(MAIN_SHMEM_SEGMENT, sxact)); dlist_delete(&sxact->xactLink); dlist_push_tail(&PredXact->availableList, &sxact->xactLink); diff --git a/src/backend/utils/activity/pgstat_shmem.c b/src/backend/utils/activity/pgstat_shmem.c index 33fbdca9609..c6d9157f417 100644 --- a/src/backend/utils/activity/pgstat_shmem.c +++ b/src/backend/utils/activity/pgstat_shmem.c @@ -13,6 +13,7 @@ #include "postgres.h" #include "pgstat.h" +#include "storage/pg_shmem.h" #include "storage/shmem.h" #include "utils/memutils.h" #include "utils/pgstat_internal.h" @@ -233,7 +234,7 @@ StatsShmemInit(void) int idx = kind - PGSTAT_KIND_CUSTOM_MIN; Assert(kind_info->shared_size != 0); - ctl->custom_data[idx] = ShmemAlloc(kind_info->shared_size); + ctl->custom_data[idx] = ShmemAlloc(MAIN_SHMEM_SEGMENT, kind_info->shared_size); ptr = ctl->custom_data[idx]; } diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index 83f6501df38..4b27f2a245e 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -8592,8 +8592,8 @@ { oid => '5052', descr => 'allocations from the main shared memory segment', proname => 'pg_get_shmem_allocations', prorows => '50', proretset => 't', provolatile => 'v', prorettype => 'record', proargtypes => '', - proallargtypes => '{text,int8,int8,int8}', proargmodes => '{o,o,o,o}', - proargnames => '{name,off,size,allocated_size}', + proallargtypes => '{text,text,int8,int8,int8}', proargmodes => '{o,o,o,o,o}', + proargnames => '{name,segment,off,size,allocated_size}', prosrc => 'pg_get_shmem_allocations' }, { oid => '4099', descr => 'Is NUMA support available?', @@ -8616,6 +8616,14 @@ proargmodes => '{o,o,o}', proargnames => '{name,type,size}', prosrc => 'pg_get_dsm_registry_allocations' }, +# shared memory segments +{ oid => '5101', descr => 'shared memory segments', + proname => 'pg_get_shmem_segments', prorows => '6', proretset => 't', + provolatile => 'v', prorettype => 'record', proargtypes => '', + proallargtypes => '{int4,text,int8,int8,int8}', proargmodes => '{o,o,o,o,o}', + proargnames => '{id,name,size,freeoffset,reserved_size}', + prosrc => 'pg_get_shmem_segments' }, + # memory context of local backend { oid => '2282', descr => 'information about all memory contexts of local backend', diff --git a/src/include/storage/bufmgr.h b/src/include/storage/bufmgr.h index a40adf6b2a8..93348a34378 100644 --- a/src/include/storage/bufmgr.h +++ b/src/include/storage/bufmgr.h @@ -19,6 +19,7 @@ #include "storage/block.h" #include "storage/buf.h" #include "storage/bufpage.h" +#include "storage/pg_shmem.h" #include "storage/relfilelocator.h" #include "utils/relcache.h" #include "utils/snapmgr.h" @@ -367,7 +368,7 @@ extern void MarkDirtyAllUnpinnedBuffers(int32 *buffers_dirtied, /* in buf_init.c */ extern void BufferManagerShmemInit(void); -extern Size BufferManagerShmemSize(void); +extern Size BufferManagerShmemSize(MemoryMappingSizes *mapping_sizes); /* in localbuf.c */ extern void AtProcExit_LocalBuffers(void); diff --git a/src/include/storage/ipc.h b/src/include/storage/ipc.h index da32787ab51..f1d0802d048 100644 --- a/src/include/storage/ipc.h +++ b/src/include/storage/ipc.h @@ -18,6 +18,8 @@ #ifndef IPC_H #define IPC_H +#include "storage/pg_shmem.h" + typedef void (*pg_on_exit_callback) (int code, Datum arg); typedef void (*shmem_startup_hook_type) (void); @@ -77,7 +79,7 @@ extern void check_on_shmem_exit_lists_are_empty(void); /* ipci.c */ extern PGDLLIMPORT shmem_startup_hook_type shmem_startup_hook; -extern Size CalculateShmemSize(void); +extern Size CalculateShmemSize(MemoryMappingSizes *mapping_sizes); extern void CreateSharedMemoryAndSemaphores(void); #ifdef EXEC_BACKEND extern void AttachSharedMemoryStructs(void); diff --git a/src/include/storage/pg_shmem.h b/src/include/storage/pg_shmem.h index 10c7b065861..eafab1dae52 100644 --- a/src/include/storage/pg_shmem.h +++ b/src/include/storage/pg_shmem.h @@ -26,12 +26,20 @@ #include "storage/dsm_impl.h" + typedef struct PGShmemHeader /* standard header for all Postgres shmem */ { int32 magic; /* magic # to identify Postgres segments */ #define PGShmemMagic 679834894 pid_t creatorPID; /* PID of creating process (set but unread) */ + + /* + * TODO: We might have to rename these fields to allocSize (for amount of + * memory allocated currently in this segment), maxSize (for maximum size + * the segment can grow to.) + */ Size totalsize; /* total size of segment */ + Size reservedsize; /* Size of the reserved mapping */ Size content_offset; /* offset to the data, i.e. size of this * header */ dsm_handle dsm_control; /* ID of dynamic shared memory control seg */ @@ -41,6 +49,55 @@ typedef struct PGShmemHeader /* standard header for all Postgres shmem */ #endif } PGShmemHeader; +/* + * Information about the shared memory segment that is required to be passed + * from the Postmaster to each backend. + */ +typedef struct PGUsedShmemInfo +{ + void *UsedShmemSegAddr; /* SysV shared memory for the header */ +#ifndef WIN32 + unsigned long UsedShmemSegID; /* IPC key */ +#else + void *ShmemProtectiveRegion; /* Protective region for Windows + * shared memory */ + HANDLE UsedShmemSegID; +#endif +} PGUsedShmemInfo; + +/* + * To be able to dynamically resize the shared buffer pool, we allocate shared + * memory in two segments. Main segment which contains everything except the + * buffer blocks and BUFFERS_SHMEM_SEGMENT which contains the buffer blocks. + * Main segment is fixed sized whereas BUFFERS_SHMEM_SEGMENT can be resized + * during runtime. + * + * TODO: convert this to enum? + */ + +#define MAIN_SHMEM_SEGMENT 0 + +/* Buffer blocks */ +#define BUFFERS_SHMEM_SEGMENT 1 + +/* Number of available segments for anonymous memory mappings */ +#define NUM_MEMORY_MAPPINGS 2 + +/* + * Structure to hold required sizes of each shared memory segment as calculated + * by CalculateShmemSize(). + * + * TODO: Does ShmemMappingSizes sound better? + */ +typedef struct MemoryMappingSizes +{ + Size shmem_req_size; /* Required size of the segment */ + Size shmem_reserved; /* Required size of the reserved address + * space. */ +} MemoryMappingSizes; + +extern PGDLLIMPORT PGUsedShmemInfo UsedShmemInfo[NUM_MEMORY_MAPPINGS]; + /* GUC variables */ extern PGDLLIMPORT int shared_memory_type; extern PGDLLIMPORT int huge_pages; @@ -64,14 +121,6 @@ typedef enum SHMEM_TYPE_MMAP, } PGShmemType; -#ifndef WIN32 -extern PGDLLIMPORT unsigned long UsedShmemSegID; -#else -extern PGDLLIMPORT HANDLE UsedShmemSegID; -extern PGDLLIMPORT void *ShmemProtectiveRegion; -#endif -extern PGDLLIMPORT void *UsedShmemSegAddr; - #if !defined(WIN32) && !defined(EXEC_BACKEND) #define DEFAULT_SHARED_MEMORY_TYPE SHMEM_TYPE_MMAP #elif !defined(WIN32) @@ -85,10 +134,27 @@ extern void PGSharedMemoryReAttach(void); extern void PGSharedMemoryNoReAttach(void); #endif -extern PGShmemHeader *PGSharedMemoryCreate(Size size, +extern PGShmemHeader *PGSharedMemoryCreate(int segment_id, MemoryMappingSizes *mapping_sizes, PGShmemHeader **shim); extern bool PGSharedMemoryIsInUse(unsigned long id1, unsigned long id2); extern void PGSharedMemoryDetach(void); -extern void GetHugePageSize(Size *hugepagesize, int *mmap_flags); +extern void GetHugePageSize(Size *hugepagesize, int *mmap_flags, + int *memfd_flags); +extern void PrepareHugePages(void); + +static inline const char * +MappingName(int segment_id) +{ + switch (segment_id) + { + case MAIN_SHMEM_SEGMENT: + return "main"; + case BUFFERS_SHMEM_SEGMENT: + return "buffers"; + default: + return "unknown"; + } +} + #endif /* PG_SHMEM_H */ diff --git a/src/include/storage/shmem.h b/src/include/storage/shmem.h index 89d45287c17..5d58b3b39e6 100644 --- a/src/include/storage/shmem.h +++ b/src/include/storage/shmem.h @@ -29,14 +29,16 @@ extern PGDLLIMPORT slock_t *ShmemLock; typedef struct PGShmemHeader PGShmemHeader; /* avoid including * storage/pg_shmem.h here */ -extern void InitShmemAllocator(PGShmemHeader *seghdr); -extern void *ShmemAlloc(Size size); +extern void InitShmemAllocator(int segment_id, PGShmemHeader *seghdr); +extern void *ShmemAlloc(int segment_id, Size size); extern void *ShmemAllocNoError(Size size); -extern bool ShmemAddrIsValid(const void *addr); +extern bool ShmemAddrIsValid(int segment_id, const void *addr); extern void InitShmemIndex(void); extern HTAB *ShmemInitHash(const char *name, int64 init_size, int64 max_size, HASHCTL *infoP, int hash_flags); extern void *ShmemInitStruct(const char *name, Size size, bool *foundPtr); +extern void *ShmemInitStructInSegment(const char *name, Size size, + bool *foundPtr, int segment_id); extern Size add_size(Size s1, Size s2); extern Size mul_size(Size s1, Size s2); @@ -58,6 +60,7 @@ typedef struct void *location; /* location in shared mem */ Size size; /* # bytes requested for the structure */ Size allocated_size; /* # bytes actually allocated */ + int segment_id; /* segment in which the structure is allocated */ } ShmemIndexEnt; #endif /* SHMEM_H */ diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out index f4ee2bd7459..1e1bd1eb8b4 100644 --- a/src/test/regress/expected/rules.out +++ b/src/test/regress/expected/rules.out @@ -1768,14 +1768,21 @@ pg_shadow| SELECT pg_authid.rolname AS usename, LEFT JOIN pg_db_role_setting s ON (((pg_authid.oid = s.setrole) AND (s.setdatabase = (0)::oid)))) WHERE pg_authid.rolcanlogin; pg_shmem_allocations| SELECT name, + segment, off, size, allocated_size - FROM pg_get_shmem_allocations() pg_get_shmem_allocations(name, off, size, allocated_size); + FROM pg_get_shmem_allocations() pg_get_shmem_allocations(name, segment, off, size, allocated_size); pg_shmem_allocations_numa| SELECT name, numa_node, size FROM pg_get_shmem_allocations_numa() pg_get_shmem_allocations_numa(name, numa_node, size); +pg_shmem_segments| SELECT id, + name, + size, + freeoffset, + reserved_size + FROM pg_get_shmem_segments() pg_get_shmem_segments(id, name, size, freeoffset, reserved_size); pg_stat_activity| SELECT s.datid, d.datname, s.pid, diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list index 77518489412..e4dbe1b787b 100644 --- a/src/tools/pgindent/typedefs.list +++ b/src/tools/pgindent/typedefs.list @@ -120,6 +120,7 @@ AmcheckOptions AnalyzeAttrComputeStatsFunc AnalyzeAttrFetchFunc AnalyzeForeignTable_function +AnonShmemData AnlExprData AnlIndexData AnyArrayType @@ -1685,6 +1686,7 @@ MVNDistinct MVNDistinctItem ManyTestResource ManyTestResourceKind +MemoryMappingSizes Material MaterialPath MaterialState @@ -1887,6 +1889,7 @@ PGFInfoFunction PGFileType PGFunction PGIOAlignedBlock +PGUsedShmemInfo PGLZ_HistEntry PGLZ_Strategy PGLoadBalanceType @@ -2807,6 +2810,7 @@ ShippableCacheEntry ShmemAllocatorData ShippableCacheKey ShmemIndexEnt +ShmemSegment ShutdownForeignScan_function ShutdownInformation ShutdownMode -- 2.34.1