From a7f25c62ef900b2b115c575c2d8aa158ec825c69 Mon Sep 17 00:00:00 2001
From: Dmitrii Dolgov <9erthalion6@gmail.com>
Date: Fri, 28 Feb 2025 19:54:47 +0100
Subject: [PATCH 2/4] Memory and address space management for buffer resizing
This has three changes
1. Allow to use multiple shared memory mappings
============================================
Currently all the work with shared memory is done via a single anonymous
memory mapping, which limits ways how the shared memory could be organized.
Introduce possibility to allocate multiple shared memory mappings, where
a single mapping is associated with a specified shared memory segment.
A new shared memory API is introduced, extended with a segment as a new
parameter. As a path of least resistance, the original API is kept in
place, utilizing the main shared memory segment.
Modifies pg_shmem_allocations to report shared memory segment as well.
Adds pg_shmem_segments to report shared memory segment information.
2. Address space reservation for shared memory
============================================
Currently the shared memory layout is designed to pack everything tight
together, leaving no space between mappings for resizing. Here is how it
looks like for one mapping in /proc/$PID/maps, /dev/zero represents the
anonymous shared memory we talk about:
00400000-00490000 /path/bin/postgres
...
012d9000-0133e000 [heap]
7f443a800000-7f470a800000 /dev/zero (deleted)
7f470a800000-7f471831d000 /usr/lib/locale/locale-archive
7f4718400000-7f4718401000 /usr/lib64/libstdc++.so.6.0.34
...
Make the layout more dynamic via splitting every shared memory segment
into two parts:
* An anonymous file, which actually contains shared memory content.
Such an anonymous file is created via memfd_create, it lives in
memory, behaves like a regular file and semantically equivalent to an
anonymous memory allocated via mmap with MAP_ANONYMOUS.
* A reservation mapping, which size is much larger than required shared
segment size. This mapping is created with flag MAP_NORESERVE (to not
count the reserved space against memory limits). The anonymous file is
mapped into this reservation mapping.
If we have to change the address maps while resizing the shared buffer
pool, it is needed to be done in Postmaster too, so that the new
backends will inherit the resized address space from the Postmaster.
However, Postmaster is not invovled in ProcSignalBarrier mechanism and
we don't want it to spend time in things other than its core
functionality. To achive that, maximum required address space maps are
setup upfront with read and write access when starting the server. When
resizing the buffer pool only the backing file object is resized from
the coordinator. This also makes the ProcSignalBarrier handling code
light for backends other than the coordinator.
The resulting layout looks like this:
00400000-00490000 /path/bin/postgres
...
3f526000-3f590000 rw-p [heap]
7fbd827fe000-7fbd8bdde000 rw-s /memfd:main (deleted) -- anon file
7fbd8bdde000-7fbe82800000 ---s /memfd:main (deleted) -- reservation
7fbe82800000-7fbe90670000 r--p /usr/lib/locale/locale-archive
7fbe90800000-7fbe90941000 r-xp /usr/lib64/libstdc++.so.6.0.34
To resize a shared memory segment in this layout it's possible to use
ftruncate on the memory mapped file.
This approach also do not impact the actual memory usage as reported by
the kernel.
TODO: Verify that Cgroup v2 doesn't have any problems with that as well. To verify a new cgroup
was created with the memory limit 256 MB, then PostgreSQL was launched within
this cgroup with shared_buffers = 128 MB:
$ cd /sys/fs/cgroup
$ mkdir postgres
$ cd postres
$ echo 268435456 > memory.max
$ echo $MASTER_PID_SHELL > cgroup.procs
# postgres from the master branch has being successfully launched
# from that shell
$ cat memory.current
17465344 (~16.6 MB)
# stop postgres
$ echo $PATCH_PID_SHELL > cgroup.procs
# postgres from the patch has being successfully launched from that shell
$ cat memory.current
20770816 (~19.8 MB)
There are also few unrelated advantages of using memory mapped files:
* We've got a file descriptor, which could be used for regular file
operations (modification, truncation, you name it).
* The file could be given a name, which improves readability when it
comes to process maps.
* By default, Linux will not add file-backed shared mappings into a core dump,
making it more convenient to work with them in PostgreSQL: no more huge dumps
to process. - Some hackers have expressed concerns over it.
The downside is that memfd_create is Linux specific.
3. Refactor CalculateShmemSize()
=============================
This function calls many functions which return the amount of shared
memory required for different shared memory data structures. Up until
now, the returned total of these sizes was used to create a single
shared memory segment. With this change, CalculateShmemSize() needs to
estimate memory requirements for each of the segments. It now takes an
array of MemoryMappingSizes, containing as many elements as the number
of segments, as an argument. The sizes returned by all the function it
calls, except BufferManagerShmemSize(), are added and saved in the first
element (index 0) of the array. BufferManagerShmemSize() is modified to
save the amount of memory required for buffer manager related segments
in the corresponding array element. Additionally it also saves the
amount of reserved space. For now, the amount of reserved address space
is same as the amount of required memory but that is expected to change
with the next commit which implements buffer pool resize.
CalculateShmemSize() now returns the total of sizes corresponding to all
the sizes.
Author: Dmitrii Dolgov and Ashutosh Bapat
Reviewed-by: Tomas Vondra
---
doc/src/sgml/system-views.sgml | 9 +
src/backend/catalog/system_views.sql | 7 +
src/backend/port/sysv_shmem.c | 425 +++++++++++++++++++------
src/backend/port/win32_sema.c | 2 +-
src/backend/port/win32_shmem.c | 14 +-
src/backend/storage/buffer/buf_init.c | 56 ++--
src/backend/storage/buffer/buf_table.c | 6 +-
src/backend/storage/buffer/freelist.c | 5 +-
src/backend/storage/ipc/ipc.c | 4 +-
src/backend/storage/ipc/ipci.c | 99 ++++--
src/backend/storage/ipc/shmem.c | 243 ++++++++++----
src/backend/storage/lmgr/lwlock.c | 15 +-
src/include/catalog/pg_proc.dat | 12 +-
src/include/portability/mem.h | 2 +-
src/include/storage/bufmgr.h | 3 +-
src/include/storage/ipc.h | 4 +-
src/include/storage/pg_shmem.h | 60 +++-
src/include/storage/shmem.h | 12 +
src/test/regress/expected/rules.out | 10 +-
19 files changed, 755 insertions(+), 233 deletions(-)
diff --git a/doc/src/sgml/system-views.sgml b/doc/src/sgml/system-views.sgml
index 8f3e2741051..bc70a3ee6c9 100644
--- a/doc/src/sgml/system-views.sgml
+++ b/doc/src/sgml/system-views.sgml
@@ -4233,6 +4233,15 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
+
+
+ segment text
+
+
+ The name of the shared memory segment concerning the allocation.
+
+
+
off int8
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 059e8778ca7..59145066647 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -668,6 +668,13 @@ GRANT SELECT ON pg_shmem_allocations TO pg_read_all_stats;
REVOKE EXECUTE ON FUNCTION pg_get_shmem_allocations() FROM PUBLIC;
GRANT EXECUTE ON FUNCTION pg_get_shmem_allocations() TO pg_read_all_stats;
+CREATE VIEW pg_shmem_segments AS
+ SELECT * FROM pg_get_shmem_segments();
+
+REVOKE ALL ON pg_shmem_segments FROM PUBLIC;
+GRANT SELECT ON pg_shmem_segments TO pg_read_all_stats;
+REVOKE EXECUTE ON FUNCTION pg_get_shmem_segments() FROM PUBLIC;
+GRANT EXECUTE ON FUNCTION pg_get_shmem_segments() TO pg_read_all_stats;
CREATE VIEW pg_shmem_allocations_numa AS
SELECT * FROM pg_get_shmem_allocations_numa();
diff --git a/src/backend/port/sysv_shmem.c b/src/backend/port/sysv_shmem.c
index 197926d44f6..cc4b2c80e1a 100644
--- a/src/backend/port/sysv_shmem.c
+++ b/src/backend/port/sysv_shmem.c
@@ -90,12 +90,49 @@ typedef enum
SHMSTATE_UNATTACHED, /* pertinent to DataDir, no attached PIDs */
} IpcMemoryState;
-
+/*
+ * TODO: These should be moved into ShmemSegment, now that there can be multiple
+ * shared memory segments. But there's windows specific code which will need
+ * adjustment, so leaving it here.
+ */
unsigned long UsedShmemSegID = 0;
void *UsedShmemSegAddr = NULL;
-static Size AnonymousShmemSize;
-static void *AnonymousShmem = NULL;
+/*
+ * Anonymous mapping layout we use looks like this:
+ *
+ * 00400000-00c2a000 r-xp /bin/postgres
+ * ...
+ * 3f526000-3f590000 rw-p [heap]
+ * 7fbd827fe000-7fbd8bdde000 rw-s /memfd:main (deleted)
+ * 7fbd8bdde000-7fbe82800000 ---s /memfd:main (deleted)
+ * 7fbe82800000-7fbe90670000 r--p /usr/lib/locale/locale-archive
+ * 7fbe90800000-7fbe90941000 r-xp /usr/lib64/libstdc++.so.6.0.34
+ * ...
+ *
+ * We need to place shared memory mappings in such a way, that there will be
+ * gaps between them in the address space. Those gaps have to be large enough
+ * to resize the mapping up to certain size, without counting towards the total
+ * memory consumption.
+ *
+ * To achieve this, for each shared memory segment we first create an anonymous
+ * file of specified size using memfd_create, which will accomodate actual
+ * shared memory mapping content. It is represented by the first /memfd:main
+ * with rw permissions. Then we create a mapping for this file using mmap, with
+ * size much larger than required and flags PROT_NONE (allows to make sure the
+ * reserved space will not be used) and MAP_NORESERVE (prevents the space from
+ * being counted against memory limits). The mapping serves as an address space
+ * reservation, into which shared memory segment can be extended and is
+ * represented by the second /memfd:main with no permissions.
+ */
+
+/*
+ * Flag telling that we have decided to use huge pages.
+ *
+ * XXX: It's possible to use GetConfigOption("huge_pages_status", false, false)
+ * instead, but it feels like an overkill.
+ */
+static bool huge_pages_on = false;
static void *InternalIpcMemoryCreate(IpcMemoryKey memKey, Size size);
static void IpcMemoryDetach(int status, Datum shmaddr);
@@ -104,6 +141,27 @@ static IpcMemoryState PGSharedMemoryAttach(IpcMemoryId shmId,
void *attachAt,
PGShmemHeader **addr);
+const char*
+MappingName(int shmem_segment)
+{
+ switch (shmem_segment)
+ {
+ case MAIN_SHMEM_SEGMENT:
+ return "main";
+ case BUFFERS_SHMEM_SEGMENT:
+ return "buffers";
+ case BUFFER_DESCRIPTORS_SHMEM_SEGMENT:
+ return "descriptors";
+ case BUFFER_IOCV_SHMEM_SEGMENT:
+ return "iocv";
+ case CHECKPOINT_BUFFERS_SHMEM_SEGMENT:
+ return "checkpoint";
+ case STRATEGY_SHMEM_SEGMENT:
+ return "strategy";
+ default:
+ return "unknown";
+ }
+}
/*
* InternalIpcMemoryCreate(memKey, size)
@@ -470,19 +528,20 @@ PGSharedMemoryAttach(IpcMemoryId shmId,
* hugepage sizes, we might want to think about more invasive strategies,
* such as increasing shared_buffers to absorb the extra space.
*
- * Returns the (real, assumed or config provided) page size into
- * *hugepagesize, and the hugepage-related mmap flags to use into
- * *mmap_flags if requested by the caller. If huge pages are not supported,
- * *hugepagesize and *mmap_flags are set to 0.
+ * Returns the (real, assumed or config provided) page size into *hugepagesize,
+ * the hugepage-related mmap and memfd flags to use into *mmap_flags and
+ * *memfd_flags if requested by the caller. If huge pages are not supported,
+ * *hugepagesize, *mmap_flags and *memfd_flags are set to 0.
*/
void
-GetHugePageSize(Size *hugepagesize, int *mmap_flags)
+GetHugePageSize(Size *hugepagesize, int *mmap_flags, int *memfd_flags)
{
#ifdef MAP_HUGETLB
Size default_hugepagesize = 0;
Size hugepagesize_local = 0;
int mmap_flags_local = 0;
+ int memfd_flags_local = 0;
/*
* System-dependent code to find out the default huge page size.
@@ -541,6 +600,7 @@ GetHugePageSize(Size *hugepagesize, int *mmap_flags)
}
mmap_flags_local = MAP_HUGETLB;
+ memfd_flags_local = MFD_HUGETLB;
/*
* On recent enough Linux, also include the explicit page size, if
@@ -551,7 +611,16 @@ GetHugePageSize(Size *hugepagesize, int *mmap_flags)
{
int shift = pg_ceil_log2_64(hugepagesize_local);
- mmap_flags_local |= (shift & MAP_HUGE_MASK) << MAP_HUGE_SHIFT;
+ memfd_flags_local |= (shift & MAP_HUGE_MASK) << MAP_HUGE_SHIFT;
+ }
+#endif
+
+#if defined(MFD_HUGE_MASK) && defined(MFD_HUGE_SHIFT)
+ if (hugepagesize_local != default_hugepagesize)
+ {
+ int shift = pg_ceil_log2_64(hugepagesize_local);
+
+ memfd_flags_local |= (shift & MAP_HUGE_MASK) << MAP_HUGE_SHIFT;
}
#endif
@@ -560,6 +629,8 @@ GetHugePageSize(Size *hugepagesize, int *mmap_flags)
*mmap_flags = mmap_flags_local;
if (hugepagesize)
*hugepagesize = hugepagesize_local;
+ if (memfd_flags)
+ *memfd_flags = memfd_flags_local;
#else
@@ -567,6 +638,8 @@ GetHugePageSize(Size *hugepagesize, int *mmap_flags)
*hugepagesize = 0;
if (mmap_flags)
*mmap_flags = 0;
+ if (memfd_flags)
+ *memfd_flags = 0;
#endif /* MAP_HUGETLB */
}
@@ -588,83 +661,242 @@ check_huge_page_size(int *newval, void **extra, GucSource source)
return true;
}
+/*
+ * Wrapper around posix_fallocate() to allocate memory for a given shared memory
+ * segment.
+ *
+ * Performs retry on EINTR, and raises error upon failure.
+ */
+static void
+shmem_fallocate(int fd, const char *mapping_name, Size size, int elevel)
+{
+#if defined(HAVE_POSIX_FALLOCATE) && defined(__linux__)
+ int ret;
+
+
+ /*
+ * If there is not enough memory, trying to access a hole in address space
+ * will cause SIGBUS. If supported, avoid that by allocating memory upfront.
+ *
+ * We still use a traditional EINTR retry loop to handle SIGCONT.
+ * posix_fallocate() doesn't restart automatically, and we don't want this to
+ * fail if you attach a debugger.
+ */
+ do
+ {
+ ret = posix_fallocate(fd, 0, size);
+ } while (ret == EINTR);
+
+ if (ret != 0)
+ {
+ ereport(elevel,
+ (errmsg("segment[%s]: could not allocate space for anonymous file: %s",
+ mapping_name, strerror(ret)),
+ (ret == ENOMEM) ?
+ errhint("This error usually means that PostgreSQL's request "
+ "for a shared memory segment exceeded available memory, "
+ "swap space, or huge pages. To reduce the request size "
+ "(currently %zu bytes), reduce PostgreSQL's shared "
+ "memory usage, perhaps by reducing \"shared_buffers\" or "
+ "\"max_connections\".",
+ size) : 0));
+ }
+#endif /* HAVE_POSIX_FALLOCATE && __linux__ */
+}
+
+/*
+ * Round up the required amount of memory and the amount of required reserved
+ * address space to the nearest huge page size.
+ */
+static inline void
+round_off_mapping_sizes_for_hugepages(MemoryMappingSizes *mapping, int hugepagesize)
+{
+ if (hugepagesize == 0)
+ return;
+
+ if (mapping->shmem_req_size % hugepagesize != 0)
+ mapping->shmem_req_size += hugepagesize -
+ (mapping->shmem_req_size % hugepagesize);
+
+ if (mapping->shmem_reserved % hugepagesize != 0)
+ mapping->shmem_reserved = mapping->shmem_reserved + hugepagesize -
+ (mapping->shmem_reserved % hugepagesize);
+}
+
/*
* Creates an anonymous mmap()ed shared memory segment.
*
- * Pass the requested size in *size. This function will modify *size to the
- * actual size of the allocation, if it ends up allocating a segment that is
- * larger than requested.
+ * This function will modify mapping size to the actual size of the allocation,
+ * if it ends up allocating a segment that is larger than requested. If needed,
+ * it also rounds up the mapping reserved size to be a multiple of huge page
+ * size.
+ *
+ * Note that we do not fallback from huge pages to regular pages in this
+ * function, this decision was already made in ReserveAnonymousMemory and we
+ * stick to it.
+ *
+ * TODO: Update the prologue to be consistent with the code.
*/
-static void *
-CreateAnonymousSegment(Size *size)
+static void
+CreateAnonymousSegment(MemoryMappingSizes *mapping, int segment_id)
{
- Size allocsize = *size;
void *ptr = MAP_FAILED;
- int mmap_errno = 0;
+ int save_errno = 0;
+ int mmap_flags = PG_MMAP_FLAGS, memfd_flags = 0;
+ ShmemSegment *segment = &Segments[segment_id];
#ifndef MAP_HUGETLB
- /* PGSharedMemoryCreate should have dealt with this case */
- Assert(huge_pages != HUGE_PAGES_ON);
+ /* PrepareHugePages should have dealt with this case */
+ Assert(huge_pages != HUGE_PAGES_ON && !huge_pages_on);
#else
- if (huge_pages == HUGE_PAGES_ON || huge_pages == HUGE_PAGES_TRY)
+ if (huge_pages_on)
{
- /*
- * Round up the request size to a suitable large value.
- */
Size hugepagesize;
- int mmap_flags;
- GetHugePageSize(&hugepagesize, &mmap_flags);
+ /* Make sure nothing is messed up */
+ Assert(huge_pages == HUGE_PAGES_ON || huge_pages == HUGE_PAGES_TRY);
- if (allocsize % hugepagesize != 0)
- allocsize += hugepagesize - (allocsize % hugepagesize);
+ /* Round up the request size to a suitable large value */
+ GetHugePageSize(&hugepagesize, &mmap_flags, &memfd_flags);
+ round_off_mapping_sizes_for_hugepages(mapping, hugepagesize);
- ptr = mmap(NULL, allocsize, PROT_READ | PROT_WRITE,
- PG_MMAP_FLAGS | mmap_flags, -1, 0);
- mmap_errno = errno;
- if (huge_pages == HUGE_PAGES_TRY && ptr == MAP_FAILED)
- elog(DEBUG1, "mmap(%zu) with MAP_HUGETLB failed, huge pages disabled: %m",
- allocsize);
+ /* Verify that the new size is withing the reserved boundaries */
+ Assert(mapping->shmem_reserved >= mapping->shmem_req_size);
+
+ mmap_flags = PG_MMAP_FLAGS | mmap_flags;
}
#endif
/*
- * Report whether huge pages are in use. This needs to be tracked before
- * the second mmap() call if attempting to use huge pages failed
- * previously.
+ * Prepare an anonymous file backing the segment. Its size will be
+ * specified later via ftruncate.
+ *
+ * The file behaves like a regular file, but lives in memory. Once all
+ * references to the file are dropped, it is automatically released.
+ * Anonymous memory is used for all backing pages of the file, thus it has
+ * the same semantics as anonymous memory allocations using mmap with the
+ * MAP_ANONYMOUS flag.
*/
- SetConfigOption("huge_pages_status", (ptr == MAP_FAILED) ? "off" : "on",
- PGC_INTERNAL, PGC_S_DYNAMIC_DEFAULT);
+ segment->segment_fd = memfd_create(MappingName(segment_id), memfd_flags);
+ if (segment->segment_fd == -1)
+ ereport(FATAL,
+ (errmsg("segment[%s]: could not create anonymous shared memory file: %m",
+ MappingName(segment_id))));
- if (ptr == MAP_FAILED && huge_pages != HUGE_PAGES_ON)
- {
- /*
- * Use the original size, not the rounded-up value, when falling back
- * to non-huge pages.
- */
- allocsize = *size;
- ptr = mmap(NULL, allocsize, PROT_READ | PROT_WRITE,
- PG_MMAP_FLAGS, -1, 0);
- mmap_errno = errno;
- }
+ elog(DEBUG1, "segment[%s]: mmap(%zu)", MappingName(segment_id), mapping->shmem_req_size);
+ /*
+ * Reserve maximum required address space for future expansion of this
+ * memory segment. MAP_NORESERVE ensures that no memory is allocated. The
+ * whole address space will be setup for read/write access, so that memory
+ * allocated to this address space can be read or written to even if it is
+ * resized.
+ */
+ ptr = mmap(NULL, mapping->shmem_reserved, PROT_READ | PROT_WRITE,
+ mmap_flags | MAP_NORESERVE, segment->segment_fd, 0);
if (ptr == MAP_FAILED)
+ ereport(FATAL,
+ (errmsg("segment[%s]: could not map anonymous shared memory: %m",
+ MappingName(segment_id))));
+
+ /*
+ * Resize the backing file to the required size. On platforms where it is
+ * supported, we also allocate the required memory upfront. On other
+ * platform the memory upto the size of file will be allocated on demand.
+ */
+ if(ftruncate(segment->segment_fd, mapping->shmem_req_size) == -1)
{
- errno = mmap_errno;
+ save_errno = errno;
+
+ close(segment->segment_fd);
+
+ errno = save_errno;
ereport(FATAL,
- (errmsg("could not map anonymous shared memory: %m"),
- (mmap_errno == ENOMEM) ?
+ (errmsg("segment[%s]: could not truncate anonymous file to size %zu: %m",
+ MappingName(segment_id), mapping->shmem_req_size),
+ (save_errno == ENOMEM) ?
errhint("This error usually means that PostgreSQL's request "
"for a shared memory segment exceeded available memory, "
"swap space, or huge pages. To reduce the request size "
"(currently %zu bytes), reduce PostgreSQL's shared "
"memory usage, perhaps by reducing \"shared_buffers\" or "
"\"max_connections\".",
- allocsize) : 0));
+ mapping->shmem_req_size) : 0));
}
+ shmem_fallocate(segment->segment_fd, MappingName(segment_id), mapping->shmem_req_size, FATAL);
- *size = allocsize;
- return ptr;
+ segment->shmem = ptr;
+ segment->shmem_size = mapping->shmem_req_size;
+ segment->shmem_reserved = mapping->shmem_reserved;
+}
+
+/*
+ * PrepareHugePages
+ *
+ * Figure out if there are enough huge pages to allocate all shared memory
+ * segments, and report that information via huge_pages_status and
+ * huge_pages_on. It needs to be called before creating shared memory segments.
+ *
+ * It is necessary to maintain the same semantic (simple on/off) for
+ * huge_pages_status, even if there are multiple shared memory segments: all
+ * segments either use huge pages or not, there is no mix of segments with
+ * different page size. The latter might be actually beneficial, in particular
+ * because only some segments may require large amount of memory, but for now
+ * we go with a simple solution.
+ */
+void
+PrepareHugePages()
+{
+ void *ptr = MAP_FAILED;
+ MemoryMappingSizes mapping_sizes[NUM_MEMORY_MAPPINGS];
+
+ CalculateShmemSize(mapping_sizes);
+
+ /* Complain if hugepages demanded but we can't possibly support them */
+#if !defined(MAP_HUGETLB)
+ if (huge_pages == HUGE_PAGES_ON)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("huge pages not supported on this platform")));
+#else
+ if (huge_pages == HUGE_PAGES_ON || huge_pages == HUGE_PAGES_TRY)
+ {
+ Size hugepagesize, total_size = 0;
+ int mmap_flags;
+
+ GetHugePageSize(&hugepagesize, &mmap_flags, NULL);
+
+ /*
+ * Figure out how much memory is needed for all segments, keeping in
+ * mind that for every segment this value will be rounding up by the
+ * huge page size. The resulting value will be used to probe memory and
+ * decide whether we will allocate huge pages or not.
+ */
+ for(int segment = 0; segment < NUM_MEMORY_MAPPINGS; segment++)
+ {
+ Size segment_size = mapping_sizes[segment].shmem_req_size;
+
+ if (segment_size % hugepagesize != 0)
+ segment_size += hugepagesize - (segment_size % hugepagesize);
+
+ total_size += segment_size;
+ }
+
+ /* Map total amount of memory to test its availability. */
+ elog(DEBUG1, "reserving space: probe mmap(%zu) with MAP_HUGETLB",
+ total_size);
+ ptr = mmap(NULL, total_size, PROT_NONE,
+ PG_MMAP_FLAGS | MAP_ANONYMOUS | mmap_flags, -1, 0);
+ }
+#endif
+
+ /*
+ * Report whether huge pages are in use. This needs to be tracked before
+ * creating shared memory segments.
+ */
+ SetConfigOption("huge_pages_status", (ptr == MAP_FAILED) ? "off" : "on",
+ PGC_INTERNAL, PGC_S_DYNAMIC_DEFAULT);
+ huge_pages_on = ptr != MAP_FAILED;
}
/*
@@ -674,20 +906,25 @@ CreateAnonymousSegment(Size *size)
static void
AnonymousShmemDetach(int status, Datum arg)
{
- /* Release anonymous shared memory block, if any. */
- if (AnonymousShmem != NULL)
+ for(int i = 0; i < NUM_MEMORY_MAPPINGS; i++)
{
- if (munmap(AnonymousShmem, AnonymousShmemSize) < 0)
- elog(LOG, "munmap(%p, %zu) failed: %m",
- AnonymousShmem, AnonymousShmemSize);
- AnonymousShmem = NULL;
+ ShmemSegment *segment = &Segments[i];
+
+ /* Release anonymous shared memory block, if any. */
+ if (segment->shmem != NULL)
+ {
+ if (munmap(segment->shmem, segment->shmem_size) < 0)
+ elog(LOG, "munmap(%p, %zu) failed: %m",
+ segment->shmem, segment->shmem_size);
+ segment->shmem = NULL;
+ }
}
}
/*
* PGSharedMemoryCreate
*
- * Create a shared memory segment of the given size and initialize its
+ * Create a shared memory segment for the given mapping and initialize its
* standard header. Also, register an on_shmem_exit callback to release
* the storage.
*
@@ -697,7 +934,7 @@ AnonymousShmemDetach(int status, Datum arg)
* postmaster or backend.
*/
PGShmemHeader *
-PGSharedMemoryCreate(Size size,
+PGSharedMemoryCreate(MemoryMappingSizes *mapping, int segment_id,
PGShmemHeader **shim)
{
IpcMemoryKey NextShmemSegID;
@@ -705,6 +942,7 @@ PGSharedMemoryCreate(Size size,
PGShmemHeader *hdr;
struct stat statbuf;
Size sysvsize;
+ ShmemSegment *segment = &Segments[segment_id];
/*
* We use the data directory's ID info (inode and device numbers) to
@@ -717,14 +955,6 @@ PGSharedMemoryCreate(Size size,
errmsg("could not stat data directory \"%s\": %m",
DataDir)));
- /* Complain if hugepages demanded but we can't possibly support them */
-#if !defined(MAP_HUGETLB)
- if (huge_pages == HUGE_PAGES_ON)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("huge pages not supported on this platform")));
-#endif
-
/* For now, we don't support huge pages in SysV memory */
if (huge_pages == HUGE_PAGES_ON && shared_memory_type != SHMEM_TYPE_MMAP)
ereport(ERROR,
@@ -732,12 +962,12 @@ PGSharedMemoryCreate(Size size,
errmsg("huge pages not supported with the current \"shared_memory_type\" setting")));
/* Room for a header? */
- Assert(size > MAXALIGN(sizeof(PGShmemHeader)));
+ Assert(mapping->shmem_req_size > MAXALIGN(sizeof(PGShmemHeader)));
if (shared_memory_type == SHMEM_TYPE_MMAP)
{
- AnonymousShmem = CreateAnonymousSegment(&size);
- AnonymousShmemSize = size;
+ /* On success, mapping data will be modified. */
+ CreateAnonymousSegment(mapping, segment_id);
/* Register on-exit routine to unmap the anonymous segment */
on_shmem_exit(AnonymousShmemDetach, (Datum) 0);
@@ -747,7 +977,7 @@ PGSharedMemoryCreate(Size size,
}
else
{
- sysvsize = size;
+ sysvsize = mapping->shmem_req_size;
/* huge pages are only available with mmap */
SetConfigOption("huge_pages_status", "off",
@@ -760,7 +990,7 @@ PGSharedMemoryCreate(Size size,
* loop simultaneously. (CreateDataDirLockFile() does not entirely ensure
* that, but prefer fixing it over coping here.)
*/
- NextShmemSegID = statbuf.st_ino;
+ NextShmemSegID = statbuf.st_ino + segment_id;
for (;;)
{
@@ -852,13 +1082,13 @@ PGSharedMemoryCreate(Size size,
/*
* Initialize space allocation status for segment.
*/
- hdr->totalsize = size;
+ hdr->totalsize = mapping->shmem_req_size;
hdr->freeoffset = MAXALIGN(sizeof(PGShmemHeader));
*shim = hdr;
/* Save info for possible future use */
- UsedShmemSegAddr = memAddress;
- UsedShmemSegID = (unsigned long) NextShmemSegID;
+ segment->seg_addr = memAddress;
+ segment->seg_id = (unsigned long) NextShmemSegID;
/*
* If AnonymousShmem is NULL here, then we're not using anonymous shared
@@ -866,10 +1096,10 @@ PGSharedMemoryCreate(Size size,
* block. Otherwise, the System V shared memory block is only a shim, and
* we must return a pointer to the real block.
*/
- if (AnonymousShmem == NULL)
+ if (segment->shmem == NULL)
return hdr;
- memcpy(AnonymousShmem, hdr, sizeof(PGShmemHeader));
- return (PGShmemHeader *) AnonymousShmem;
+ memcpy(segment->shmem, hdr, sizeof(PGShmemHeader));
+ return (PGShmemHeader *) segment->shmem;
}
#ifdef EXEC_BACKEND
@@ -969,23 +1199,28 @@ PGSharedMemoryNoReAttach(void)
void
PGSharedMemoryDetach(void)
{
- if (UsedShmemSegAddr != NULL)
+ for(int i = 0; i < NUM_MEMORY_MAPPINGS; i++)
{
- if ((shmdt(UsedShmemSegAddr) < 0)
+ ShmemSegment *segment = &Segments[i];
+
+ if (segment->seg_addr != NULL)
+ {
+ if ((shmdt(segment->seg_addr) < 0)
#if defined(EXEC_BACKEND) && defined(__CYGWIN__)
- /* Work-around for cygipc exec bug */
- && shmdt(NULL) < 0
+ /* Work-around for cygipc exec bug */
+ && shmdt(NULL) < 0
#endif
- )
- elog(LOG, "shmdt(%p) failed: %m", UsedShmemSegAddr);
- UsedShmemSegAddr = NULL;
- }
+ )
+ elog(LOG, "shmdt(%p) failed: %m", segment->seg_addr);
+ segment->seg_addr = NULL;
+ }
- if (AnonymousShmem != NULL)
- {
- if (munmap(AnonymousShmem, AnonymousShmemSize) < 0)
- elog(LOG, "munmap(%p, %zu) failed: %m",
- AnonymousShmem, AnonymousShmemSize);
- AnonymousShmem = NULL;
+ if (segment->shmem != NULL)
+ {
+ if (munmap(segment->shmem, segment->shmem_size) < 0)
+ elog(LOG, "munmap(%p, %zu) failed: %m",
+ segment->shmem, segment->shmem_size);
+ segment->shmem = NULL;
+ }
}
}
diff --git a/src/backend/port/win32_sema.c b/src/backend/port/win32_sema.c
index 5854ad1f54d..e7365ff8060 100644
--- a/src/backend/port/win32_sema.c
+++ b/src/backend/port/win32_sema.c
@@ -44,7 +44,7 @@ PGSemaphoreShmemSize(int maxSemas)
* process exits.
*/
void
-PGReserveSemaphores(int maxSemas)
+PGReserveSemaphores(int maxSemas, int shmem_segment)
{
mySemSet = (HANDLE *) malloc(maxSemas * sizeof(HANDLE));
if (mySemSet == NULL)
diff --git a/src/backend/port/win32_shmem.c b/src/backend/port/win32_shmem.c
index 4dee856d6bd..5c0c32babaf 100644
--- a/src/backend/port/win32_shmem.c
+++ b/src/backend/port/win32_shmem.c
@@ -204,7 +204,7 @@ EnableLockPagesPrivilege(int elevel)
* standard header.
*/
PGShmemHeader *
-PGSharedMemoryCreate(Size size,
+PGSharedMemoryCreate(MemoryMappingSizes *mapping_sizes, int segment_id,
PGShmemHeader **shim)
{
void *memAddress;
@@ -216,9 +216,10 @@ PGSharedMemoryCreate(Size size,
DWORD size_high;
DWORD size_low;
SIZE_T largePageSize = 0;
- Size orig_size = size;
+ Size size = mapping_sizes->shmem_req_size;
DWORD flProtect = PAGE_READWRITE;
DWORD desiredAccess;
+ ShmemSegment *segment = &Segments[segment_id]
ShmemProtectiveRegion = VirtualAlloc(NULL, PROTECTIVE_REGION_SIZE,
MEM_RESERVE, PAGE_NOACCESS);
@@ -304,7 +305,7 @@ retry:
* Use the original size, not the rounded-up value, when
* falling back to non-huge pages.
*/
- size = orig_size;
+ size = mapping_sizes->shmem_req_size;
flProtect = PAGE_READWRITE;
goto retry;
}
@@ -393,6 +394,11 @@ retry:
hdr->dsm_control = 0;
/* Save info for possible future use */
+ segment->shmem_size = size;
+ segment->seg_addr = memAddress;
+ segment->shmem = (Pointer) hdr;
+ segment->seg_id = (unsigned long) hmap2;
+
UsedShmemSegAddr = memAddress;
UsedShmemSegSize = size;
UsedShmemSegID = hmap2;
@@ -627,7 +633,7 @@ pgwin32_ReserveSharedMemoryRegion(HANDLE hChild)
* use GetLargePageMinimum() instead.
*/
void
-GetHugePageSize(Size *hugepagesize, int *mmap_flags)
+GetHugePageSize(Size *hugepagesize, int *mmap_flags, int *memfd_flags)
{
if (hugepagesize)
*hugepagesize = 0;
diff --git a/src/backend/storage/buffer/buf_init.c b/src/backend/storage/buffer/buf_init.c
index 6fd3a6bbac5..4fa547f48de 100644
--- a/src/backend/storage/buffer/buf_init.c
+++ b/src/backend/storage/buffer/buf_init.c
@@ -17,6 +17,7 @@
#include "storage/aio.h"
#include "storage/buf_internals.h"
#include "storage/bufmgr.h"
+#include "storage/pg_shmem.h"
BufferDescPadded *BufferDescriptors;
char *BufferBlocks;
@@ -62,7 +63,10 @@ CkptSortItem *CkptBufferIds;
* Initialize shared buffer pool
*
* This is called once during shared-memory initialization (either in the
- * postmaster, or in a standalone backend).
+ * postmaster, or in a standalone backend). Size of data structures initialized
+ * here depends on NBuffers, and to be able to change NBuffers without a
+ * restart we store each structure into a separate shared memory segment, which
+ * could be resized on demand.
*/
void
BufferManagerShmemInit(void)
@@ -74,22 +78,22 @@ BufferManagerShmemInit(void)
/* Align descriptors to a cacheline boundary. */
BufferDescriptors = (BufferDescPadded *)
- ShmemInitStruct("Buffer Descriptors",
+ ShmemInitStructInSegment("Buffer Descriptors",
NBuffers * sizeof(BufferDescPadded),
- &foundDescs);
+ &foundDescs, BUFFER_DESCRIPTORS_SHMEM_SEGMENT);
/* Align buffer pool on IO page size boundary. */
BufferBlocks = (char *)
TYPEALIGN(PG_IO_ALIGN_SIZE,
- ShmemInitStruct("Buffer Blocks",
+ ShmemInitStructInSegment("Buffer Blocks",
NBuffers * (Size) BLCKSZ + PG_IO_ALIGN_SIZE,
- &foundBufs));
+ &foundBufs, BUFFERS_SHMEM_SEGMENT));
/* Align condition variables to cacheline boundary. */
BufferIOCVArray = (ConditionVariableMinimallyPadded *)
- ShmemInitStruct("Buffer IO Condition Variables",
+ ShmemInitStructInSegment("Buffer IO Condition Variables",
NBuffers * sizeof(ConditionVariableMinimallyPadded),
- &foundIOCV);
+ &foundIOCV, BUFFER_IOCV_SHMEM_SEGMENT);
/*
* The array used to sort to-be-checkpointed buffer ids is located in
@@ -99,8 +103,9 @@ BufferManagerShmemInit(void)
* painful.
*/
CkptBufferIds = (CkptSortItem *)
- ShmemInitStruct("Checkpoint BufferIds",
- NBuffers * sizeof(CkptSortItem), &foundBufCkpt);
+ ShmemInitStructInSegment("Checkpoint BufferIds",
+ NBuffers * sizeof(CkptSortItem), &foundBufCkpt,
+ CHECKPOINT_BUFFERS_SHMEM_SEGMENT);
if (foundDescs || foundBufs || foundIOCV || foundBufCkpt)
{
@@ -147,33 +152,42 @@ BufferManagerShmemInit(void)
* BufferManagerShmemSize
*
* compute the size of shared memory for the buffer pool including
- * data pages, buffer descriptors, hash tables, etc.
+ * data pages, buffer descriptors, hash tables, etc. based on the
+ * shared memory segment. The main segment must not allocate anything
+ * related to buffers, every other segment will receive part of the
+ * data.
*/
Size
-BufferManagerShmemSize(void)
+BufferManagerShmemSize(MemoryMappingSizes *mapping_sizes)
{
- Size size = 0;
+ size_t size;
- /* size of buffer descriptors */
- size = add_size(size, mul_size(NBuffers, sizeof(BufferDescPadded)));
- /* to allow aligning buffer descriptors */
+ /* size of buffer descriptors, plus alignment padding */
+ size = add_size(0, mul_size(NBuffers, sizeof(BufferDescPadded)));
size = add_size(size, PG_CACHE_LINE_SIZE);
+ mapping_sizes[BUFFER_DESCRIPTORS_SHMEM_SEGMENT].shmem_req_size = size;
+ mapping_sizes[BUFFER_DESCRIPTORS_SHMEM_SEGMENT].shmem_reserved = size;
/* size of data pages, plus alignment padding */
- size = add_size(size, PG_IO_ALIGN_SIZE);
+ size = add_size(0, PG_IO_ALIGN_SIZE);
size = add_size(size, mul_size(NBuffers, BLCKSZ));
+ mapping_sizes[BUFFERS_SHMEM_SEGMENT].shmem_req_size = size;
+ mapping_sizes[BUFFERS_SHMEM_SEGMENT].shmem_reserved = size;
/* size of stuff controlled by freelist.c */
- size = add_size(size, StrategyShmemSize());
+ mapping_sizes[STRATEGY_SHMEM_SEGMENT].shmem_req_size = StrategyShmemSize();
+ mapping_sizes[STRATEGY_SHMEM_SEGMENT].shmem_reserved = StrategyShmemSize();
- /* size of I/O condition variables */
- size = add_size(size, mul_size(NBuffers,
+ /* size of I/O condition variables, plus alignment padding */
+ size = add_size(0, mul_size(NBuffers,
sizeof(ConditionVariableMinimallyPadded)));
- /* to allow aligning the above */
size = add_size(size, PG_CACHE_LINE_SIZE);
+ mapping_sizes[BUFFER_IOCV_SHMEM_SEGMENT].shmem_req_size = size;
+ mapping_sizes[BUFFER_IOCV_SHMEM_SEGMENT].shmem_reserved = size;
/* size of checkpoint sort array in bufmgr.c */
- size = add_size(size, mul_size(NBuffers, sizeof(CkptSortItem)));
+ mapping_sizes[CHECKPOINT_BUFFERS_SHMEM_SEGMENT].shmem_req_size = mul_size(NBuffers, sizeof(CkptSortItem));
+ mapping_sizes[CHECKPOINT_BUFFERS_SHMEM_SEGMENT].shmem_reserved = mul_size(NBuffers, sizeof(CkptSortItem));
return size;
}
diff --git a/src/backend/storage/buffer/buf_table.c b/src/backend/storage/buffer/buf_table.c
index f0c39ec2822..67e87f9935d 100644
--- a/src/backend/storage/buffer/buf_table.c
+++ b/src/backend/storage/buffer/buf_table.c
@@ -25,6 +25,7 @@
#include "funcapi.h"
#include "storage/buf_internals.h"
#include "storage/lwlock.h"
+#include "storage/pg_shmem.h"
#include "utils/rel.h"
#include "utils/builtins.h"
@@ -64,10 +65,11 @@ InitBufTable(int size)
info.entrysize = sizeof(BufferLookupEnt);
info.num_partitions = NUM_BUFFER_PARTITIONS;
- SharedBufHash = ShmemInitHash("Shared Buffer Lookup Table",
+ SharedBufHash = ShmemInitHashInSegment("Shared Buffer Lookup Table",
size, size,
&info,
- HASH_ELEM | HASH_BLOBS | HASH_PARTITION | HASH_FIXED_SIZE);
+ HASH_ELEM | HASH_BLOBS | HASH_PARTITION | HASH_FIXED_SIZE,
+ STRATEGY_SHMEM_SEGMENT);
}
/*
diff --git a/src/backend/storage/buffer/freelist.c b/src/backend/storage/buffer/freelist.c
index 28d952b3534..13ee840ab9f 100644
--- a/src/backend/storage/buffer/freelist.c
+++ b/src/backend/storage/buffer/freelist.c
@@ -19,6 +19,7 @@
#include "port/atomics.h"
#include "storage/buf_internals.h"
#include "storage/bufmgr.h"
+#include "storage/pg_shmem.h"
#include "storage/proc.h"
#define INT_ACCESS_ONCE(var) ((int)(*((volatile int *)&(var))))
@@ -418,9 +419,9 @@ StrategyInitialize(bool init)
* Get or create the shared strategy control block
*/
StrategyControl = (BufferStrategyControl *)
- ShmemInitStruct("Buffer Strategy Status",
+ ShmemInitStructInSegment("Buffer Strategy Status",
sizeof(BufferStrategyControl),
- &found);
+ &found, STRATEGY_SHMEM_SEGMENT);
if (!found)
{
diff --git a/src/backend/storage/ipc/ipc.c b/src/backend/storage/ipc/ipc.c
index 2704e80b3a7..1965b2d3eb4 100644
--- a/src/backend/storage/ipc/ipc.c
+++ b/src/backend/storage/ipc/ipc.c
@@ -61,6 +61,8 @@ static void proc_exit_prepare(int code);
* but provide some additional features we need --- in particular,
* we want to register callbacks to invoke when we are disconnecting
* from a broken shared-memory context but not exiting the postmaster.
+ * Maximum number of such exit callbacks depends on the number of shared
+ * segments.
*
* Callback functions can take zero, one, or two args: the first passed
* arg is the integer exitcode, the second is the Datum supplied when
@@ -68,7 +70,7 @@ static void proc_exit_prepare(int code);
* ----------------------------------------------------------------
*/
-#define MAX_ON_EXITS 20
+#define MAX_ON_EXITS 40
struct ONEXIT
{
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index b23d0c19360..41190f96639 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -81,10 +81,17 @@ RequestAddinShmemSpace(Size size)
/*
* CalculateShmemSize
- * Calculates the amount of shared memory needed.
+ * Calculates the amount of shared memory needed.
+ *
+ * The amount of shared memory required per segment is saved in mapping_sizes,
+ * which is expected to be an array of size NUM_MEMORY_MAPPINGS. The total
+ * amount of memory needed across all the segments is returned. For the memory
+ * mappings which reserve address space for future expansion, the required
+ * amount of reserved space is saved in mapping_sizes of those segments.
+ * This memory is not included in the returned value.
*/
Size
-CalculateShmemSize(void)
+CalculateShmemSize(MemoryMappingSizes *mapping_sizes)
{
Size size;
@@ -102,7 +109,13 @@ CalculateShmemSize(void)
sizeof(ShmemIndexEnt)));
size = add_size(size, dsm_estimate_size());
size = add_size(size, DSMRegistryShmemSize());
- size = add_size(size, BufferManagerShmemSize());
+
+ /*
+ * Buffer manager adds estimates for memory requirements for every shared
+ * memory segment that it uses in the corresponding AnonymousMappings.
+ * Consider size required from only the main shared memory segment here.
+ */
+ size = add_size(size, BufferManagerShmemSize(mapping_sizes));
size = add_size(size, LockManagerShmemSize());
size = add_size(size, PredicateLockShmemSize());
size = add_size(size, ProcGlobalShmemSize());
@@ -144,8 +157,22 @@ CalculateShmemSize(void)
/* include additional requested shmem from preload libraries */
size = add_size(size, total_addin_request);
+ /*
+ * All the shared memory allocations considered so far happen in the main
+ * shared memory segment.
+ */
+ mapping_sizes[MAIN_SHMEM_SEGMENT].shmem_req_size = size;
+ mapping_sizes[MAIN_SHMEM_SEGMENT].shmem_reserved = size;
+
+ size = 0;
/* might as well round it off to a multiple of a typical page size */
- size = add_size(size, 8192 - (size % 8192));
+ for (int segment = 0; segment < NUM_MEMORY_MAPPINGS; segment++)
+ {
+ mapping_sizes[segment].shmem_req_size = add_size(mapping_sizes[segment].shmem_req_size, 8192 - (mapping_sizes[segment].shmem_req_size % 8192));
+ mapping_sizes[segment].shmem_reserved = add_size(mapping_sizes[segment].shmem_reserved, 8192 - (mapping_sizes[segment].shmem_reserved % 8192));
+ /* Compute the total size of all segments */
+ size = size + mapping_sizes[segment].shmem_req_size;
+ }
return size;
}
@@ -191,32 +218,44 @@ CreateSharedMemoryAndSemaphores(void)
{
PGShmemHeader *shim;
PGShmemHeader *seghdr;
- Size size;
+ MemoryMappingSizes mapping_sizes[NUM_MEMORY_MAPPINGS];
Assert(!IsUnderPostmaster);
- /* Compute the size of the shared-memory block */
- size = CalculateShmemSize();
- elog(DEBUG3, "invoking IpcMemoryCreate(size=%zu)", size);
+ CalculateShmemSize(mapping_sizes);
- /*
- * Create the shmem segment
- */
- seghdr = PGSharedMemoryCreate(size, &shim);
-
- /*
- * Make sure that huge pages are never reported as "unknown" while the
- * server is running.
- */
- Assert(strcmp("unknown",
- GetConfigOption("huge_pages_status", false, false)) != 0);
-
- InitShmemAccess(seghdr);
+ /* Decide if we use huge pages or regular size pages */
+ PrepareHugePages();
- /*
- * Set up shared memory allocation mechanism
- */
- InitShmemAllocation();
+ for(int segment = 0; segment < NUM_MEMORY_MAPPINGS; segment++)
+ {
+ MemoryMappingSizes *mapping = &mapping_sizes[segment];
+
+ /* Compute the size of the shared-memory block */
+ elog(DEBUG3, "invoking IpcMemoryCreate(segment %s, size=%zu, reserved address space=%zu)",
+ MappingName(segment), mapping->shmem_req_size, mapping->shmem_reserved);
+
+ /*
+ * Create the shmem segment.
+ *
+ * XXX: Do multiple shims are needed, one per segment?
+ */
+ seghdr = PGSharedMemoryCreate(mapping, segment, &shim);
+
+ /*
+ * Make sure that huge pages are never reported as "unknown" while the
+ * server is running.
+ */
+ Assert(strcmp("unknown",
+ GetConfigOption("huge_pages_status", false, false)) != 0);
+
+ InitShmemAccessInSegment(seghdr, segment);
+
+ /*
+ * Set up shared memory allocation mechanism
+ */
+ InitShmemAllocationInSegment(segment);
+ }
/* Initialize subsystems */
CreateOrAttachShmemStructs();
@@ -334,7 +373,9 @@ CreateOrAttachShmemStructs(void)
* InitializeShmemGUCs
*
* This function initializes runtime-computed GUCs related to the amount of
- * shared memory required for the current configuration.
+ * shared memory required for the current configuration. It assumes that the
+ * memory required by the shared memory segments is already calculated and is
+ * available in AnonymousMappings.
*/
void
InitializeShmemGUCs(void)
@@ -343,11 +384,13 @@ InitializeShmemGUCs(void)
Size size_b;
Size size_mb;
Size hp_size;
+ MemoryMappingSizes mapping_sizes[NUM_MEMORY_MAPPINGS];
+
/*
* Calculate the shared memory size and round up to the nearest megabyte.
*/
- size_b = CalculateShmemSize();
+ size_b = CalculateShmemSize(mapping_sizes);
size_mb = add_size(size_b, (1024 * 1024) - 1) / (1024 * 1024);
sprintf(buf, "%zu", size_mb);
SetConfigOption("shared_memory_size", buf,
@@ -356,7 +399,7 @@ InitializeShmemGUCs(void)
/*
* Calculate the number of huge pages required.
*/
- GetHugePageSize(&hp_size, NULL);
+ GetHugePageSize(&hp_size, NULL, NULL);
if (hp_size != 0)
{
Size hp_required;
diff --git a/src/backend/storage/ipc/shmem.c b/src/backend/storage/ipc/shmem.c
index 0f18beb6ad4..f303a9328df 100644
--- a/src/backend/storage/ipc/shmem.c
+++ b/src/backend/storage/ipc/shmem.c
@@ -76,20 +76,19 @@
#include "utils/builtins.h"
static void *ShmemAllocRaw(Size size, Size *allocated_size);
-static void *ShmemAllocUnlocked(Size size);
+static void *ShmemAllocRawInSegment(Size size, Size *allocated_size,
+ int shmem_segment);
/* shared memory global variables */
-static PGShmemHeader *ShmemSegHdr; /* shared mem segment header */
+ShmemSegment Segments[NUM_MEMORY_MAPPINGS];
-static void *ShmemBase; /* start address of shared memory */
-
-static void *ShmemEnd; /* end+1 address of shared memory */
-
-slock_t *ShmemLock; /* spinlock for shared memory and LWLock
- * allocation */
-
-static HTAB *ShmemIndex = NULL; /* primary index hashtable for shmem */
+/*
+ * Primary index hashtable for shmem, for simplicity we use a single for all
+ * shared memory segments. There can be performance consequences of that, and
+ * an alternative option would be to have one index per shared memory segments.
+ */
+static HTAB *ShmemIndex = NULL;
/* To get reliable results for NUMA inquiry we need to "touch pages" once */
static bool firstNumaTouch = true;
@@ -102,9 +101,17 @@ Datum pg_numa_available(PG_FUNCTION_ARGS);
void
InitShmemAccess(PGShmemHeader *seghdr)
{
- ShmemSegHdr = seghdr;
- ShmemBase = seghdr;
- ShmemEnd = (char *) ShmemBase + seghdr->totalsize;
+ InitShmemAccessInSegment(seghdr, MAIN_SHMEM_SEGMENT);
+}
+
+void
+InitShmemAccessInSegment(PGShmemHeader *seghdr, int shmem_segment)
+{
+ PGShmemHeader *shmhdr = (PGShmemHeader *) seghdr;
+ ShmemSegment *seg = &Segments[shmem_segment];
+ seg->ShmemSegHdr = shmhdr;
+ seg->ShmemBase = (void *) shmhdr;
+ seg->ShmemEnd = (char *) seg->ShmemBase + shmhdr->totalsize;
}
/*
@@ -115,7 +122,13 @@ InitShmemAccess(PGShmemHeader *seghdr)
void
InitShmemAllocation(void)
{
- PGShmemHeader *shmhdr = ShmemSegHdr;
+ InitShmemAllocationInSegment(MAIN_SHMEM_SEGMENT);
+}
+
+void
+InitShmemAllocationInSegment(int shmem_segment)
+{
+ PGShmemHeader *shmhdr = Segments[shmem_segment].ShmemSegHdr;
char *aligned;
Assert(shmhdr != NULL);
@@ -124,9 +137,9 @@ InitShmemAllocation(void)
* Initialize the spinlock used by ShmemAlloc. We must use
* ShmemAllocUnlocked, since obviously ShmemAlloc can't be called yet.
*/
- ShmemLock = (slock_t *) ShmemAllocUnlocked(sizeof(slock_t));
+ Segments[shmem_segment].ShmemLock = (slock_t *) ShmemAllocUnlockedInSegment(sizeof(slock_t), shmem_segment);
- SpinLockInit(ShmemLock);
+ SpinLockInit(Segments[shmem_segment].ShmemLock);
/*
* Allocations after this point should go through ShmemAlloc, which
@@ -151,16 +164,22 @@ InitShmemAllocation(void)
*/
void *
ShmemAlloc(Size size)
+{
+ return ShmemAllocInSegment(size, MAIN_SHMEM_SEGMENT);
+}
+
+void *
+ShmemAllocInSegment(Size size, int shmem_segment)
{
void *newSpace;
Size allocated_size;
- newSpace = ShmemAllocRaw(size, &allocated_size);
+ newSpace = ShmemAllocRawInSegment(size, &allocated_size, shmem_segment);
if (!newSpace)
ereport(ERROR,
(errcode(ERRCODE_OUT_OF_MEMORY),
- errmsg("out of shared memory (%zu bytes requested)",
- size)));
+ errmsg("out of shared memory in segment %s (%zu bytes requested)",
+ MappingName(shmem_segment), size)));
return newSpace;
}
@@ -185,6 +204,12 @@ ShmemAllocNoError(Size size)
*/
static void *
ShmemAllocRaw(Size size, Size *allocated_size)
+{
+ return ShmemAllocRawInSegment(size, allocated_size, MAIN_SHMEM_SEGMENT);
+}
+
+static void *
+ShmemAllocRawInSegment(Size size, Size *allocated_size, int shmem_segment)
{
Size newStart;
Size newFree;
@@ -204,22 +229,22 @@ ShmemAllocRaw(Size size, Size *allocated_size)
size = CACHELINEALIGN(size);
*allocated_size = size;
- Assert(ShmemSegHdr != NULL);
+ Assert(Segments[shmem_segment].ShmemSegHdr != NULL);
- SpinLockAcquire(ShmemLock);
+ SpinLockAcquire(Segments[shmem_segment].ShmemLock);
- newStart = ShmemSegHdr->freeoffset;
+ newStart = Segments[shmem_segment].ShmemSegHdr->freeoffset;
newFree = newStart + size;
- if (newFree <= ShmemSegHdr->totalsize)
+ if (newFree <= Segments[shmem_segment].ShmemSegHdr->totalsize)
{
- newSpace = (char *) ShmemBase + newStart;
- ShmemSegHdr->freeoffset = newFree;
+ newSpace = (char *) Segments[shmem_segment].ShmemBase + newStart;
+ Segments[shmem_segment].ShmemSegHdr->freeoffset = newFree;
}
else
newSpace = NULL;
- SpinLockRelease(ShmemLock);
+ SpinLockRelease(Segments[shmem_segment].ShmemLock);
/* note this assert is okay with newSpace == NULL */
Assert(newSpace == (void *) CACHELINEALIGN(newSpace));
@@ -228,15 +253,16 @@ ShmemAllocRaw(Size size, Size *allocated_size)
}
/*
- * ShmemAllocUnlocked -- allocate max-aligned chunk from shared memory
+ * ShmemAllocUnlockedInSegment
+ * allocate max-aligned chunk from given shared memory segment
*
* Allocate space without locking ShmemLock. This should be used for,
* and only for, allocations that must happen before ShmemLock is ready.
*
* We consider maxalign, rather than cachealign, sufficient here.
*/
-static void *
-ShmemAllocUnlocked(Size size)
+void *
+ShmemAllocUnlockedInSegment(Size size, int shmem_segment)
{
Size newStart;
Size newFree;
@@ -247,19 +273,19 @@ ShmemAllocUnlocked(Size size)
*/
size = MAXALIGN(size);
- Assert(ShmemSegHdr != NULL);
+ Assert(Segments[shmem_segment].ShmemSegHdr != NULL);
- newStart = ShmemSegHdr->freeoffset;
+ newStart = Segments[shmem_segment].ShmemSegHdr->freeoffset;
newFree = newStart + size;
- if (newFree > ShmemSegHdr->totalsize)
+ if (newFree > Segments[shmem_segment].ShmemSegHdr->totalsize)
ereport(ERROR,
(errcode(ERRCODE_OUT_OF_MEMORY),
- errmsg("out of shared memory (%zu bytes requested)",
- size)));
- ShmemSegHdr->freeoffset = newFree;
+ errmsg("out of shared memory in segment %s (%zu bytes requested)",
+ MappingName(shmem_segment), size)));
+ Segments[shmem_segment].ShmemSegHdr->freeoffset = newFree;
- newSpace = (char *) ShmemBase + newStart;
+ newSpace = (char *) Segments[shmem_segment].ShmemBase + newStart;
Assert(newSpace == (void *) MAXALIGN(newSpace));
@@ -274,7 +300,13 @@ ShmemAllocUnlocked(Size size)
bool
ShmemAddrIsValid(const void *addr)
{
- return (addr >= ShmemBase) && (addr < ShmemEnd);
+ return ShmemAddrIsValidInSegment(addr, MAIN_SHMEM_SEGMENT);
+}
+
+bool
+ShmemAddrIsValidInSegment(const void *addr, int shmem_segment)
+{
+ return (addr >= Segments[shmem_segment].ShmemBase) && (addr < Segments[shmem_segment].ShmemEnd);
}
/*
@@ -335,6 +367,18 @@ ShmemInitHash(const char *name, /* table string name for shmem index */
int64 max_size, /* max size of the table */
HASHCTL *infoP, /* info about key and bucket size */
int hash_flags) /* info about infoP */
+{
+ return ShmemInitHashInSegment(name, init_size, max_size, infoP, hash_flags,
+ MAIN_SHMEM_SEGMENT);
+}
+
+HTAB *
+ShmemInitHashInSegment(const char *name, /* table string name for shmem index */
+ long init_size, /* initial table size */
+ long max_size, /* max size of the table */
+ HASHCTL *infoP, /* info about key and bucket size */
+ int hash_flags, /* info about infoP */
+ int shmem_segment) /* in which segment to keep the table */
{
bool found;
void *location;
@@ -351,9 +395,9 @@ ShmemInitHash(const char *name, /* table string name for shmem index */
hash_flags |= HASH_SHARED_MEM | HASH_ALLOC | HASH_DIRSIZE;
/* look it up in the shmem index */
- location = ShmemInitStruct(name,
+ location = ShmemInitStructInSegment(name,
hash_get_shared_size(infoP, hash_flags),
- &found);
+ &found, shmem_segment);
/*
* if it already exists, attach to it rather than allocate and initialize
@@ -386,6 +430,13 @@ ShmemInitHash(const char *name, /* table string name for shmem index */
*/
void *
ShmemInitStruct(const char *name, Size size, bool *foundPtr)
+{
+ return ShmemInitStructInSegment(name, size, foundPtr, MAIN_SHMEM_SEGMENT);
+}
+
+void *
+ShmemInitStructInSegment(const char *name, Size size, bool *foundPtr,
+ int shmem_segment)
{
ShmemIndexEnt *result;
void *structPtr;
@@ -394,7 +445,7 @@ ShmemInitStruct(const char *name, Size size, bool *foundPtr)
if (!ShmemIndex)
{
- PGShmemHeader *shmemseghdr = ShmemSegHdr;
+ PGShmemHeader *shmemseghdr = Segments[shmem_segment].ShmemSegHdr;
/* Must be trying to create/attach to ShmemIndex itself */
Assert(strcmp(name, "ShmemIndex") == 0);
@@ -417,7 +468,7 @@ ShmemInitStruct(const char *name, Size size, bool *foundPtr)
* process can be accessing shared memory yet.
*/
Assert(shmemseghdr->index == NULL);
- structPtr = ShmemAlloc(size);
+ structPtr = ShmemAllocInSegment(size, shmem_segment);
shmemseghdr->index = structPtr;
*foundPtr = false;
}
@@ -434,8 +485,8 @@ ShmemInitStruct(const char *name, Size size, bool *foundPtr)
LWLockRelease(ShmemIndexLock);
ereport(ERROR,
(errcode(ERRCODE_OUT_OF_MEMORY),
- errmsg("could not create ShmemIndex entry for data structure \"%s\"",
- name)));
+ errmsg("could not create ShmemIndex entry for data structure \"%s\" in segment %d",
+ name, shmem_segment)));
}
if (*foundPtr)
@@ -460,7 +511,7 @@ ShmemInitStruct(const char *name, Size size, bool *foundPtr)
Size allocated_size;
/* It isn't in the table yet. allocate and initialize it */
- structPtr = ShmemAllocRaw(size, &allocated_size);
+ structPtr = ShmemAllocRawInSegment(size, &allocated_size, shmem_segment);
if (structPtr == NULL)
{
/* out of memory; remove the failed ShmemIndex entry */
@@ -475,18 +526,18 @@ ShmemInitStruct(const char *name, Size size, bool *foundPtr)
result->size = size;
result->allocated_size = allocated_size;
result->location = structPtr;
+ result->shmem_segment = shmem_segment;
}
LWLockRelease(ShmemIndexLock);
- Assert(ShmemAddrIsValid(structPtr));
+ Assert(ShmemAddrIsValidInSegment(structPtr, shmem_segment));
Assert(structPtr == (void *) CACHELINEALIGN(structPtr));
return structPtr;
}
-
/*
* Add two Size values, checking for overflow
*/
@@ -527,13 +578,14 @@ mul_size(Size s1, Size s2)
Datum
pg_get_shmem_allocations(PG_FUNCTION_ARGS)
{
-#define PG_GET_SHMEM_SIZES_COLS 4
+#define PG_GET_SHMEM_SIZES_COLS 5
ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
HASH_SEQ_STATUS hstat;
ShmemIndexEnt *ent;
- Size named_allocated = 0;
+ Size named_allocated[NUM_MEMORY_MAPPINGS] = {0};
Datum values[PG_GET_SHMEM_SIZES_COLS];
bool nulls[PG_GET_SHMEM_SIZES_COLS];
+ int i;
InitMaterializedSRF(fcinfo, 0);
@@ -546,29 +598,40 @@ pg_get_shmem_allocations(PG_FUNCTION_ARGS)
while ((ent = (ShmemIndexEnt *) hash_seq_search(&hstat)) != NULL)
{
values[0] = CStringGetTextDatum(ent->key);
- values[1] = Int64GetDatum((char *) ent->location - (char *) ShmemSegHdr);
- values[2] = Int64GetDatum(ent->size);
- values[3] = Int64GetDatum(ent->allocated_size);
- named_allocated += ent->allocated_size;
+ values[1] = CStringGetTextDatum(MappingName(ent->shmem_segment));
+ values[2] = Int64GetDatum((char *) ent->location - (char *) Segments[ent->shmem_segment].ShmemSegHdr);
+ values[3] = Int64GetDatum(ent->size);
+ values[4] = Int64GetDatum(ent->allocated_size);
+ named_allocated[ent->shmem_segment] += ent->allocated_size;
tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc,
values, nulls);
}
/* output shared memory allocated but not counted via the shmem index */
- values[0] = CStringGetTextDatum("");
- nulls[1] = true;
- values[2] = Int64GetDatum(ShmemSegHdr->freeoffset - named_allocated);
- values[3] = values[2];
- tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc, values, nulls);
+ for (i = 0; i < NUM_MEMORY_MAPPINGS; i++)
+ {
+ values[0] = CStringGetTextDatum("");
+ values[1] = CStringGetTextDatum(MappingName(i));
+ nulls[2] = true;
+ values[3] = Int64GetDatum(Segments[i].ShmemSegHdr->freeoffset - named_allocated[i]);
+ values[4] = values[3];
+ tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc, values, nulls);
+ }
/* output as-of-yet unused shared memory */
- nulls[0] = true;
- values[1] = Int64GetDatum(ShmemSegHdr->freeoffset);
- nulls[1] = false;
- values[2] = Int64GetDatum(ShmemSegHdr->totalsize - ShmemSegHdr->freeoffset);
- values[3] = values[2];
- tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc, values, nulls);
+ memset(nulls, 0, sizeof(nulls));
+
+ for (i = 0; i < NUM_MEMORY_MAPPINGS; i++)
+ {
+ PGShmemHeader *shmhdr = Segments[i].ShmemSegHdr;
+ nulls[0] = true;
+ values[1] = CStringGetTextDatum(MappingName(i));
+ values[2] = Int64GetDatum(shmhdr->freeoffset);
+ values[3] = Int64GetDatum(shmhdr->totalsize - shmhdr->freeoffset);
+ values[4] = values[3];
+ tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc, values, nulls);
+ }
LWLockRelease(ShmemIndexLock);
@@ -593,7 +656,7 @@ pg_get_shmem_allocations_numa(PG_FUNCTION_ARGS)
Size os_page_size;
void **page_ptrs;
int *pages_status;
- uint64 shm_total_page_count,
+ uint64 shm_total_page_count = 0,
shm_ent_page_count,
max_nodes;
Size *nodes;
@@ -628,7 +691,12 @@ pg_get_shmem_allocations_numa(PG_FUNCTION_ARGS)
* this is not very likely, and moreover we have more entries, each of
* them using only fraction of the total pages.
*/
- shm_total_page_count = (ShmemSegHdr->totalsize / os_page_size) + 1;
+ for(int segment = 0; segment < NUM_MEMORY_MAPPINGS; segment++)
+ {
+ PGShmemHeader *shmhdr = Segments[segment].ShmemSegHdr;
+ shm_total_page_count += (shmhdr->totalsize / os_page_size) + 1;
+ }
+
page_ptrs = palloc0(sizeof(void *) * shm_total_page_count);
pages_status = palloc(sizeof(int) * shm_total_page_count);
@@ -751,7 +819,7 @@ pg_get_shmem_pagesize(void)
Assert(huge_pages_status != HUGE_PAGES_UNKNOWN);
if (huge_pages_status == HUGE_PAGES_ON)
- GetHugePageSize(&os_page_size, NULL);
+ GetHugePageSize(&os_page_size, NULL, NULL);
return os_page_size;
}
@@ -761,3 +829,46 @@ pg_numa_available(PG_FUNCTION_ARGS)
{
PG_RETURN_BOOL(pg_numa_init() != -1);
}
+
+/* SQL SRF showing shared memory segments */
+Datum
+pg_get_shmem_segments(PG_FUNCTION_ARGS)
+{
+#define PG_GET_SHMEM_SEGS_COLS 6
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ Datum values[PG_GET_SHMEM_SEGS_COLS];
+ bool nulls[PG_GET_SHMEM_SEGS_COLS];
+ int i;
+
+ InitMaterializedSRF(fcinfo, 0);
+
+ /* output all allocated entries */
+ for (i = 0; i < NUM_MEMORY_MAPPINGS; i++)
+ {
+ ShmemSegment *segment = &Segments[i];
+ PGShmemHeader *shmhdr = segment->ShmemSegHdr;
+ int j;
+
+ if (shmhdr == NULL)
+ {
+ for (j = 0; j < PG_GET_SHMEM_SEGS_COLS; j++)
+ nulls[j] = true;
+ }
+ else
+ {
+ memset(nulls, 0, sizeof(nulls));
+ values[0] = Int32GetDatum(i);
+ values[1] = CStringGetTextDatum(MappingName(i));
+ values[2] = Int64GetDatum(shmhdr->totalsize);
+ values[3] = Int64GetDatum(shmhdr->freeoffset);
+ values[4] = Int64GetDatum(segment->shmem_size);
+ values[5] = Int64GetDatum(segment->shmem_reserved);
+ }
+
+ tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc,
+ values, nulls);
+ }
+
+ return (Datum) 0;
+}
+
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index b017880f5e4..c25dd13b63a 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -80,6 +80,8 @@
#include "pg_trace.h"
#include "pgstat.h"
#include "port/pg_bitutils.h"
+#include "postmaster/postmaster.h"
+#include "storage/pg_shmem.h"
#include "storage/proc.h"
#include "storage/proclist.h"
#include "storage/procnumber.h"
@@ -612,12 +614,15 @@ LWLockNewTrancheId(const char *name)
/*
* We use the ShmemLock spinlock to protect LWLockCounter and
* LWLockTrancheNames.
+ *
+ * XXX: Looks like this is the only use of Segments outside of shmem.c,
+ * it's maybe worth it to reshape this part to hide Segments structure.
*/
- SpinLockAcquire(ShmemLock);
+ SpinLockAcquire(Segments[MAIN_SHMEM_SEGMENT].ShmemLock);
if (*LWLockCounter - LWTRANCHE_FIRST_USER_DEFINED >= MAX_NAMED_TRANCHES)
{
- SpinLockRelease(ShmemLock);
+ SpinLockRelease(Segments[MAIN_SHMEM_SEGMENT].ShmemLock);
ereport(ERROR,
(errmsg("maximum number of tranches already registered"),
errdetail("No more than %d tranches may be registered.",
@@ -628,7 +633,7 @@ LWLockNewTrancheId(const char *name)
LocalLWLockCounter = *LWLockCounter;
strlcpy(LWLockTrancheNames[result - LWTRANCHE_FIRST_USER_DEFINED], name, NAMEDATALEN);
- SpinLockRelease(ShmemLock);
+ SpinLockRelease(Segments[MAIN_SHMEM_SEGMENT].ShmemLock);
return result;
}
@@ -750,9 +755,9 @@ GetLWTrancheName(uint16 trancheId)
*/
if (trancheId >= LocalLWLockCounter)
{
- SpinLockAcquire(ShmemLock);
+ SpinLockAcquire(Segments[MAIN_SHMEM_SEGMENT].ShmemLock);
LocalLWLockCounter = *LWLockCounter;
- SpinLockRelease(ShmemLock);
+ SpinLockRelease(Segments[MAIN_SHMEM_SEGMENT].ShmemLock);
if (trancheId >= LocalLWLockCounter)
elog(ERROR, "tranche %d is not registered", trancheId);
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 5cf9e12fcb9..411043ca750 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8576,8 +8576,8 @@
{ oid => '5052', descr => 'allocations from the main shared memory segment',
proname => 'pg_get_shmem_allocations', prorows => '50', proretset => 't',
provolatile => 'v', prorettype => 'record', proargtypes => '',
- proallargtypes => '{text,int8,int8,int8}', proargmodes => '{o,o,o,o}',
- proargnames => '{name,off,size,allocated_size}',
+ proallargtypes => '{text,text,int8,int8,int8}', proargmodes => '{o,o,o,o,o}',
+ proargnames => '{name,segment,off,size,allocated_size}',
prosrc => 'pg_get_shmem_allocations' },
{ oid => '4099', descr => 'Is NUMA support available?',
@@ -8600,6 +8600,14 @@
proargmodes => '{o,o,o}', proargnames => '{name,type,size}',
prosrc => 'pg_get_dsm_registry_allocations' },
+# shared memory segments
+{ oid => '5101', descr => 'shared memory segments',
+ proname => 'pg_get_shmem_segments', prorows => '6', proretset => 't',
+ provolatile => 'v', prorettype => 'record', proargtypes => '',
+ proallargtypes => '{int4,text,int8,int8,int8,int8}', proargmodes => '{o,o,o,o,o,o}',
+ proargnames => '{id,name,size,freeoffset,mapping_size,mapping_reserved_size}',
+ prosrc => 'pg_get_shmem_segments' },
+
# memory context of local backend
{ oid => '2282',
descr => 'information about all memory contexts of local backend',
diff --git a/src/include/portability/mem.h b/src/include/portability/mem.h
index ef9800732d9..40588ff6968 100644
--- a/src/include/portability/mem.h
+++ b/src/include/portability/mem.h
@@ -38,7 +38,7 @@
#define MAP_NOSYNC 0
#endif
-#define PG_MMAP_FLAGS (MAP_SHARED|MAP_ANONYMOUS|MAP_HASSEMAPHORE)
+#define PG_MMAP_FLAGS (MAP_SHARED|MAP_HASSEMAPHORE)
/* Some really old systems don't define MAP_FAILED. */
#ifndef MAP_FAILED
diff --git a/src/include/storage/bufmgr.h b/src/include/storage/bufmgr.h
index b5f8f3c5d42..3769f4db7dc 100644
--- a/src/include/storage/bufmgr.h
+++ b/src/include/storage/bufmgr.h
@@ -19,6 +19,7 @@
#include "storage/block.h"
#include "storage/buf.h"
#include "storage/bufpage.h"
+#include "storage/pg_shmem.h"
#include "storage/relfilelocator.h"
#include "utils/relcache.h"
#include "utils/snapmgr.h"
@@ -326,7 +327,7 @@ extern void EvictRelUnpinnedBuffers(Relation rel,
/* in buf_init.c */
extern void BufferManagerShmemInit(void);
-extern Size BufferManagerShmemSize(void);
+extern Size BufferManagerShmemSize(MemoryMappingSizes *mapping_sizes);
/* in localbuf.c */
extern void AtProcExit_LocalBuffers(void);
diff --git a/src/include/storage/ipc.h b/src/include/storage/ipc.h
index 2a8a8f0eabd..d73f1b407db 100644
--- a/src/include/storage/ipc.h
+++ b/src/include/storage/ipc.h
@@ -18,6 +18,8 @@
#ifndef IPC_H
#define IPC_H
+#include "storage/pg_shmem.h"
+
typedef void (*pg_on_exit_callback) (int code, Datum arg);
typedef void (*shmem_startup_hook_type) (void);
@@ -77,7 +79,7 @@ extern void check_on_shmem_exit_lists_are_empty(void);
/* ipci.c */
extern PGDLLIMPORT shmem_startup_hook_type shmem_startup_hook;
-extern Size CalculateShmemSize(void);
+extern Size CalculateShmemSize(MemoryMappingSizes *mapping_sizes);
extern void CreateSharedMemoryAndSemaphores(void);
#ifdef EXEC_BACKEND
extern void AttachSharedMemoryStructs(void);
diff --git a/src/include/storage/pg_shmem.h b/src/include/storage/pg_shmem.h
index 5f7d4b83a60..beee0a53d2d 100644
--- a/src/include/storage/pg_shmem.h
+++ b/src/include/storage/pg_shmem.h
@@ -25,6 +25,13 @@
#define PG_SHMEM_H
#include "storage/dsm_impl.h"
+#include "storage/spin.h"
+
+typedef struct MemoryMappingSizes
+{
+ Size shmem_req_size; /* Required size of the segment */
+ Size shmem_reserved; /* Required size of the reserved address space. */
+} MemoryMappingSizes;
typedef struct PGShmemHeader /* standard header for all Postgres shmem */
{
@@ -41,6 +48,27 @@ typedef struct PGShmemHeader /* standard header for all Postgres shmem */
#endif
} PGShmemHeader;
+typedef struct ShmemSegment
+{
+ PGShmemHeader *ShmemSegHdr; /* shared mem segment header */
+ void *ShmemBase; /* start address of shared memory */
+ void *ShmemEnd; /* end+1 address of shared memory */
+ slock_t *ShmemLock; /* spinlock for shared memory and LWLock
+ * allocation */
+ int segment_fd; /* fd for the backing anon file */
+ unsigned long seg_id; /* IPC key */
+ int shmem_segment; /* TODO: Do we really need it? */
+ Size shmem_size; /* Size of the actually used memory */
+ Size shmem_reserved; /* Size of the reserved mapping */
+ Pointer shmem; /* Pointer to the start of the mapped memory */
+ Pointer seg_addr; /* SysV shared memory for the header */
+} ShmemSegment;
+
+/* Number of available segments for anonymous memory mappings */
+#define NUM_MEMORY_MAPPINGS 6
+
+extern PGDLLIMPORT ShmemSegment Segments[NUM_MEMORY_MAPPINGS];
+
/* GUC variables */
extern PGDLLIMPORT int shared_memory_type;
extern PGDLLIMPORT int huge_pages;
@@ -85,10 +113,38 @@ extern void PGSharedMemoryReAttach(void);
extern void PGSharedMemoryNoReAttach(void);
#endif
-extern PGShmemHeader *PGSharedMemoryCreate(Size size,
+extern PGShmemHeader *PGSharedMemoryCreate(MemoryMappingSizes *mapping_sizes, int segment_id,
PGShmemHeader **shim);
extern bool PGSharedMemoryIsInUse(unsigned long id1, unsigned long id2);
extern void PGSharedMemoryDetach(void);
-extern void GetHugePageSize(Size *hugepagesize, int *mmap_flags);
+extern const char *MappingName(int shmem_segment);
+extern void GetHugePageSize(Size *hugepagesize, int *mmap_flags,
+ int *memfd_flags);
+void PrepareHugePages(void);
+
+/*
+ * To be able to dynamically resize largest parts of the data stored in shared
+ * memory, we split it into multiple shared memory mappings segments. Each
+ * segment contains only certain part of the data, which size depends on
+ * NBuffers.
+ */
+
+/* The main segment, contains everything except buffer blocks and related data. */
+#define MAIN_SHMEM_SEGMENT 0
+
+/* Buffer blocks */
+#define BUFFERS_SHMEM_SEGMENT 1
+
+/* Buffer descriptors */
+#define BUFFER_DESCRIPTORS_SHMEM_SEGMENT 2
+
+/* Condition variables for buffers */
+#define BUFFER_IOCV_SHMEM_SEGMENT 3
+
+/* Checkpoint BufferIds */
+#define CHECKPOINT_BUFFERS_SHMEM_SEGMENT 4
+
+/* Buffer strategy status */
+#define STRATEGY_SHMEM_SEGMENT 5
#endif /* PG_SHMEM_H */
diff --git a/src/include/storage/shmem.h b/src/include/storage/shmem.h
index 70a5b8b172c..c56712555f0 100644
--- a/src/include/storage/shmem.h
+++ b/src/include/storage/shmem.h
@@ -30,14 +30,25 @@ extern PGDLLIMPORT slock_t *ShmemLock;
typedef struct PGShmemHeader PGShmemHeader; /* avoid including
* storage/pg_shmem.h here */
extern void InitShmemAccess(PGShmemHeader *seghdr);
+extern void InitShmemAccessInSegment(struct PGShmemHeader *seghdr,
+ int shmem_segment);
extern void InitShmemAllocation(void);
+extern void InitShmemAllocationInSegment(int shmem_segment);
extern void *ShmemAlloc(Size size);
+extern void *ShmemAllocInSegment(Size size, int shmem_segment);
extern void *ShmemAllocNoError(Size size);
+extern void *ShmemAllocUnlockedInSegment(Size size, int shmem_segment);
extern bool ShmemAddrIsValid(const void *addr);
+extern bool ShmemAddrIsValidInSegment(const void *addr, int shmem_segment);
extern void InitShmemIndex(void);
extern HTAB *ShmemInitHash(const char *name, int64 init_size, int64 max_size,
HASHCTL *infoP, int hash_flags);
+extern HTAB *ShmemInitHashInSegment(const char *name, long init_size,
+ long max_size, HASHCTL *infoP,
+ int hash_flags, int shmem_segment);
extern void *ShmemInitStruct(const char *name, Size size, bool *foundPtr);
+extern void *ShmemInitStructInSegment(const char *name, Size size,
+ bool *foundPtr, int shmem_segment);
extern Size add_size(Size s1, Size s2);
extern Size mul_size(Size s1, Size s2);
@@ -59,6 +70,7 @@ typedef struct
void *location; /* location in shared mem */
Size size; /* # bytes requested for the structure */
Size allocated_size; /* # bytes actually allocated */
+ int shmem_segment; /* segment in which the structure is allocated */
} ShmemIndexEnt;
#endif /* SHMEM_H */
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 7c52181cbcb..bd877df5f3b 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1765,14 +1765,22 @@ pg_shadow| SELECT pg_authid.rolname AS usename,
LEFT JOIN pg_db_role_setting s ON (((pg_authid.oid = s.setrole) AND (s.setdatabase = (0)::oid))))
WHERE pg_authid.rolcanlogin;
pg_shmem_allocations| SELECT name,
+ segment,
off,
size,
allocated_size
- FROM pg_get_shmem_allocations() pg_get_shmem_allocations(name, off, size, allocated_size);
+ FROM pg_get_shmem_allocations() pg_get_shmem_allocations(name, segment, off, size, allocated_size);
pg_shmem_allocations_numa| SELECT name,
numa_node,
size
FROM pg_get_shmem_allocations_numa() pg_get_shmem_allocations_numa(name, numa_node, size);
+pg_shmem_segments| SELECT id,
+ name,
+ size,
+ freeoffset,
+ mapping_size,
+ mapping_reserved_size
+ FROM pg_get_shmem_segments() pg_get_shmem_segments(id, name, size, freeoffset, mapping_size, mapping_reserved_size);
pg_stat_activity| SELECT s.datid,
d.datname,
s.pid,
--
2.34.1