Thread: Dynamic shared memory areas

Dynamic shared memory areas

From
Thomas Munro
Date:
Hi hackers,

I would like to propose a new subsystem called Dynamic Shared [Memory]
Areas, or "DSA".  It provides an object called a "dsa_area" which can
be used by multiple backends to share data.  Under the covers, a
dsa_area is made up of some number of DSM segments, but it appears to
client code as a single shared memory heap with a simple allocate/free
interface.  Because the memory is mapped at different addresses in
different backends, it introduces a kind of sharable relative pointer
and an operation to convert it to a backend-local pointer.

After you have created or attached to a dsa_area, you can use it much
like MemoryContextAlloc/pfree, except for the extra hoop to jump
through to get the local address:

  dsa_pointer p;
  char *mem;

  p = dsa_allocate(area, 42);
  mem = (char *) dsa_get_address(area, p);
  if (mem != NULL)
  {
      snprintf(mem, 42, "Hello world");
      dsa_free(area, p);
  }

Exposing the dsa_pointer in this way allows client code to build data
structures with internal dsa_pointers that will be usable in all
backends that attach to the dsa_area.

DSA areas have many potential uses, including shared workspaces for
various kinds of parallel query execution, longer term storage for
in-memory database objects, caches and so forth.  In some cases it may
be useful to use a dsa_area directly, but there could be a library of
useful data structures that know how to use DSA memory.  More on all
of those topics, with patches, soon.

SOME CONTEXT

Currently, Postgres provides three classes of memory:

1.  Backend-local memory, managed with palloc/pfree, and MemoryContext
providing a hierarchy of memory heaps tied to various scopes.
Underneath that, there is of course the C runtime's heap and
allocator.

2.  Traditional non-extensible shared memory mapped into every backend
at the same address.  This works on Unix because child processes
inherit the memory map of the postmaster.  In EXEC_BACKEND builds
(including Windows) it works because you can ask for memory to be
mapped at a specific address and it'll succeed if ASLR is turned off
and the backend hasn't been running very long and the address range
happens to be still free.  This memory is currently managed with an
allocate-only allocator.  There is a small library of data structures
that know how to use (but never free) this memory.

3.  DSM memory, our abstraction for shared memory segments created on
demand in non-postmaster backends.  This memory is mapped at different
addresses in different backends.  Currently its main use is to provide
a chunk of memory for parallel query.  To manage the space inside a
DSM segment, shm_toc ('table-of-contents') can be used as a kind of
allocate-only space manager which allows backends to find the
backend-local address of objects within the segment using integer
keys.

This proposal adds a fourth class, building on the third.  Compared
with the existing memory classes:

* It provides a fully general allocate/free facility, as currently
available only in (1), though does not have (1)'s directly
dereferenceable pointers.

* It grows automatically and can in theory grow as big as virtual
memory allows, like (1), though it also provides a way to cap total
size so that allocations fail beyond some size.

* It provides something like the throw-it-all-away-at-once clean-up
facility of (1), since DSA areas can be destroyed, are reference
counted, and can optionally be tracked by the resource manager
mechanism (riding on DSM's coat tails).

* It provides the data sharing between backends of (2) and (3), though
doesn't have (2)'s directly dereferenceable pointers.

* Through proposals that will follow this one, it will provide for
basic data structures that build on top of it such as hash tables,
like (2), except that these ones will be able to grow as required and
give memory back.

* Unlike (1) and (2), client code has to deal with incompatible memory
maps.  This involves calling dsa_get_address(area, relative_pointer)
which amounts to a few instructions to perform a base address lookup
and pointer arithmetic.

Using processes instead of threads gives Postgres certain advantages,
but requires us to deal with shared memory instead of just using
something like (1) for all our memory needs, as a hypothetical
multi-threaded Postgres fork would presumably do.  This proposal is a
step towards making our shared memory facilities more powerful and
general.

IMPLEMENTATON AND HISTORY

Back in 2014, Robert Haas proposed sb_alloc[1].  It had two layers:

* a 'free page manager' which cuts a piece of memory into 4KB pages
and embeds a btree into the empty pages to track contiguous runs of
pages, so that you can get and put free page ranges

* an allocator which manages a set of backend-private memory regions,
each of which has a free page manager; large allocations are handled
directly with pages from the free page manager in an existing region,
or new regions created as required with malloc; allocations <= 8KB are
handled with pools (called "heaps" in that patch) of various object
sizes ("size classes") that live in 64KB superblocks, which in turn
come from the free page manager

DSA uses Robert's free page manager unchanged, except for some
debugging by me.  It uses the same general approach and much of the
code for the higher level allocator, but I have reworked it
substantially to replace the MemoryContext interface, put it in DSM
segments, introduce the multi-segment relative pointer scheme, and add
concurrency support.

Compared to some well known malloc implementations which this code
takes general inspiration from, the main differences are obviously the
shared memory nature, the lack of per-core pools (an avenue for future
research that would increase concurrent performance at the cost of
increased fragmentation), and it has that lower level page manager.
Some other systems go directly to the OS (mmap, sbrk) for superblocks
and large objects.  The equivalent for us would be to throw away the
lower layer and simply create a DSM segment for large allocations and
64KB superblocks, but there are implementation and portability reasons
not to want to create very large numbers of DSM segments.

Compared to palloc/pfree, DSA aims to waste less space.  It has more
finely gained size classes (8, 16, 24, 32, 40, 48, ... see
dsa_size_classes), uses a page map that uses 8 bytes per 4KB page to
keep track of how to free memory instead of putting bookkeeping
information in front of every object.

Some other notes in no particular order:  It's admittedly slightly
confusing that the patch currently contains two separate relative
pointer concepts: relptr is used by Robert's freespace.c code and
provides for sort-of-type-checked offsets relative to a single base,
and dsa_pointer is used by dsa.c to provide multi-segment relative
pointers that encode a segment index in the higher bits.  The lock
tranche arguments to dsa_create_dynamic are clunky, but I don't have a
better idea currently since you can't allocate and free tranche IDs so
I don't see how dsa.c can own that problem.  The "dynamic"  part of
dsa_create_dynamic's name reflects a desire to have an alternative
"fixed" version where you can provide it with an already existing
piece of memory to manage, such as a pre-existing DSM segment, but
that has not been implemented.  It's desirable to allow atomic ops on
dsa_pointer; I believe Andres Freund plans to make that happen for 64
bit values on 32 bit systems, but if that turns out to be problematic
I would want to make dsa_pointer 32 bits on 32 bit systems.

PATCH

First, please apply dsm-unpin-segment-v2.patch[2], and then
dsm-handle-invalid.patch (attached, and also proposed), and finally
dsa-v1.patch.  I have also attached test-dsa.patch, a small module
which exercises the allocator and shows some client code.

Thanks to my colleagues Robert Haas for the sb_alloc code that morphed
into this patch, and John Gorman and Amit Khandekar for feedback and
testing.

I'd be most grateful for any feedback.  Thanks for reading!

[1] https://www.postgresql.org/message-id/flat/CA%2BTgmobkeWptGwiNa%2BSGFWsTLzTzD-CeLz0KcE-y6LFgoUus4A%40mail.gmail.com
[2] https://www.postgresql.org/message-id/CAEepm%3D29DZeWf44-4fzciAQ14iY5vCVZ6RUJ-KR2yzs3hPzrkw%40mail.gmail.com

--
Thomas Munro
http://www.enterprisedb.com

Attachment

Re: Dynamic shared memory areas

From
Thomas Munro
Date:
On Fri, Aug 19, 2016 at 7:07 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> I would like to propose a new subsystem called Dynamic Shared [Memory]
> Areas, or "DSA".  It provides an object called a "dsa_area" which can
> be used by multiple backends to share data.  Under the covers, a
> dsa_area is made up of some number of DSM segments, but it appears to
> client code as a single shared memory heap with a simple allocate/free
> interface.  Because the memory is mapped at different addresses in
> different backends, it introduces a kind of sharable relative pointer
> and an operation to convert it to a backend-local pointer.
>
> [...]
>
> [...]  It's desirable to allow atomic ops on
> dsa_pointer; I believe Andres Freund plans to make that happen for 64
> bit values on 32 bit systems, but if that turns out to be problematic
> I would want to make dsa_pointer 32 bits on 32 bit systems.

Here's a new version that does that.  It provides the type
dsa_pointer_atomic and associated operations, using
PG_HAVE_ATOMIC_U64_SUPPORT to decide which size to use.  The choice of
size is overridable at compile time with USE_SMALL_DSA_POINTER.

The other change is that it now creates DSM segments of sizes that
don't get large so fast.  V1 would create 1MB, 2MB, 4MB, ... segments
(geometric growth being necessary because we can't have large numbers
of segments, but we want to support large total sizes).  V2 creates
segments of size 1, 1, 1, 1, 2, 2, 2, 2, 4, 4, 4, 4, ... according to
the compile time constant DSA_NUM_SEGMENTS_AT_EACH_SIZE.  I'm not sure
how to select a good number for this yet and the best answer may
depend on whether you're using small pointers.

This version is rebased against master as of today and doesn't depend
on any other patches.

--
Thomas Munro
http://www.enterprisedb.com

Attachment

Re: Dynamic shared memory areas

From
Dilip Kumar
Date:
On Wed, Oct 5, 2016 at 3:00 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> Here's a new version that does that.

While testing this patch I found some issue,

+ total_size = DSA_INITIAL_SEGMENT_SIZE;
+ total_pages = total_size / FPM_PAGE_SIZE;
+ metadata_bytes =
+ MAXALIGN(sizeof(dsa_area_control)) +
+ MAXALIGN(sizeof(FreePageManager)) +
+ total_pages * sizeof(dsa_pointer);
+ /* Add padding up to next page boundary. */
+ if (metadata_bytes % FPM_PAGE_SIZE != 0)
+ metadata_bytes += FPM_PAGE_SIZE - (metadata_bytes % FPM_PAGE_SIZE);
+ usable_pages =
+ (total_size - metadata_bytes) / FPM_PAGE_SIZE;

+ segment = dsm_create(total_size, 0);
+ dsm_pin_segment(segment);

Actually problem is that size of dsa_area_control is bigger than
DSA_INITIAL_SEGMENT_SIZE.
but we are allocating segment of DSA_INITIAL_SEGMENT_SIZE size.

(gdb) p sizeof(dsa_area_control)
$8 = 67111000
(gdb) p DSA_INITIAL_SEGMENT_SIZE
$9 = 1048576

In dsa-v1 problem was not exist because  DSA_MAX_SEGMENTS was 1024,
but in dsa-v2 I think it's calculated wrongly.

(gdb) p DSA_MAX_SEGMENTS
$10 = 16777216

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: Dynamic shared memory areas

From
Thomas Munro
Date:
On Wed, Oct 5, 2016 at 10:04 PM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
> On Wed, Oct 5, 2016 at 3:00 AM, Thomas Munro
> <thomas.munro@enterprisedb.com> wrote:
>> Here's a new version that does that.
>
> While testing this patch I found some issue,
>
> + total_size = DSA_INITIAL_SEGMENT_SIZE;
> + total_pages = total_size / FPM_PAGE_SIZE;
> + metadata_bytes =
> + MAXALIGN(sizeof(dsa_area_control)) +
> + MAXALIGN(sizeof(FreePageManager)) +
> + total_pages * sizeof(dsa_pointer);
> + /* Add padding up to next page boundary. */
> + if (metadata_bytes % FPM_PAGE_SIZE != 0)
> + metadata_bytes += FPM_PAGE_SIZE - (metadata_bytes % FPM_PAGE_SIZE);
> + usable_pages =
> + (total_size - metadata_bytes) / FPM_PAGE_SIZE;
>
> + segment = dsm_create(total_size, 0);
> + dsm_pin_segment(segment);
>
> Actually problem is that size of dsa_area_control is bigger than
> DSA_INITIAL_SEGMENT_SIZE.
> but we are allocating segment of DSA_INITIAL_SEGMENT_SIZE size.
>
> (gdb) p sizeof(dsa_area_control)
> $8 = 67111000
> (gdb) p DSA_INITIAL_SEGMENT_SIZE
> $9 = 1048576
>
> In dsa-v1 problem was not exist because  DSA_MAX_SEGMENTS was 1024,
> but in dsa-v2 I think it's calculated wrongly.
>
> (gdb) p DSA_MAX_SEGMENTS
> $10 = 16777216

Oops, right, thanks.  A last minute change to that macro definition
that I stupidly tested only in USE_SMALL_DSA_POINTER mode.  Here is a
fix for that, capping DSA_MAX_SEGMENTS as before.

--
Thomas Munro
http://www.enterprisedb.com

Attachment

Re: Dynamic shared memory areas

From
Thomas Munro
Date:
On Wed, Oct 5, 2016 at 11:28 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> [dsa-v3.patch]

Here is a new version which just adds CLOBBER_FREED_MEMORY support to dsa_free.

--
Thomas Munro
http://www.enterprisedb.com

Attachment

Re: Dynamic shared memory areas

From
Thomas Munro
Date:
On Tue, Nov 1, 2016 at 5:06 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> On Wed, Oct 5, 2016 at 11:28 PM, Thomas Munro
> <thomas.munro@enterprisedb.com> wrote:
>> [dsa-v3.patch]
>
> Here is a new version which just adds CLOBBER_FREED_MEMORY support to dsa_free.

Here is a new version that fixes a bug I discovered in freepage.c today.

Details:  When dsa_free decides to give back a whole superblock back
to the free page manager for a segment with FreePageManagerPut, and
there was already exactly one span of exactly one free page in that
segment, and the span being 'put' is not adjacent to that existing
free page, then the singleton format must be converted to a btree with
the existing page as root and the newly put span as the sole leaf.
But in that special case we forgot to add the newly put span to the
appropriate free list.  Not only did we lose track of it, but a future
call to FreePageManagerPut might try to merge it with another adjacent
span, which will try to manipulate the freelist that it expects it to
be in and blow up.  The fix is just to add a call to
FreePagePushSpanLeader in this corner case before the early return.

--
Thomas Munro
http://www.enterprisedb.com

Attachment

Re: Dynamic shared memory areas

From
Thomas Munro
Date:
On Thu, Nov 10, 2016 at 6:37 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> Here is a new version that fixes a bug I discovered in freepage.c today.

And here is a new version rebased on top of commit
b40b4dd9e10ea701c8d47ccba9407fc32ed384e5.

--
Thomas Munro
http://www.enterprisedb.com

Attachment

Re: Dynamic shared memory areas

From
Robert Haas
Date:
On Thu, Nov 10, 2016 at 12:37 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> On Tue, Nov 1, 2016 at 5:06 PM, Thomas Munro
> <thomas.munro@enterprisedb.com> wrote:
>> On Wed, Oct 5, 2016 at 11:28 PM, Thomas Munro
>> <thomas.munro@enterprisedb.com> wrote:
>>> [dsa-v3.patch]
>>
>> Here is a new version which just adds CLOBBER_FREED_MEMORY support to dsa_free.
>
> Here is a new version that fixes a bug I discovered in freepage.c today.
>
> Details:  When dsa_free decides to give back a whole superblock back
> to the free page manager for a segment with FreePageManagerPut, and
> there was already exactly one span of exactly one free page in that
> segment, and the span being 'put' is not adjacent to that existing
> free page, then the singleton format must be converted to a btree with
> the existing page as root and the newly put span as the sole leaf.
> But in that special case we forgot to add the newly put span to the
> appropriate free list.  Not only did we lose track of it, but a future
> call to FreePageManagerPut might try to merge it with another adjacent
> span, which will try to manipulate the freelist that it expects it to
> be in and blow up.  The fix is just to add a call to
> FreePagePushSpanLeader in this corner case before the early return.

Since a lot of the design of this patch is mine - from my earlier work
on sb_alloc - I don't expect to have a ton of objections to it. And
I'd like to get it committed, because other parallelism work depends
on it (bitmap heap scan and parallel hash join in particular), and
because it's got other uses as well.  However, I don't want to be
perceived as slamming my code (or that of my colleagues) into the tree
without due opportunity for other people to provide feedback, so if
anyone has questions, comments, concerns, or review to offer, please
do.

I think we should develop versions of this that (1) allocate from the
main shared memory segment and (2) allocate from backend-private
memory.  Per my previous benchmarking results, allocating from
backend-private memory would be a substantial win for tuplesort.c
because this allocator is substantially more memory-efficient for
large memory contexts than aset.c, and Tomas Vondra tested it out and
found that it is also faster for logical decoding than the approach he
proposed.  Perhaps that's not an argument for holding up his proposed
patches for that problem, but I think it IS a good argument for
pressing forward with a backend-private version of this allocator.
I'm not saying that should be part of the initial commit of this code,
but I think it's a good direction to pursue.

One question that we need to resolve is where the code should live in
the source tree.  When I wrote the original patches upon which this
work was based, I think that I put all the code in
src/backend/utils/mmgr, since it's all memory-management code.  In
this patch, Thomas left the free page manager code there, but put the
allocator itself in src/backend/storage/ipc.  There's a certain logic
to that because dynamic shared areas (dsa.c) sit right next to dynamic
shared memory (dsm.c) but it feels a bit schizophrenic to have half of
the code in storage/ipc and the other half in utils/mmgr.  I guess my
view is that utils/mmgr is a better fit, because I think that this is
basically memory management code that happens to use shared memory,
rather than basically IPC that happens to be an allocator.  If we
decide that this stuff goes in storage/ipc then that's basically
saying that everything that uses dynamic shared memory is going to end
up in that directory, which seems like a mess.  The fact that the
free-page manager, at least, could be used for allocations not based
on DSM strengthens that argument in my view. Other opinions?

The #ifdef HAVE__BUILTIN_TYPES_COMPATIBLE_P hack in relptr.h, for
which I believe I'm responsible, is ugly.  There is probably a
compiler out there that has __typeof__ but not
__builtin_types_compatible_p, and we could cater to that by adding a
separate configure test for __typeof__.  A little browsing of the
documentation on at https://gcc.gnu.org/onlinedocs/ seems to suggest
that __builtin_types_compatible_p didn't exist before GCC 3.1, but
__typeof__ seems to be there even in 2.95.3.  That's not very
interesting, honestly, because 3.1 came out in 2002, but there might
be non-GCC compilers that implement __typeof__ but not
__builtin_types_compatible_p.  I am inclined not to worry about this
unless somebody feels otherwise, but it's not beautiful.

I wonder if it wouldn't be a good idea to allow the dsa_area_control
to be stored wherever instead of insisting that it's got to be located
inside the first DSM segment backing the pool itself.  For example,
you could make dsa_create_dynamic() take a third argument which is a
pointer to enough space for a dsa_area_control, and it would
initialize it in place.  Then you could make dsa_attach_dynamic() take
a pointer to that same structure instead of taking a dsa_handle.
Actually, I think dsa_handle goes away: it becomes the caller's
responsibility to figure out the correct pointer address to pass in
the second process.  The advantage of this design change is that you
could stuff the dsa_area_control into the existing parallel DSM and
only create additional DSMs if anything is actually allocated.  What
would be even cooler is to allow a little bit of space inside the
parallel DSM that gets used first, and then, when that overflows, we
start creating new DSMs.  Say, 64kB.  Am I sounding greedy yet?  It
just seems like a good idea not to needlessly multiply the number of
DSMs.

+               /* Unlink span. */
+               /* TODO: Does it even need to be linked in in the
first place? */
+               LWLockAcquire(DSA_SCLASS_LOCK(area, DSA_SCLASS_SPAN_LARGE),
+                                         LW_EXCLUSIVE);
+               unlink_span(area, span);
+               LWLockRelease(DSA_SCLASS_LOCK(area, DSA_SCLASS_SPAN_LARGE));

In answer to the TODO, I think this isn't strictly necessary, but it
seems like a good idea to do it anyway for debuggability.  If we
didn't do this, the space occupied by a large object wouldn't be
"known" in any way other than by having disappeared from the free page
map, whereas this way it's linked into the DSA's listed of allocated
chunks like anything else, so for example dsa_dump() can print it.  I
recommend removing this TODO.

+       /*
+        * TODO: We could take Max(fpm->contiguous_pages, result of
+        * FreePageBtreeCleanup) and give it to FreePageManagerUpdatLargest as a
+        * starting point for its search, potentially avoiding a bunch of work,
+        * since there is no way the largest contiguous run is bigger than that.
+        */

Typo: Updat.

+       /*
+        * TODO: Figure out how to avoid setting this every time. It
may not be as
+        * simple as it looks.
+        */

Something isn't right with this function, because it takes the trouble
to calculate a value for contiguous_pages that it then doesn't use for
anything.  I think the original idea here was that if we calculated a
value for contiguous_pages that was less than fpm->contiguous_pages,
there was no need to dirty it.  If we can't get away with that for
some reason, then there's no point in calculating the value in the
first place.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Dynamic shared memory areas

From
Thomas Munro
Date:
On Wed, Nov 16, 2016 at 2:31 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> ... my
> view is that utils/mmgr is a better fit, ...

OK, changed.

> The #ifdef HAVE__BUILTIN_TYPES_COMPATIBLE_P hack in relptr.h, for
> which I believe I'm responsible, is ugly.  There is probably a
> compiler out there that has __typeof__ but not
> __builtin_types_compatible_p, and we could cater to that by adding a
> separate configure test for __typeof__.  A little browsing of the
> documentation on at https://gcc.gnu.org/onlinedocs/ seems to suggest
> that __builtin_types_compatible_p didn't exist before GCC 3.1, but
> __typeof__ seems to be there even in 2.95.3.  That's not very
> interesting, honestly, because 3.1 came out in 2002, but there might
> be non-GCC compilers that implement __typeof__ but not
> __builtin_types_compatible_p.  I am inclined not to worry about this
> unless somebody feels otherwise, but it's not beautiful.

+1

> I wonder if it wouldn't be a good idea to allow the dsa_area_control
> to be stored wherever instead of insisting that it's got to be located
> inside the first DSM segment backing the pool itself.  For example,
> you could make dsa_create_dynamic() take a third argument which is a
> pointer to enough space for a dsa_area_control, and it would
> initialize it in place.  Then you could make dsa_attach_dynamic() take
> a pointer to that same structure instead of taking a dsa_handle.
> Actually, I think dsa_handle goes away: it becomes the caller's
> responsibility to figure out the correct pointer address to pass in
> the second process.  The advantage of this design change is that you
> could stuff the dsa_area_control into the existing parallel DSM and
> only create additional DSMs if anything is actually allocated.  What
> would be even cooler is to allow a little bit of space inside the
> parallel DSM that gets used first, and then, when that overflows, we
> start creating new DSMs.  Say, 64kB.  Am I sounding greedy yet?  It
> just seems like a good idea not to needlessly multiply the number of
> DSMs.

Alternatively we could stop using DSM directly for parallel query and
just use a DSA area for all the shmem needs of a parallel query
execution as I mentioned elsewhere[1].   That would involve changing a
bunch of stuff including the FDW interface, so that's probably a bad
idea at this point.  So I tried this in-place idea out today.  See the
attached version which provides:

  dsa_area *dsa_create(...);
  dsa_area *dsa_attach(dsa_handle handle);

Those replace the functions that previously had _dynamic in the name.
Then I have new variants:

  dsa_area *dsa_create_in_place(void *place, size_t size, ...);
  dsa_area *dsa_attach_in_place(void *place);

Those let you create an area in existing memory (in a DSM segment,
traditional inherited shmem).  The in-place versions will stlll create
DSM segments on demand as required, though I suppose if you wanted to
prevent that you could with dsa_set_size_limit(area, size).  One
complication is that of course the automatic detach feature doesn't
work if you're in some random piece of memory.  I have exposed
dsa_on_dsm_detach, so that there is a way to hook it up to the detach
hook for a pre-existing DSM segment, but that's the caller's
responibility.  This is important because although the first 'segment'
is created in place, if other segments have been created we still have
to manage those; it gets tricky if you are the last attached process
for the area, but do not have a particular segment mapped in currently
because you've never accessed it; that works with a regular dsa_create
area, because everyone has the control segment mapped in so we use
that one's dsm_on_detach hook and from there we can do the cleanup we
need to do, but in this new case there is no such thing.  You can see
an example of manual detach hook installation in
dsa-area-for-executor-v2.patch which I'll now go and post over in that
other thread.

> +               /* Unlink span. */
> +               /* TODO: Does it even need to be linked in in the
> first place? */
> +               LWLockAcquire(DSA_SCLASS_LOCK(area, DSA_SCLASS_SPAN_LARGE),
> +                                         LW_EXCLUSIVE);
> +               unlink_span(area, span);
> +               LWLockRelease(DSA_SCLASS_LOCK(area, DSA_SCLASS_SPAN_LARGE));
>
> In answer to the TODO, I think this isn't strictly necessary, but it
> seems like a good idea to do it anyway for debuggability.  If we
> didn't do this, the space occupied by a large object wouldn't be
> "known" in any way other than by having disappeared from the free page
> map, whereas this way it's linked into the DSA's listed of allocated
> chunks like anything else, so for example dsa_dump() can print it.  I
> recommend removing this TODO.

Removed.

> +       /*
> +        * TODO: We could take Max(fpm->contiguous_pages, result of
> +        * FreePageBtreeCleanup) and give it to FreePageManagerUpdatLargest as a
> +        * starting point for its search, potentially avoiding a bunch of work,
> +        * since there is no way the largest contiguous run is bigger than that.
> +        */
>
> Typo: Updat.

Fixed.

> +       /*
> +        * TODO: Figure out how to avoid setting this every time. It
> may not be as
> +        * simple as it looks.
> +        */
>
> Something isn't right with this function, because it takes the trouble
> to calculate a value for contiguous_pages that it then doesn't use for
> anything.  I think the original idea here was that if we calculated a
> value for contiguous_pages that was less than fpm->contiguous_pages,
> there was no need to dirty it.  If we can't get away with that for
> some reason, then there's no point in calculating the value in the
> first place.

Yeah.  Will come back on this point.

The attached patch is just for discussion only...  I need to resolve
that contiguous_pages question and do some more testing.

[1] https://www.postgresql.org/message-id/CAEepm=0HmRefi1+xDJ99Gj5APHr8Qr05KZtAxrMj8b+ay3o6sA@mail.gmail.com

--
Thomas Munro
http://www.enterprisedb.com

Attachment

Re: Dynamic shared memory areas

From
Thomas Munro
Date:
On Thu, Nov 24, 2016 at 1:07 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> The attached patch is just for discussion only...  I need to resolve
> that contiguous_pages question and do some more testing.

As Dilip discovered, there was a problem with resource cleanup for DSA
areas created inside pre-existing DSM segments, which I've now sorted
out in the attached version.  I also updated the copyright messages,
introduced a couple of the new 'unlikely' macros in the address
decoding path, and introduced high_segment_index to avoid scanning
bigger segment arrays than is necessary sometimes.

As for contiguous_pages_dirty, I see what was missing from earlier
attempts at more subtle invalidation: we had failed to set the flag in
cases where FreePageManagerGetInternal was called during a
FreePageManagerPut operation.  What do you think about the logic in
this patch... do you see any ways for contiguous_pages to get out of
date?  There is a new assertion that contiguous_pages matches the
state of the freelists at the end of FreePageManagerGet and
FreePageManagerPut, enabled if you defined FPM_EXTRA_ASSERTS, and this
passes my random allocation pattern testing.

--
Thomas Munro
http://www.enterprisedb.com

Attachment

Re: Dynamic shared memory areas

From
Robert Haas
Date:
On Wed, Nov 23, 2016 at 7:07 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> Those let you create an area in existing memory (in a DSM segment,
> traditional inherited shmem).  The in-place versions will stlll create
> DSM segments on demand as required, though I suppose if you wanted to
> prevent that you could with dsa_set_size_limit(area, size).  One
> complication is that of course the automatic detach feature doesn't
> work if you're in some random piece of memory.  I have exposed
> dsa_on_dsm_detach, so that there is a way to hook it up to the detach
> hook for a pre-existing DSM segment, but that's the caller's
> responibility.

shm_mq_attach() made the opposite decision about how to solve this
problem, and frankly I think that API is a lot more convenient: if the
first argument to shm_mq_attach() happens to be located inside of a
DSM, you can pass the DSM as the second argument and it registers the
on_dsm_detach() hook for you.  If not, you can pass NULL and deal with
it in some other way.  But this makes the common case very simple.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Dynamic shared memory areas

From
Robert Haas
Date:
More review:

+ * For large objects, we just stick all of the allocations in fullness class
+ * 0. Since we can just return the space directly to the free page manager,
+ * we don't really need them on a list at all, except that if someone wants
+ * to bulk release everything allocated using this BlockAreaContext, we
+ * have no other way of finding them.

This comment is out-of-date.

+               /*
+                * If this is the only span, and there is no active
span, then maybe
+                * we should probably move this span to fullness class
1.  (Otherwise
+                * if you allocate exactly all the objects in the only
span, it moves
+                * to class 3, then you free them all, it moves to 2,
and then is
+                * given back, leaving no active span).
+                */

"maybe we should probably" seems to have one more doubt-expressing
word than it needs.

+               if (size_class == DSA_SCLASS_SPAN_LARGE)
+                       /* Large object frees give back segments
aggressively already. */
+                       continue;

We generally use braces in this kind of case.

+                * Search the fullness class 1 only.  That is where we
expect to find

extra "the"

+               /* Call for effect (we don't need the result). */
+               get_segment_by_index(area, index);
...
+       return area->segment_maps[index].mapped_address + offset;

It isn't guaranteed that area->segment_maps[index].mapped_address will
be non-NULL on return from get_segment_by_index, and then this
function will return a completely bogus pointer to the caller.  I
think you should probably elog() instead.

+               elog(ERROR, "dsa: can't attach to area handle %u", handle);

Avoid ":" in elog messages.   You don't really need to - and it isn't
project style to - tag these with "dsa:"; that's what \errverbose or
\set VERBOSITY verbose is for.  In this particular case, I might just
adopt the formulation from parallel.c:
               ereport(ERROR,

(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),                                errmsg("could not map dynamic
shared
memory segment")));

+                       elog(FATAL,
+                                "dsa couldn't find run of pages:
fpm_largest out of sync");

Here I'd go with "dsa could not find %u free pages".

+               elog(ERROR, "dsa_pin: area already pinned");

"dsa_area already pinned"

+               elog(ERROR, "dsa_unpin: area not pinned");

"dsa_area not pinned"

+               if (segment == NULL)
+                       elog(ERROR, "dsa: can't attach to segment");

As above.

+static dsa_segment_map *
+get_segment_by_index(dsa_area *area, dsa_segment_index index)
+{
+       if (unlikely(area->segment_maps[index].mapped_address == NULL))
+       {
+               dsm_handle      handle;
+               dsm_segment *segment;
+               dsa_segment_map *segment_map;
+
+               handle = area->control->segment_handles[index];

Don't you need to acquire the lock for this?

+               /* Check all currently mapped segments to find what's
been freed. */
+               for (i = 0; i <= area->high_segment_index; ++i)
+               {
+                       if (area->segment_maps[i].header != NULL &&
+                               area->segment_maps[i].header->freed)
+                       {
+                               dsm_detach(area->segment_maps[i].segment);
+                               area->segment_maps[i].segment = NULL;
+                               area->segment_maps[i].header = NULL;
+                               area->segment_maps[i].mapped_address = NULL;
+                       }
+               }
+               area->freed_segment_counter = freed_segment_counter;

And this?

+/*
+ * Release a DSA area that was produced by dsa_create_in_place or
+ * dsa_attach_in_place.  It is preferable to use one of the 'dsa_on_XXX'
+ * callbacks so that this is managed automatically, because failure to release
+ * an area created in-place leaks its segments permanently.
+ */
+void
+dsa_release_in_place(void *place)
+{
+       decrement_reference_count((dsa_area_control *) place);
+}

Since this seems to be the only caller of decrement_reference_count,
you could just put the logic here.  The contract for this function is
also a bit unclear from the header comment.  I initially thought that
it was your intention that this should be called from every process
that has either created or attached the segment.  But that doesn't
seem like it will work, because decrement_reference_count calls
dsm_unpin_segment on every segment, and a segment can only be pinned
once, so everybody else would fail.  So maybe the idea is that ANY ONE
process has to call dsa_release_in_place.  But then that could lead to
failures in other backends inside get_segment_by_index(), because
segments they don't have mapped might already be gone.  OK, third try:
maybe the idea is that the LAST process out has to call
dsa_release_in_place().  But how do the various cooperating processes
know which one that is?

I've also realized another thing that's not so good about this:
superblocks are 64kB, so allocating 64kB of initial space probably
just wastes most of it.  I think we want to either allocate just
enough space to hold the control information, or else that much space
plus space for at least a few superblocks.  I'm inclined to go the
first way, because it seems a bit overenthusiastic to allocate 256kB
or 512kB just on the off chance we might need it.  On the other hand,
including a few bytes in the control segment so that we don't need to
allocate 1MB segment that we might not need sounds pretty sharp.
Maybe DSA can expose an API that returns the number of bytes that will
be needed for the control structure, and then the caller can arrange
for that space to be available during the Estimate phase...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Dynamic shared memory areas

From
Thomas Munro
Date:
On Wed, Nov 30, 2016 at 4:35 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> More review:

Thanks!

> + * For large objects, we just stick all of the allocations in fullness class
> + * 0. Since we can just return the space directly to the free page manager,
> + * we don't really need them on a list at all, except that if someone wants
> + * to bulk release everything allocated using this BlockAreaContext, we
> + * have no other way of finding them.
>
> This comment is out-of-date.

Removed.

> +               /*
> +                * If this is the only span, and there is no active
> span, then maybe
> +                * we should probably move this span to fullness class
> 1.  (Otherwise
> +                * if you allocate exactly all the objects in the only
> span, it moves
> +                * to class 3, then you free them all, it moves to 2,
> and then is
> +                * given back, leaving no active span).
> +                */
>
> "maybe we should probably" seems to have one more doubt-expressing
> word than it needs.

Fixed.

> +               if (size_class == DSA_SCLASS_SPAN_LARGE)
> +                       /* Large object frees give back segments
> aggressively already. */
> +                       continue;
>
> We generally use braces in this kind of case.

Fixed.

> +                * Search the fullness class 1 only.  That is where we
> expect to find
>
> extra "the"

Fixed.

> +               /* Call for effect (we don't need the result). */
> +               get_segment_by_index(area, index);
> ...
> +       return area->segment_maps[index].mapped_address + offset;
>
> It isn't guaranteed that area->segment_maps[index].mapped_address will
> be non-NULL on return from get_segment_by_index, and then this
> function will return a completely bogus pointer to the caller.  I
> think you should probably elog() instead.

Hmm.  Right.  In fact it's never OK to ask for a segment by index when
that segment is gone since that implies an access-after-free so there
is no reason for NULL to be handled by callers.  I have changed
get_segment_by_index to raise an error..

> +               elog(ERROR, "dsa: can't attach to area handle %u", handle);
>
> Avoid ":" in elog messages.   You don't really need to - and it isn't
> project style to - tag these with "dsa:"; that's what \errverbose or
> \set VERBOSITY verbose is for.  In this particular case, I might just
> adopt the formulation from parallel.c:
>
>                 ereport(ERROR,
>
> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>                                  errmsg("could not map dynamic shared
> memory segment")));

Fixed.

> +                       elog(FATAL,
> +                                "dsa couldn't find run of pages:
> fpm_largest out of sync");
>
> Here I'd go with "dsa could not find %u free pages".

Fixed.

> +               elog(ERROR, "dsa_pin: area already pinned");
>
> "dsa_area already pinned"

Fixed.

> +               elog(ERROR, "dsa_unpin: area not pinned");
>
> "dsa_area not pinned"

Fixed.

> +               if (segment == NULL)
> +                       elog(ERROR, "dsa: can't attach to segment");
>
> As above.

Fixed.

> +static dsa_segment_map *
> +get_segment_by_index(dsa_area *area, dsa_segment_index index)
> +{
> +       if (unlikely(area->segment_maps[index].mapped_address == NULL))
> +       {
> +               dsm_handle      handle;
> +               dsm_segment *segment;
> +               dsa_segment_map *segment_map;
> +
> +               handle = area->control->segment_handles[index];
>
> Don't you need to acquire the lock for this?

No.  I've updated the comments to explain, and refactored a bit.

I'll explain here in different words here:  This is memory, you are a
C programmer, and as with malloc/free, referencing memory that has
been freed invokes undefined behaviour possibly including but not
limited to demons flying out of your nose.  When you call
dsa_get_address(some_dsa_pointer) or dsa_free(some_dsa_pointer) you
are asserting that the address points to memory allocated with
dsa_allocate from this area that has not yet been freed.  Given that
assertion, area->control->segment_handles[index] (where index is
extracted from the address) must be valid and cannot change under your
feet.  control->segment_handles[index] can only change after
everything allocated from that whole segment has been freed; you can
think of it as 'locked' as long as any live object exists in the
segment it corresponds to.

In general I'm trying not to do anything too clever in the first
version of DSA: it uses plain old LWLock for each size-class's pool
and then an area-wide LWLock for segment operations.  But in the
particular case of dsa_get_address, I think it's really important for
the viability of DSA for these address translations to be fast in
likely path, hence my desire to figure out a protocol for lock-free
address translation even though segments come and go.

> +               /* Check all currently mapped segments to find what's
> been freed. */
> +               for (i = 0; i <= area->high_segment_index; ++i)
> +               {
> +                       if (area->segment_maps[i].header != NULL &&
> +                               area->segment_maps[i].header->freed)
> +                       {
> +                               dsm_detach(area->segment_maps[i].segment);
> +                               area->segment_maps[i].segment = NULL;
> +                               area->segment_maps[i].header = NULL;
> +                               area->segment_maps[i].mapped_address = NULL;
> +                       }
> +               }
> +               area->freed_segment_counter = freed_segment_counter;
>
> And this?

Hmm.  I had a theory for why that didn't need to be locked, though it
admittedly lacked a necessary barrier -- d'oh.  But I'll spare you the
details and just lock it because this is not a hot path and it's much
simpler that way.

I've also refactored that code into a new static function
check_for_freed_segments, because I realised that dsa_free needs the
same treatment as dsa_get_address.  The checking for freed segments
was also happening at the wrong time, which I've now straightened out
-- that must happen before you arrive into a get_segment_index, as
described in the copious new comments.  Thoughts?

> +/*
> + * Release a DSA area that was produced by dsa_create_in_place or
> + * dsa_attach_in_place.  It is preferable to use one of the 'dsa_on_XXX'
> + * callbacks so that this is managed automatically, because failure to release
> + * an area created in-place leaks its segments permanently.
> + */
> +void
> +dsa_release_in_place(void *place)
> +{
> +       decrement_reference_count((dsa_area_control *) place);
> +}
>
> Since this seems to be the only caller of decrement_reference_count,
> you could just put the logic here.

Ok, done.

> The contract for this function is
> also a bit unclear from the header comment.  I initially thought that
> it was your intention that this should be called from every process
> that has either created or attached the segment.

That is indeed my intention.

> But that doesn't
> seem like it will work, because decrement_reference_count calls
> dsm_unpin_segment on every segment, and a segment can only be pinned
> once, so everybody else would fail.  So maybe the idea is that ANY ONE
> process has to call dsa_release_in_place.  But then that could lead to
> failures in other backends inside get_segment_by_index(), because
> segments they don't have mapped might already be gone.  OK, third try:
> maybe the idea is that the LAST process out has to call
> dsa_release_in_place().  But how do the various cooperating processes
> know which one that is?

It decrements the reference count for the area, but only unpins the
segments if the reference count reaches zero:

  Assert(control->refcnt > 0);
  if (--control->refcnt == 0)
  {
      /* ... unpin all the segments ... */
  }

> I've also realized another thing that's not so good about this:
> superblocks are 64kB, so allocating 64kB of initial space probably
> just wastes most of it.  I think we want to either allocate just
> enough space to hold the control information, or else that much space
> plus space for at least a few superblocks.  I'm inclined to go the
> first way, because it seems a bit overenthusiastic to allocate 256kB
> or 512kB just on the off chance we might need it.  On the other hand,
> including a few bytes in the control segment so that we don't need to
> allocate 1MB segment that we might not need sounds pretty sharp.
> Maybe DSA can expose an API that returns the number of bytes that will
> be needed for the control structure, and then the caller can arrange
> for that space to be available during the Estimate phase...

Yeah, I also thought about that, but didn't try to do better before
because I couldn't see how to make a nice macro for this without
dragging a ton of internal stuff out into the header.  I have written
a new function dsa_minimum_size().  The caller can use that number
directly to get a minimal in-place area that will immediately create
an extra DSM segment as soon as you call dsa_allocate.  Unfortunately
you can't really add more to that number with predictable results
unless you know some internal details and your future allocation
pattern: to avoid extra segment creation, you'd need to add 4KB for a
block of spans and then 64KB for each size class you plan to allocate,
and of course that might change.  But at least it allows us to create
an in-place DSA area for every parallel query cheaply, and then defer
creation of the first DSM segment until the first time someone tries
to allocate, which seems about right to me.

And in response to your earlier email:

On Tue, Nov 29, 2016 at 7:48 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> shm_mq_attach() made the opposite decision about how to solve this
> problem, and frankly I think that API is a lot more convenient: if the
> first argument to shm_mq_attach() happens to be located inside of a
> DSM, you can pass the DSM as the second argument and it registers the
> on_dsm_detach() hook for you.  If not, you can pass NULL and deal with
> it in some other way.  But this makes the common case very simple.

Ok, I've now done the same.

I feel like some more general destructor callback for objects in
containing object is wanted here, rather than sticking dsm_segment *
into various constructor-like functions, but I haven't thought
seriously about that and I'm not arguing that case now.

Please find attached dsa-v8.patch, and also a small test module for
running random allocate/free exercises and dumping the internal
allocator state.

--
Thomas Munro
http://www.enterprisedb.com

Attachment

Re: Dynamic shared memory areas

From
Haribabu Kommi
Date:


On Thu, Dec 1, 2016 at 10:33 PM, Thomas Munro <thomas.munro@enterprisedb.com> wrote:

Please find attached dsa-v8.patch, and also a small test module for
running random allocate/free exercises and dumping the internal
allocator state.

Moved to next CF with "needs review" status.

Regards,
Hari Babu
Fujitsu Australia

Re: Dynamic shared memory areas

From
Robert Haas
Date:
On Thu, Dec 1, 2016 at 6:33 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> Please find attached dsa-v8.patch, and also a small test module for
> running random allocate/free exercises and dumping the internal
> allocator state.

OK, I've committed the main patch.  As far as test-dsa.patch, can we
tie that into make check-world so that committing it delivers some
buildfarm coverage for this code?  Of course, the test settings would
have to be fairly conservative given that some buildfarm machines have
very limited resources, but it still seems worth doing.  test_shm_mq
might provide some useful precedent.

Note that you don't need the prototype if you've already used
PG_FUNCTION_INFO_V1.

I'm not sure that using the same random seed every time is a good
idea.  Maybe you should provide a way to set the seed as part of
starting the test, or to not do that (pass NULL?) and then elog(LOG,
...) the seed that's chosen.  Then if the BF crashes, we can see what
seed was in use for that particular test.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Dynamic shared memory areas

From
Robert Haas
Date:
On Fri, Dec 2, 2016 at 1:21 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Dec 1, 2016 at 6:33 AM, Thomas Munro
> <thomas.munro@enterprisedb.com> wrote:
>> Please find attached dsa-v8.patch, and also a small test module for
>> running random allocate/free exercises and dumping the internal
>> allocator state.
>
> OK, I've committed the main patch.

...but the buildfarm isn't very happy about it.

tern complains:

In file included from dsa.c:58:0:
../../../../src/include/utils/dsa.h:59:1: error: unknown type name
'pg_atomic_uint64'typedef pg_atomic_uint64 dsa_pointer_atomic;

...but that code is only compiled if #if DSA_POINTER_SIZEOF == 4 fails
to be true.  And that should always be true unless
PG_HAVE_ATOMIC_U64_SUPPORT is defined.  So apparently tern claims to
PG_HAVE_ATOMIC_U64_SUPPORT but doesn't actually define
pg_atomic_uint64?  That doesn't seem right.

The failures on several other BF members appear to be similar.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Dynamic shared memory areas

From
Robert Haas
Date:
On Fri, Dec 2, 2016 at 2:56 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Dec 2, 2016 at 1:21 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Thu, Dec 1, 2016 at 6:33 AM, Thomas Munro
>> <thomas.munro@enterprisedb.com> wrote:
>>> Please find attached dsa-v8.patch, and also a small test module for
>>> running random allocate/free exercises and dumping the internal
>>> allocator state.
>>
>> OK, I've committed the main patch.
>
> ...but the buildfarm isn't very happy about it.
>
> tern complains:
>
> In file included from dsa.c:58:0:
> ../../../../src/include/utils/dsa.h:59:1: error: unknown type name
> 'pg_atomic_uint64'
>  typedef pg_atomic_uint64 dsa_pointer_atomic;
>
> ...but that code is only compiled if #if DSA_POINTER_SIZEOF == 4 fails
> to be true.  And that should always be true unless
> PG_HAVE_ATOMIC_U64_SUPPORT is defined.  So apparently tern claims to
> PG_HAVE_ATOMIC_U64_SUPPORT but doesn't actually define
> pg_atomic_uint64?  That doesn't seem right.

No, that's not the problem.  Just a garden variety thinko in dsa.h.
Will push a fix presently.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Dynamic shared memory areas

From
Thomas Munro
Date:
On Sat, Dec 3, 2016 at 9:02 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Dec 2, 2016 at 2:56 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Fri, Dec 2, 2016 at 1:21 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>> On Thu, Dec 1, 2016 at 6:33 AM, Thomas Munro
>>> <thomas.munro@enterprisedb.com> wrote:
>>>> Please find attached dsa-v8.patch, and also a small test module for
>>>> running random allocate/free exercises and dumping the internal
>>>> allocator state.
>>>
>>> OK, I've committed the main patch.
>>
>> ...but the buildfarm isn't very happy about it.
>>
>> tern complains:
>>
>> In file included from dsa.c:58:0:
>> ../../../../src/include/utils/dsa.h:59:1: error: unknown type name
>> 'pg_atomic_uint64'
>>  typedef pg_atomic_uint64 dsa_pointer_atomic;
>>
>> ...but that code is only compiled if #if DSA_POINTER_SIZEOF == 4 fails
>> to be true.  And that should always be true unless
>> PG_HAVE_ATOMIC_U64_SUPPORT is defined.  So apparently tern claims to
>> PG_HAVE_ATOMIC_U64_SUPPORT but doesn't actually define
>> pg_atomic_uint64?  That doesn't seem right.
>
> No, that's not the problem.  Just a garden variety thinko in dsa.h.
> Will push a fix presently.

Here's a patch to provide the right format string for dsa_pointer to
printf-like functions, which clears a warning coming from dsa_dump (a
debugging function) on 32 bit systems.

--
Thomas Munro
http://www.enterprisedb.com

Attachment

Re: Dynamic shared memory areas

From
Robert Haas
Date:
On Fri, Dec 2, 2016 at 3:46 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> Here's a patch to provide the right format string for dsa_pointer to
> printf-like functions, which clears a warning coming from dsa_dump (a
> debugging function) on 32 bit systems.

Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Dynamic shared memory areas

From
Peter Geoghegan
Date:
On Tue, Nov 15, 2016 at 5:31 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> I think we should develop versions of this that (1) allocate from the
> main shared memory segment and (2) allocate from backend-private
> memory.  Per my previous benchmarking results, allocating from
> backend-private memory would be a substantial win for tuplesort.c
> because this allocator is substantially more memory-efficient for
> large memory contexts than aset.c, and Tomas Vondra tested it out and
> found that it is also faster for logical decoding than the approach he
> proposed.

The approach that I'd prefer to take with tuplesort.c is to have a
buffer for caller tuples that is written to sequentially, and
repalloc()'d as needed, much like the memtuples array. It would be
slightly tricky to make this work when memtuples needs to be a heap
(I'm mostly thinking of top-N heapsorts here). That has perhaps
unbeatable efficiency, while also helping cases with significant
physical/logical correlation in their input, which is pretty common.
Creating an index on a serial PK within pg_restore would probably get
notably faster if we went this way.

-- 
Peter Geoghegan