Re: things I learned from working on memory allocation - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: things I learned from working on memory allocation
Date
Msg-id CAM3SWZQkyurkShnsL_vQ6D2stCx-f6XxPvTc776Yidn_9DDhAw@mail.gmail.com
Whole thread Raw
In response to things I learned from working on memory allocation  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: things I learned from working on memory allocation  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Mon, Jul 7, 2014 at 12:57 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> 5. It's tempting to look at other ways of solving the parallel sort
> problem that don't need an allocator - perhaps by simply packing all
> the tuples into a DSM one after the next.  But this is not easy to do,
> or at least it's not easy to do efficiently.  Tuples enter tuplesort.c
> through one of the tuplesort_put*() functions, most of which end up
> calling one of the copytup_*() functions.  But you can't easily just
> change those functions to stick the tuple someplace else, because they
> don't necessarily get the address of the space to be used from palloc
> directly.  In particular, copytup_heap() calls
> ExecCopySlotMinimalTuple(), and copytup_cluster() calls
> heap_copytuple().  heap_copytuple() is simple enough that you might be
> able to finagle a new API that would make it work, but I'm not sure
> what we could really do about ExecCopySlotMinimalTuple() except call
> it and then copy the result.  Perhaps that'd be OK for a first
> version.

Perhaps. If my experience with sort support for text is anything to go
on, the cost of copying over the tuples isn't really that high
relative to the cost of performing the sort, particularly when there
are many tuples to sort. OTOH, I'd be concerned about the cost of main
memory accesses for pass-by-reference types during parallel tuple
sorts in particular. The same concern applies to cases where the tuple
proper must be accessed, although presumably that occurs less
frequently.

The LLVM project's new libc++ library passes remark on a similar issue
in their rationale for yet another C++ standard library: "For example,
it is generally accepted that building std::string using the "short
string optimization" instead of using Copy On Write (COW) is a
superior approach for multicore machines...". It seems likely that
they're mostly referencing the apparent trend of memory bandwidth per
core stagnation [1], [2]. To get the real benefit of a parallel sort,
we must do anything we can to avoid having cores/workers remain idle
during the sort while waiting on memory access. I believe that that's
the bigger problem.

[1] http://www.admin-magazine.com/HPC/Articles/Finding-Memory-Bottlenecks-with-Stream
[2] https://software.intel.com/en-us/articles/detecting-memory-bandwidth-saturation-in-threaded-applications

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Pg_upgrade and toast tables bug discovered
Next
From: Tom Lane
Date:
Subject: Re: pg_dump reporing version of server & pg_dump as comments in the output