Re: same-address mappings vs. relative pointers - Mailing list pgsql-hackers

From Robert Haas
Subject Re: same-address mappings vs. relative pointers
Date
Msg-id CA+TgmoYQnaBi=3gzvHVgV6ZAdqvA4m=6XXKQsn7g_RmB2gz-0Q@mail.gmail.com
Whole thread Raw
In response to Re: same-address mappings vs. relative pointers  (Andres Freund <andres@2ndquadrant.com>)
Responses Re: same-address mappings vs. relative pointers
List pgsql-hackers
On Thu, Dec 5, 2013 at 9:44 AM, Andres Freund <andres@2ndquadrant.com> wrote:
>> Why?  Lots of people have written lots of programs that do just that.
>
> Well, but we're a database, not a generic programming library ;)

I think we're arguably both.

> But what's your alternative if you have a shared_palloc() like thingy?
> The overhead is going to be crazy if you allocate so granular that you
> can easily grow, and if you allocate on a coarse level, this isn't going
> to buy you much?

I don't know if you've noticed, but the overhead of palloc is pretty
much insane for large memory contexts right now.  Consider a bunch of
100-byte tuples that you want to sort, on a 64-bit machine.  IIUC,
we're going to round the request up to 128 bytes and then add 16 bytes
of overhead for 2 pointers before the beginning of the chunk.  That's
44% overhead.  For small memory contexts like the ones that store
syscache entries it's really hard to do any better, but there are much
better algorithms (which I am investigating) for cases where you
expect to allocate a large amount of memory.   44% overhead on a 1kB
is no big deal; on 1GB it's a big deal.

But the question of how efficient palloc, or a hypothetical
shared_palloc() is, is really separate from whether or not we ought to
have one.  All allocators have some overhead, and people use them
anyway because it drastically simplifies programming.

>> And suppose we want to do a parallel external sort with, say, the
>> worker backend pushing tuples into a heap stored in the dynamic shared
>> memory segment and other backends writing them out to tapes and
>> removing them from the heap.  You can't do that without some kind of
>> space management, i.e. an allocator.
>
> Hm. I'd have thought one would implement that by pushing fixed size
> amounts of tuples to individual workers, let them sort each chunk in
> memory and then write the result out to disk.

I haven't really investigated external sort algorithms at this point,
so it is possible that you are right.  However, that sounds like the
equivalent of constructing runs by loading a batch of tuples,
quicksorting them, writing them out, and then moving on to the next
batch, a modification to tuplesort.c which I have tried and found to
be significantly inferior to the present algorithm.  As Tom has been
at pains to point out, our current algorithm produces much longer
runs, especially if the input data has some intrinsic ordering, which
is not an uncommon situation.  So while I don't know for sure, I think
it's probably unwise to assume that the algorithm we pick won't
require the ability to remove tuples from the dsm incrementally, and I
don't want the choice of algorithm to be constrained by overly-rigid
architectural decisions from the outset.

> So, for a) what if we, instead of storing plain offsets have something like:
> typedef struct sptr
> {
>     Offset sptr_self;
>     Offset sptr_offset;
> } sptr;

I like the name Offset, but I'm firmly convinced that a relative
pointer needs to be the same size as a regular pointer.  8 bytes for a
pointer is already painfully large in some contexts; 16 is just too
much.

>> And then I thought, boy, it sucks
>> not to be able to declare what kind of a thing we're pointing *at*
>> here, but apart from using C++ I see no solution to that problem.  I
>> guess we could do something like this:
>>
>> #define relptr(type)   Size
>>
>> So the compiler wouldn't enforce anything, but at least notationally
>> we'd know what sort of object we were supposedly referencing.
>
> There might be some ugly compiler dependent magic we could do. Depending
> on how we decide to declare offsets. Like (very, very roughly)
>
> #define relptr(type, struct_name, varname) union struct_name##_##varname{ \
>     type relptr_type; \
>     Offset relptr_off;
> }
>
> And then, for accessing have:
> #define relptr_access(seg, off) \
>   typeof(off.relptr_type)* (((char *)seg->base_address) + off.relptr_off)
>
> But boy, that's ugly.

It's clever, though, and probably worth further experimentation.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Performance optimization of btree binary search
Next
From: "MauMau"
Date:
Subject: [RFC] Shouldn't we remove annoying FATAL messages from server log?