Thread: same-address mappings vs. relative pointers
During development of the dynamic shared memory facility, Noah and I spent a lot of time arguing about whether it was practical to ensure that a dynamic shared memory segment got mapped at the same address in every backend that used it. The argument went something like this: Me: We'll never be able to make that work reliably. Noah: But if we can't use pointers in the dynamic shared memory segment, our lives will suck. Me: Well, we probably don't really NEED to use data structures that contain pointers all THAT much. Noah: Are you nuts? Of course we need pointers. They're ubiquitous and essential. Me: Meh. I felt somewhat vindicated when I finished the dynamic shared memory message queuing patch (which I'm still hoping someone will review sometime soon...?) since that constitutes a useful chunk of functionality that doesn't care about pointers at all. And there are surely other examples that fall into the same category; for example, an lwlock doesn't contain any pointers today, so storing one in a dynamic shared memory segment in an address that might vary from one process to another ought to work OK, too. I think there was actually a patch a few years ago that made everything use LWLock * rather than LWLockId, which would allow considerably more flexibility in laying out lwlocks in memory - so you could for example try to put the in the same cache lines as the data they protect, or different cache lines than other hot lwlocks - and would probably also be almost enough to allow placing them in a dynamic shared memory segment, which would be useful. But I'm also learning painfully that this kind of thing only goes so far. For example, I spent some time looking at what it would take to provide a dynamic shared memory equivalent of palloc/pfree, a facility that I feel fairly sure would attract a few fans. Well, for regular palloc, we store a pointer to the memory context before the beginning of the chunk, so that when you call pfree you can find the memory context and stick the chunk on one of its free lists. So there are two pointers there: the pointer to the context is a pointer, of course, but so is the free list. Heh, heh. As I see it, if we want to have facilities like this, we'll have to either (1) make same-address mappings work for as many architectures as possible and don't support these facilities on the remainder or (2) use relative pointers instead of absolute pointers within dynamic shared memory segments, which means a loss of performance, notational clarity, and type-safety. We can also (3) adopt both approaches - some facilities can use relative pointers, which will be portable everywhere but annoying otherwise, and others can work only when same-address mappings are supported. Or we can (4) adopt neither approach, and confine ourselves to data structures that don't use pointers. I still have mixed feelings about the idea of same-address mappings. On the one hand, on 64-bit architectures, there's a huge amount of address space available. Assuming the OS does something even vaguely sensible in terms of laying out the text, heap, stack, and shared library mappings, there ought to be many petabytes of address space that never gets touched, and it's hard to see why we couldn't find some place in there to stick our stuff. But that could require quite a bit of OS-specific knowledge about how memory gets laid out. One idea that I think is originally Noah's, though I may be mutilating it, is to create a very large PROT_NONE mapping in the postmaster and then overwrite that mapping with the mapping for any dynamic shared memory segments we subsequently want to create. In that way, we essentially reserve the address space we want to use before the child is forked and things start to diverge (due to memory allocations, additional shared library loads, etc.). But I bet that on at least some operating systems that will actually allocate memory, or at least count toward the system's notion of overcommit, and that will be a problem. So I don't have any really good idea for how to implement this cleanly. Now, on the other hand, as far as dynamic shared memory allocation and freeing is concerned, there aren't really THAT many places where I need a pointer, so using Size or uint64 or something to store an offset instead is annoying, but I have an idea how to do this that only uses pointers in a couple of places, so I think it can be made to work. I am not sure how much complaint that will provoke, though. And even if I do it, the first poor slob that wants to store a linked list or so in dynamic shared memory is going to be unhappy if they can't get a same-address mapping. Maybe that's OK; using linked lists in shared memory might not be a great idea in the first place. I'm sure there will be more than one person who wants to do it, though. Any thoughts on what the least painful compromise is here? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi Robert, On 2013-12-04 23:32:27 -0500, Robert Haas wrote: > But I'm also learning painfully that this kind of thing only goes so > far. For example, I spent some time looking at what it would take to > provide a dynamic shared memory equivalent of palloc/pfree, a facility > that I feel fairly sure would attract a few fans. I have to admit, I don't fully see the point in a real palloc() equivalent for shared memory. If you're allocating shared memory in a granular fashion at somewhat high frequency you're doing something *wrong*. And the result is not going to be scalable. > Well, for regular > palloc, we store a pointer to the memory context before the beginning > of the chunk, so that when you call pfree you can find the memory > context and stick the chunk on one of its free lists. So there are > two pointers there: the pointer to the context is a pointer, of > course, but so is the free list. Heh, heh. This is one of those times where live really, really would be easier if we were using c++. Just have small smart-pointer wrapper, and we wouldn't need to worry too much. > I still have mixed feelings about the idea of same-address mappings. > On the one hand, on 64-bit architectures, there's a huge amount of > address space available. Assuming the OS does something even vaguely > sensible in terms of laying out the text, heap, stack, and shared > library mappings, there ought to be many petabytes of address space > that never gets touched, and it's hard to see why we couldn't find > some place in there to stick our stuff. It's actually quite a bit away from petabytes. Most x86-64 CPUs have 48bit of virtual memory, with the OS eating up a far bit of that (so it doesn't need to tear down TLBs in system calls). I think cross-platform you end up with something around 8TB, up to 64TB on others OSs. Now, there are some CPUs coming out with bigger virtual memory sizes, but it's going to be longer than I am willing to wait till we are there. > Now, on the other hand, as far as dynamic shared memory allocation and > freeing is concerned, there aren't really THAT many places where I > need a pointer, so using Size or uint64 or something to store an > offset instead is annoying, but I have an idea how to do this that > only uses pointers in a couple of places, so I think it can be made to > work. I am not sure how much complaint that will provoke, though. > And even if I do it, the first poor slob that wants to store a linked > list or so in dynamic shared memory is going to be unhappy if they > can't get a same-address mapping. Maybe that's OK; using linked lists > in shared memory might not be a great idea in the first place. I'm > sure there will be more than one person who wants to do it, though. > a Just because people want it, doesn't make it worthwile to provide it ;) > Any thoughts on what the least painful compromise is here? I think a reasonable route is having some kind of smart-pointer on C level that abstracts away the offset math and allows to use pointers locally. Something like void *sptr_deref(sptr *); where the returned pointer can be used as long as it is purely in memory. And sptr_update(sptr *, void *); which allows a sptr to point elsewhere in the same segment. + lots of smarts I continue to believe that a) using pointers in dynamically allocated segments is going to end up with lots of pain. b) the pain from not having real pointers is manageable. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Dec 5, 2013 at 4:56 AM, Andres Freund <andres@2ndquadrant.com> wrote: > Hi Robert, > > On 2013-12-04 23:32:27 -0500, Robert Haas wrote: >> But I'm also learning painfully that this kind of thing only goes so >> far. For example, I spent some time looking at what it would take to >> provide a dynamic shared memory equivalent of palloc/pfree, a facility >> that I feel fairly sure would attract a few fans. > > I have to admit, I don't fully see the point in a real palloc() > equivalent for shared memory. If you're allocating shared memory in a > granular fashion at somewhat high frequency you're doing something > *wrong*. And the result is not going to be scalable. Why? Lots of people have written lots of programs that do just that. I agree it's overdone, but saying it should never happen seems like an over-rotation in the opposite direction. In any case, the application that led me to this was actually parallel internal sort. Let's suppose we had an implementation of parallel internal sort. You'd load all the tuples you want to sort into dynamic shared memory (just as we now load them into palloc'd chunks), start a bunch of workers, and the original backend and the newly-started workers would cooperate to sort your data. Win. It's fairly clear that, technically speaking, you do NOT need a real allocator for this. You can just shove the first tuple that you store into shared memory, say right up against the end of the region. Then you can place the next tuple immediately before it and the following one just before that, and so on. This is very simple to implement and also extremely memory-efficient, so superficially it's quite an appealing approach. But now let's suppose the input data was estimated to fit in work_mem but in the end did not, and therefore we need to instead do an external sort. That means we're going to start evicting tuples from memory and to make room for new tuples we read from the input stream. Well, now our strategy of tight-packing everything does not look so good, because we have no way of tracking which tuples we no longer need and reusing that space for new tuples. If external sort is not something we know how to do in parallel, we can potentially copy all of the tuples out of the dynamic shared memory segment and back to backend-private memory in palloc'd chunks, and then deallocate the dynamic shared memory segment... but that's potentially expensive if work_mem is large, and it temporarily uses double what we're allowed by work_mem. And suppose we want to do a parallel external sort with, say, the worker backend pushing tuples into a heap stored in the dynamic shared memory segment and other backends writing them out to tapes and removing them from the heap. You can't do that without some kind of space management, i.e. an allocator. > This is one of those times where live really, really would be easier if > we were using c++. Just have small smart-pointer wrapper, and we > wouldn't need to worry too much. Yes, a template class would be perfect here. >> I still have mixed feelings about the idea of same-address mappings. >> On the one hand, on 64-bit architectures, there's a huge amount of >> address space available. Assuming the OS does something even vaguely >> sensible in terms of laying out the text, heap, stack, and shared >> library mappings, there ought to be many petabytes of address space >> that never gets touched, and it's hard to see why we couldn't find >> some place in there to stick our stuff. > > It's actually quite a bit away from petabytes. Most x86-64 CPUs have > 48bit of virtual memory, with the OS eating up a far bit of that (so it > doesn't need to tear down TLBs in system calls). I think cross-platform > you end up with something around 8TB, up to 64TB on others OSs. > Now, there are some CPUs coming out with bigger virtual memory sizes, > but it's going to be longer than I am willing to wait till we are there. Oh. Well, that's a shame. >> Any thoughts on what the least painful compromise is here? > > I think a reasonable route is having some kind of smart-pointer on C > level that abstracts away the offset math and allows to use pointers > locally. Something like void *sptr_deref(sptr *); where the returned > pointer can be used as long as it is purely in memory. And > sptr_update(sptr *, void *); which allows a sptr to point elsewhere in > the same segment. > + lots of smarts Can you elaborate on this a little bit? I'm not sure I understand what you're getting at here. > I continue to believe that a) using pointers in dynamically allocated > segments is going to end up with lots of pain. b) the pain from not > having real pointers is manageable. Fair opinion, but I think we will certainly need to pass around memory offsets in some form for certain things. Even for purely internal parallel sort, the Tuplesort objects need to store the offset of the actual tuple within the segment. My first guess as to how to do that was to assume that sizeof(Size) == sizeof(void *) and just store offsets as a Size, but then I thought maybe it'd be better to do something like typedef Size relptr. And then I thought, boy, it sucks not to be able to declare what kind of a thing we're pointing *at* here, but apart from using C++ I see no solution to that problem. I guess we could do something like this: #define relptr(type) Size So the compiler wouldn't enforce anything, but at least notationally we'd know what sort of object we were supposedly referencing. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 12/05/2013 06:32 AM, Robert Haas wrote: > During development of the dynamic shared memory facility, Noah and I > spent a lot of time arguing about whether it was practical to ensure > that a dynamic shared memory segment got mapped at the same address in > every backend that used it. My vote goes for not trying to map at same address. I don't see how you could do that reliably, and I don't see much need for it anyway. That said, it naturally depends on what you're going to use the dynamic shared memory facility for. It's the same problem I have with reviewing the already-committed DSM patch and the message queue patch. The patches look fine as far as they go, but I have the nagging feeling that there are a bunch of big patches coming up later that use the facilities, and I can't tell if the facilities are over-engineered for what's actually needed, or not sufficient. As a side-note, I've been thinking that we don't really need same-address mapping for shared_buffers either. Getting rid of it wouldn't buy us anything right now, but if we wanted e.g to make shared_buffers changeable without a restart, that would be useful. - Heikki
On 2013-12-05 15:57:22 +0200, Heikki Linnakangas wrote: > As a side-note, I've been thinking that we don't really need same-address > mapping for shared_buffers either. Getting rid of it wouldn't buy us > anything right now, but if we wanted e.g to make shared_buffers changeable > without a restart, that would be useful. I doubt it's that easy to gid of atm (at least in !EXEC_BACKEND), but if we ever want to properly support ALSR in EXEC_BACKEND environments, we might need to go there. The hacks windows does around it are already quite ugly. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 2013-12-05 07:44:27 -0500, Robert Haas wrote: > On Thu, Dec 5, 2013 at 4:56 AM, Andres Freund <andres@2ndquadrant.com> wrote: > > Hi Robert, > > > > On 2013-12-04 23:32:27 -0500, Robert Haas wrote: > >> But I'm also learning painfully that this kind of thing only goes so > >> far. For example, I spent some time looking at what it would take to > >> provide a dynamic shared memory equivalent of palloc/pfree, a facility > >> that I feel fairly sure would attract a few fans. > > > > I have to admit, I don't fully see the point in a real palloc() > > equivalent for shared memory. If you're allocating shared memory in a > > granular fashion at somewhat high frequency you're doing something > > *wrong*. And the result is not going to be scalable. > Why? Lots of people have written lots of programs that do just that. Well, but we're a database, not a generic programming library ;) > I agree it's overdone, but saying it should never happen seems like an > over-rotation in the opposite direction. That might be true. > But now let's suppose the input data was estimated to fit in work_mem > but in the end did not, and therefore we need to instead do an > external sort. That means we're going to start evicting tuples from > memory and to make room for new tuples we read from the input stream. > Well, now our strategy of tight-packing everything does not look so > good, because we have no way of tracking which tuples we no longer > need and reusing that space for new tuples. If external sort is not > something we know how to do in parallel, we can potentially copy all > of the tuples out of the dynamic shared memory segment and back to > backend-private memory in palloc'd chunks, and then deallocate the > dynamic shared memory segment... but that's potentially expensive if > work_mem is large, and it temporarily uses double what we're allowed > by work_mem. But what's your alternative if you have a shared_palloc() like thingy? The overhead is going to be crazy if you allocate so granular that you can easily grow, and if you allocate on a coarse level, this isn't going to buy you much? Essentially, I am not arguing against a facility to dynamically allocate memory from shared memory, but I think the tradeoffs are different enough that I think palloc()/mcxt.c isn't necessarily the thing to model it after. But then, I don't really have an idea how it should look like, not having thought about it much. > And suppose we want to do a parallel external sort with, say, the > worker backend pushing tuples into a heap stored in the dynamic shared > memory segment and other backends writing them out to tapes and > removing them from the heap. You can't do that without some kind of > space management, i.e. an allocator. Hm. I'd have thought one would implement that by pushing fixed size amounts of tuples to individual workers, let them sort each chunk in memory and then write the result out to disk. > >> Any thoughts on what the least painful compromise is here? > > > > I think a reasonable route is having some kind of smart-pointer on C > > level that abstracts away the offset math and allows to use pointers > > locally. Something like void *sptr_deref(sptr *); where the returned > > pointer can be used as long as it is purely in memory. And > > sptr_update(sptr *, void *); which allows a sptr to point elsewhere in > > the same segment. > > + lots of smarts > > Can you elaborate on this a little bit? I'm not sure I understand > what you're getting at here. So, my thought is that we really want something that acts like a pointer, just in shared memory, so we don't have to do too much offset math. And I think for performance and readability we want to use pointers when operating locally in a backend. So, I guess there are situations in which we essentially want pointers: a) pointing somewhere in the same segment. b) pointing at an offset in *another* segment. So, for a) what if we, instead of storing plain offsets have something like: typedef struct sptr { Offset sptr_self; Offset sptr_offset; } sptr; and always store that in dynamically allocated shared memory instead of raw offsets. That allows to define a function like: void * sptr_deref(sptr *sp) { return ((char *)sp) - sp->sptr_self + sp->sptr_offset; } to get what we point to. Without having to reference the segment base, at the price of having to store two offsets. To update the pointer we could have: void sptr_set(sptr *sp, void *p) { char *segment_base; segment_base = ((char *)sp) - sp->sptr_self; /* make sure pointer doesn't point below the beginning of the segment */ Assert(segment_base <= p); /* make sure pointer doesn't point beyond the segment */ Assert(p < segment_base + GetSegmentSize(segment_base)); sp->sptr_offset = ((char *)p) - segment_base; } To initialize we'd have: void sptr_init(dsm_segment *segment, sptr *sp, void *p) { sp->sptr_self = ((char *)sp) - (char *)(segment->mapped_address) sptr_set(sp, p); } for b) we could have typedef struct sptr_far { dsm_handle far_handle; sptr far_ptr; } and equivalent sptr_far_deref/set/init. Although that probably would require a more efficient manner to resolve a dsm_handle to the dsm_segment. Makes at least some sense? > > I continue to believe that a) using pointers in dynamically allocated > > segments is going to end up with lots of pain. b) the pain from not > > having real pointers is manageable. > > Fair opinion, but I think we will certainly need to pass around memory > offsets in some form for certain things. Absolutely - I am not sure where the "but" is coming from ;) > And then I thought, boy, it sucks > not to be able to declare what kind of a thing we're pointing *at* > here, but apart from using C++ I see no solution to that problem. I > guess we could do something like this: > > #define relptr(type) Size > > So the compiler wouldn't enforce anything, but at least notationally > we'd know what sort of object we were supposedly referencing. There might be some ugly compiler dependent magic we could do. Depending on how we decide to declare offsets. Like (very, very roughly) #define relptr(type, struct_name, varname) union struct_name##_##varname{ \ type relptr_type; \ Offset relptr_off; } And then, for accessing have: #define relptr_access(seg, off) \ typeof(off.relptr_type)* (((char *)seg->base_address) + off.relptr_off) But boy, that's ugly. Greetings, Andres Freund --Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 2013-12-05 15:44:34 +0100, Andres Freund wrote: > On 2013-12-05 07:44:27 -0500, Robert Haas wrote: > > And then I thought, boy, it sucks > > not to be able to declare what kind of a thing we're pointing *at* > > here, but apart from using C++ I see no solution to that problem. I > > guess we could do something like this: > > > > #define relptr(type) Size > > > > So the compiler wouldn't enforce anything, but at least notationally > > we'd know what sort of object we were supposedly referencing. > > There might be some ugly compiler dependent magic we could do. Depending > on how we decide to declare offsets. Like (very, very roughly) > > #define relptr(type, struct_name, varname) union struct_name##_##varname{ \ > type relptr_type; \ > Offset relptr_off; > } > > And then, for accessing have: > #define relptr_access(seg, off) \ > typeof(off.relptr_type)* (((char *)seg->base_address) + off.relptr_off) > > But boy, that's ugly. On second thought - there's probably no reason to name the union, making it somewhat less ugly. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Dec 5, 2013 at 8:57 AM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote: > On 12/05/2013 06:32 AM, Robert Haas wrote: >> During development of the dynamic shared memory facility, Noah and I >> spent a lot of time arguing about whether it was practical to ensure >> that a dynamic shared memory segment got mapped at the same address in >> every backend that used it. > > My vote goes for not trying to map at same address. I don't see how you > could do that reliably, and I don't see much need for it anyway. > > That said, it naturally depends on what you're going to use the dynamic > shared memory facility for. It's the same problem I have with reviewing the > already-committed DSM patch and the message queue patch. The patches look > fine as far as they go, but I have the nagging feeling that there are a > bunch of big patches coming up later that use the facilities, and I can't > tell if the facilities are over-engineered for what's actually needed, or > not sufficient. Sure, well, that's one of the challenges of any doing any sort of large software development project. One rarely knows at the outset all of the problems that one will encounter before finishing said project. For small projects, you can usually predict these things pretty well, but as the scope of the project goes, it becomes more and more difficult to know whether you've made the right initial steps. That having been said, I'm pretty confident in the steps taken thus far - but if you're imagining that I have all the answers completely worked out and am choosing to reveal them only bit by bit, it's not like that. If you want to see the overall vision for this project, see https://wiki.postgresql.org/wiki/Parallel_Sort . Here, I expect to use dynamic shared memory for three purposes. First, I expect to use a shared memory message queue to propagate any error or warning messages generated in the workers back to the user backend. That's the point of introducing that infrastructure now, though I hope it will eventually also be suitable for streaming tuples between backends, so that you can run one part of the query tree in one backend and stream the output to a different backend that picks it up and processes it further. We could surely contrive a simpler solution just for error messages, but I viewed that as short-sighted. Second, I expect to store the SortTuple array, or some analogue of it, in dynamic shared memory. Third, I expect to store the tuples themselves in dynamic shared memory. Down the road, I imagine wanting to put hash tables in shared memory, so we can parallelize things like hash joins and hash aggregates. > As a side-note, I've been thinking that we don't really need same-address > mapping for shared_buffers either. Getting rid of it wouldn't buy us > anything right now, but if we wanted e.g to make shared_buffers changeable > without a restart, that would be useful. Very true. One major obstacle to that is that changing the size of shared_buffers also means resizing the LWLock array and the buffer descriptor array. If we got rid of the idea of having lwlocks in their own data structure and moved the buffer lwlocks into the buffer descriptors, that would get us down to two segments, but that still feels like one too many. There's also the problem of the buffer mapping hash table, which would need to grow somehow as well. Technically, the size of the fsync absorb queue also depends on shared_buffers, but we could decide not to care about that, I think. The other problem here is that once you do implement all this, a reference to a buffer beyond what your backend has mapped will necessitate an unmap and remap of the shared-buffers segment. If the remap fails, and you hold any buffer pins, you will have to PANIC. There could be some performance overhead from inserting bounds checks in all the right places too, although there might not be enough places to matter, since a too-high buffer number can only come from a limited number of places - either a lookup in the buffer mapping table, or a buffer allocation event. I don't mention any of these things to discourage you from working on the problem, but rather to because I've thought about it too - and the aforementioned problems have are the things that have stumped me so far. If you've got ideas about how to solve them, or even better yet want to implement something, great! -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Dec 5, 2013 at 9:44 AM, Andres Freund <andres@2ndquadrant.com> wrote: >> Why? Lots of people have written lots of programs that do just that. > > Well, but we're a database, not a generic programming library ;) I think we're arguably both. > But what's your alternative if you have a shared_palloc() like thingy? > The overhead is going to be crazy if you allocate so granular that you > can easily grow, and if you allocate on a coarse level, this isn't going > to buy you much? I don't know if you've noticed, but the overhead of palloc is pretty much insane for large memory contexts right now. Consider a bunch of 100-byte tuples that you want to sort, on a 64-bit machine. IIUC, we're going to round the request up to 128 bytes and then add 16 bytes of overhead for 2 pointers before the beginning of the chunk. That's 44% overhead. For small memory contexts like the ones that store syscache entries it's really hard to do any better, but there are much better algorithms (which I am investigating) for cases where you expect to allocate a large amount of memory. 44% overhead on a 1kB is no big deal; on 1GB it's a big deal. But the question of how efficient palloc, or a hypothetical shared_palloc() is, is really separate from whether or not we ought to have one. All allocators have some overhead, and people use them anyway because it drastically simplifies programming. >> And suppose we want to do a parallel external sort with, say, the >> worker backend pushing tuples into a heap stored in the dynamic shared >> memory segment and other backends writing them out to tapes and >> removing them from the heap. You can't do that without some kind of >> space management, i.e. an allocator. > > Hm. I'd have thought one would implement that by pushing fixed size > amounts of tuples to individual workers, let them sort each chunk in > memory and then write the result out to disk. I haven't really investigated external sort algorithms at this point, so it is possible that you are right. However, that sounds like the equivalent of constructing runs by loading a batch of tuples, quicksorting them, writing them out, and then moving on to the next batch, a modification to tuplesort.c which I have tried and found to be significantly inferior to the present algorithm. As Tom has been at pains to point out, our current algorithm produces much longer runs, especially if the input data has some intrinsic ordering, which is not an uncommon situation. So while I don't know for sure, I think it's probably unwise to assume that the algorithm we pick won't require the ability to remove tuples from the dsm incrementally, and I don't want the choice of algorithm to be constrained by overly-rigid architectural decisions from the outset. > So, for a) what if we, instead of storing plain offsets have something like: > typedef struct sptr > { > Offset sptr_self; > Offset sptr_offset; > } sptr; I like the name Offset, but I'm firmly convinced that a relative pointer needs to be the same size as a regular pointer. 8 bytes for a pointer is already painfully large in some contexts; 16 is just too much. >> And then I thought, boy, it sucks >> not to be able to declare what kind of a thing we're pointing *at* >> here, but apart from using C++ I see no solution to that problem. I >> guess we could do something like this: >> >> #define relptr(type) Size >> >> So the compiler wouldn't enforce anything, but at least notationally >> we'd know what sort of object we were supposedly referencing. > > There might be some ugly compiler dependent magic we could do. Depending > on how we decide to declare offsets. Like (very, very roughly) > > #define relptr(type, struct_name, varname) union struct_name##_##varname{ \ > type relptr_type; \ > Offset relptr_off; > } > > And then, for accessing have: > #define relptr_access(seg, off) \ > typeof(off.relptr_type)* (((char *)seg->base_address) + off.relptr_off) > > But boy, that's ugly. It's clever, though, and probably worth further experimentation. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2013-12-05 10:17:18 -0500, Robert Haas wrote: > On Thu, Dec 5, 2013 at 9:44 AM, Andres Freund <andres@2ndquadrant.com> wrote: > >> Why? Lots of people have written lots of programs that do just that. > > > > Well, but we're a database, not a generic programming library ;) > > I think we're arguably both. Fair enough. > I don't know if you've noticed, but the overhead of palloc is pretty > much insane for large memory contexts right now. Yea, I have more than noticed. I actually still have a partial replacement of aset.c lying around. It didn't even have 16bytes overhead, although that had required some ugly hacks when wanting to continue allowing several parallel context implementations. Btw, it's even worse if you have lots of allocations above ALLOC_CHUNK_LIMIT, we walk a linear list on reset, and we reset damn frequently - partially because we think it's cheap. > Consider a bunch of > 100-byte tuples that you want to sort, on a 64-bit machine. IIUC, > we're going to round the request up to 128 bytes and then add 16 bytes > of overhead for 2 pointers before the beginning of the chunk. I think we add the 16bytes overhead before rounding up, but I might misremember. > That's > 44% overhead. For small memory contexts like the ones that store > syscache entries it's really hard to do any better, but there are much > better algorithms (which I am investigating) for cases where you > expect to allocate a large amount of memory. What I was working on was a slab allocator where, in addition to default sizes like the current ones, you could create specific slabs for certain sizes one knew would be frequently be used which then would a) be faster to allocate b) have lower space overhead c) were guaranteed not to cause fragmentation. While I agree that the memory consumption is one concern, I am more worried about the time it eats up in many workloads. > > So, for a) what if we, instead of storing plain offsets have something like: > > typedef struct sptr > > { > > Offset sptr_self; > > Offset sptr_offset; > > } sptr; > > I like the name Offset, but I'm firmly convinced that a relative > pointer needs to be the same size as a regular pointer. 8 bytes for a > pointer is already painfully large in some contexts; 16 is just too > much. I don't find that all that convincing. In 90% of the cases only a few pointers will be needed, for those something like the above will be helpful, allow bounds checking and all. For the 10% of other cases, we can still open-code stuff. Quite possibly we would want to avoid using 8 bytes there anyway. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Dec5, 2013, at 15:44 , Andres Freund <andres@2ndquadrant.com> wrote: > There might be some ugly compiler dependent magic we could do. Depending > on how we decide to declare offsets. Like (very, very roughly) > > #define relptr(type, struct_name, varname) union struct_name##_##varname{ \ > type relptr_type; \ > Offset relptr_off; > } > > And then, for accessing have: > #define relptr_access(seg, off) \ > typeof(off.relptr_type)* (((char *)seg->base_address) + off.relptr_off) > > But boy, that's ugly. Well, uglyness we can live with, especially if it's less ugly than the alternatives. But I'm afraid is also unportable - typeof() is a GCC extension, not a part of ANSI C, no? best regards, Florian Pflug
On 2013-12-11 11:42:25 +0100, Florian Pflug wrote: > On Dec5, 2013, at 15:44 , Andres Freund <andres@2ndquadrant.com> wrote: > > There might be some ugly compiler dependent magic we could do. Depending > > on how we decide to declare offsets. Like (very, very roughly) > > > > #define relptr(type, struct_name, varname) union struct_name##_##varname{ \ > > type relptr_type; \ > > Offset relptr_off; > > } > > > > And then, for accessing have: > > #define relptr_access(seg, off) \ > > typeof(off.relptr_type)* (((char *)seg->base_address) + off.relptr_off) > > > > But boy, that's ugly. > > Well, uglyness we can live with, especially if it's less ugly than the > alternatives. But I'm afraid is also unportable - typeof() is a GCC > extension, not a part of ANSI C, no? Yes (although there's C11 stuff to do equivalent stuff afair) - I was thinking of only doing it for compilers we support that dark magic for and fall back to returning a void* for the others. We'll probably miss a cast or two required on !gcc that way, but it's still likely to be less error prone. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Dec11, 2013, at 11:47 , Andres Freund <andres@2ndquadrant.com> wrote: > On 2013-12-11 11:42:25 +0100, Florian Pflug wrote: >> On Dec5, 2013, at 15:44 , Andres Freund <andres@2ndquadrant.com> wrote: >>> There might be some ugly compiler dependent magic we could do. Depending >>> on how we decide to declare offsets. Like (very, very roughly) >>> >>> #define relptr(type, struct_name, varname) union struct_name##_##varname{ \ >>> type relptr_type; \ >>> Offset relptr_off; >>> } >>> >>> And then, for accessing have: >>> #define relptr_access(seg, off) \ >>> typeof(off.relptr_type)* (((char *)seg->base_address) + off.relptr_off) >>> >>> But boy, that's ugly. >> >> Well, uglyness we can live with, especially if it's less ugly than the >> alternatives. But I'm afraid is also unportable - typeof() is a GCC >> extension, not a part of ANSI C, no? > > Yes (although there's C11 stuff to do equivalent stuff afair) - I was > thinking of only doing it for compilers we support that dark magic for > and fall back to returning a void* for the others. We'll probably miss a > cast or two required on !gcc that way, but it's still likely to be less > error prone. Would it? For this to catch type mismatches, you'd both need to develop on a typeof-supporting compiler *and* don't cast the result of relptr_access(). But you can't really do that, because the code will then fail on compilers which don't support typeof()... What we could do, I guess, is to pass the type to relptr_access() and to relptr(), and let the compiler verify that they are the same. Something like #define relptr(type) union { \ type relptr_type; \ Offset relptr_off; \ } #define relptr_access(type, seg, rptr) \ (type)( \ (rptr.relptr_type - (type)0), \ ((char*)seg->base_address) + rptr.relptr_off\ ) And, yes, ouch ;-) best regards, Florian Pflug
On 2013-12-11 12:37:56 +0100, Florian Pflug wrote: > On Dec11, 2013, at 11:47 , Andres Freund <andres@2ndquadrant.com> wrote: > > On 2013-12-11 11:42:25 +0100, Florian Pflug wrote: > > Yes (although there's C11 stuff to do equivalent stuff afair) - I was > > thinking of only doing it for compilers we support that dark magic for > > and fall back to returning a void* for the others. We'll probably miss a > > cast or two required on !gcc that way, but it's still likely to be less > > error prone. > > > Would it? For this to catch type mismatches, you'd both need to develop > on a typeof-supporting compiler *and* don't cast the result of relptr_access(). > But you can't really do that, because the code will then fail on compilers > which don't support typeof()... Yea, right. > What we could do, I guess, is to pass the type to relptr_access() and to > relptr(), and let the compiler verify that they are the same. Tom and I actually added a macro that's helpful for that recently: AssertVariableIsOfType(). With that we should be able to get something reasonable, failing at compile time, with a useful error message even ;) Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services