Re: Experimental dynamic memory allocation of postgresql shared memory - Mailing list pgsql-hackers
From | Aleksey Demakov |
---|---|
Subject | Re: Experimental dynamic memory allocation of postgresql shared memory |
Date | |
Msg-id | CAFCwUrCZfbSCxbFZv3GiNRPZTTtEmNqvd6H=hA6xkxKyCJY1hQ@mail.gmail.com Whole thread Raw |
In response to | Re: Experimental dynamic memory allocation of postgresql shared memory (Aleksey Demakov <ademakov@gmail.com>) |
List | pgsql-hackers |
Sorry for unclear language. Late Friday evening in my place is to blame. On Sat, Jun 18, 2016 at 12:23 AM, Aleksey Demakov <ademakov@gmail.com> wrote: > On Fri, Jun 17, 2016 at 10:54 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Fri, Jun 17, 2016 at 12:34 PM, Aleksey Demakov <ademakov@gmail.com> wrote: >>>> I expect that to be useful for parallel query and anything else where >>>> processes need to share variable-size data. However, that's different >>>> from this because ours can grown to arbitrary size and shrink again by >>>> allocating and freeing with DSM segments. We also do everything with >>>> relative pointers since DSM segments can be mapped at different >>>> addresses in different processes, whereas this would only work with >>>> memory carved out of the main shared memory segment (or some new DSM >>>> facility that guaranteed identical placement in every address space). >>>> >>> >>> I believe it would be perfectly okay to allocate huge amount of address >>> space with mmap on startup. If the pages are not touched, the OS VM >>> subsystem will not commit them. >> >> In my opinion, that's not going to fly. If I thought otherwise, I >> would not have developed the DSM facility in the first place. >> >> First, the behavior in this area is highly dependent on choice of >> operating system and configuration parameters. We've had plenty of >> experience with requiring non-default configuration parameters to run >> PostgreSQL, and it's all bad. I don't really want to have to tell >> users that they must run with a particular value of >> vm.overcommit_memory in order to run the server. Nor do I want to >> tell users of other operating systems that their ability to run >> PostgreSQL is dependent on the behavior their OS has in this area. I >> had a MacBook Pro up until a year or two ago where a sufficiently >> shared memory request would cause a kernel panic. That bug will >> probably be fixed at some point if it hasn't been already, but >> probably by returning an error rather than making it work. >> >> Second, there's no way to give memory back once you've touched it. If >> you decide to do a hash join on a 250GB inner table using a shared >> hash table, you're going to have 250GB in swap-backed pages floating >> around when you're done. If the user has swap configured (and more >> and more people don't), the operating system will eventually page >> those out, but until that happens those pages are reducing the amount >> of page cache that's available, and after it happens they're using up >> swap. In either case, the space consumed is consumed to no purpose. >> You don't care about that hash table any more once the query >> completes; there's just no way to tell the operating system that. If >> your workload follows an entirely predictable pattern and you always >> have about the same amount of usage of this facility then you can just >> reuse the same pages and everything is fine. But if your usage >> fluctuates I believe it will be a big problem. With DSM, we can and >> do explicitly free the memory back to the OS as soon as we don't need >> it any more - and that's a big benefit. >> > > Essentially this is pessimizing for the lowest common denominator > among OSes. Having a contiguous address space makes things so > much simpler that considering this case, IMHO, is well worth of it. > > You are right that this might highly depend on the OS. But you are > only partially right that it's impossible to give the memory back once > you touched it. It is possible in many cases with additional measures. > That is with additional control over memory mapping. Surprisingly, in > this case windows has the most straightforward solution. VirtualAlloc > has separate MEM_RESERVE and MEM_COMMIT flags. On various > Unix flavours it is possible to play with mmap MAP_NORESERVE > flag and madvise syscall. Finally, it's possible to repeatedly mmap > and munmap on portions of a contiguous address space providing > a given addr argument for both of them. The last option might, of > course, is susceptible to hijacking this portion of the address by an > inadvertent caller of mmap with NULL addr argument. But probably > this could be avoided by imposing a disciplined use of mmap in > postgresql core and extensions. > > Thus providing a single contiguous shared address space is doable. > The other question is how much it would buy. As for development > time of an allocator it is a clear win. In terms of easy passing direct > memory pointers between backends this a clear win again. > > In terms of resulting performance, I don't know. This would take > a few cycles on every step. You have a shared hash table. You > cannot keep pointers there. You need to store offsets against the > base address. Any reference would involve additional arithmetics. > When these things add up, the net effect might become noticeable. > > Regards, > Aleksey
pgsql-hackers by date: