Thread: Shared memory and memory context question
Dear all, I am writing a C-language shared-object file which is dynamically linked with postgres, and uses the various SPI functions for executing queries from numerous trigger functions. My question is thus: what is the best method for a dynamically linked object to share memory with the same object running on other backends? Am I right in thinking that if I allocate memory in the "upper execution context" from SPI_palloc(), this is not shared with the other processes? I thought of a few ways of doing this (please forgive me if these appear idiotic, as I am fairly new to postgres): 1. Change memory context to TopMemoryContext and palloc everything there. (However, I believe this still isn't shared between processes?) 2. Use the shmem functions in src/backend/storage/ipc/shmem.c to create a chunk of shared memory and use this (Although I would like to avoid writing my own memory manager to carve up the space). 3. Somehow create shared memory using the shmem functions, and set a memory context to live *inside* this shared memory, which my trigger functions can then switch to. Then use palloc() and pfree() without worrying.. Please let me know if this problem has been solved before, as I have searched through the mailing lists and through the source, but am not sure which is the best way to resolve it. Thanks for your help. Regards, Richard
On Sun, Feb 05, 2006 at 02:03:59PM +0000, richard@playford.net wrote: > 1. Change memory context to TopMemoryContext and palloc everything there. > (However, I believe this still isn't shared between processes?) Not shared, correct. > 2. Use the shmem functions in src/backend/storage/ipc/shmem.c to create a > chunk of shared memory and use this (Although I would like to avoid writing > my own memory manager to carve up the space). This is the generally accepted method. Please remember that when sharing structures you have to worry about concurrency. So you need locking. > 3. Somehow create shared memory using the shmem functions, and set a memory > context to live *inside* this shared memory, which my trigger functions can > then switch to. Then use palloc() and pfree() without worrying.. Nope, palloc/pfree don't deal with concurrency. > Please let me know if this problem has been solved before, as I have searched > through the mailing lists and through the source, but am not sure which is > the best way to resolve it. Thanks for your help. Most people allocate chunks of shared memory and don't use palloc/pfree. What are you doing that requires such management? Most shared structures in PostgreSQL are allocated once and never freed... Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
richard@playford.net writes: > 1. Change memory context to TopMemoryContext and palloc everything there. > (However, I believe this still isn't shared between processes?) Nope. > 2. Use the shmem functions in src/backend/storage/ipc/shmem.c to create a > chunk of shared memory and use this (Although I would like to avoid writing > my own memory manager to carve up the space). > > 3. Somehow create shared memory using the shmem functions, and set a memory > context to live *inside* this shared memory, which my trigger functions can > then switch to. Then use palloc() and pfree() without worrying.. You'd have to do one of the above, but #2 is probably out because all shared memory is allocated to various purposes at startup and there is none free at runtime (as I understand it). For #3, how do you plan to have a memory context shared by multiple backends with no synchronization? If two backends try to do allocation or deallocation at the same time you will get corruption, as I don't think palloc() and pfree() do any locking (they currently never allocate from shared memory). You should probably think very carefully about whether you can get along without using additional shared memory, because it's not that easy to do. -Doug
On Sun February 5 2006 14:11, Martijn van Oosterhout wrote: > This is the generally accepted method. Please remember that when > sharing structures you have to worry about concurrency. So you need > locking. Of course - I have already implemented locking with semaphores (I may simply use one big lock and carefully avoid reentry). > Nope, palloc/pfree don't deal with concurrency. Indeed, although if I lock the shared memory then I can palloc and pfree() without worrying. The problem I see is that new memory contexts have their memory assigned to them when they are created. I can't tell them "go here!" > Most people allocate chunks of shared memory and don't use > palloc/pfree. What are you doing that requires such management? Most > shared structures in PostgreSQL are allocated once and never freed... I have a number of functions which modify tables based on complex rules stored in script-files. I wrote a parser for these files as a separate program first before incorporating it as a shared object, subsequentially it loads and executes rules from memory. As anything can be read from the files, and rules can be unloaded later, I was hoping for flexibility in allocing memory to store it all. Another option is to load the files but store the rules within the database, which should be possible, but appears to be a slightly messy way of doing it. Then again, messing about with shared memory allocation may be messier. Asking as an fairly inexperienced postgres person, what would you suggest?
On Sun, Feb 05, 2006 at 02:31:23PM +0000, Richard Hills wrote: > I have a number of functions which modify tables based on complex rules stored > in script-files. I wrote a parser for these files as a separate program first > before incorporating it as a shared object, subsequentially it loads and > executes rules from memory. As anything can be read from the files, and rules > can be unloaded later, I was hoping for flexibility in allocing memory to > store it all. So what you load are the already processed rules? In that case you could probably use the buffer management system. Ask it to load the blocks and they'll be in the buffer cache. As long as you have the buffer pinned they'll stay there. That's pretty much a read-only approach. If you're talking about things that don't come from disk, well, hmm... If you want you could use a file on disk as backing and mmap() it into each processes address space... > Another option is to load the files but store the rules within the database, > which should be possible, but appears to be a slightly messy way of doing it. > Then again, messing about with shared memory allocation may be messier. > Asking as an fairly inexperienced postgres person, what would you suggest? The real question is, does it need to be shared-writable. Shared-readonly is much easier (ie one writer, multiple readers). Using a file as backing store for mmap() may be the easiest.... Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
On Sun February 5 2006 14:43, Martijn van Oosterhout wrote: > So what you load are the already processed rules? In that case you > could probably use the buffer management system. Ask it to load the > blocks and they'll be in the buffer cache. As long as you have the > buffer pinned they'll stay there. That's pretty much a read-only > approach. > > If you're talking about things that don't come from disk, well, hmm... > If you want you could use a file on disk as backing and mmap() it into > each processes address space...<...> > The real question is, does it need to be shared-writable. > Shared-readonly is much easier (ie one writer, multiple readers). Using > a file as backing store for mmap() may be the easiest.... I load the rules from a script and parse them, storing them in a forest of linked malloced structures. These structures are created by one writer but then read by a number of readers, and later may be removed by the original writer. So, as you can imagine, I could store the forest in the db, although it might be a mess. First I will look through the buffer management system, and see if that will do the job. Thanks for your help, Regards, Richard
Martijn van Oosterhout <kleptog@svana.org> writes: > So what you load are the already processed rules? In that case you > could probably use the buffer management system. Ask it to load the > blocks and they'll be in the buffer cache. As long as you have the > buffer pinned they'll stay there. ... until you get to the end of the transaction, where the buffer manager will barf because somebody forgot an unpin. Long-term buffer pins are really not acceptable anyway --- you'd essentially be asserting that your little facility is more important than any other use of shared buffers, and I'm sorry but that ain't so. AFAICT the data structures you are worried about don't have any readily predictable size, which means there is no good way to keep them in shared memory --- we can't dynamically resize shared memory. So I think storing the rules in a table and loading into private memory at need is really the only reasonable solution. Storing them in a table has a lot of other advantages anyway, mainly that you can manipulate them from SQL. You can find some prior discussion of similar issues in the archives; IIRC the idea of a shared plan cache was being kicked around for awhile some years back. regards, tom lane
On Sun February 5 2006 16:16, Tom Lane wrote: > AFAICT the data structures you are worried about don't have any readily > predictable size, which means there is no good way to keep them in > shared memory --- we can't dynamically resize shared memory. So I think > storing the rules in a table and loading into private memory at need is > really the only reasonable solution. Storing them in a table has a lot > of other advantages anyway, mainly that you can manipulate them from > SQL. I have come to the conclusion that storing the rules and various other bits in tables is the best solution, although this will require a much more complex db structure than I had originally planned. Trying to allocate and free memory in shared memory is fairly straightforward, but likely to become incredibly messy. Seeing as some of the rules already include load-value-from-db-on-demand, it should be fairly straightforward to extend it to load-rule-from-db-on-demand. Thanks for all your help, Regards, Richard
On Sun, 2006-02-05 at 14:03 +0000, richard@playford.net wrote: > 3. Somehow create shared memory using the shmem functions, and set a memory > context to live *inside* this shared memory, which my trigger functions can > then switch to. Then use palloc() and pfree() without worrying.. This has been done before, by the TelegraphCQ folks: they implemented a shared memory MemoryContext on top of OSSP MM[1]. The code is in the v0.2 TelegraphCQ tarball[2] -- see shmctx.c and shmset.c in src/backend/utils/mmgr/. I'm not aware of an independent distribution, but you could probably separate it out without too much pain. (Of course, the comments elsewhere in the thread about using an alternative are probably still true...) -Neil [1] http://www.ossp.org/pkg/lib/mm/ [2] http://telegraph.cs.berkeley.edu/downloads/TelegraphCQ-0.2.tar.gz
Hi!! I was just browsing the message and saw yours. I have actually written a shared memory system for PostgreSQL. I've done some basic bench testing, and it seems to work, but I haven't given it the big QA push yet. My company, Mohawk Software, is going to release a bunch of PostgreSQL extenssions for text search, shared memory, interfacing, etc. Here's the source for the shared module. Mind you, it has not been through rigerous QA yet!!! Also, this is the UNIX/Linux/SHM version, the Win32 version has not been written yet. http://www.mohawksoft.org
Attachment
> On Sun February 5 2006 16:16, Tom Lane wrote: >> AFAICT the data structures you are worried about don't have any readily >> predictable size, which means there is no good way to keep them in >> shared memory --- we can't dynamically resize shared memory. So I think >> storing the rules in a table and loading into private memory at need is >> really the only reasonable solution. Storing them in a table has a lot >> of other advantages anyway, mainly that you can manipulate them from >> SQL. > > I have come to the conclusion that storing the rules and various other > bits in > tables is the best solution, although this will require a much more > complex > db structure than I had originally planned. Trying to allocate and free > memory in shared memory is fairly straightforward, but likely to become > incredibly messy. > > Seeing as some of the rules already include load-value-from-db-on-demand, > it > should be fairly straightforward to extend it to > load-rule-from-db-on-demand. > I posted some source to a shared memory sort of thing to the group, as well as to you, I believe. For variables and values that change very infrequently, using the DB is the right idea. PostgreSQL, as well as most databases, crumble under a highly changing database. By changing, I mean a lot of UPDATES and DELETES. Inserts are not so bad. PostgreSQL has a fairl poor (IMHO) UPDATE behaviour. Most transaction aware databases do, but PostgreSQL seems quite bad. For an example, if you are doing a scoreboard sort of thing for a website, updating a single varible in a table 20 times a second, will quickly make that simple and normally fast update/query take a very long time. You have to run VACUUM a whole lot. The next example is a session table for a website, you may have a few hundred or a few thousand active session rows, but each row may get many updates, and you may have tens of thousands of sessions which may be inactive. Unless you vaccum very frequently, you are doing a lot of disk I/O for every session, because the query has to walk the table file to find a valid row. A database is a BAD system to manage data like sessions in an active website. It is a good tool for most all, but if you are implementing an eBay or Yahoo, you'll swamp your DB quickly. The issue with a shared memory system is that you don't get the data security that you do with disk storage.
On Mon February 6 2006 05:17, Mark Woodward wrote: > I posted some source to a shared memory sort of thing to the group, as > well as to you, I believe. Indeed, and it looks rather interesting. I'll have a look through it when I have a chance... So, after more discussion and experimentation, the possible methods in order of +elegance/-difficulty/-complexity are: =1. OSSP supported shared mem, possibly with a pg memory context or Mark's shared memory manager. =1. Separate application which the postgres backends talk to over tcp (which actually turns out to be quite a clean way of doing it). 3. Storing rules in db and reloading them each time (which turns out to be a utter bastard to do). 4. Shared memory with my own memory manager. I am *probably* going to go for the separate network application, as I believe this is easy and relatively clean, as the required messages should be fairly straightforward. Each postgres backend opens a connection to the single separate "rules-server" which sends back a serious of commands (probably SQL), before the connection is closed again. If this is Clearly Insane - please let me know! Regards, Richard
> On Mon February 6 2006 05:17, Mark Woodward wrote: >> I posted some source to a shared memory sort of thing to the group, as >> well as to you, I believe. > > Indeed, and it looks rather interesting. I'll have a look through it when > I > have a chance... > > So, after more discussion and experimentation, the possible methods in > order > of +elegance/-difficulty/-complexity are: > > =1. OSSP supported shared mem, possibly with a pg memory context or Mark's > shared memory manager. > =1. Separate application which the postgres backends talk to over tcp > (which > actually turns out to be quite a clean way of doing it). If you hop on over to http://www.mohawksoft.org, you'll see a server application called "MCache." MCache is written to handle *exactly* the sort of information you are looking to manage. Its primary duty is to manage highly concurrent/active sessions for a large web cluster. I have also been working on a PostgreSQL extension for it. It needs to be fleshed out and, again, some heavy duty QA, but "works on my machine." I alluded to releasing an extension module for PostgreSQL, I'm actually working on a much larger set of projects intended to tightly integrate PostgreSQL, web servers (PHP right now), and a set of service applications including search and recommendations. In another thread I wanted to add an extension, "xmldbx," to postgresql's contrib dir. Anyway, I digress. If anyone is interested in lending a hand in QA, examples, and so on, I'd be glad to take this off line. > 3. Storing rules in db and reloading them each time (which turns out to be > a > utter bastard to do). > 4. Shared memory with my own memory manager. If you have time and the inclanation to so, it is a fund sort of thing to write. > > I am *probably* going to go for the separate network application, as I > believe this is easy and relatively clean, as the required messages should > be > fairly straightforward. Each postgres backend opens a connection to the > single separate "rules-server" which sends back a serious of commands > (probably SQL), before the connection is closed again. > > If this is Clearly Insane - please let me know! It isn't a bad idea at all. For MCache, I leave the socket connection open for the next use of the PostgreSQL session. Web environments usually keep a cache of active database connections to save the overhead of connecting each time. You just need to be careful when you clean up.