Thread: Win32 shared memory speed
I've seen several comments about shared memory under Windows being "slow", but I haven't had much luck finding info in the archives. What are the details of this? How was it determined and is there a straightforward test/benchmark?
IIRC, there hasn't been any direct benchmark for it (though I've wanted to do that but had no time), but it's been the olnlyreal explanation put forward for the behaviour we've seen. And it does make sense given the thread-centric view of thewindows mm. /Magnus > ------- Original Message ------- > From: "Trevor Talbot" <quension@gmail.com> > To: pgsql-hackers@postgresql.org > Sent: 07-11-11, 00:31:59 > Subject: [HACKERS] Win32 shared memory speed > > I've seen several comments about shared memory under Windows being > "slow", but I haven't had much luck finding info in the archives. > > What are the details of this? How was it determined and is there a > straightforward test/benchmark? > > ---------------------------(end of broadcast)--------------------------- > TIP 2: Don't 'kill -9' the postmaster >
Magnus Hagander wrote: > IIRC, there hasn't been any direct benchmark for it (though I've wanted to do that but had no time), but it's been theolnly real explanation put forward for the behaviour we've seen. And it does make sense given the thread-centric viewof the windows mm. > > /Magnus > How is it supposed to be slow, once its mapped into your process? There's no OS interaction at all then. If you are suggesting that the inter-process synch objects are slow, then that may be so: just use interlocked increment and a spin lock in place of a mutex and use an associated event to wake up if necessary. You dont have to use a named kernel mutex, though it may be handy while setting up the shared memory. If you are repeatedly changing the mappings - well, that may be something that needs optimisation. James
James Mansion wrote: > Magnus Hagander wrote: >> IIRC, there hasn't been any direct benchmark for it (though I've >> wanted to do that but had no time), but it's been the olnly real >> explanation put forward for the behaviour we've seen. And it does make >> sense given the thread-centric view of the windows mm. >> >> /Magnus > How is it supposed to be slow, once its mapped into your process? > There's no OS interaction at all then. Not entirely sure, I didn't think that theory up, I'm just echoing it. My guess has been somewhere around interaction with the very expensive between-process context switches. > If you are suggesting that the inter-process synch objects are slow, > then that may be so: just use interlocked > increment and a spin lock in place of a mutex and use an associated > event to wake up if necessary. > > You dont have to use a named kernel mutex, though it may be handy while > setting up the shared memory. We already use the interlocked functions for our spinlocks, with the MSVC build. With the GCC build, we use custom assembler. > If you are repeatedly changing the mappings - well, that may be > something that needs optimisation. We're not. The postmaster creates the segment, and each backend attaches to it just once. //Magnus