Thread: Re: [PERFORM] Hanging queries on dual CPU windows
> > If so, > > we could perhaps recode that part using a Mutex instead of > a critical > > section - since it's not a performance critical path, the > difference > > shouldn't be large. If I code up a patch for that, can you re-apply > > SP1 and test it? Or is this a production system you can't > really touch? > > I can do whatever the hell I want with it, so if you could > cook up a patch that would be great. > > As a BTW: I reinstalled SP1 and turned stats collection off. > That also seems to work, but is not really a solution since > we want to use autovacuuming. Ok, I've coded up a patch that changes the code to use a mutex instead. Patch attached. You can get a precompiled postgres.exe at http://www.hagander.net/download/postgres.exe_mutex.zip. You need to copy this file to postmaster.exe as well - they are supposed to be identical. It's based off a snapshot of 8.1-stable. Looking a my system while testing this it still loooked like it was hanging on that plac ein the code, even though I saw no problems. So I'm not convinced we can actually trust the stacktrace from the non-default threads. So I don't think this patch will actually work :-( But it's worth a try. (Oh, and I moved the thread over to -hackers, seems more correct at this time) //Magnus
Attachment
On Sunday 12 March 2006 09:40, Magnus Hagander wrote: > > > If so, > > > we could perhaps recode that part using a Mutex instead of > > > > a critical > > > > > section - since it's not a performance critical path, the > > > > difference > > > > > shouldn't be large. If I code up a patch for that, can you re-apply > > > SP1 and test it? Or is this a production system you can't > > > > really touch? > > > > I can do whatever the hell I want with it, so if you could > > cook up a patch that would be great. > > > > As a BTW: I reinstalled SP1 and turned stats collection off. > > That also seems to work, but is not really a solution since > > we want to use autovacuuming. > > Ok, I've coded up a patch that changes the code to use a mutex instead. > Patch attached. You can get a precompiled postgres.exe at > http://www.hagander.net/download/postgres.exe_mutex.zip. You need to > copy this file to postmaster.exe as well - they are supposed to be > identical. It's based off a snapshot of 8.1-stable. > > Looking a my system while testing this it still loooked like it was > hanging on that plac ein the code, even though I saw no problems. So I'm > not convinced we can actually trust the stacktrace from the non-default > threads. So I don't think this patch will actually work :-( But it's > worth a try. > > (Oh, and I moved the thread over to -hackers, seems more correct at this > time) Thanks Magnus, I'll try tomorrow. Will let you know ASAP (8:30 EST I guess :). If this doesn't work, how do we progress? > > //Magnus jan -- -------------------------------------------------------------- Jan de Visser jdevisser@digitalfairway.com Baruk Khazad! Khazad ai-menu! --------------------------------------------------------------
""Magnus Hagander"" <mha@sollentuna.net> wrote > Ok, I've coded up a patch that changes the code to use a mutex instead. Are we asserting the problem is caused by the spinlock random wake-up order? I am not sure why this would fix the problem. If my memory serves, a critical section might be a problem if one process aborts unexpected while it is inside. Other waiting processes can never have a chance to enter it (also have no chance to handle SIGQUIT) -- so this patch may solve this. There is another suspect in http://www.devisser-siderius.com/stack1.jpg, i.e., process 3 does shmctl. I once filed a server core dump bug in win32 of reporting WSAEWOULDBLOCK. (http://archives.postgresql.org/pgsql-bugs/2006-02/msg00185.php). AFAICS, it is actually an mistranslated EINTR. There seems some relation between these issues, but I didn't come up with a complete theory of it. Regards, Qingqing
On Sunday 12 March 2006 09:40, Magnus Hagander wrote: > Looking a my system while testing this it still loooked like it was > hanging on that plac ein the code, even though I saw no problems. So I'm > not convinced we can actually trust the stacktrace from the non-default > threads. So I don't think this patch will actually work :-( But it's > worth a try. I'm afraid you're right. Hangs again :( jan -- -------------------------------------------------------------- Jan de Visser jdevisser@digitalfairway.com Baruk Khazad! Khazad ai-menu! --------------------------------------------------------------
On Monday 13 March 2006 09:26, Jan de Visser wrote: > On Sunday 12 March 2006 09:40, Magnus Hagander wrote: > > Looking a my system while testing this it still loooked like it was > > hanging on that plac ein the code, even though I saw no problems. So I'm > > not convinced we can actually trust the stacktrace from the non-default > > threads. So I don't think this patch will actually work :-( But it's > > worth a try. > > I'm afraid you're right. Hangs again :( I now have the toolchain set up, so if you want me to try stuff, please let me know. Resolving this is important to us. On a whim, I replaced InitializeCriticalSection with InitializeCriticalSectionAndSpinCount, since MSDN told me that would be better for SMP. No joy. jan -- -------------------------------------------------------------- Jan de Visser jdevisser@digitalfairway.com Baruk Khazad! Khazad ai-menu! --------------------------------------------------------------