Re: win32 _dosmaperr() - Mailing list pgsql-hackers

From Qingqing Zhou
Subject Re: win32 _dosmaperr()
Date
Msg-id dcuj99$1hfr$1@news.hub.org
Whole thread Raw
In response to Re: win32 _dosmaperr()  ("Magnus Hagander" <mha@sollentuna.net>)
Responses Re: win32 _dosmaperr()  (Bruce Momjian <pgman@candle.pha.pa.us>)
List pgsql-hackers
""Magnus Hagander"" <mha@sollentuna.net> writes
>
> I suggest you try using Process Explorer from www.sysinternals.com to
> figure out who has the file open. Most of the time it should be able to
> tell you exactly who has locked the file - at least as long as it's done
> from userspace. I'm not 100% sure on how it deals with kernel level
> locks.
>

After runing PG win32 (8.0.1) sever for a while and mix some heavy
transactions like checkpoint, vacuum together, I encountered another problem
should be in the same category. PG reports:
   "could not unlink 0000xxxx, continuing to try"

at dirmod.c/pgunlink() and deadloops there. I use the PE tool you mentioned,
I found there are only 3 processes hold the handle of the problematic xlog
segment, all of them are postgres backends. Using the FileMon tool from the
same website, I found that bgwriter tried to OPEN the xlog segment with ALL
ACCESS but failed with result DELETE PEND.

That is to say, under some conditions, even if I opened file with
SHARED_DELETE flag, I may not remove the file when it is open? I did some
tests, but every time I delete/rename an opened file, I could make it.

Things could get worse because the whole database cluster may stop working
and waiting for the buffer the bgwriter is working on, but bgwriter is
waiting for (by the deadloop in pgunlink) those postgres'es to move on (so
that they could close the problematic xlog segment), which is a deadlock.

Regards,
Qingqing








pgsql-hackers by date:

Previous
From: Ian Burrell
Date:
Subject: Re: Solving the OID-collision problem
Next
From: Palle Girgensohn
Date:
Subject: problem building 7.3 on FreeBSD 6