Thread: [Win32] Problem with rename()

[Win32] Problem with rename()

From
"Peter Brant"
Date:
Hi all,

In the last couple of days, we've been bitten (a couple of times, on
different servers) by an apparent glitch or bad interaction in the
Windows implementation of rename().

The relevant log message is:

[2006-04-17 16:49:22.583 ] 2252 LOG:  could not rename file
"pg_xlog/000000010000010A000000BD" to
"pg_xlog/000000010000010A000000D7", continuing to try

It apparently just keeps on looping indefinitely.  The "completed
rename" message from port/dirmod.c never shows up.

Shortly thereafter, Postgres becomes unresponsive.  Attempts to make a
new connection just block.  Autovacuums block.  A "pg_ctl ... stop -m
fast" doesn't work.  Only "pg_ctl ... stop -m immediate" does.

With the last occurrence, I saved off the output of "handle -a" and
"pslist -x" in case that's helpful.

Any thoughts on what might be going wrong?  If it happens again, what
other clues should I be looking for?

Pete

Re: [Win32] Problem with rename()

From
Bruce Momjian
Date:
Peter Brant wrote:
> Hi all,
>
> In the last couple of days, we've been bitten (a couple of times, on
> different servers) by an apparent glitch or bad interaction in the
> Windows implementation of rename().
>
> The relevant log message is:
>
> [2006-04-17 16:49:22.583 ] 2252 LOG:  could not rename file
> "pg_xlog/000000010000010A000000BD" to
> "pg_xlog/000000010000010A000000D7", continuing to try
>
> It apparently just keeps on looping indefinitely.  The "completed
> rename" message from port/dirmod.c never shows up.
>
> Shortly thereafter, Postgres becomes unresponsive.  Attempts to make a
> new connection just block.  Autovacuums block.  A "pg_ctl ... stop -m
> fast" doesn't work.  Only "pg_ctl ... stop -m immediate" does.
>
> With the last occurrence, I saved off the output of "handle -a" and
> "pslist -x" in case that's helpful.
>
> Any thoughts on what might be going wrong?  If it happens again, what
> other clues should I be looking for?

Yes, comment I added to dirmod.c give a hint:

    /*
     * We need these loops because even though PostgreSQL uses flags that
     * allow rename while the file is open, other applications might have
     * these files open without those flags.
     */

so someone else has the file opened, but didn't use the required flags.
As to what could have it open, I don't know.

--
  Bruce Momjian   http://candle.pha.pa.us
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: [Win32] Problem with rename()

From
"Qingqing Zhou"
Date:
""Peter Brant"" <Peter.Brant@wicourts.gov>
>
> In the last couple of days, we've been bitten (a couple of times, on
> different servers) by an apparent glitch or bad interaction in the
> Windows implementation of rename().
>
> The relevant log message is:
>
> [2006-04-17 16:49:22.583 ] 2252 LOG:  could not rename file
> "pg_xlog/000000010000010A000000BD" to
> "pg_xlog/000000010000010A000000D7", continuing to try
>
> It apparently just keeps on looping indefinitely.  The "completed
> rename" message from port/dirmod.c never shows up.
>

Similar problems have been reported before -- which PG version and do you
have any anti-virus software installed?

Regards,
Qingqing

Re: [Win32] Problem with rename()

From
"Magnus Hagander"
Date:
> Hi all,
>=20
> In the last couple of days, we've been bitten (a couple of=20
> times, on different servers) by an apparent glitch or bad=20
> interaction in the Windows implementation of rename().
>=20
> The relevant log message is:
>=20
> [2006-04-17 16:49:22.583 ] 2252 LOG:  could not rename file=20
> "pg_xlog/000000010000010A000000BD" to=20
> "pg_xlog/000000010000010A000000D7", continuing to try
>=20
> It apparently just keeps on looping indefinitely.  The=20
> "completed rename" message from port/dirmod.c never shows up.
>=20
> Shortly thereafter, Postgres becomes unresponsive.  Attempts=20
> to make a new connection just block.  Autovacuums block.  A=20
> "pg_ctl ... stop -m fast" doesn't work.  Only "pg_ctl ...=20
> stop -m immediate" does.
>=20
> With the last occurrence, I saved off the output of "handle=20
> -a" and "pslist -x" in case that's helpful.=20=20
>=20
> Any thoughts on what might be going wrong?  If it happens=20
> again, what other clues should I be looking for?

It would be interesting to see which processes have handle(s) open to
either of these two names. "handle -a" shuold give that, I assume?

//Magnus

Re: [Win32] Problem with rename()

From
"Peter Brant"
Date:
Unfortunately, it's not that simple.  It would be straightforward to
track down if it were.

In response to other questions:

It's Postgres 8.1.3 running on Windows 2003 Server.  No anti-virus
software is installed.  The servers are essentially bare except for the
OS and Postgres.

We have "handle -a" output from two occurrences (different servers):

For the first one:

LOG:  could not rename file "pg_xlog/000000010000010A000000BD" to
"pg_xlog/000000010000010A000000D7", continuing to try

Only one process (postgres.exe) is holding a handle to
pg_xlog/000000010000010A000000BD:

  F84: Event         \BaseNamedObjects\pgident: postgres: bigbird
bigbird 127.0.0.1(3306) BIND
  FF4: File          G:\pgsql\data\pg_xlog\000000010000010A000000BD

Nothing has the target file open.

The second is similar, except that two postgres.exe processes (and
nothing else) have the file open:

LOG:  could not rename file "pg_xlog/000000010000010A0000006E" to
"pg_xlog/000000010000010A00000087", continuing to try

#1:
  F84: Event         \BaseNamedObjects\pgident: postgres: bigbird
bigbird 127.0.0.1(2367) SELECT
  EFC: File          G:\pgsql\data\pg_xlog\000000010000010A0000006E

#2:
  F84: Event         \BaseNamedObjects\pgident: postgres: bigbird
bigbird 127.0.0.1(2420) SELECT
  FF4: File          G:\pgsql\data\pg_xlog\000000010000010A0000006E

Nothing has the target file open.

Pete

>>> Bruce Momjian <pgman@candle.pha.pa.us> 04/18/06 2:58 am >>>
Yes, comment I added to dirmod.c give a hint:

    /*
     * We need these loops because even though PostgreSQL uses flags
that
     * allow rename while the file is open, other applications might
have
     * these files open without those flags.
     */

so someone else has the file opened, but didn't use the required flags.

As to what could have it open, I don't know.

Re: [Win32] Problem with rename()

From
"Harald Armin Massa"
Date:
UGV0ZXIsCgo+IEc6XHBnc3FsXGRhdGFccGdfeGxvZ1wwMDAwMDAwMTAwMDAw
MTBBMDAwMDAwQkQKCnByb3BhYmx5IGEgdmVyeSBzdHVwaWQgcXVlc3Rpb246
ICJHIiAtIGlzIHRoYXQgcmVhbGx5IGEgTE9LQUwgZHJpdmUgYXQgdGhhdApz
ZXJ2ZXIsIG9yIHJhdGhlciBzb21lIE5BUyBvciBzaW1pbGlhcj8KCkkgaGFk
IHRoZSBzYW1lIGVycm9yIGluIG9uZSBsb2dmaWxlIG9uZSB0aW1lLCBidXQg
dGhlcmUgd2hlcmUgYSBsYXJnZSBhbW91bnQKb2YgcG9zc2libGUgY3VscHJp
dHMgKHZpcmFsIHNjYW5uZXIsIGxvZ2luIHNjcmlwdCBjaGFuZ2luZyBwZXJt
aXNzaW9ucywKYmFja3VwcywgYWNjZXNzIGNvbnRyb2wgc29mdHdhcmUuLi4p
IGFuZCB3ZSBjb3VsZCBub3QgcmVwcm9kdWNlIHRoZSBlcnJvci4KCkhhcmFs
ZAotLQpHSFVNIEhhcmFsZCBNYXNzYQpwZXJzdWFkZXJlIGV0IHByb2dyYW1t
YXJlCkhhcmFsZCBBcm1pbiBNYXNzYQpSZWluc2J1cmdzdHJhw59lIDIwMmIK
NzAxOTcgU3R1dHRnYXJ0CjAxNzMvOTQwOTYwNwotClBvc3RncmVTUUwgLSBz
dXBwb3J0ZWQgYnkgYSBjb21tdW5pdHkgdGhhdCBkb2VzIG5vdCBwdXQgeW91
IG9uIGhvbGQK

Re: [Win32] Problem with rename()

From
"Magnus Hagander"
Date:
Ok. So we're obviously blocking ourselves out.

Which process was the stalled one? Was it the same one that held the
file open, or a different one?


Looking at our code, we have the comment:
    /* These flags allow concurrent rename/unlink */
                    (FILE_SHARE_READ |
FILE_SHARE_WRITE | FILE_SHARE_DELETE),

But I'm not sure that those flags actually guarantee that. They do allow
concurrent unlink, but not necessarily rename. I read elsewhere that it
should work, but can't find backing docs on MSDN. Seems it works in most
cases, but perhaps there are some where it doesn't?


Is there any way we can force our own other backends to close a file?
That would be an easy fix - have the postmaster tell all other backends
to close all files and reopen...

/Magnus

> -----Original Message-----
> From: Peter Brant [mailto:Peter.Brant@wicourts.gov]=20
> Sent: Tuesday, April 18, 2006 4:15 PM
> To: Bruce Momjian; Qingqing Zhou <zhouqq@cs.toronto.edu;=20
> Magnus Hagander <mha@sollentuna.net
> Cc: pgsql-bugs@postgresql.org
> Subject: Re: [BUGS] [Win32] Problem with rename()
>=20
> Unfortunately, it's not that simple.  It would be=20
> straightforward to track down if it were.
>=20
> In response to other questions:
>=20
> It's Postgres 8.1.3 running on Windows 2003 Server.  No=20
> anti-virus software is installed.  The servers are=20
> essentially bare except for the OS and Postgres.
>=20
> We have "handle -a" output from two occurrences (different servers):
>=20
> For the first one:
>=20
> LOG:  could not rename file=20
> "pg_xlog/000000010000010A000000BD" to=20
> "pg_xlog/000000010000010A000000D7", continuing to try
>=20
> Only one process (postgres.exe) is holding a handle to
> pg_xlog/000000010000010A000000BD:
>=20
>   F84: Event         \BaseNamedObjects\pgident: postgres: bigbird
> bigbird 127.0.0.1(3306) BIND
>   FF4: File          G:\pgsql\data\pg_xlog\000000010000010A000000BD
>=20
> Nothing has the target file open.
>=20
> The second is similar, except that two postgres.exe processes=20
> (and nothing else) have the file open:
>=20
> LOG:  could not rename file=20
> "pg_xlog/000000010000010A0000006E" to=20
> "pg_xlog/000000010000010A00000087", continuing to try
>=20
> #1:
>   F84: Event         \BaseNamedObjects\pgident: postgres: bigbird
> bigbird 127.0.0.1(2367) SELECT
>   EFC: File          G:\pgsql\data\pg_xlog\000000010000010A0000006E
>=20
> #2:
>   F84: Event         \BaseNamedObjects\pgident: postgres: bigbird
> bigbird 127.0.0.1(2420) SELECT
>   FF4: File          G:\pgsql\data\pg_xlog\000000010000010A0000006E
>=20
> Nothing has the target file open.
>=20
> Pete
>=20
> >>> Bruce Momjian <pgman@candle.pha.pa.us> 04/18/06 2:58 am >>>
> Yes, comment I added to dirmod.c give a hint:
>=20
>     /*
>      * We need these loops because even though PostgreSQL=20
> uses flags that
>      * allow rename while the file is open, other=20
> applications might have
>      * these files open without those flags.
>      */
>=20
> so someone else has the file opened, but didn't use the=20
> required flags.
>=20
> As to what could have it open, I don't know.
>=20
>=20

Re: [Win32] Problem with rename()

From
Tom Lane
Date:
"Peter Brant" <Peter.Brant@wicourts.gov> writes:
> LOG:  could not rename file "pg_xlog/000000010000010A000000BD" to
> "pg_xlog/000000010000010A000000D7", continuing to try
> ...
> Only one process (postgres.exe) is holding a handle to
> pg_xlog/000000010000010A000000BD:
> ...
> The second is similar, except that two postgres.exe processes (and
> nothing else) have the file open:

Hmm, could these be backends that have been sitting idle for some time?
I'd expect a backend to be holding open a handle for whichever WAL
segment it last wrote to.  If the backend sits idle for a couple of
checkpoints while others are advancing the end of WAL, then that segment
could become a target for renaming.

The only workable fix I can think of is to allow the checkpointer to
simply fail to rename this segment and go on about its business,
figuring that we'll be able to rename/delete the WAL segment in some
future checkpoint cycle.  Not sure how messy that would be to implement.

            regards, tom lane

Re: [Win32] Problem with rename()

From
Tom Lane
Date:
"Magnus Hagander" <mha@sollentuna.net> writes:
> Looking at our code, we have the comment:
>     /* These flags allow concurrent rename/unlink */
>                     (FILE_SHARE_READ |
> FILE_SHARE_WRITE | FILE_SHARE_DELETE),

> But I'm not sure that those flags actually guarantee that. They do allow
> concurrent unlink, but not necessarily rename. I read elsewhere that it
> should work, but can't find backing docs on MSDN. Seems it works in most
> cases, but perhaps there are some where it doesn't?

I think there are two different cases involved in rename:

1. Someone has a handle for the file-to-be-renamed;
2. Someone has a handle for the file that is to be deleted (ie currently
   has the name being renamed to).

If #2 doesn't work then we've got serious problems.  I think though that
#1 can only occur in the context of WAL segment recycling, so we can
probably work around it if that doesn't work.

            regards, tom lane

Re: [Win32] Problem with rename()

From
"Magnus Hagander"
Date:
> > Looking at our code, we have the comment:
> >     /* These flags allow concurrent rename/unlink */
> >                     (FILE_SHARE_READ |
> > FILE_SHARE_WRITE | FILE_SHARE_DELETE),
>=20
> > But I'm not sure that those flags actually guarantee that.=20
> They do allow
> > concurrent unlink, but not necessarily rename. I read=20
> elsewhere that it
> > should work, but can't find backing docs on MSDN. Seems it=20
> works in most
> > cases, but perhaps there are some where it doesn't?
>=20
> I think there are two different cases involved in rename:
>=20
> 1. Someone has a handle for the file-to-be-renamed;
> 2. Someone has a handle for the file that is to be deleted=20
> (ie currently
>    has the name being renamed to).
>=20
> If #2 doesn't work then we've got serious problems.  I think=20
> though that
> #1 can only occur in the context of WAL segment recycling, so we can
> probably work around it if that doesn't work.

The problem reported here was 1. Nobody had handles to the new filename.
I don't think I've seen any reports of issue 2, but most were never
researched to this depth (because most were just a case of
uninstalling-the-antivirus-to-make-it-work).

//Magnus

Re: [Win32] Problem with rename()

From
"Peter Brant"
Date:
They are local.

Pete

>>> "Harald Armin Massa" <haraldarminmassa@gmail.com> 04/18/06 4:35 pm
>>>
 "G" - is that really a LOKAL drive at that server, or rather some NAS
or similiar?

Re: [Win32] Problem with rename()

From
"Peter Brant"
Date:
It's definitely possible.  Both failures occurred around the end of the
business day as update traffic would have been coasting to a stop.  The
middle tier never closes a connection unless it's forced to (e.g. as a
result of a query error, connection going away, etc.)

Pete

>>> Tom Lane <tgl@sss.pgh.pa.us> 04/18/06 4:50 pm >>>
Hmm, could these be backends that have been sitting idle for some
time?

Re: [Win32] Problem with rename()

From
Tom Lane
Date:
"Peter Brant" <Peter.Brant@wicourts.gov> writes:
> [2006-04-17 16:49:22.583 ] 2252 LOG:  could not rename file
> "pg_xlog/000000010000010A000000BD" to
> "pg_xlog/000000010000010A000000D7", continuing to try
> It apparently just keeps on looping indefinitely.  The "completed
> rename" message from port/dirmod.c never shows up.

> Shortly thereafter, Postgres becomes unresponsive.  Attempts to make a
> new connection just block.  Autovacuums block.  A "pg_ctl ... stop -m
> fast" doesn't work.  Only "pg_ctl ... stop -m immediate" does.

BTW, whatever we decide to do about the rename problem, I'd say that the
second point represents an independent bug.  The rename loop would hang
up the bgwriter, which would probably cause performance to tank, but the
rest of the system shouldn't become completely unresponsive because of
an incomplete checkpoint.  The checkpoint operation shouldn't be holding
any critical locks at this point.

Can you find out anything about what the other processes are blocking on?

            regards, tom lane

Re: [Win32] Problem with rename()

From
Tom Lane
Date:
I wrote:
> "Peter Brant" <Peter.Brant@wicourts.gov> writes:
>> Shortly thereafter, Postgres becomes unresponsive.  Attempts to make a
>> new connection just block.  Autovacuums block.  A "pg_ctl ... stop -m
>> fast" doesn't work.  Only "pg_ctl ... stop -m immediate" does.

> BTW, whatever we decide to do about the rename problem, I'd say that the
> second point represents an independent bug.  The rename loop would hang
> up the bgwriter, which would probably cause performance to tank, but the
> rest of the system shouldn't become completely unresponsive because of
> an incomplete checkpoint.  The checkpoint operation shouldn't be holding
> any critical locks at this point.

I looked into this and found out that in fact, InstallXLogFileSegment
holds the ControlFileLock while trying to rename the WAL segment file.
It does this specifically as an interlock against someone else trying
to create the same new WAL segment name.  So once the system runs out
of already-created WAL segments, XLogFileInit hangs up on the lock,
and then anything that wants to generate WAL entries is blocked.

It's possible that we could avoid using a lock here, but it would
require accepting some errors in creation/renaming of WAL segments as
being expected rather than fatal conditions.  That seems a bit risky to
me, particularly for the Windows port where I have zero confidence that
I understand what errors Windows might report :-(.  Maybe such a cure
is worse than the disease, since we intend to do something about fixing
the rename problem anyway.  Any comments?

            regards, tom lane

Re: [Win32] Problem with rename()

From
"Peter Brant"
Date:
Does that also explain why an attempt to make a new connection just
hangs?

One other thing regarding that is that connection attempt seems to
kinda, sorta succeed.  It never makes it as far as a command prompt, but
on the "stop -m immediate", psql does print the "HINT:  In a moment you
should be able to reconnect to the database and repeat your command.",
etc. log messages.

Pete

>>> Tom Lane <tgl@sss.pgh.pa.us> 04/18/06 8:03 pm >>>
I looked into this and found out that in fact, InstallXLogFileSegment
holds the ControlFileLock while trying to rename the WAL segment file.
It does this specifically as an interlock against someone else trying
to create the same new WAL segment name.  So once the system runs out
of already-created WAL segments, XLogFileInit hangs up on the lock,
and then anything that wants to generate WAL entries is blocked.

Re: [Win32] Problem with rename()

From
Tom Lane
Date:
"Peter Brant" <Peter.Brant@wicourts.gov> writes:
> Does that also explain why an attempt to make a new connection just
> hangs?

Actually, I was just wondering about that --- seems like a bare
connection attempt should not generate any WAL entries.  Do you have any
nondefault actions in ~/.psqlrc or something like that?

            regards, tom lane

Re: [Win32] Problem with rename()

From
Tom Lane
Date:
I wrote:
> "Peter Brant" <Peter.Brant@wicourts.gov> writes:
>> Does that also explain why an attempt to make a new connection just
>> hangs?

> Actually, I was just wondering about that --- seems like a bare
> connection attempt should not generate any WAL entries.  Do you have any
> nondefault actions in ~/.psqlrc or something like that?

I just repeated the hangup scenario here, and confirmed that I can still
start and stop a plain-vanilla psql session (no ~/.psqlrc, no special
per-user or per-database settings) without it hanging.  I can also do
simple read-only SELECTs.  So I'm thinking your hang must involve some
additional non-read-only actions.

[ thinks for awhile longer ... ]  No, I take that back.  Once you'd
exhausted the current pg_clog page (32K transactions), even read-only
transactions would be blocked by the need to create a new pg_clog page
(which is a WAL-logged action).  A read-only transaction never actually
makes a WAL entry, but it does still consume an XID and hence a slot on
the current pg_clog page.  So I just hadn't tried enough transactions.

            regards, tom lane

Re: [Win32] Problem with rename()

From
"Peter Brant"
Date:
This is probably somewhat superfluous, but we had another one these
incidents last night whose details confirm your explanation here.

[2006-04-21 00:22:19.500 ] 2452 LOG:  could not rename file
"pg_xlog/000000010000011A0000004C" to
"pg_xlog/000000010000011A00000071", continuing to try

the autovacuums (which wouldn't actually have been vacuuming anything
since update traffic would have stopped by then) continued until:

[2006-04-21 01:57:35.968 ] 4048 LOG:  autovacuum: processing database
"bigbird"

and the Web site first started noticing timeouts at 01:31:42,827.

Overnight traffic is so light that 70 minutes to work through 32K / 2
transactions is probably about right.

Pete

>>> Tom Lane <tgl@sss.pgh.pa.us> 04/18/06 9:01 pm >>>
[ thinks for awhile longer ... ]  No, I take that back.  Once you'd
exhausted the current pg_clog page (32K transactions), even read-only
transactions would be blocked by the need to create a new pg_clog page
(which is a WAL-logged action).  A read-only transaction never
actually
makes a WAL entry, but it does still consume an XID and hence a slot
on
the current pg_clog page.  So I just hadn't tried enough transactions.

Re: [Win32] Problem with rename()

From
Bruce Momjian
Date:
I am assuming this problem and the other rash of Win32 problems reported
in March are now all fixed in 8.1.4.  If not, please let me know.

---------------------------------------------------------------------------

Tom Lane wrote:
> I wrote:
> > "Peter Brant" <Peter.Brant@wicourts.gov> writes:
> >> Does that also explain why an attempt to make a new connection just
> >> hangs?
>
> > Actually, I was just wondering about that --- seems like a bare
> > connection attempt should not generate any WAL entries.  Do you have any
> > nondefault actions in ~/.psqlrc or something like that?
>
> I just repeated the hangup scenario here, and confirmed that I can still
> start and stop a plain-vanilla psql session (no ~/.psqlrc, no special
> per-user or per-database settings) without it hanging.  I can also do
> simple read-only SELECTs.  So I'm thinking your hang must involve some
> additional non-read-only actions.
>
> [ thinks for awhile longer ... ]  No, I take that back.  Once you'd
> exhausted the current pg_clog page (32K transactions), even read-only
> transactions would be blocked by the need to create a new pg_clog page
> (which is a WAL-logged action).  A read-only transaction never actually
> makes a WAL entry, but it does still consume an XID and hence a slot on
> the current pg_clog page.  So I just hadn't tried enough transactions.
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>

--
  Bruce Momjian   http://candle.pha.pa.us
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: [Win32] Problem with rename()

From
"Peter Brant"
Date:
Really?  If there was a patch, I missed it.

My recollection is that there was general agreement about this
particular problem (see, for example,
http://archives.postgresql.org/pgsql-bugs/2006-04/msg00189.php ), but
things kind of trailed off after that without a resolution.

As far as the complete list of Win32 problems which affected us:
  - The stats collector crashing should indeed be fixed in 8.1.4
  - Missing stats caused by Windows PID recycling is fixed in 8.2
  - Various semaphore problems are probably all fixed with the new
Win32 semaphore implementation in 8.2
  - The stuck log rename problem mentioned above is still an issue
  - The "permission denied on fsync" (or something like that) problem
is still an issue.  Unfortunately, IIRC, we could never really nail down
the underlying problem.

None of these problems affect us any more: the production servers now
run Linux.  Great to have options! (and we were moving that direction
anyway)

Pete

>>> Bruce Momjian <pgman@candle.pha.pa.us> 16.06.2006 22:05 >>>

I am assuming this problem and the other rash of Win32 problems
reported
in March are now all fixed in 8.1.4.  If not, please let me know.


Re: [Win32] Problem with rename()

From
Bruce Momjian
Date:
Peter Brant wrote:
> Really?  If there was a patch, I missed it.
>
> My recollection is that there was general agreement about this
> particular problem (see, for example,
> http://archives.postgresql.org/pgsql-bugs/2006-04/msg00189.php ), but
> things kind of trailed off after that without a resolution.

Yea.  Where you using WAL archiving?  We will have a fix in 8.1.5 to
prevent multiple archivers from starting.  Perhaps that was a cause.

> As far as the complete list of Win32 problems which affected us:
>   - The stats collector crashing should indeed be fixed in 8.1.4
>   - Missing stats caused by Windows PID recycling is fixed in 8.2
>   - Various semaphore problems are probably all fixed with the new
> Win32 semaphore implementation in 8.2
>   - The stuck log rename problem mentioned above is still an issue

Yep.  What has me baffled is why no one else is seeing the problem.
We had a rash of reports, and now all is quiet.

>   - The "permission denied on fsync" (or something like that) problem
> is still an issue.  Unfortunately, IIRC, we could never really nail down
> the underlying problem.

Yes, I just reread that thread.  I also am confused where to go from
here.

> None of these problems affect us any more: the production servers now
> run Linux.  Great to have options! (and we were moving that direction
> anyway)

Were you the only one use Win32 in heavy usage?  You were on Win2003.
Were there some bugs in the OS that got fixed later.

Yea, stumped.  Guess we will have to wait for more reports.  I don't
even see how to document this as a TODO.

--
  Bruce Momjian   http://candle.pha.pa.us
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: [Win32] Problem with rename()

From
"Peter Brant"
Date:
>>> On 16.06.2006 at 23:21:21, in message
<200606162121.k5GLLLw13054@candle.pha.pa.us>, Bruce Momjian
<pgman@candle.pha.pa.us> wrote:
> Yea.  Where you using WAL archiving?  We will have a fix in 8.1.5 to
> prevent multiple archivers from starting.  Perhaps that was a cause.
>
Not at the time, no.  The rename in question was just a regular WAL
segment rename.

> Yes, I just reread that thread.  I also am confused where to go from
> here.
>
Yeah, it's unfortunate that our best theory (a _commit on a deleted
file) just didn't seem to be supported by the evidence.  Although the
servers which see a heavy SELECT load are now Linux, we still have a
couple of Windows servers receiving the normal replication traffic.  We
still get regular fsync errors after the scheduled CLUSTERs so if you do
find a fix (or come up with a new theory), there's a test bed there (at
least for now).

> Were you the only one use Win32 in heavy usage?  You were on Win2003.

> Were there some bugs in the OS that got fixed later.
...
> Yep.  What has me baffled is why no one else is seeing the problem.
> We had a rash of reports, and now all is quiet.
>
We might be somewhat more susceptible than most too.  Due to the way
our middle tier parcels out queries, some connections might sit idle for
a long time.  Per Tom's explanation in the original thread, this is an
important factor.  Ultimately if a concurrent rename isn't possible in
Windows (and that looks likely), it's going to be a problem as things
stand now.

Pete