Thread: BUG #5628: 9.0beta4 failed automatic crash recovery

BUG #5628: 9.0beta4 failed automatic crash recovery

From
"Itagaki Takahiro"
Date:
The following bug has been logged online:

Bug reference:      5628
Logged by:          Itagaki Takahiro
Email address:      itagaki.takahiro@gmail.com
PostgreSQL version: 9.0b4 (32bit)
Operating system:   Windows 7 (64bit)
Description:        9.0beta4 failed automatic crash recovery
Details:

9.0beta4 seems to fail automatic crash recovery after
some of backend processes crashed, though 8.2 succeeded
to recover. This is a rare error case, but some logic
for shared memory might be broken between versions.

I crashed a backend as a test manually with "pg_ctl kill":
  pg_ctl kill QUIT <backend-pid>

9.0 server has gone with the following logs:
----
WARNING:  terminating connection because of crash of another server process
...
LOG:  all server processes terminated; reinitializing
FATAL:  pre-existing shared memory block is still in use
HINT:  Check if there are any old server processes still running, and
terminate them.
----

But 8.2 can recover as expected:
----
WARNING:  terminating connection because of crash of another server process
...
LOG:  all server processes terminated; reinitializing
LOG:  database system was interrupted at <timestamp>
----

Re: BUG #5628: 9.0beta4 failed automatic crash recovery

From
Tom Lane
Date:
"Itagaki Takahiro" <itagaki.takahiro@gmail.com> writes:
> 9.0beta4 seems to fail automatic crash recovery after
> some of backend processes crashed,

Works for me, and always has worked for me (and I crash backend
processes regularly ;-)).  Maybe something Windows-specific?

            regards, tom lane

Re: BUG #5628: 9.0beta4 failed automatic crash recovery

From
Itagaki Takahiro
Date:
On Tue, Aug 24, 2010 at 9:45 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Itagaki Takahiro" <itagaki.takahiro@gmail.com> writes:
>> 9.0beta4 seems to fail automatic crash recovery after
>> some of backend processes crashed,
>
> Works for me, and always has worked for me (and I crash backend
> processes regularly ;-)).

Me too!

>=C2=A0Maybe something Windows-specific?

Sure. I didn't see any problems on Linux machine.
There might be issues to detach/reattach shared memory on Windows.

--=20
Itagaki Takahiro

Re: BUG #5628: 9.0beta4 failed automatic crash recovery

From
Magnus Hagander
Date:
On Tue, Aug 24, 2010 at 2:59 AM, Itagaki Takahiro
<itagaki.takahiro@gmail.com> wrote:
> On Tue, Aug 24, 2010 at 9:45 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> "Itagaki Takahiro" <itagaki.takahiro@gmail.com> writes:
>>> 9.0beta4 seems to fail automatic crash recovery after
>>> some of backend processes crashed,
>>
>> Works for me, and always has worked for me (and I crash backend
>> processes regularly ;-)).
>
> Me too!
>
>>=A0Maybe something Windows-specific?
>
> Sure. I didn't see any problems on Linux machine.
> There might be issues to detach/reattach shared memory on Windows.

We've seen this on and off before. Are you saying it's fully reproducible?

I don't recall if we did any specific changes around this for 9.0, did we?

--=20
=A0Magnus Hagander
=A0Me: http://www.hagander.net/
=A0Work: http://www.redpill-linpro.com/

Re: BUG #5628: 9.0beta4 failed automatic crash recovery

From
Itagaki Takahiro
Date:
On Tue, Aug 24, 2010 at 5:25 PM, Magnus Hagander <magnus@hagander.net> wrote:
>> There might be issues to detach/reattach shared memory on Windows.
> We've seen this on and off before. Are you saying it's fully reproducible?
>
> I don't recall if we did any specific changes around this for 9.0, did we?

Yes, it is reproducible. I tested 8.3 and 8.4, and found 8.3 and newer
versions failed to recover.
  * 8.2,17 => OK
  * 8.3.11, 8.4.4, 9.0b4 => FAILED

Same error messages were logged on failed cases:
  FATAL:  pre-existing shared memory block is still in use
  HINT:  Check if there are any old server processes still running,
and terminate them.

Changes for the issue might be introduced between 8.2 and 8.3,
or in bugfixes only applied to 8.3 or newer versions.

--
Itagaki Takahiro

Re: BUG #5628: 9.0beta4 failed automatic crash recovery

From
Magnus Hagander
Date:
On Tue, Aug 24, 2010 at 11:01 AM, Itagaki Takahiro
<itagaki.takahiro@gmail.com> wrote:
> On Tue, Aug 24, 2010 at 5:25 PM, Magnus Hagander <magnus@hagander.net>
> wrote:
>>> There might be issues to detach/reattach shared memory on Windows.
>> We've seen this on and off before. Are you saying it's fully reproducibl=
e?
>>
>> I don't recall if we did any specific changes around this for 9.0, did w=
e?
>
> Yes, it is reproducible. I tested 8.3 and 8.4, and found 8.3 and newer
> versions failed to recover.
> =A0* 8.2,17 =3D> OK
> =A0* 8.3.11, 8.4.4, 9.0b4 =3D> FAILED
>
> Same error messages were logged on failed cases:
> =A0FATAL: =A0pre-existing shared memory block is still in use
> =A0HINT: =A0Check if there are any old server processes still running,
> and terminate them.

Interesting.  It certainly doesn't happen for everybody, or we
would've heard a lot more about this. We have seen a couple of reports
of it, IIRC, but nothing easily reproducible.

Could you try increasing either the Sleep() call or the loop counter
in PGSharedMemoryCreate (win32_shmem.c) to some very high value, and
then when it's trying to restart check:
1) Is there more than one postgres.exe running (there should be only
the postmaster
2) With process explorer, see if postmaster has an open handle to the
shared memory segment (thus is basically conflicting with itself)


> Changes for the issue might be introduced between 8.2 and 8.3,
> or in bugfixes only applied to 8.3 or newer versions.

Yes, the shared memory stuff was basically rewritten for 8.3.

We also have http://git.postgresql.org/gitweb?p=3Dpostgresql.git;a=3Dcommit=
diff;h=3D0ad6b8dd7cee13cde693571f20b10b364e52dd23,
which has been backpatched to 8.2 but is not in any yet released
version. Could you try the current tip of the 8.2 branch?


--=20
=A0Magnus Hagander
=A0Me: http://www.hagander.net/
=A0Work: http://www.redpill-linpro.com/