Thread: BUG #5628: 9.0beta4 failed automatic crash recovery
The following bug has been logged online: Bug reference: 5628 Logged by: Itagaki Takahiro Email address: itagaki.takahiro@gmail.com PostgreSQL version: 9.0b4 (32bit) Operating system: Windows 7 (64bit) Description: 9.0beta4 failed automatic crash recovery Details: 9.0beta4 seems to fail automatic crash recovery after some of backend processes crashed, though 8.2 succeeded to recover. This is a rare error case, but some logic for shared memory might be broken between versions. I crashed a backend as a test manually with "pg_ctl kill": pg_ctl kill QUIT <backend-pid> 9.0 server has gone with the following logs: ---- WARNING: terminating connection because of crash of another server process ... LOG: all server processes terminated; reinitializing FATAL: pre-existing shared memory block is still in use HINT: Check if there are any old server processes still running, and terminate them. ---- But 8.2 can recover as expected: ---- WARNING: terminating connection because of crash of another server process ... LOG: all server processes terminated; reinitializing LOG: database system was interrupted at <timestamp> ----
"Itagaki Takahiro" <itagaki.takahiro@gmail.com> writes: > 9.0beta4 seems to fail automatic crash recovery after > some of backend processes crashed, Works for me, and always has worked for me (and I crash backend processes regularly ;-)). Maybe something Windows-specific? regards, tom lane
On Tue, Aug 24, 2010 at 9:45 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Itagaki Takahiro" <itagaki.takahiro@gmail.com> writes: >> 9.0beta4 seems to fail automatic crash recovery after >> some of backend processes crashed, > > Works for me, and always has worked for me (and I crash backend > processes regularly ;-)). Me too! >=C2=A0Maybe something Windows-specific? Sure. I didn't see any problems on Linux machine. There might be issues to detach/reattach shared memory on Windows. --=20 Itagaki Takahiro
On Tue, Aug 24, 2010 at 2:59 AM, Itagaki Takahiro <itagaki.takahiro@gmail.com> wrote: > On Tue, Aug 24, 2010 at 9:45 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> "Itagaki Takahiro" <itagaki.takahiro@gmail.com> writes: >>> 9.0beta4 seems to fail automatic crash recovery after >>> some of backend processes crashed, >> >> Works for me, and always has worked for me (and I crash backend >> processes regularly ;-)). > > Me too! > >>=A0Maybe something Windows-specific? > > Sure. I didn't see any problems on Linux machine. > There might be issues to detach/reattach shared memory on Windows. We've seen this on and off before. Are you saying it's fully reproducible? I don't recall if we did any specific changes around this for 9.0, did we? --=20 =A0Magnus Hagander =A0Me: http://www.hagander.net/ =A0Work: http://www.redpill-linpro.com/
On Tue, Aug 24, 2010 at 5:25 PM, Magnus Hagander <magnus@hagander.net> wrote: >> There might be issues to detach/reattach shared memory on Windows. > We've seen this on and off before. Are you saying it's fully reproducible? > > I don't recall if we did any specific changes around this for 9.0, did we? Yes, it is reproducible. I tested 8.3 and 8.4, and found 8.3 and newer versions failed to recover. * 8.2,17 => OK * 8.3.11, 8.4.4, 9.0b4 => FAILED Same error messages were logged on failed cases: FATAL: pre-existing shared memory block is still in use HINT: Check if there are any old server processes still running, and terminate them. Changes for the issue might be introduced between 8.2 and 8.3, or in bugfixes only applied to 8.3 or newer versions. -- Itagaki Takahiro
On Tue, Aug 24, 2010 at 11:01 AM, Itagaki Takahiro <itagaki.takahiro@gmail.com> wrote: > On Tue, Aug 24, 2010 at 5:25 PM, Magnus Hagander <magnus@hagander.net> > wrote: >>> There might be issues to detach/reattach shared memory on Windows. >> We've seen this on and off before. Are you saying it's fully reproducibl= e? >> >> I don't recall if we did any specific changes around this for 9.0, did w= e? > > Yes, it is reproducible. I tested 8.3 and 8.4, and found 8.3 and newer > versions failed to recover. > =A0* 8.2,17 =3D> OK > =A0* 8.3.11, 8.4.4, 9.0b4 =3D> FAILED > > Same error messages were logged on failed cases: > =A0FATAL: =A0pre-existing shared memory block is still in use > =A0HINT: =A0Check if there are any old server processes still running, > and terminate them. Interesting. It certainly doesn't happen for everybody, or we would've heard a lot more about this. We have seen a couple of reports of it, IIRC, but nothing easily reproducible. Could you try increasing either the Sleep() call or the loop counter in PGSharedMemoryCreate (win32_shmem.c) to some very high value, and then when it's trying to restart check: 1) Is there more than one postgres.exe running (there should be only the postmaster 2) With process explorer, see if postmaster has an open handle to the shared memory segment (thus is basically conflicting with itself) > Changes for the issue might be introduced between 8.2 and 8.3, > or in bugfixes only applied to 8.3 or newer versions. Yes, the shared memory stuff was basically rewritten for 8.3. We also have http://git.postgresql.org/gitweb?p=3Dpostgresql.git;a=3Dcommit= diff;h=3D0ad6b8dd7cee13cde693571f20b10b364e52dd23, which has been backpatched to 8.2 but is not in any yet released version. Could you try the current tip of the 8.2 branch? --=20 =A0Magnus Hagander =A0Me: http://www.hagander.net/ =A0Work: http://www.redpill-linpro.com/