Re: Mac OS X: system shutdown prevents checkpoint - Mailing list pgsql-hackers
From | Christopher Kings-Lynne |
---|---|
Subject | Re: Mac OS X: system shutdown prevents checkpoint |
Date | |
Msg-id | GNELIHDDFBOCMGBFGEFOAEFICCAA.chriskl@familyhealth.com.au Whole thread Raw |
In response to | Mac OS X: system shutdown prevents checkpoint (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-hackers |
I showed this to my friend who's a FreeBSD committer (Adrian Chadd) and he's actually setting up a MacOS/X box at the moment and will look into it - assuming you don't discover the problem first... Chris > -----Original Message----- > From: pgsql-hackers-owner@postgresql.org > [mailto:pgsql-hackers-owner@postgresql.org]On Behalf Of Tom Lane > Sent: Tuesday, 30 April 2002 1:26 PM > To: pgsql-hackers@postgresql.org > Cc: Francois Suter > Subject: [HACKERS] Mac OS X: system shutdown prevents checkpoint > > > I've been looking into Francois Suter's recent reports of Postgres not > shutting down cleanly on Mac OS X 10.1. I find that it's quite > reproducible. If you tell the system to shut down in the normal > fashion (eg, pick "Shut Down" from the Apple menu), the postmaster > does not terminate, leading to WAL recovery upon restart --- or > even worse, failure to restart if the postmaster PID recorded in the > lockfile happens to get assigned to some other daemon. > > Observe the normal trace of postmaster shutdown (running with -d4, > logging of timestamps and PIDs enabled): > > 2002-04-30 00:08:30 [315] DEBUG: pmdie 15 > 2002-04-30 00:08:30 [315] DEBUG: smart shutdown request > 2002-04-30 00:08:30 [331] DEBUG: shutting down > 2002-04-30 00:08:32 [331] DEBUG: database system is shut down > 2002-04-30 00:08:32 [331] DEBUG: proc_exit(0) > 2002-04-30 00:08:32 [331] DEBUG: shmem_exit(0) > 2002-04-30 00:08:32 [331] DEBUG: exit(0) > 2002-04-30 00:08:32 [315] DEBUG: reaping dead processes > 2002-04-30 00:08:32 [315] DEBUG: proc_exit(0) > 2002-04-30 00:08:32 [315] DEBUG: shmem_exit(0) > 2002-04-30 00:08:32 [315] DEBUG: exit(0) > > The postmaster (here PID 315) forks a subprocess to flush shared buffers > and checkpoint the WAL log. When the subprocess exits, the postmaster > removes its lockfile and shuts down. The subprocess takes a minimum of > 2 seconds because there's a sleep(2) in the checkpoint fsync code. > > Now here's what I see in the case of shutting down the OS X system: > > 2002-04-30 00:25:35 [376] DEBUG: pmdie 15 > 2002-04-30 00:25:35 [376] DEBUG: smart shutdown request > > ... and nothing more. Actual system shutdown (power down) occurred at > approximately 00:26:06 by my watch, over thirty seconds later than the > postmaster received SIGTERM. So there was plenty of time to do the > checkpoint subprocess. (Indeed, I believe that thirty seconds is the > grace period Darwin's init process allows SIGTERM'd processes before > giving up and hard-killing them. So the system was actually sitting and > waiting for the postmaster.) > > What we appear to have here is that the kernel is not allowing the > postmaster to fork a checkpoint subprocess. But there's no indication > that the postmaster got a fork() error return, either. Seems like it's > just hung. > > Does this ring a bell with anyone? Is it an OSX bug, or a "feature"; > and if the latter, how can we work around it? > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org >
pgsql-hackers by date: