Re: Mac OS X: system shutdown prevents checkpoint - Mailing list pgsql-hackers

From Christopher Kings-Lynne
Subject Re: Mac OS X: system shutdown prevents checkpoint
Date
Msg-id GNELIHDDFBOCMGBFGEFOAEFICCAA.chriskl@familyhealth.com.au
Whole thread Raw
In response to Mac OS X: system shutdown prevents checkpoint  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
I showed this to my friend who's a FreeBSD committer (Adrian Chadd) and he's
actually setting up a MacOS/X box at the moment and will look into it -
assuming you don't discover the problem first...

Chris

> -----Original Message-----
> From: pgsql-hackers-owner@postgresql.org
> [mailto:pgsql-hackers-owner@postgresql.org]On Behalf Of Tom Lane
> Sent: Tuesday, 30 April 2002 1:26 PM
> To: pgsql-hackers@postgresql.org
> Cc: Francois Suter
> Subject: [HACKERS] Mac OS X: system shutdown prevents checkpoint
>
>
> I've been looking into Francois Suter's recent reports of Postgres not
> shutting down cleanly on Mac OS X 10.1.  I find that it's quite
> reproducible.  If you tell the system to shut down in the normal
> fashion (eg, pick "Shut Down" from the Apple menu), the postmaster
> does not terminate, leading to WAL recovery upon restart --- or
> even worse, failure to restart if the postmaster PID recorded in the
> lockfile happens to get assigned to some other daemon.
>
> Observe the normal trace of postmaster shutdown (running with -d4,
> logging of timestamps and PIDs enabled):
>
> 2002-04-30 00:08:30 [315]    DEBUG:  pmdie 15
> 2002-04-30 00:08:30 [315]    DEBUG:  smart shutdown request
> 2002-04-30 00:08:30 [331]    DEBUG:  shutting down
> 2002-04-30 00:08:32 [331]    DEBUG:  database system is shut down
> 2002-04-30 00:08:32 [331]    DEBUG:  proc_exit(0)
> 2002-04-30 00:08:32 [331]    DEBUG:  shmem_exit(0)
> 2002-04-30 00:08:32 [331]    DEBUG:  exit(0)
> 2002-04-30 00:08:32 [315]    DEBUG:  reaping dead processes
> 2002-04-30 00:08:32 [315]    DEBUG:  proc_exit(0)
> 2002-04-30 00:08:32 [315]    DEBUG:  shmem_exit(0)
> 2002-04-30 00:08:32 [315]    DEBUG:  exit(0)
>
> The postmaster (here PID 315) forks a subprocess to flush shared buffers
> and checkpoint the WAL log.  When the subprocess exits, the postmaster
> removes its lockfile and shuts down.  The subprocess takes a minimum of
> 2 seconds because there's a sleep(2) in the checkpoint fsync code.
>
> Now here's what I see in the case of shutting down the OS X system:
>
> 2002-04-30 00:25:35 [376]    DEBUG:  pmdie 15
> 2002-04-30 00:25:35 [376]    DEBUG:  smart shutdown request
>
> ... and nothing more.  Actual system shutdown (power down) occurred at
> approximately 00:26:06 by my watch, over thirty seconds later than the
> postmaster received SIGTERM.  So there was plenty of time to do the
> checkpoint subprocess.  (Indeed, I believe that thirty seconds is the
> grace period Darwin's init process allows SIGTERM'd processes before
> giving up and hard-killing them.  So the system was actually sitting and
> waiting for the postmaster.)
>
> What we appear to have here is that the kernel is not allowing the
> postmaster to fork a checkpoint subprocess.  But there's no indication
> that the postmaster got a fork() error return, either.  Seems like it's
> just hung.
>
> Does this ring a bell with anyone?  Is it an OSX bug, or a "feature";
> and if the latter, how can we work around it?
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
>



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Mac OS X: system shutdown prevents checkpoint
Next
From: "Christopher Kings-Lynne"
Date:
Subject: Re: [RFC] Set Returning Functions