Re: emergency outage requiring database restart - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: emergency outage requiring database restart
Date
Msg-id 20161019145653.GE5087@momjian.us
Whole thread Raw
In response to Re: emergency outage requiring database restart  (Merlin Moncure <mmoncure@gmail.com>)
Responses Re: emergency outage requiring database restart  (Merlin Moncure <mmoncure@gmail.com>)
List pgsql-hackers
On Wed, Oct 19, 2016 at 08:54:48AM -0500, Merlin Moncure wrote:
> > Yeah.  Believe me -- I know the drill.  Most or all the damage seemed
> > to be to the system catalogs with at least two critical tables dropped
> > or inaccessible in some fashion.  A lot of the OIDs seemed to be
> > pointing at the wrong thing.  Couple more datapoints here.
> >
> > *) This database is OLTP, doing ~ 20 tps avg (but very bursty)
> > *) Another database on the same cluster was not impacted.  However
> > it's more olap style and may not have been written to during the
> > outage
> >
> > Now, this infrastructure running this system is running maybe 100ish
> > postgres clusters and maybe 1000ish sql server instances with
> > approximately zero unexplained data corruption issues in the 5 years
> > I've been here.  Having said that, this definitely smells and feels
> > like something on the infrastructure side.  I'll follow up if I have
> > any useful info.
> 
> After a thorough investigation I now have credible evidence the source
> of the damage did not originate from the database itself.
> Specifically, this database is mounted on the same volume as the
> operating system (I know, I know) and something non database driven
> sucked up disk space very rapidly and exhausted the volume -- fast
> enough that sar didn't pick it up.  Oh well :-) -- thanks for the help

However, disk space exhaustion should not lead to corruption unless the
underlying layers lied in some way.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Parallel tuplesort (for parallel B-Tree index creation)
Next
From: Bruce Momjian
Date:
Subject: Re: Question about behavior of snapshot too old feature