Re: emergency outage requiring database restart - Mailing list pgsql-hackers

From Merlin Moncure
Subject Re: emergency outage requiring database restart
Date
Msg-id CAHyXU0yUTcvLs5Hk+_p2iFNz0EXf3otbLDUA56BnEj0tN7zztA@mail.gmail.com
Whole thread Raw
In response to Re: emergency outage requiring database restart  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Responses Re: emergency outage requiring database restart  (Alvaro Herrera <alvherre@2ndquadrant.com>)
List pgsql-hackers
On Mon, Oct 24, 2016 at 9:18 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Merlin Moncure wrote:
>> On Mon, Oct 24, 2016 at 6:01 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
>
>> > Corruption struck again.
>> > This time got another case of view busted -- attempting to create
>> > gives missing 'type' error.
>>
>> Call it a hunch -- I think the problem is in pl/sh.
>
> I've heard that before.

well, yeah, previously I had an issue where the database crashed
during a heavy concurrent pl/sh based load.   However the problems
went away when I refactored the code.   Anyways, I looked at the code
and couldn't see anything obviously wrong so who knows?  All I know is
my production database is exploding continuously and I'm looking for
answers.  The only other extension in heavy use on this servers is
postgres_fdw.

The other database on the cluster is fine, which kind of suggests we
are not facing clog or WAL type problems.

After last night, I rebuilt the cluster, turning on checksums, turning
on synchronous commit (it was off) and added a standby replica.  This
should help narrow the problem down should it re-occur; if storage is
bad (note, other database on same machine is doing 10x write activity
and is fine) or something is scribbling on shared memory (my guess
here)  then checksums should be popped, right?

merlin



pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Wraparound warning
Next
From: Alvaro Herrera
Date:
Subject: Re: emergency outage requiring database restart