Re: emergency outage requiring database restart - Mailing list pgsql-hackers

From Andres Freund
Subject Re: emergency outage requiring database restart
Date
Msg-id 147977C4-6107-47C6-9628-475EC6263E2C@anarazel.de
Whole thread Raw
In response to Re: emergency outage requiring database restart  (Merlin Moncure <mmoncure@gmail.com>)
Responses Re: emergency outage requiring database restart  (Merlin Moncure <mmoncure@gmail.com>)
Re: emergency outage requiring database restart  (Oskari Saarenmaa <os@ohmu.fi>)
List pgsql-hackers

On October 26, 2016 8:57:22 PM GMT+03:00, Merlin Moncure <mmoncure@gmail.com> wrote:
>On Wed, Oct 26, 2016 at 12:43 PM, Merlin Moncure <mmoncure@gmail.com>
>wrote:
>> On Wed, Oct 26, 2016 at 11:35 AM, Merlin Moncure <mmoncure@gmail.com>
>wrote:
>>> On Tue, Oct 25, 2016 at 3:08 PM, Merlin Moncure <mmoncure@gmail.com>
>wrote:
>>>> Confirmation of problem re-occurrence will come in a few days.   
>I'm
>>>> much more likely to believe 6+sigma occurrence (storage, freak bug,
>>>> etc) should it prove the problem goes away post rebuild.
>>>
>>> ok, no major reported outage yet, but just got:
>>>
>>> 2016-10-26 11:27:55 CDT [postgres@castaging]: ERROR:  invalid page
>in
>>> block 12 of relation base/203883/1259
>
>*) I've now strongly correlated this routine with the damage.
>
>[root@rcdylsdbmpf001 ~]# cat
>/var/lib/pgsql/9.5/data/pg_log/postgresql-26.log  | grep -i
>pushmarketsample | head -5
>2016-10-26 11:26:27 CDT [postgres@castaging]: LOG:  execute <unnamed>:
>SELECT PushMarketSample($1::TEXT) AS published
>2016-10-26 11:26:40 CDT [postgres@castaging]: LOG:  execute <unnamed>:
>SELECT PushMarketSample($1::TEXT) AS published
>PL/pgSQL function pushmarketsample(text,date,integer) line 103 at SQL
>statement
>PL/pgSQL function pushmarketsample(text,date,integer) line 103 at SQL
>statement
>2016-10-26 11:26:42 CDT [postgres@castaging]: STATEMENT:  SELECT
>PushMarketSample($1::TEXT) AS published
>
>*) First invocation was 11:26:27 CDT
>
>*) Second invocation was 11:26:40 and gave checksum error (as noted
>earlier 11:26:42)
>
>*) Routine attached (if interested)
>
>My next step is to set up test environment and jam this routine
>aggressively to see what happens.

Any chance that plsh or the script it executes does anything with the file descriptors it inherits? That'd certainly
oneway to get into odd corruption issues.
 

We processor really should use O_CLOEXEC for the majority of it file handles.

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Issues with building snap packages and psql
Next
From: Merlin Moncure
Date:
Subject: Re: emergency outage requiring database restart