Re: [HACKERS] emergency outage requiring database restart - Mailing list pgsql-hackers

From Merlin Moncure
Subject Re: [HACKERS] emergency outage requiring database restart
Date
Msg-id CAHyXU0w43BkN9R3FTSatRwGR0Qjjuy8m909yGtKtnUncrxq98Q@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] emergency outage requiring database restart  (Peter Eisentraut <peter.eisentraut@2ndquadrant.com>)
List pgsql-hackers
On Tue, Jan 3, 2017 at 1:05 PM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> On 11/7/16 5:31 PM, Merlin Moncure wrote:
>> Regardless, it seems like you might be on to something, and I'm
>> inclined to patch your change, test it, and roll it out to production.
>> If it helps or at least narrows the problem down, we ought to give it
>> consideration for inclusion (unless someone else can think of a good
>> reason not to do that, heh!).
>
> Any results yet?

Not yet unfortunately.  I compiled the server with the change, but was
not able get $libdir working so that I could just do a binary swap
over my pgdg compiled package.  If anyone has some pointers on how to
do that, I'd appreciated it.

Still getting checksum failures.   Over the last 30 days, I see the
following.  Since enabling checksums FWICT none of the damage is
permanent and rolls back with the transaction.   So creepy!

[root@rcdylsdbmpf001 pg_log]# cat *.log | grep "page verification failed"
2016-12-05 10:17:48 CST [postgres@castaging]: WARNING:  page
verification failed, calculated checksum 61797 but expected 61798
2016-12-05 11:15:31 CST [postgres@castaging]: WARNING:  page
verification failed, calculated checksum 37750 but expected 37749
2016-12-05 11:15:58 CST [postgres@castaging]: WARNING:  page
verification failed, calculated checksum 44483 but expected 44482
2016-12-05 11:16:33 CST [postgres@castaging]: WARNING:  page
verification failed, calculated checksum 58926 but expected 58925
2016-12-05 11:17:08 CST [postgres@castaging]: WARNING:  page
verification failed, calculated checksum 38527 but expected 38528
2016-12-05 11:18:34 CST [postgres@castaging]: WARNING:  page
verification failed, calculated checksum 61932 but expected 61933
2016-12-05 11:18:55 CST [postgres@castaging]: WARNING:  page
verification failed, calculated checksum 23757 but expected 23758
2016-12-05 12:13:48 CST [rms@mpf2]: WARNING:  page verification
failed, calculated checksum 44192 but expected 44225 at character 417
2016-12-08 14:18:37 CST [postgres@castaging]: WARNING:  page
verification failed, calculated checksum 36083 but expected 36082
2016-12-08 15:52:31 CST [postgres@castaging]: WARNING:  page
verification failed, calculated checksum 63414 but expected 63415 at
character 1096
2016-12-09 09:12:21 CST [postgres@castaging]: WARNING:  page
verification failed, calculated checksum 25781 but expected 25780
2016-12-09 09:13:20 CST [postgres@castaging]: WARNING:  page
verification failed, calculated checksum 63043 but expected 63044 at
character 4230
2016-12-12 08:57:45 CST [postgres@castaging]: WARNING:  page
verification failed, calculated checksum 31775 but expected 31771
2016-12-13 09:47:11 CST [postgres@castaging]: WARNING:  page
verification failed, calculated checksum 40802 but expected 40806
2016-12-15 12:49:04 CST [rms@mpf2]: WARNING:  page verification
failed, calculated checksum 11625 but expected 11592 at character 417
2016-12-15 12:51:08 CST [rms@mpf2]: WARNING:  page verification
failed, calculated checksum 51017 but expected 51018
2016-12-15 12:52:36 CST [rms@mpf2]: WARNING:  page verification
failed, calculated checksum 51017 but expected 51018 at character 417
2016-12-16 12:16:31 CST [rms@mpf2]: WARNING:  page verification
failed, calculated checksum 23580 but expected 23576
2016-12-20 13:59:33 CST [postgres@castaging]: WARNING:  page
verification failed, calculated checksum 45273 but expected 45285
2016-12-20 14:00:22 CST [postgres@castaging]: WARNING:  page
verification failed, calculated checksum 10524 but expected 10525

note second database 'mpf2'.  This is a new development.  Example of
query that is jacking things is this:
2016-12-15 12:51:08 CST [rms@mpf2]: WARNING:  page verification
failed, calculated checksum 51017 but expected 51018
2016-12-15 12:51:08 CST [rms@mpf2]: CONTEXT:  SQL statement "     COPY (       SELECT 'DELETE FROM tblNAptCommonSample
WHEREReportPeriod = 201612;'       UNION ALL       SELECT format(   'INSERT INTO tblNAptCommonSample('
'ReportPeriod,Period, AdjustmentType, PlanType, MSA, MSASubMkt, '     'Sample, Occupancy, OccupancyChange,
AverageRent,
AverageRentChange, RentSF, '     'RentSFChange)'   'VALUES('     '%s, %s, ''%s'', ''%s'', ''%s'', %s,'     '%s, %s, %s,
%s,%s, %s,'     '%s)',     ReportPeriod, Period, AdjustmentType, PlanType, MSA, MSASubMkt,     c(Sample), c(Occupancy),
c(OccupancyChange),c(AverageRent),
 
c(AverageRentChange), c(RentSF),     c(RentSFChange))   FROM tblNAptCommonSample     WHERE Period = 201612     AND MSA
!='5610'   UNION ALL     SELECT 'go'     ) TO '/tmp/tblnaptcommonsample.sql';   "   PL/pgSQL function
writempf1history(integer)line 75 at EXECUTE
 


or this:
2016-12-15 12:52:36 CST [rms@mpf2]: WARNING:  page verification
failed, calculated checksum 51017 but expected 51018 at character 417
2016-12-15 12:52:36 CST [rms@mpf2]: QUERY:     COPY (       SELECT 'DELETE FROM tbltwrexistingunits WHERE ReportPeriod
=201612;'       UNION ALL       SELECT format(   'INSERT INTO tbltwrexistingunits('     'ReportPeriod, market,
submarketnum,yr_qtr, cmpltns, deletions, '     'existing, unadjexisting)'   'VALUES('     '%s, ''%s'', %s, ''%s'', %s,
%s,'    '%s, %s)',     ReportPeriod, market, submarket, yr_qtr, c(cmpltns), c(deletions),     c(existing),
c(unadjexisting))  FROM tbltwrexistingunits     WHERE ReportPeriod = 201612       AND market != '5610'   UNION ALL
SELECT'go'     ) TO '/tmp/tbltwrexistingunits.sql';
 

2016-12-15 12:52:36 CST [rms@mpf2]: CONTEXT:  PL/pgSQL function
writempf1history(integer) line 109 at EXECUTE

This is another (much simpler) routine that:
1. writes out data to scratch file with COPY
2. uses pl/sh to pipe to sqsh

merlin



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: [HACKERS] increasing the default WAL segment size
Next
From: Peter Eisentraut
Date:
Subject: Re: [HACKERS] [PATCH] Reload SSL certificates on SIGHUP