Re: Back-branch update releases coming in a couple weeks - Mailing list pgsql-hackers

From MauMau
Subject Re: Back-branch update releases coming in a couple weeks
Date
Msg-id 7AE503F0CB83442082C20ECE2B0A6E4B@maumau
Whole thread Raw
In response to Re: Back-branch update releases coming in a couple weeks  (Fujii Masao <masao.fujii@gmail.com>)
Responses Re: Back-branch update releases coming in a couple weeks
List pgsql-hackers
From: "Fujii Masao" <masao.fujii@gmail.com>
> On Thu, Jan 24, 2013 at 7:42 AM, MauMau <maumau307@gmail.com> wrote:
>> I searched through PostgreSQL mailing lists with "WAL contains references 
>> to
>> invalid pages", and i found 19 messages.  Some people encountered similar
>> problem.  There were some discussions regarding those problems (Tom and
>> Simon Riggs commented), but those discussions did not reach a solution.
>>
>> I also found a discussion which might relate to this problem.  Does this 
>> fix
>> the problem?
>>
>> [BUG] lag of minRecoveryPont in archive recovery
>> http://www.postgresql.org/message-id/20121206.130458.170549097.horiguchi.kyotaro@lab.ntt.co.jp
>
> Yes. Could you check whether you can reproduce the problem on the
> latest REL9_2_STABLE?

I tried to produce the problem by doing "pg_ctl stop -mi" against the 
primary more than ten times on REL9_2_STABLE, but the problem did not 
appear.  However, I encountered the crash only once out of dozens of 
failovers, possibly more than a hundred times, on PostgreSQL 9.1.6.  So, I'm 
not sure the problem is fixed in REL9_2_STABLE.

I'm wondering if the fix discussed in the above thread solves my problem.  I 
found the following differences between Horiguchi-san's case and my case:

(1)
Horiguchi-san says the bug outputs the message:

WARNING:  page 0 of relation base/16384/16385 does not exist

On the other hand, I got the message:

WARNING:  page 506747 of relation base/482272/482304 was uninitialized


(2)
Horiguchi-san produced the problem when he shut the standby immediately and 
restarted it.  However, I saw the problem during failover.


(3)
Horiguchi-san did not use any index, but in my case the WARNING message 
refers to an index.


But there's a similar point.  Horiguchi-san says the problem occurs after 
DELETE+VACUUM.  In my case, I shut the primary down while the application 
was doing INSERT/UPDATE.  As the below messages show, some vacuuming was 
running just before the immediate shutdown:

...
LOG:  automatic vacuum of table "GOLD.scm1.tbl1": index scans: 0pages: 0 removed, 36743 remaintuples: 0 removed, 73764
remainsystemusage: CPU 0.09s/0.11u sec elapsed 0.66 sec
 
LOG:  automatic analyze of table "GOLD.scm1.tbl1" system usage: CPU 
0.00s/0.14u sec elapsed 0.32 sec
LOG:  automatic vacuum of table "GOLD.scm1.tbl2": index scans: 0pages: 0 removed, 12101 remaintuples: 40657 removed,
44142remain system usage: CPU 0.06s/0.06u sec 
 
elapsed 0.30 sec
LOG:  automatic analyze of table "GOLD.scm1.tbl2" system usage: CPU 
0.00s/0.06u sec elapsed 0.14 sec
LOG:  received immediate shutdown request
...


Could you tell me the details of the problem discussed and fixed in the 
upcoming minor release?  I would to like to know the phenomenon and its 
conditions, and whether it applies to my case.

Regards
MauMau





pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: BUG #6510: A simple prompt is displayed using wrong charset
Next
From: Claudio Freire
Date:
Subject: Re: [PATCH 1/3] Fix x + y < x overflow checks