Re: Fast promotion failure - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Fast promotion failure
Date
Msg-id 51909998.3010202@vmware.com
Whole thread Raw
In response to Re: Fast promotion failure  (Amit Kapila <amit.kapila@huawei.com>)
Responses Re: Fast promotion failure  (Amit Kapila <amit.kapila@huawei.com>)
List pgsql-hackers
On 13.05.2013 06:07, Amit Kapila wrote:
> On Monday, May 13, 2013 5:54 AM Kyotaro HORIGUCHI wrote:
>> Heikki said in the fist message in this thread that he suspected
>> the cause of the failure he had seen to be wrong TLI on whitch
>> checkpointer runs. Nevertheless, the patch you suggested for me
>> looks fixing it. Moreover (one of?) the failure from the same
>> cause looks fixed with the patch.
>
> There were 2 problems:
> 1. There was some issue in walsender logic due to which after promotion in
> some cases it hits assertion or error
> 2. During fast promotion, checkpoint gets created with wrong TLI
>
> He has provided 2 different patches
> fix-standby-promotion-assert-fail-2.patch and
> fast-promotion-quick-fix.patch.
> Among 2, he has already committed fix-standby-promotion-assert-fail-2.patch
> (http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=2ffa66f49
> 75c99e52984f7ee81b47d137b5b4751)

That's correct.

>> Is the point of this discussion that the patch may leave out some
>> glich about timing of timeline-related changing and Heikki saw an
>> egress of that?
>
> AFAIU, the committed patch has some gap in overall scenario which is the
> fast promotion issue.

Right, the fast promotion issue is still there.

Just to get us all on the same page again: Does anyone see a problem 
with a fresh git checkout, with the fast-promotion-quick-fix.patch 
applied? 
(http://www.postgresql.org/message-id/51894942.4080500@vmware.com). If 
you do, please speak up. As far as I know, the already-committed patch, 
together with fast-promotion-quick-fix.patch, should fix all known 
issues (*).

I haven't committed a fix for the issue I reported in this thread, 
because I'm not 100% on what the right fix for it would be. 
fast-promotion-quick-fix.patch seems to do the trick, but at least the 
comments need to be updated, and I'm not sure if there some related 
corner cases that it doesn't handle. Simon?

(*) Well, almost. This one is still pending: 
http://www.postgresql.org/message-id/CAB7nPqRhuCuuD012GCB_tAAFrixx2WioN_zfXQcvLuRab8DN2g@mail.gmail.com

- Heikki



pgsql-hackers by date:

Previous
From: Kyotaro HORIGUCHI
Date:
Subject: Re: Logging of PAM Authentication Failure
Next
From: Heikki Linnakangas
Date:
Subject: Re: erroneous restore into pg_catalog schema