Re: [HACKERS] WIP: long transactions on hot standby feedback replica/ proof of concept - Mailing list pgsql-hackers

From Alexander Korotkov
Subject Re: [HACKERS] WIP: long transactions on hot standby feedback replica/ proof of concept
Date
Msg-id CAPpHfdu9cDv_2Yw87=5U25P+1k8Mv=K_o78tTcT70km55okK5g@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] WIP: long transactions on hot standby feedback replica/ proof of concept  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: [HACKERS] WIP: long transactions on hot standby feedback replica/ proof of concept  (Masahiko Sawada <sawada.mshk@gmail.com>)
List pgsql-hackers
On Wed, Nov 1, 2017 at 5:55 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Tue, Oct 31, 2017 at 6:17 PM, Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
> On Tue, Oct 31, 2017 at 5:16 AM, Masahiko Sawada <sawada.mshk@gmail.com>
> wrote:
>>
>> On Mon, Oct 30, 2017 at 10:16 PM, Robert Haas <robertmhaas@gmail.com>
>> wrote:
>> > On Tue, Oct 24, 2017 at 1:26 PM, Ivan Kartyshov
>> > <i.kartyshov@postgrespro.ru> wrote:
>> >> Hello. I made some bugfixes and rewrite the patch.
>> >
>> > I don't think it's a good idea to deliberately leave the state of the
>> > standby different from the state of the  master on the theory that it
>> > won't matter.  I feel like that's something that's likely to come back
>> > to bite us.
>>
>> I agree with Robert. What happen if we intentionally don't apply the
>> truncation WAL and switched over? If we insert a tuple on the new
>> master server to a block that has been truncated on the old master,
>> the WAL apply on the new standby will fail? I guess there are such
>> corner cases causing failures of WAL replay after switch-over.
>
>
> Yes, that looks dangerous.  One approach to cope that could be teaching heap
> redo function to handle such these situations.  But I expect this approach
> to be criticized for bad design.  And I understand fairness of this
> criticism.
>
> However, from user prospective of view, current behavior of
> hot_standby_feedback is just broken, because it both increases bloat and
> doesn't guarantee that read-only query on standby wouldn't be cancelled
> because of vacuum.  Therefore, we should be looking for solution: if one
> approach isn't good enough, then we should look for another approach.
>
> I can propose following alternative approach: teach read-only queries on hot
> standby to tolerate concurrent relation truncation.  Therefore, when
> non-existent heap page is accessed on hot standby, we can know that it was
> deleted by concurrent truncation and should be assumed to be empty.  Any
> thoughts?
>

You also meant that the applying WAL for AccessExclusiveLock is always
skipped on standby servers to allow scans to access the relation?

Definitely not every AccessExclusiveLock WAL records should be skipped, but only whose were emitted during heap truncation.  There are other cases when AccessExclusiveLock WAL records are emitted, for instance, during DDL operations.  But, I'd like to focus on AccessExclusiveLock WAL records caused by VACUUM for now.  It's kind of understandable for users that DDL might cancel read-only query on standby.  So, if you're running long report query then you should wait with your DDL.  But VACUUM is a different story.  It runs automatically when you do normal DML queries.

AccessExclusiveLock WAL records by VACUUM could be either not emitted, or somehow distinguished and skipped on standby.  I haven't thought out that level of detail for now.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

pgsql-hackers by date:

Previous
From: Alexander Korotkov
Date:
Subject: Re: [HACKERS] WIP: long transactions on hot standby feedback replica/ proof of concept
Next
From: Alexander Korotkov
Date:
Subject: Re: [HACKERS] How to implement a SP-GiST index as a extension module?