Re: IPC/MultixactCreation on the Standby server - Mailing list pgsql-hackers

From Ivan Bykov
Subject Re: IPC/MultixactCreation on the Standby server
Date
Msg-id 176399474623.1015.6638892143305450074.pgcf@coridan.postgresql.org
Whole thread Raw
In response to Re: IPC/MultixactCreation on the Standby server  (Andrey Borodin <x4mmm@yandex-team.ru>)
Responses Re: IPC/MultixactCreation on the Standby server
List pgsql-hackers
Hi, Andrey!

> Thanks for your review!

Review still in progress, sorry for the delay. I didn't have enough time to fully understand the changes you suggest,
butit seems there is 
 
only a small gap in my understanding of what the patch does. Here is my explanation of the problem.

The main problem
=============

The main problem is that we may lose session context before writing the offset to SLRU (but we may write 
a WAL record). It seems that the writer process got stuck in the XLogInsert procedure or even failed
between GetNewMultiXactId 
and RecordNewMultiXact call. In this case, readers will wait to receive a conditional variable signal (from new
multixacts)
 
but could not find a valid offset for the "failed" (it may be in WAL) multixid.

I illustrate this using the next diagram.

Writer                                        Reader             
--------------------------------------------------------------------------------
 MultiXactIdCreateFromMembers
 -> GetNewMultiXactId (101)
                                              GetMultiXactIdMembers(100)
                                              -> LWLockAcquire(MultiXactOffset)
                                              -> read offset 100
                                              -> read offset 101
                                              -> LWLockRelease(MultiXactOffset)
                                                 offset 101 == 0
                                              -> ConditionVariableSleep()
+--------------------------------------------------------------------------------------+
|-> XLogInsert                                                                         |
+--------------------------------------------------------------------------------------+
 -> RecordNewMultiXact
  -> LWLockAcquire(MultiXactOffset)
  -> write offset 101
  -> LWLockRelease(MultiXactOffset)
  -> ConditionVariableBroadcast(nextoff_cv);  
                                              -> retry:
                                              -> LWLockAcquire(MultiXactOffset)
                                              -> read offset 100
                                              -> read offset 101
                                              -> LWLockRelease(MultiXactOffset)
                                                 offset 101 != 0
                                              -> length = offset 101 - read offset 100
  -> LWLockAcquire(MultiXactMember)
  -> write members 101
  -> LWLockRelease(MultiXactOffset)

--------------------------------------------------------------------------------------

As far as I can see, your proposal seems to address exactly that problem.
The main difference from the former solution is writing to MultiXactOffset SLRU all required 
information for the reader atomically on multiact insertion. 
Before this change, we actually extended the multixact insertion time window to the next multixact 
insertion time, and it seems a risky design.

I illustrate the new solution using the next diagram.

Writer                                        Reader             
--------------------------------------------------------------------------------
 MultiXactIdCreateFromMembers
 -> GetNewMultiXactId (100)
 -> XLogInsert 
 -> RecordNewMultiXact
  -> LWLockAcquire(MultiXactOffset)
  -> write offset 100 
  -> write offset 101 *****************************************************************
  -> LWLockRelease(MultiXactOffset)
                                              GetMultiXactIdMembers(100)
                                              -> LWLockAcquire(MultiXactOffset)
                                              -> read offset 100
                                              -> read offset 101
                                              -> LWLockRelease(MultiXactOffset)
                                              Assert(offset 101 == 0)
                                              -> ConditionVariableSleep()
                                              -> length = offset 101 - read offset 100
--------------------------------------------------------------------------------------

So if I understand the core idea of your solution right, I think that the code in the last patch 
(v10-0001-Avoid-multixact-edge-case-2-by-writing-the-next-.patch) is correct and does what it should.

pgsql-hackers by date:

Previous
From: Chao Li
Date:
Subject: Re: get rid of Pointer type, mostly
Next
From: Peter Eisentraut
Date:
Subject: Re: refactor AlterDomainAddConstraint (alter domain add constraint)