Re: [HACKERS] make async slave to wait for lsn to be replayed - Mailing list pgsql-hackers

From Anna Akenteva
Subject Re: [HACKERS] make async slave to wait for lsn to be replayed
Date
Msg-id f4100c14985b1b7983aefd00cb6b491c@postgrespro.ru
Whole thread Raw
In response to Re: [HACKERS] make async slave to wait for lsn to be replayed  (Kartyshov Ivan <i.kartyshov@postgrespro.ru>)
Responses Re: [HACKERS] make async slave to wait for lsn to be replayed  (Kartyshov Ivan <i.kartyshov@postgrespro.ru>)
List pgsql-hackers
On 2020-03-21 14:16, Kartyshov Ivan wrote:
> As it was discussed earlier, I added wait for statement into
> begin/start statement.
Thanks! To address the discussion: I like the idea of having WAIT as a 
part of BEGIN statement rather than a separate command, as Thomas Munro 
suggested. That way, the syntax itself enforces that WAIT FOR LSN will 
actually take effect, even for single-snapshot transactions. It seems 
more convenient for the user, who won't have to remember the details 
about how WAIT interacts with isolation levels.


> BEGIN [ WORK | TRANSACTION ] [ transaction_mode[, ...] ] wait_for_event
Not sure about this, but could we add "WAIT FOR .." as another 
transaction_mode rather than a separate thing? That way, user won't have 
to worry about the order. As of now, one should remember to always put 
WAIT FOR as the Last parameter in the BEGIN statement.


>       and event is:
>           LSN value [options]
>           TIMESTAMP value
I would maybe remove WAIT FOR TIMESTAMP. As Robert Haas has pointed out, 
it seems a lot like pg_sleep_until(). Besides, it doesn't necessarily 
need to be connected to transaction start, which makes it different from 
WAIT FOR LSN - so I wouldn't mix them together.


I had another look at the code:


===
In WaitShmemSize() you might want to use functions that check for 
overflow - add_size()/mul_size(). They're used in similar situations, 
for example in BTreeShmemSize().


===
This is how WaitUtility() is called - note that time_val will always be 
 > 0:
+    if (time_val <= 0)
+        time_val = 1;
+...
+    res = WaitUtility(lsn, (int)(time_val * 1000), dest);

(time_val * 1000) is passed to WaitUtility() as the delay argument. And 
inside WaitUtility() we have this:

+if (delay > 0)
+    latch_events = WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH;
+else
+    latch_events = WL_LATCH_SET | WL_POSTMASTER_DEATH;

Since we always pass a delay value greater than 0, we'll never get to 
the "else" clause here and we'll never be ready to wait for LSN forever. 
Perhaps due to that, the current test outputs this after a simple WAIT 
FOR LSN command:
psql:<stdin>:1: NOTICE:  LSN is not reached.


===
Speaking of tests,

When I tried to test BEGIN TRANSACTION WAIT FOR LSN, I got a segfault:
LOG:  statement: BEGIN TRANSACTION WAIT FOR LSN '0/3002808'
LOG:  server process (PID 10385) was terminated by signal 11: 
Segmentation fault
DETAIL:  Failed process was running: COMMIT

Could you add some more tests to the patch when this is fixed? With WAIT 
as part of BEGIN statement + with things such as WAIT FOR ALL ... / WAIT 
FOR ANY ... / WAIT FOR LSN ... UNTIL TIMESTAMP ...


===
In WaitSetLatch() we should probably check backend for NULL before 
calling SetLatch(&backend->procLatch)

We might also need to check wait statement for NULL in these two cases:
+   case T_TransactionStmt:
+   {...
+       result = transformWaitForStmt(pstate, (WaitStmt *) stmt->wait);

case TRANS_STMT_START:
{...
+   WaitStmt   *waitstmt = (WaitStmt *) stmt->wait;
+   res = WaitMain(waitstmt, dest);


===
After we added the "wait" attribute to TransactionStmt struct, do we 
perhaps need to add something to _copyTransactionStmt() / 
_equalTransactionStmt()?

--
Anna Akenteva
Postgres Professional:
The Russian Postgres Company
http://www.postgrespro.com



pgsql-hackers by date:

Previous
From: Fabien COELHO
Date:
Subject: Allow continuations in "pg_hba.conf" files
Next
From: Jeff Davis
Date:
Subject: Re: AllocSetEstimateChunkSpace()