make async slave to wait for lsn to be replayed - Mailing list pgsql-hackers

From Ivan Kartyshov
Subject make async slave to wait for lsn to be replayed
Date
Msg-id 0240c26c-9f84-30ea-fca9-93ab2df5f305@postgrespro.ru
Whole thread Raw
Responses Re: make async slave to wait for lsn to be replayed  (Craig Ringer <craig@2ndquadrant.com>)
Re: make async slave to wait for lsn to be replayed  (Thomas Munro <thomas.munro@enterprisedb.com>)
List pgsql-hackers
Hi hackers,

Few days earlier I've finished my work on WAITLSN statement utility, so
I’d like to share it.


Introduction
============

Our clients who deal with 9.5 and use asynchronous master-slave
replication, asked to make the wait-mechanism on the slave side to
prevent the situation when slave handles query which needs data (LSN)
that was received, flushed, but still not replayed.


Problem description
===================

The implementation:
Must handle the wait-mechanism using pg_sleep() in order not to load system
Must avoid race conditions if different backend want to wait for
different LSN
Must not take snapshot of DB, to avoid troubles with sudden minXID change
Must have optional timeout parameter if LSN traffic has stalled.
Must release on postmaster’s death or interrupts.


Implementation
==============

To avoid troubles with snapshots, WAITLSN was implemented as a utility
statement, this allows us to circumvent the snapshot-taking mechanism.
We tried different variants and the most effective way was to use Latches.
To handle interprocess interaction all Latches are stored in shared
memory and to cope with race conditions, each Latch is protected by a
Spinlock.
Timeout was made optional parameter, it is set in milliseconds.


What works
==========

Actually, it works well even with significant timeout or wait period
values, but of course there might be things I've overlooked.

How to use it
==========

WAITLSN ‘LSN’ [, timeout in ms];

#Wait until LSN 0/303EC60 will be replayed, or 10 second passed.
WAITLSN ‘0/303EC60’, 10000;

#Or same without timeout.
WAITLSN ‘0/303EC60’;

Notice: WAITLSN will release on PostmasterDeath or Interruption events
if they come earlier then LSN or timeout.

Testing the implementation
======================

The implementation was tested with testgres and unittest python modules.

How to test this implementation:
Start master server
Make table test, insert tuple 1
Make asynchronous slave replication (9.5 wal_level = standby, 9.6 or
higher wal_level =  replica)
Slave: START TRANSACTION ISOLATION LEVEL REPEATABLE READ ;
        SELECT * FROM test;
Master: delete tuple + make vacuum + get new LSN
Slave: WAITLSN ‘newLSN’, 60000;
        Waitlsn finished with FALSE “LSN doesn`t reached”
Slave: COMMIT;
        WAITLSN ‘newLSN’, 60000;
        Waitlsn finished with success (without NOTICE message)

The WAITLSN as expected wait LSN, and interrupts on PostmasterDeath,
interrupts or timeout.

Your feedback is welcome!


---
Ivan Kartyshov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Attachment

pgsql-hackers by date:

Previous
From: Marko Tiikkaja
Date:
Subject: Re: INSERT .. SET syntax
Next
From: Craig Ringer
Date:
Subject: Re: pg_sequence catalog