RE: Time delayed LR (WAS Re: logical replication restrictions) - Mailing list pgsql-hackers

From Hayato Kuroda (Fujitsu)
Subject RE: Time delayed LR (WAS Re: logical replication restrictions)
Date
Msg-id TYAPR01MB5866D871F60DDFD8FAA2CDE4F5BD9@TYAPR01MB5866.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: Time delayed LR (WAS Re: logical replication restrictions)  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses RE: Time delayed LR (WAS Re: logical replication restrictions)  ("Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com>)
List pgsql-hackers
Hi hackers,

I have made a rough prototype that can serialize changes to permanent file and
apply after time elapsed from v30 patch. I think the 2PC and restore mechanism
needs more analysis, but I can share codes for discussion. How do you think?

## Interfaces

Not changed from old versions. The subscription parameter "min_apply_delay" is
used to enable the time-delayed logical replication.

## Advantages

Two big problems are solved.

* Apply worker can respond from walsender's keepalive while delaying application.
  This is because the process will not sleep.
* Publisher can recycle WALs even if a transaction related with the WAL is not
  applied yet. This is because the apply worker flush all the changes to file
  and reply that WALs are flushed.

## Disadvantages

Code complexity.

## Basic design

The basic idea is quite simple - create a new file when apply worker receive
BEGIN message, write received changes, and flush them when COMMIT message is come.
The delayed transaction is checked its commit time for every main loop, and applied
when the time exceeds the min_apply_delay.

To handle files APIs that uses plain kernel FDs was used. This approach is
similar to physical walreceiver process. Apart from the physical one, worker
does not flush for every messages - it is done at the end of the transaction.

### For 2PC

The delay is started since COMMIT PREPARED is come. But to avoid the
long-lock-holding issue, the prepared transaction is just written into file
without applying.

When BEGIN PREPARE is received, same as normal transactions, the worker creates
a file and starts to write changes. If we reach the PREPARE message, just writes
a message into file, flushes, and just closes it. This means that no transactions
are prepared on subscriber. When COMMIT PREPARED is received, the worker opens the
file again and write the message. After that we treat same as normal committed
transaction.

### For streamed transaction

Do no special thing when the streaming transaction is come. When it is committed
or prepared, read all the changes and write into permanent file. To read and
write changes apply_spooled_changes() is used, which means the basic workflow
is not changed.

### Restore from files

To check the elapsed time from the commit, all commit_time of delayed transactions
must be stored in the memory. Basically it can store when the worker handle COMMIT
message, but it must do special treatment for restarting.

When an apply worker receives COMMIT/PREPARE/COMMIT PREPARED message, it writes
the message, flush them, and cache the commit_time. When worker restarts, it open
files, check the final message (this is done by seeking some bytes from end of
the file), and then cache the written commit_time.

## Restrictions

* The combination with ALTER SUBSCRIPTION .. SKIP LSN is not considered.

Thanks for Osumi-san to help implementing.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED


Attachment

pgsql-hackers by date:

Previous
From: Aleksander Alekseev
Date:
Subject: Re: HOT chain validation in verify_heapam()
Next
From: Greg Stark
Date:
Subject: Re: Commitfest 2023-03 starting tomorrow!