Re: logical decoding and replication of sequences - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: logical decoding and replication of sequences |
Date | |
Msg-id | 3d6df331-5532-6848-eb45-344b265e0238@enterprisedb.com Whole thread Raw |
In response to | Re: logical decoding and replication of sequences (Tomas Vondra <tomas.vondra@enterprisedb.com>) |
Responses |
Re: logical decoding and replication of sequences
|
List | pgsql-hackers |
Hi, Here's a an updated version of this patch - rebased to current master, and fixing some of the issues raised in Peter's review. Mainly, this adds a TAP test to src/test/subscription, focusing on testing the various situations with transactional and non-transactional behavior (with subtransactions and various ROLLBACK versions). This new TAP test however uncovered an issue with wait_for_catchup(), because that uses pg_current_wal_lsn() to wait for replication of all the changes. But that does not work when the sequence gets advanced in a transaction that is then aborted, e.g. like this: BEGIN; SELECT nextval('s') FROM generate_series(1,100); ROLLBACK; The root cause is that pg_current_wal_lsn() uses the LogwrtResult.Write, which is updated by XLogFlush() - but only in RecordTransactionCommit. Which makes sense, because only the committed stuff is "visible". But the non-transactional behavior changes this, because now some of the changes from aborted transactions may need to be replicated. Which means the wait_for_catchup() ends up not waiting for the sequence change. One option would be adding XLogFlush() to RecordTransactionAbort(), but my guess is we don't do that intentionally (even though aborts should be fairly rare in most workloads). I'm not entirely sure changing this (replicating changes from aborted xacts) is a good idea. Maybe it'd be better to replicate this "lazily" - instead of replicating the advances right away, we might remember which sequences were advanced in the transaction, and then replicate current state for those sequences at commit time. The idea is that if an increment is "invisible" we probably don't need to replicate it, it's fine to replicate the next "visible" increment. So for example given BEGIN; SELECT nextval('s'); ROLLBACK; BEGIN; SELECT nextval('s'); COMMIT; we don't need to replicate the first change, but we need to replicate the second one. The trouble is we don't decode individual sequence advances, just those that update the sequence tuple (so every 32 values or so). So we'd need to remeber the first increment, in a way that is (a) persistent across restarts and (b) shared by all backends. The other challenge seems to be ordering of the changes - at the moment we have no issues with this, because increments on the same sequence are replicated immediately, in the WAL order. But postponing them to commit time would affect this order. I've also briefly looked at the code duplication - there's a couple functions in the patch that I shamelessly copy-pasted and tweaked to handle sequences instead of tables: publicationcmds.c ----------------- 1) OpenTableList/CloseTableList - > OpenSequenceList/CloseSequenceList Trivial differences, trivial to get rid of - the only difference is pretty much just table_open vs. relation open. 2) AlterPublicationTables -> AlterPublicationSequences This is a bit more complicated, because the tables also handle inheritance (which I think does not apply to sequences). Other than that, it's calling the functions from (1). subscriptioncmds.c ------------------ 1) fetch_table_list, fetch_sequence_list Minimal differences, essentially just the catalog name. 2) AlterSubscription_refresh A lot of duplication, but the code handling tables and sequences is almost exactly the same and can be reused fairly easily (moved to a separate function, called for tables and then sequences). regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
pgsql-hackers by date: