Hi,
Here's a rebased version of the sequence decoding patch.
0001 is a fix for the pre-existing issue in logicalmsg_decode,
attempting to build a snapshot before getting into a consistent state.
AFAICS this only affects assert-enabled builds and is otherwise
harmless, because we are not actually using the snapshot (apply gets a
valid snapshot from the transaction).
This is mostly the fix I shared in November, except that I kept the call
in decode.c (per comment from Andres). I haven't added any argument to
SnapBuildProcessChange because we may need to backpatch this (and it
didn't seem much simpler, IMHO).
0002 is a rebased version of the original approach, committed as
0da92dc530 (and then reverted in 2c7ea57e56). This includes the same fix
as 0001 (for the sequence messages), the primary reason for the revert.
The rebase was not quite straightforward, due to extensive changes in
how publications deal with tables/schemas, and so on. So this adopts
them, but other than that it behaves just like the original patch.
So this abandons the approach with COMMIT-time logging for sequences
accessed/modified by the transaction, proposed in response to the
revert. It seemed like a good (and simpler) alternative, but there were
far too many issues - higher overhead, ordering of records for
concurrent transactions, making it reliable, etc.
I think the main remaining question is what's the goal of this patch, or
rather what "guarantees" we expect from it - what we expect to see on
the replica after incrementing a sequence on the primary.
Robert described [1] a model and argued the standby should not "invent"
new states. It's a long / detailed explanation, I'm not going to try to
shorten in here because that'd inevitably omit various details. So
better read it whole ...
Anyway, I don't think this approach (essentially treating most sequence
increments as non-transactional) breaks any consistency guarantees or
introduces any "new" states that would not be observable on the primary.
In a way, this treats non-transactional sequence increments as separate
transactions, and applies them directly. If you read the sequence in
between two commits, you might see any "intermediate" state of the
sequence - that's the nature of non-transactional changes.
We could "postpone" applying the decoded changes until the first next
commit, which might improve performance if a transaction is long enough
to cover many sequence increments. But that's more a performance
optimization than a matter of correctness, IMHO.
One caveat is that because of how WAL works for sequences, we're
actually decoding changes "ahead" so if you read the sequence on the
subscriber it'll actually seem to be slightly ahead (up to ~32 values).
This could be eliminated by setting SEQ_LOG_VALS to 0, which however
increases the sequence costs, of course.
This however brings me to the original question what's the purpose of
this patch - and that's essentially keeping sequences up to date to make
them usable after a failover. We can't generate values from the sequence
on the subscriber, because it'd just get overwritten. And from this
point of view, it's also fine that the sequence is slightly ahead,
because that's what happens after crash recovery anyway. And we're not
guaranteeing the sequences to be gap-less.
regards
[1]
https://www.postgresql.org/message-id/CA%2BTgmoaYG7672OgdwpGm5cOwy8_ftbs%3D3u-YMvR9fiJwQUzgrQ%40mail.gmail.com
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company