Re: logical decoding and replication of sequences, take 2 - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: logical decoding and replication of sequences, take 2 |
Date | |
Msg-id | 8bf1c518-b886-fe1b-5c42-09f9c663146d@enterprisedb.com Whole thread Raw |
In response to | Re: logical decoding and replication of sequences, take 2 (Tomas Vondra <tomas.vondra@enterprisedb.com>) |
Responses |
Re: logical decoding and replication of sequences, take 2
Re: logical decoding and replication of sequences, take 2 |
List | pgsql-hackers |
Hi, I noticed on cfbot the patch no longer applies, so here's a rebased version. Most of the breakage was due to the column filtering reworks, grammar changes etc. A lot of bitrot, but mostly mechanical stuff. I haven't looked into the optimizations / improvements I discussed in my previous post (logging only LSN of the last WAL-logged increment), because while fixing "make check-world" I ran into a more serious issue that I think needs to be discussed first. And I suspect it might also affect the feasibility of the LSN optimization. So, what's the issue - the current solution is based on WAL-logging state of all sequences incremented by the transaction at COMMIT. To do that, we read the state from disk, and write that into WAL. However, these WAL messages are not necessarily correlated to COMMIT records, so stuff like this might happen: 1. transaction T1 increments sequence S 2. transaction T2 increments sequence S 3. both T1 and T2 start to COMMIT 4. T1 reads state of S from disk, writes it into WAL 5. transaction T3 increments sequence S 6. T2 reads state of S from disk, writes it into WAL 7. T2 write COMMIT into WAL 8. T1 write COMMIT into WAL Because the apply order is determined by ordering of COMMIT records, this means we'd apply the increments logged by T2, and then by T1. But that undoes the increment by T3, and the sequence would go backwards. The previous patch version addressed that by acquiring lock on the sequence, holding it until transaction end. This effectively ensures the order of sequence messages and COMMIT matches. But that's problematic for a number of reasons: 1) throughput reduction, because the COMMIT records need to serialize 2) deadlock risk, if we happen to lock sequences in different order (in different transactions) 3) problem for prepared transactions - the sequences are locked and logged in PrepareTransaction, because we may not have seqhashtab beyond that point. This is a much worse variant of (1). Note: I also wonder what happens if someone does DISCARD SEQUENCES. I guess we'll forget the sequences, which is bad - so we'd have to invent a separate cache that does not have this issue. I realized (3) because one of the test_decoding TAP tests got stuck exactly because of a sequence locked by a prepared transaction. This patch simply releases the lock after writing the WAL message, but that just makes it vulnerable to the reordering. And this would have been true even with the LSN optimization. However, I was thinking that maybe we could use the LSN of the WAL message (XLOG_LOGICAL_SEQUENCE) to deal with the ordering issue, because *this* is the sensible sequence increment ordering. In the example above, we'd first apply the WAL message from T2 (because that commits first). And then we'd get to apply T1, but the WAL message has an older LSN, so we'd skip it. But this requires us remembering LSN of the already applied WAL sequence messages, which could be tricky - we'd need to persist it in some way because of restarts, etc. We can't do this while decoding but on the apply side, I think, because of streaming, aborts. The other option might be to make these messages non-transactional, in which case we'd separate the ordering from COMMIT ordering, evading the reordering problem. That'd mean we'd ignore rollbacks (which seems fine), we could probably optimize this by checking if the state actually changed, etc. But we'd also need to deal with transactions created in the (still uncommitted) transaction. But I'm also worried it might lead to the same issue with non-transactional behaviors that forced revert in v15. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
pgsql-hackers by date: