Re: logical decoding and replication of sequences, take 2 - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: logical decoding and replication of sequences, take 2 |
Date | |
Msg-id | 418ff5ac-1664-9038-8ac6-0dd308b5df07@enterprisedb.com Whole thread Raw |
In response to | Re: logical decoding and replication of sequences, take 2 (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: logical decoding and replication of sequences, take 2
|
List | pgsql-hackers |
On 11/29/23 14:42, Amit Kapila wrote: > On Wed, Nov 29, 2023 at 2:59 AM Tomas Vondra > <tomas.vondra@enterprisedb.com> wrote: >> >> I have been hacking on improving the improvements outlined in my >> preceding e-mail, but I have some bad news - I ran into an issue that I >> don't know how to solve :-( >> >> Consider this transaction: >> >> BEGIN; >> ALTER SEQUENCE s RESTART 1000; >> >> SAVEPOINT s1; >> ALTER SEQUENCE s RESTART 2000; >> ROLLBACK TO s1; >> >> INSERT INTO seq_test SELECT nextval('s') FROM generate_series(1,40); >> COMMIT; >> >> If you try this with the approach relying on rd_newRelfilelocatorSubid >> and rd_createSubid, it fails like this on the subscriber: >> >> ERROR: could not map filenode "base/5/16394" to relation OID >> >> This happens because ReorderBufferQueueSequence tries to do this in the >> non-transactional branch: >> >> reloid = RelidByRelfilenumber(rlocator.spcOid, rlocator.relNumber); >> >> and the relfilenode is the one created by the first ALTER. But this is >> obviously wrong - the changes should have been treated as transactional, >> because they are tied to the first ALTER. So how did we get there? >> >> Well, the whole problem is that in case of abort, AtEOSubXact_cleanup >> resets the two fields to InvalidSubTransactionId. Which means the >> rollback in the above transaction also forgets about the first ALTER. >> Now that I look at the RelationData comments, it actually describes >> exactly this situation: >> >> * >> * rd_newRelfilelocatorSubid is the ID of the highest subtransaction >> * the most-recent relfilenumber change has survived into or zero if >> * not changed in the current transaction (or we have forgotten >> * changing it). This field is accurate when non-zero, but it can be >> * zero when a relation has multiple new relfilenumbers within a >> * single transaction, with one of them occurring in a subsequently >> * aborted subtransaction, e.g. >> * BEGIN; >> * TRUNCATE t; >> * SAVEPOINT save; >> * TRUNCATE t; >> * ROLLBACK TO save; >> * -- rd_newRelfilelocatorSubid is now forgotten >> * >> >> The root of this problem is that we'd need some sort of "history" for >> the field, so that when a subxact aborts, we can restore the previous >> value. But we obviously don't have that, and I doubt we want to add that >> to relcache - for example, it'd either need to impose some limit on the >> history (and thus a failure when we reach the limit), or it'd need to >> handle histories of arbitrary length. >> > > Yeah, I think that would be really tricky and we may not want to go there. > >> At this point I don't see a solution for this, which means the best way >> forward with the sequence decoding patch seems to be the original >> approach, on the decoding side. >> > > One thing that worries me about that approach is that it can suck with > the workload that has a lot of DDLs that create XLOG_SMGR_CREATE > records. We have previously fixed some such workloads in logical > decoding where decoding a transaction containing truncation of a table > with a lot of partitions (1000 or more) used to take a very long time. > Don't we face performance issues in such scenarios? > I don't think we do, really. We will have to decode the SMGR records and add the relfilenodes to the hash table(s), but I think that affects the lookup performance too much. What I think might be a problem is if we have many top-level transactions, especially if those transactions do something that creates a relfilenode. Because then we'll have to do a hash_search for each of them, and that might be measurable even if each lookup is O(1). And we do the lookup for every sequence change ... > How do we see this work w.r.t to some sort of global sequences? There > is some recent discussion where I have raised a similar point [1]. > > [1] - https://www.postgresql.org/message-id/CAA4eK1JF%3D4_Eoq7FFjHSe98-_ooJ5QWd0s2_pj8gR%2B_dvwKxvA%40mail.gmail.com > I think those are very different things, even though called "sequences". AFAIK solutions like snowflakeID or UUIDs don't require replication of any shared state (that's kinda the whole point), so I don't see why would it need some special support in logical decoding. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: