Re: logical decoding and replication of sequences, take 2 - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: logical decoding and replication of sequences, take 2
Date
Msg-id 418ff5ac-1664-9038-8ac6-0dd308b5df07@enterprisedb.com
Whole thread Raw
In response to Re: logical decoding and replication of sequences, take 2  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: logical decoding and replication of sequences, take 2
List pgsql-hackers
On 11/29/23 14:42, Amit Kapila wrote:
> On Wed, Nov 29, 2023 at 2:59 AM Tomas Vondra
> <tomas.vondra@enterprisedb.com> wrote:
>>
>> I have been hacking on improving the improvements outlined in my
>> preceding e-mail, but I have some bad news - I ran into an issue that I
>> don't know how to solve :-(
>>
>> Consider this transaction:
>>
>>   BEGIN;
>>   ALTER SEQUENCE s RESTART 1000;
>>
>>   SAVEPOINT s1;
>>   ALTER SEQUENCE s RESTART 2000;
>>   ROLLBACK TO s1;
>>
>>   INSERT INTO seq_test SELECT nextval('s') FROM generate_series(1,40);
>>   COMMIT;
>>
>> If you try this with the approach relying on rd_newRelfilelocatorSubid
>> and rd_createSubid, it fails like this on the subscriber:
>>
>>   ERROR:  could not map filenode "base/5/16394" to relation OID
>>
>> This happens because ReorderBufferQueueSequence tries to do this in the
>> non-transactional branch:
>>
>>   reloid = RelidByRelfilenumber(rlocator.spcOid, rlocator.relNumber);
>>
>> and the relfilenode is the one created by the first ALTER. But this is
>> obviously wrong - the changes should have been treated as transactional,
>> because they are tied to the first ALTER. So how did we get there?
>>
>> Well, the whole problem is that in case of abort, AtEOSubXact_cleanup
>> resets the two fields to InvalidSubTransactionId. Which means the
>> rollback in the above transaction also forgets about the first ALTER.
>> Now that I look at the RelationData comments, it actually describes
>> exactly this situation:
>>
>>   *
>>   * rd_newRelfilelocatorSubid is the ID of the highest subtransaction
>>   * the most-recent relfilenumber change has survived into or zero if
>>   * not changed in the current transaction (or we have forgotten
>>   * changing it).  This field is accurate when non-zero, but it can be
>>   * zero when a relation has multiple new relfilenumbers within a
>>   * single transaction, with one of them occurring in a subsequently
>>   * aborted subtransaction, e.g.
>>   *    BEGIN;
>>   *    TRUNCATE t;
>>   *    SAVEPOINT save;
>>   *    TRUNCATE t;
>>   *    ROLLBACK TO save;
>>   *    -- rd_newRelfilelocatorSubid is now forgotten
>>   *
>>
>> The root of this problem is that we'd need some sort of "history" for
>> the field, so that when a subxact aborts, we can restore the previous
>> value. But we obviously don't have that, and I doubt we want to add that
>> to relcache - for example, it'd either need to impose some limit on the
>> history (and thus a failure when we reach the limit), or it'd need to
>> handle histories of arbitrary length.
>>
> 
> Yeah, I think that would be really tricky and we may not want to go there.
> 
>> At this point I don't see a solution for this, which means the best way
>> forward with the sequence decoding patch seems to be the original
>> approach, on the decoding side.
>>
> 
> One thing that worries me about that approach is that it can suck with
> the workload that has a lot of DDLs that create XLOG_SMGR_CREATE
> records. We have previously fixed some such workloads in logical
> decoding where decoding a transaction containing truncation of a table
> with a lot of partitions (1000 or more) used to take a very long time.
> Don't we face performance issues in such scenarios?
> 

I don't think we do, really. We will have to decode the SMGR records and
add the relfilenodes to the hash table(s), but I think that affects the
lookup performance too much. What I think might be a problem is if we
have many top-level transactions, especially if those transactions do
something that creates a relfilenode. Because then we'll have to do a
hash_search for each of them, and that might be measurable even if each
lookup is O(1). And we do the lookup for every sequence change ...

> How do we see this work w.r.t to some sort of global sequences? There
> is some recent discussion where I have raised a similar point [1].
> 
> [1] - https://www.postgresql.org/message-id/CAA4eK1JF%3D4_Eoq7FFjHSe98-_ooJ5QWd0s2_pj8gR%2B_dvwKxvA%40mail.gmail.com
> 

I think those are very different things, even though called "sequences".
AFAIK solutions like snowflakeID or UUIDs don't require replication of
any shared state (that's kinda the whole point), so I don't see why
would it need some special support in logical decoding.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Pavel Borisov
Date:
Subject: Re: Table AM Interface Enhancements
Next
From: Matthias van de Meent
Date:
Subject: Re: Parallel CREATE INDEX for BRIN indexes