Re: [PATCH] Fix stale relation close in sequence synchronization - Mailing list pgsql-hackers

From vignesh C
Subject Re: [PATCH] Fix stale relation close in sequence synchronization
Date
Msg-id CALDaNm2VSvHVJYTji65qGZCsTHKcDZKNaHcNhaBZywC4yAA=3g@mail.gmail.com
Whole thread
In response to Re: [PATCH] Fix stale relation close in sequence synchronization  (Ayush Tiwari <ayushtiwari.slg01@gmail.com>)
Responses Re: [PATCH] Fix stale relation close in sequence synchronization
List pgsql-hackers
On Tue, 28 Apr 2026 at 18:04, Ayush Tiwari <ayushtiwari.slg01@gmail.com> wrote:
>
> Hi,
>
>
> On Tue, 28 Apr 2026 at 17:44, Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> wrote:
>>
>> Dear Ayush,
>>
>> > I found a crash in the logical replication sequence synchronization worker
>> > when the publisher returns NULL sequence data for a sequence after at least
>> > one sequence in the same sync batch has already been processed.
>>
>> Good catch. I confirmed the HEAD can crash with your added test.
>
>
> Thanks for checking and confirming the crash.
>
>>
>>
>> > The attached patch clears the output Relation pointer at the start of
>> > get_and_validate_seq_info() and clears the local pointer in copy_sequences()
>> > after closing it. That keeps early returns from reusing a relation from a
>> > previous row.
>>
>> To confirm; can't we declare the sequence_rel in the inner-loop? My first
>> impression was the bug caused by the wrong lifetime. Are there any other
>> thoughts around here?
>
>
> I agree that declaring sequence_rel in the tuple-processing loop is
> cleaner. The relation belongs only to the current publisher result row,
> so limiting the variable's lifetime makes the intended ownership clearer
> and prevents any value from carrying over to the next row.
>
> I have kept the initialization of the output argument in
> get_and_validate_seq_info(), so every return path leaves it in a defined
> state. In v2, the caller-side pointer is declared inside the inner loop,
> and the explicit reset after table_close() is no longer needed.
>
> Attached is v2 with that change.

Thanks for finding and reporting the issue. I was able to reproduce
the issue with the steps you provided.
Few comments:
1) Here "SELECT nextval('regress_no_select');" and "REVOKE ALL ON
SEQUENCE regress_no_select FROM PUBLIC;" is not required for this test
case:
+$node_publisher->safe_psql(
+ 'postgres', qq(
+ CREATE ROLE regress_seq_repl LOGIN REPLICATION;
+ CREATE SEQUENCE regress_no_select;
+ SELECT nextval('regress_no_select');
+ GRANT USAGE ON SCHEMA public TO regress_seq_repl;
+ GRANT SELECT ON ALL SEQUENCES IN SCHEMA public TO regress_seq_repl;
+ REVOKE ALL ON SEQUENCE regress_no_select FROM PUBLIC;
+ REVOKE ALL ON SEQUENCE regress_no_select FROM regress_seq_repl;
+));

2) Since the comment about “dropped concurrently” has been removed,
could you merge that context into the new wording:
  /*
- * last_value can be NULL if the sequence was dropped concurrently (see
- * pg_get_sequence_data()).
+ * The sequence data can be NULL if it is not accessible on the publisher
+ * (see pg_get_sequence_data()).
  */

Something like:
/*
* The sequence data can be NULL due to insufficient privileges or if
* the sequence was dropped concurrently (see pg_get_sequence_data()).
*/

3) Can we change this:
##########
# A NULL sequence data row from the publisher must not make the subscriber
# close the previously synchronized sequence relation again.
##########

To something like:
##########
# Ensure that insufficient privileges on the publisher for a sequence
# do not disrupt the subscriber. The subscriber should log a warning
# and continue retrying.
##########

Regards,
Vignesh



pgsql-hackers by date:

Previous
From: Ayush Tiwari
Date:
Subject: Re: [PATCH] Fix stale relation close in sequence synchronization
Next
From: Tom Lane
Date:
Subject: Re: [BUG?] macOS (Intel) build warnings: "ranlib: file … has no symbols" for aarch64 objects