On Tue, Jun 11, 2024 at 7:12 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
>
> Sorry for not responding to this thread earlier (two conferences in two
> weeks), but isn't the pushed fix addressing a symptom instead of the
> actual root cause?
>
> Why should it be OK for the subscriber to confirm a flush LSN and then
> later take that back and report a lower LSN? Seems somewhat against my
> understanding of what "flush LSN" means.
>
The reason is that the subscriber doesn't persistently store/advance
the LSN for which it doesn't have to do anything like DDLs. But still,
the subscriber has to acknowledge such LSNs for synchronous
replication. We have comments/code at various places to deal with this
[1][2]. Now, after the restart, the subscriber won't know of such LSNs
so it will use its origin LSN which is the LSN of the last applied
transaction (and it can be before the LSN that last time the
subscriber had acknowledged). I had once thought to persist such LSNs
on subscriber by advancing the origin but that could be overhead in
certain workloads where logical decoding doesn't yield anything
meaningful for subscribers. So it needs more thought.
> The commit message explains this happens when the subscriber does not
> need to do anything for - but then why shouldn't it just report the
> prior LSN, in such cases?
>
It is required for synchronous replication.
> I haven't looked into the details, but my concern is this removes an
> useful assert, protecting us against certain type of bugs. And now we'll
> just happily ignore them. Is that a good idea?
>
The assert was added in this release. I was also having the same
understanding as yours which is why I added it. However, the case
presented by Vignesh has revealed that I was wrong.
--
With Regards,
Amit Kapila.