Thread: Taking into account syncrep position in flush_lsn reported by apply worker
Hey. Currently synchronous_commit is by default disabled for logical apply worker on the ground that reported flush_lsn includes only locally flushed data so slot (publisher) preserves everything higher than this, and so in case of subscriber restart no data is lost. However, imagine that subscriber is made highly available by standby to which synchronous replication is enabled. Then reported flush_lsn is ignorant of this synchronous replication progress, and in case of failover data loss may occur if subscriber managed to ack flush_lsn ahead of syncrep. Moreover, it is almost silent due to this else if (start_lsn < slot->data.confirmed_flush) { /* * It might seem like we should error out in this case, but it's * pretty common for a client to acknowledge a LSN it doesn't have to * do anything for, and thus didn't store persistently, because the * xlog records didn't result in anything relevant for logical * decoding. Clients have to be able to do that to support synchronous * replication. * * Starting at a different LSN than requested might not catch certain * kinds of client errors; so the client may wish to check that * confirmed_flush_lsn matches its expectations. */ elog(LOG, "%X/%X has been already streamed, forwarding to %X/%X", LSN_FORMAT_ARGS(start_lsn), LSN_FORMAT_ARGS(slot->data.confirmed_flush)); start_lsn = slot->data.confirmed_flush; } in logical.c Attached draft patch fixes this by taking into account syncrep progress in worker reporting. -- cheers, arseny
Attachment
Re: Taking into account syncrep position in flush_lsn reported by apply worker
From
Arseny Sher
Date:
Sorry for the poor formatting of the message above, this should be better: Hey. Currently synchronous_commit is disabled for logical apply worker on the ground that reported flush_lsn includes only locally flushed data so slot (publisher) preserves everything higher than this, and so in case of subscriber restart no data is lost. However, imagine that subscriber is made highly available by standby to which synchronous replication is enabled. Then reported flush_lsn is ignorant of this synchronous replication progress, and in case of failover data loss may occur if subscriber managed to ack flush_lsn ahead of syncrep. Moreover, it is silent due to this else if (start_lsn < slot->data.confirmed_flush) { /* * It might seem like we should error out in this case, but it's * pretty common for a client to acknowledge a LSN it doesn't have to * do anything for, and thus didn't store persistently, because the * xlog records didn't result in anything relevant for logical * decoding. Clients have to be able to do that to support synchronous * replication. * * Starting at a different LSN than requested might not catch certain * kinds of client errors; so the client may wish to check that * confirmed_flush_lsn matches its expectations. */ elog(LOG, "%X/%X has been already streamed, forwarding to %X/%X", LSN_FORMAT_ARGS(start_lsn), LSN_FORMAT_ARGS(slot->data.confirmed_flush)); start_lsn = slot->data.confirmed_flush; } in logical.c Attached draft patch fixes this by taking into account syncrep progress in worker reporting.
Attachment
Re: Taking into account syncrep position in flush_lsn reported by apply worker
From
Amit Kapila
Date:
On Mon, Aug 12, 2024 at 3:43 PM Arseny Sher <ars@neon.tech> wrote: > > Sorry for the poor formatting of the message above, this should be better: > > Hey. Currently synchronous_commit is disabled for logical apply worker > on the ground that reported flush_lsn includes only locally flushed data > so slot (publisher) preserves everything higher than this, and so in > case of subscriber restart no data is lost. However, imagine that > subscriber is made highly available by standby to which synchronous > replication is enabled. Then reported flush_lsn is ignorant of this > synchronous replication progress, and in case of failover data loss may > occur if subscriber managed to ack flush_lsn ahead of syncrep. > Won't the same can be achieved by enabling the synchronous_commit parameter for a subscription? -- With Regards, Amit Kapila.