On Fri, Jan 7, 2022 at 12:54 PM Jeff Davis <pgsql@j-davis.com> wrote:
>
> On Wed, 2022-01-05 at 23:59 -0800, SATYANARAYANA NARLAPURAM wrote:
> > I would like to propose a GUC send_Wal_after_quorum_committed which
> > when set to ON, walsenders corresponds to async standbys and logical
> > replication workers wait until the LSN is quorum committed on the
> > primary before sending it to the standby. This not only simplifies
> > the post failover steps but avoids unnecessary downtime for the async
> > replicas. Thoughts?
>
> Do we need a GUC? Or should we just always require that sync rep is
> satisfied before sending to async replicas?
>
> It feels like the sync quorum should always be ahead of the async
> replicas. Unless I'm missing a use case, or there is some kind of
> performance gotcha.
IMO, having GUC is a reasonable choice because some users might be
okay with it if their async replicas are ahead of the sync ones or
they would have dealt with this problem already in their HA solutions
or they don't want their async replicas to fall behind by the primary
(most of the times).
If there are long running txns on the primary and the async standbys
were to wait until quorum commit from sync standbys, won't they fall
behind the primary by too much? This isn't a problem at all if we
think from the perspective that async replicas are anyways prone to
falling behind by the primary. But, if the primary is having long
running txns continuously, the async replicas would eventually fall
behind more and more. Is there a way we can send the WAL records to
both sync and async replicas together but the async replicas won't
apply those WAL records until primary tells the standbys that quorum
commit is obtained? If the quorum commit isn't obtained by the
primary, the async replicas can ignore to apply the WAL records and
discard them.
Regards,
Bharath Rupireddy.