Re: Fix a bug in DecodeAbort() and improve input data check on subscriber. - Mailing list pgsql-hackers

From vignesh C
Subject Re: Fix a bug in DecodeAbort() and improve input data check on subscriber.
Date
Msg-id CALDaNm1wX4JmbyonLkL0KEE=ixvA-zyB90f-KtY=Xm4PAXC10A@mail.gmail.com
Whole thread Raw
In response to Fix a bug in DecodeAbort() and improve input data check on subscriber.  (Masahiko Sawada <sawada.mshk@gmail.com>)
List pgsql-hackers
On Tue, Dec 7, 2021 at 6:07 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> Hi all,
>
> While updating the patch I recently posted[1] to make pg_waldump
> report replication origin ID, LSN, and timestamp, I found a bug that
> replication origin timestamp is not set in ROLLBACK PREPARED case.
> Commit 8bdb1332eb5 (CC'ed Amit) added an argument to
> ReorderBufferFinishPrepared() but in ROLLBACK PREPARED case, the
> caller specified it at the wrong position:
>
> @@ -730,6 +730,7 @@ DecodeCommit(LogicalDecodingContext *ctx,
> XLogRecordBuffer *buf,
>     if (two_phase)
>     {
>         ReorderBufferFinishPrepared(ctx->reorder, xid, buf->origptr,
> buf->endptr,
> +
> SnapBuildInitialConsistentPoint(ctx->snapshot_builder),
>                                     commit_time, origin_id, origin_lsn,
>                                     parsed->twophase_gid, true);
>     }
> @@ -868,6 +869,7 @@ DecodeAbort(LogicalDecodingContext *ctx,
> XLogRecordBuffer *buf,
>     {
>         ReorderBufferFinishPrepared(ctx->reorder, xid, buf->origptr,
> buf->endptr,
>                                     abort_time, origin_id, origin_lsn,
> +                                   InvalidXLogRecPtr,
>                                     parsed->twophase_gid, false);
>     }
>
> This affects the value of rollback_data.rollback_time on the
> subscriber, resulting in setting the wrong timestamp to both the
> replication origin timestamp and the error callback argument on the
> subscriber. I've attached the patch to fix it.
>
> Besides, I think we can improve checking input data on subscribers.
> This bug was not detected by compilers but it could have been detected
> if we checked the input data. Looking at logicalrep_read_xxx functions
> in proto.c, there are some inconsistencies; we check the value of
> prepare_data->xid in logicalrep_read_prepare_common() but we don't in
> both logicalrep_read_commit_prepared() and
> logicalrep_read_rollback_prepared(), and we don't check anything in
> stream_start/stream_stop. Also, IIUC that since timestamps are always
> set in prepare/commit prepared/rollback prepared cases we can check
> them too. I've attached a PoC patch that introduces macros for
> checking input data and adds some new checks. Since it could be
> frequently called, I used unlikely() but probably we can also consider
> replacing elog(ERROR) with assertions.

The first patch is required as suggested and fixes the problem.
Few comments on the second patch:
1) Should we validate prepare_time and xid in
logicalrep_read_begin_prepare. Is this not checked intentionally?
2) Similarly should we validate committime in logicalrep_read_stream_commit?

Regards,
Vignesh



pgsql-hackers by date:

Previous
From: "kuroda.hayato@fujitsu.com"
Date:
Subject: RE: Allow escape in application_name
Next
From: Ronan Dunklau
Date:
Subject: Re: Non-superuser subscription owners