RE: Conflict detection and logging in logical replication - Mailing list pgsql-hackers

From Zhijie Hou (Fujitsu)
Subject RE: Conflict detection and logging in logical replication
Date
Msg-id OS0PR01MB57166AF7C2D7B06E3CF23AD694B72@OS0PR01MB5716.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: Conflict detection and logging in logical replication  (Dilip Kumar <dilipbalaut@gmail.com>)
Responses RE: Conflict detection and logging in logical replication
List pgsql-hackers
On Monday, July 29, 2024 6:59 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> 
> On Mon, Jul 29, 2024 at 11:44 AM Zhijie Hou (Fujitsu)
> <houzj.fnst@fujitsu.com> wrote:
> >
> 
> I was going through v7-0001, and I have some initial comments.

Thanks for the comments !

> 
> @@ -536,11 +542,9 @@ ExecCheckIndexConstraints(ResultRelInfo
> *resultRelInfo, TupleTableSlot *slot,
>   ExprContext *econtext;
>   Datum values[INDEX_MAX_KEYS];
>   bool isnull[INDEX_MAX_KEYS];
> - ItemPointerData invalidItemPtr;
>   bool checkedIndex = false;
> 
>   ItemPointerSetInvalid(conflictTid);
> - ItemPointerSetInvalid(&invalidItemPtr);
> 
>   /*
>   * Get information from the result relation info structure.
> @@ -629,7 +633,7 @@ ExecCheckIndexConstraints(ResultRelInfo
> *resultRelInfo, TupleTableSlot *slot,
> 
>   satisfiesConstraint =
>   check_exclusion_or_unique_constraint(heapRelation, indexRelation,
> - indexInfo, &invalidItemPtr,
> + indexInfo, &slot->tts_tid,
>   values, isnull, estate, false,
>   CEOUC_WAIT, true,
>   conflictTid);
> 
> What is the purpose of this change?  I mean
> 'check_exclusion_or_unique_constraint' says that 'tupleid'
> should be invalidItemPtr if the tuple is not yet inserted and
> ExecCheckIndexConstraints is called by ExecInsert before inserting the tuple.
> So what is this change?

Because the function ExecCheckIndexConstraints() is now invoked after inserting
a tuple (in the patch). So, we need to ignore the newly inserted tuple when
checking conflict in check_exclusion_or_unique_constraint().

> Would this change ExecInsert's behavior as well?

Thanks for pointing it out, I will check and reply.

> 
> ----
> ----
> 
> +ReCheckConflictIndexes(ResultRelInfo *resultRelInfo, EState *estate,
> +    ConflictType type, List *recheckIndexes,
> +    TupleTableSlot *slot)
> +{
> + /* Re-check all the unique indexes for potential conflicts */
> +foreach_oid(uniqueidx, resultRelInfo->ri_onConflictArbiterIndexes)
> + {
> + TupleTableSlot *conflictslot;
> +
> + if (list_member_oid(recheckIndexes, uniqueidx) &&
> + FindConflictTuple(resultRelInfo, estate, uniqueidx, slot,
> + &conflictslot)) { RepOriginId origin; TimestampTz committs;
> + TransactionId xmin;
> +
> + GetTupleCommitTs(conflictslot, &xmin, &origin, &committs);
> +ReportApplyConflict(ERROR, type, resultRelInfo->ri_RelationDesc,
> +uniqueidx,  xmin, origin, committs, conflictslot);  }  } }
>  and
> 
> + conflictindexes = resultRelInfo->ri_onConflictArbiterIndexes;
> +
>   if (resultRelInfo->ri_NumIndices > 0)
>   recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
> -    slot, estate, false, false,
> -    NULL, NIL, false);
> +    slot, estate, false,
> +    conflictindexes ? true : false,
> +    &conflict,
> +    conflictindexes, false);
> +
> + /*
> + * Rechecks the conflict indexes to fetch the conflicting local tuple
> + * and reports the conflict. We perform this check here, instead of
> + * perform an additional index scan before the actual insertion and
> + * reporting the conflict if any conflicting tuples are found. This is
> + * to avoid the overhead of executing the extra scan for each INSERT
> + * operation, even when no conflict arises, which could introduce
> + * significant overhead to replication, particularly in cases where
> + * conflicts are rare.
> + */
> + if (conflict)
> + ReCheckConflictIndexes(resultRelInfo, estate, CT_INSERT_EXISTS,
> +    recheckIndexes, slot);
> 
> 
>  This logic is confusing, first, you are calling
> ExecInsertIndexTuples() with no duplicate error for the indexes
> present in 'ri_onConflictArbiterIndexes' which means
>  the indexes returned by the function must be a subset of
> 'ri_onConflictArbiterIndexes' and later in ReCheckConflictIndexes()
> you are again processing all the
>  indexes of 'ri_onConflictArbiterIndexes' and checking if any of these
> is a subset of the indexes that is returned by
> ExecInsertIndexTuples().

I think that's not always true. The indexes returned by the function *may not*
be a subset of 'ri_onConflictArbiterIndexes'. Based on the comments atop of the
ExecInsertIndexTuples, it returns a list of index OIDs for any unique or
exclusion constraints that are deferred, and in addition to that, it will
include the indexes in 'arbiterIndexes' if noDupErr == true.

> 
>  Why are we doing that, I think we can directly use the
> 'recheckIndexes' which is returned by ExecInsertIndexTuples(), and
> those indexes are guaranteed to be a subset of
>  ri_onConflictArbiterIndexes. No?

Based on above, we need to filter the deferred indexes or exclusion constraints
in the 'ri_onConflictArbiterIndexes'.

Best Regards,
Hou zj


pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: [PoC] Federated Authn/z with OAUTHBEARER
Next
From: Tomas Vondra
Date:
Subject: Re: Make COPY format extendable: Extract COPY TO format implementations