Re: Conflict Detection and Resolution - Mailing list pgsql-hackers

From Nisha Moond
Subject Re: Conflict Detection and Resolution
Date
Msg-id CABdArM42yLTQpMDuVFXFFsy8G9b=YouJnte5eZ8MMtq4YbwZGQ@mail.gmail.com
Whole thread Raw
In response to Re: Conflict Detection and Resolution  (shveta malik <shveta.malik@gmail.com>)
List pgsql-hackers
On Thu, Aug 22, 2024 at 3:45 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Wed, Aug 21, 2024 at 4:08 PM Nisha Moond <nisha.moond412@gmail.com> wrote:
> >
> > The patches have been rebased on the latest pgHead following the merge
> > of the conflict detection patch [1].
>
> Thanks for working on patches.
>
> Summarizing the issues which need some suggestions/thoughts.
>
> 1)
> For subscription based resolvers, currently the syntax implemented is:
>
> 1a)
> CREATE SUBSCRIPTION <subname>
> CONNECTION <conninfo> PUBLICATION <pubname>
> CONFLICT RESOLVER
>     (conflict_type1 = resolver1, conflict_type2 = resolver2,
> conflict_type3 = resolver3,...);
>
> 1b)
> ALTER SUBSCRIPTION <subname> CONFLICT RESOLVER
>     (conflict_type1 = resolver1, conflict_type2 = resolver2,
> conflict_type3 = resolver3,...);
>
> Earlier the syntax suggested in [1] was:
> CREATE SUBSCRIPTION <subname> CONNECTION <conninfo> PUBLICATION <pubname>
> CONFLICT RESOLVER 'conflict_resolver1' FOR 'conflict_type1',
> CONFLICT RESOLVER 'conflict_resolver2' FOR 'conflict_type2';
>
> I think the currently implemented syntax  is good as it has less
> repetition, unless others think otherwise.
>
> ~~
>
> 2)
> For subscription based resolvers, do we need a RESET command to reset
> resolvers to default? Any one of below or both?
>
> 2a) reset all at once:
>  ALTER SUBSCRIPTION <name> RESET CONFLICT RESOLVERS
>
> 2b) reset one at a time:
>  ALTER SUBSCRIPTION <name> RESET CONFLICT RESOLVER for 'conflict_type';
>
> The issue I see here is, to implement 1a and 1b, we have introduced
> the  'RESOLVER' keyword. If we want to implement 2a, we will have to
> introduce the 'RESOLVERS' keyword as well. But we can come up with
> some alternative syntax if we plan to implement these. Thoughts?
>
> ~~
>
> 3)  Regarding update_exists:
>
> 3a)
> Currently update_exists resolver patch is kept separate. The reason
> being, it performs resolution which will need deletion of multiple
> rows. It will be good to discuss if we want to target this in the
> first draft. Please see the example:
>
> create table tab (a int primary key, b int unique, c int unique);
>
> Pub: insert into tab  values (1,1,1);
> Sub:
> insert into tab  values (2,20,30);
> insert into tab values (3,40,50);
> insert into tab values (4,60,70);
>
> Pub: update tab set a=2,b=40,c=70 where a=1;
>
> The above 'update' on pub will result in 'update_exists' on sub and if
> resolution is in favour of 'apply', then it will conflict with all the
> three local rows of subscriber due to unique constraint present on all
> three columns. Thus in order to resolve the conflict, it will have to
> delete these 3 rows on sub:
>
> 2,20,30
> 3,40,50
> 4,60,70
> and then update 1,1,1 to 2,40,70.
>
> Just need opinion on if we shall target this in the initial draft.
>
> 3b)
> If we plan to implement this, we need to work on optimal design where
> we can find all the conflicting rows at once and delete those.
> Currently the implementation has been done using recursion i.e. find
> one conflicting row, then delete it and then next and so on i.e. we
> call  apply_handle_update_internal() recursively. On initial code
> review, I feel it is doable to scan all indexes at once and get
> conflicting-tuple-ids in one go and get rid of recursion. It can be
> attempted once we decide on 3a.
>
> ~~
>
> 4)
> Now for insert_exists and update_exists, we are doing a pre-scan of
> all unique indexes to find conflict. Also there is post-scan to figure
> out if the conflicting row is inserted meanwhile. This needs to be
> reviewed for optimization. We need to avoid pre-scan wherever
> possible. I think the only case for which it can be avoided is
> 'ERROR'. For the cases where resolver is in favor of remote-apply, we
> need to check conflict beforehand to avoid rollback of already
> inserted data. And for the case where resolver is in favor of skipping
> the change, then too we should know beforehand about the conflict to
> avoid heap-insertion and rollback. Thoughts?
>
+1 to the idea of optimization, but it seems that when the resolver is
set to ERROR, skipping the pre-scan only optimizes the case where no
conflict exists.
If a conflict is found, the apply-worker will error out during the
pre-scan, and no post-scan occurs, so there's no opportunity for
optimization.
However, if no conflict is present, we currently do both pre-scan and
post-scan. Skipping the pre-scan in this scenario could be a
worthwhile optimization, even if it only benefits the no-conflict
case.

--
Thanks,
Nisha



pgsql-hackers by date:

Previous
From: Tatsuo Ishii
Date:
Subject: Re: Row pattern recognition
Next
From: Pavel Stehule
Date:
Subject: maybe buggy implementation of NO INDENT in xmlserialize