Thread: Conflict Detection and Resolution

Conflict Detection and Resolution

From
shveta malik
Date:
Hello hackers,

Please find the proposal for Conflict Detection and Resolution (CDR)
for Logical replication.
<Thanks to Nisha, Hou-San, and Amit who helped in figuring out the
below details.>

Introduction
================
In case the node is subscribed to multiple providers, or when local
writes happen on a subscriber, conflicts can arise for the incoming
changes.  CDR is the mechanism to automatically detect and resolve
these conflicts depending on the application and configurations.
CDR is not applicable for the initial table sync. If locally, there
exists conflicting data on the table, the table sync worker will fail.
Please find the details on CDR in apply worker for INSERT, UPDATE and
DELETE operations:

INSERT
================
To resolve INSERT conflict on subscriber, it is important to find out
the conflicting row (if any) before we attempt an insertion. The
indexes or search preference for the same will be:
First check for replica identity (RI) index.
  - if not found, check for the primary key (PK) index.
    - if not found, then check for unique indexes (individual ones or
added by unique constraints)
         - if unique index also not found, skip CDR

Note: if no RI index, PK, or unique index is found but
REPLICA_IDENTITY_FULL is defined, CDR will still be skipped.
The reason being that even though a row can be identified with
REPLICAT_IDENTITY_FULL, such tables are allowed to have duplicate
rows. Hence, we should not go for conflict detection in such a case.

In case of replica identity ‘nothing’ and in absence of any suitable
index (as defined above), CDR will be skipped for INSERT.

Conflict Type:
----------------
insert_exists: A conflict is detected when the table has the same
value for a key column as the new value in the incoming row.

Conflict Resolution
----------------
a) latest_timestamp_wins:    The change with later commit timestamp wins.
b) earliest_timestamp_wins:   The change with earlier commit timestamp wins.
c) apply:   Always apply the remote change.
d) skip:    Remote change is skipped.
e) error:   Error out on conflict. Replication is stopped, manual
action is needed.

The change will be converted to 'UPDATE' and applied if the decision
is in favor of applying remote change.

It is important to have commit timestamp info available on subscriber
when latest_timestamp_wins or earliest_timestamp_wins method is chosen
as resolution method.  Thus ‘track_commit_timestamp’ must be enabled
on subscriber, in absence of which, configuring the said
timestamp-based resolution methods will result in error.

Note: If the user has chosen the latest or earliest_timestamp_wins,
and the remote and local timestamps are the same, then it will go by
system identifier. The change with a higher system identifier will
win. This will ensure that the same change is picked on all the nodes.

 UPDATE
================

Conflict Detection Method:
--------------------------------
Origin conflict detection: The ‘origin’ info is used to detect
conflict which can be obtained from commit-timestamp generated for
incoming txn at the source node. To compare remote’s origin with the
local’s origin, we must have origin information for local txns as well
which can be obtained from commit-timestamp after enabling
‘track_commit_timestamp’ locally.
The one drawback here is the ‘origin’ information cannot be obtained
once the row is frozen and the commit-timestamp info is removed by
vacuum. For a frozen row, conflicts cannot be raised, and thus the
incoming changes will be applied in all the cases.

Conflict Types:
----------------
a) update_differ: The origin of an incoming update's key row differs
from the local row i.e.; the row has already been updated locally or
by different nodes.
b) update_missing: The row with the same value as that incoming
update's key does not exist. Remote is trying to update a row which
does not exist locally.
c) update_deleted: The row with the same value as that incoming
update's key does not exist. The row is already deleted. This conflict
type is generated only if the deleted row is still detectable i.e., it
is not removed by VACUUM yet. If the row is removed by VACUUM already,
it cannot detect this conflict. It will detect it as update_missing
and will follow the default or configured resolver of update_missing
itself.

 Conflict Resolutions:
----------------
a)   latest_timestamp_wins:    The change with later commit timestamp
wins. Can be used for ‘update_differ’.
b)   earliest_timestamp_wins:   The change with earlier commit
timestamp wins. Can be used for ‘update_differ’.
c)   apply:   The remote change is always applied.  Can be used for
‘update_differ’.
d)   apply_or_skip: Remote change is converted to INSERT and is
applied. If the complete row cannot be constructed from the info
provided by the publisher, then the change is skipped. Can be used for
‘update_missing’ or ‘update_deleted’.
e)   apply_or_error: Remote change is converted to INSERT and is
applied. If the complete row cannot be constructed from the info
provided by the publisher, then error is raised. Can be used for
‘update_missing’ or ‘update_deleted’.
f) skip: Remote change is skipped and local one is retained. Can be
used for any conflict type.
g)   error: Error out of conflict. Replication is stopped, manual
action is needed.  Can be used for any conflict type.

 To support UPDATE CDR, the presence of either replica identity Index
or primary key is required on target node.  Update CDR will not be
supported in absence of replica identity index or primary key even
though REPLICA IDENTITY FULL is set.  Please refer to "UPDATE" in
"Noteworthey Scenarios" section in [1] for further details.

DELETE
================
Conflict Type:
----------------
delete_missing: An incoming delete is trying to delete a row on a
target node which does not exist.

Conflict Resolutions:
----------------
a)   error : Error out on conflict. Replication is stopped, manual
action is needed.
b)   skip  : The remote change is skipped.

Configuring Conflict Resolution:
------------------------------------------------
There are two parts when it comes to configuring CDR:

a) Enabling/Disabling conflict detection.
b) Configuring conflict resolvers for different conflict types.

 Users can sometimes create multiple subscriptions on the same node,
subscribing to different tables to improve replication performance by
starting multiple apply workers. If the tables in one subscription are
less likely to cause conflict, then it is possible that user may want
conflict detection disabled for that subscription to avoid detection
latency while enabling it for other subscriptions.  This generates a
requirement to make ‘conflict detection’ configurable per
subscription. While the conflict resolver configuration can remain
global. All the subscriptions which opt for ‘conflict detection’ will
follow global conflict resolver configuration.

To implement the above, subscription commands will be changed to have
one more parameter 'conflict_resolution=on/off', default will be OFF.

To configure global resolvers, new DDL command will be introduced:

CONFLICT RESOLVER ON <conflict_type> IS <conflict_resolver>

-------------------------

Apart from the above three main operations and resolver configuration,
there are more conflict types like primary-key updates, multiple
unique constraints etc and some special scenarios to be considered.
Complete design details can be found in [1].

[1]: https://wiki.postgresql.org/wiki/Conflict_Detection_and_Resolution

thanks
Shveta



Re: Conflict Detection and Resolution

From
Tomas Vondra
Date:
On 5/23/24 08:36, shveta malik wrote:
> Hello hackers,
> 
> Please find the proposal for Conflict Detection and Resolution (CDR)
> for Logical replication.
> <Thanks to Nisha, Hou-San, and Amit who helped in figuring out the
> below details.>
> 
> Introduction
> ================
> In case the node is subscribed to multiple providers, or when local
> writes happen on a subscriber, conflicts can arise for the incoming
> changes.  CDR is the mechanism to automatically detect and resolve
> these conflicts depending on the application and configurations.
> CDR is not applicable for the initial table sync. If locally, there
> exists conflicting data on the table, the table sync worker will fail.
> Please find the details on CDR in apply worker for INSERT, UPDATE and
> DELETE operations:
> 

Which architecture are you aiming for? Here you talk about multiple
providers, but the wiki page mentions active-active. I'm not sure how
much this matters, but it might.

Also, what kind of consistency you expect from this? Because none of
these simple conflict resolution methods can give you the regular
consistency models we're used to, AFAICS.

> INSERT
> ================
> To resolve INSERT conflict on subscriber, it is important to find out
> the conflicting row (if any) before we attempt an insertion. The
> indexes or search preference for the same will be:
> First check for replica identity (RI) index.
>   - if not found, check for the primary key (PK) index.
>     - if not found, then check for unique indexes (individual ones or
> added by unique constraints)
>          - if unique index also not found, skip CDR
> 
> Note: if no RI index, PK, or unique index is found but
> REPLICA_IDENTITY_FULL is defined, CDR will still be skipped.
> The reason being that even though a row can be identified with
> REPLICAT_IDENTITY_FULL, such tables are allowed to have duplicate
> rows. Hence, we should not go for conflict detection in such a case.
> 

It's not clear to me why would REPLICA_IDENTITY_FULL mean the table is
allowed to have duplicate values? It just means the upstream is sending
the whole original row, there can still be a PK/UNIQUE index on both the
publisher and subscriber.

> In case of replica identity ‘nothing’ and in absence of any suitable
> index (as defined above), CDR will be skipped for INSERT.
> 
> Conflict Type:
> ----------------
> insert_exists: A conflict is detected when the table has the same
> value for a key column as the new value in the incoming row.
> 
> Conflict Resolution
> ----------------
> a) latest_timestamp_wins:    The change with later commit timestamp wins.
> b) earliest_timestamp_wins:   The change with earlier commit timestamp wins.
> c) apply:   Always apply the remote change.
> d) skip:    Remote change is skipped.
> e) error:   Error out on conflict. Replication is stopped, manual
> action is needed.
> 

Why not to have some support for user-defined conflict resolution
methods, allowing to do more complex stuff (e.g. merging the rows in
some way, perhaps even with datatype-specific behavior)?

> The change will be converted to 'UPDATE' and applied if the decision
> is in favor of applying remote change.
> 
> It is important to have commit timestamp info available on subscriber
> when latest_timestamp_wins or earliest_timestamp_wins method is chosen
> as resolution method.  Thus ‘track_commit_timestamp’ must be enabled
> on subscriber, in absence of which, configuring the said
> timestamp-based resolution methods will result in error.
> 
> Note: If the user has chosen the latest or earliest_timestamp_wins,
> and the remote and local timestamps are the same, then it will go by
> system identifier. The change with a higher system identifier will
> win. This will ensure that the same change is picked on all the nodes.

How is this going to deal with the fact that commit LSN and timestamps
may not correlate perfectly? That is, commits may happen with LSN1 <
LSN2 but with T1 > T2.

> 
>  UPDATE
> ================
> 
> Conflict Detection Method:
> --------------------------------
> Origin conflict detection: The ‘origin’ info is used to detect
> conflict which can be obtained from commit-timestamp generated for
> incoming txn at the source node. To compare remote’s origin with the
> local’s origin, we must have origin information for local txns as well
> which can be obtained from commit-timestamp after enabling
> ‘track_commit_timestamp’ locally.
> The one drawback here is the ‘origin’ information cannot be obtained
> once the row is frozen and the commit-timestamp info is removed by
> vacuum. For a frozen row, conflicts cannot be raised, and thus the
> incoming changes will be applied in all the cases.
> 
> Conflict Types:
> ----------------
> a) update_differ: The origin of an incoming update's key row differs
> from the local row i.e.; the row has already been updated locally or
> by different nodes.
> b) update_missing: The row with the same value as that incoming
> update's key does not exist. Remote is trying to update a row which
> does not exist locally.
> c) update_deleted: The row with the same value as that incoming
> update's key does not exist. The row is already deleted. This conflict
> type is generated only if the deleted row is still detectable i.e., it
> is not removed by VACUUM yet. If the row is removed by VACUUM already,
> it cannot detect this conflict. It will detect it as update_missing
> and will follow the default or configured resolver of update_missing
> itself.
> 

I don't understand the why should update_missing or update_deleted be
different, especially considering it's not detected reliably. And also
that even if we happen to find the row the associated TOAST data may
have already been removed. So why would this matter?


>  Conflict Resolutions:
> ----------------
> a)   latest_timestamp_wins:    The change with later commit timestamp
> wins. Can be used for ‘update_differ’.
> b)   earliest_timestamp_wins:   The change with earlier commit
> timestamp wins. Can be used for ‘update_differ’.
> c)   apply:   The remote change is always applied.  Can be used for
> ‘update_differ’.
> d)   apply_or_skip: Remote change is converted to INSERT and is
> applied. If the complete row cannot be constructed from the info
> provided by the publisher, then the change is skipped. Can be used for
> ‘update_missing’ or ‘update_deleted’.
> e)   apply_or_error: Remote change is converted to INSERT and is
> applied. If the complete row cannot be constructed from the info
> provided by the publisher, then error is raised. Can be used for
> ‘update_missing’ or ‘update_deleted’.
> f) skip: Remote change is skipped and local one is retained. Can be
> used for any conflict type.
> g)   error: Error out of conflict. Replication is stopped, manual
> action is needed.  Can be used for any conflict type.
> 
>  To support UPDATE CDR, the presence of either replica identity Index
> or primary key is required on target node.  Update CDR will not be
> supported in absence of replica identity index or primary key even
> though REPLICA IDENTITY FULL is set.  Please refer to "UPDATE" in
> "Noteworthey Scenarios" section in [1] for further details.
> 
> DELETE
> ================
> Conflict Type:
> ----------------
> delete_missing: An incoming delete is trying to delete a row on a
> target node which does not exist.
> 
> Conflict Resolutions:
> ----------------
> a)   error : Error out on conflict. Replication is stopped, manual
> action is needed.
> b)   skip  : The remote change is skipped.
> 
> Configuring Conflict Resolution:
> ------------------------------------------------
> There are two parts when it comes to configuring CDR:
> 
> a) Enabling/Disabling conflict detection.
> b) Configuring conflict resolvers for different conflict types.
> 
>  Users can sometimes create multiple subscriptions on the same node,
> subscribing to different tables to improve replication performance by
> starting multiple apply workers. If the tables in one subscription are
> less likely to cause conflict, then it is possible that user may want
> conflict detection disabled for that subscription to avoid detection
> latency while enabling it for other subscriptions.  This generates a
> requirement to make ‘conflict detection’ configurable per
> subscription. While the conflict resolver configuration can remain
> global. All the subscriptions which opt for ‘conflict detection’ will
> follow global conflict resolver configuration.
> 
> To implement the above, subscription commands will be changed to have
> one more parameter 'conflict_resolution=on/off', default will be OFF.
> 
> To configure global resolvers, new DDL command will be introduced:
> 
> CONFLICT RESOLVER ON <conflict_type> IS <conflict_resolver>
> 

I very much doubt we want a single global conflict resolver, or even one
resolver per subscription. It seems like a very table-specific thing.

Also, doesn't all this whole design ignore the concurrency between
publishers? Isn't this problematic considering the commit timestamps may
go backwards (for a given publisher), which means the conflict
resolution is not deterministic (as it depends on how exactly it
interleaves)?


> -------------------------
> 
> Apart from the above three main operations and resolver configuration,
> there are more conflict types like primary-key updates, multiple
> unique constraints etc and some special scenarios to be considered.
> Complete design details can be found in [1].
> 
> [1]: https://wiki.postgresql.org/wiki/Conflict_Detection_and_Resolution
> 

Hmmm, not sure it's good to have a "complete" design on wiki, and only
some subset posted to the mailing list. I haven't compared what the
differences are, though.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Conflict Detection and Resolution

From
shveta malik
Date:
On Sat, May 25, 2024 at 2:39 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
>
> On 5/23/24 08:36, shveta malik wrote:
> > Hello hackers,
> >
> > Please find the proposal for Conflict Detection and Resolution (CDR)
> > for Logical replication.
> > <Thanks to Nisha, Hou-San, and Amit who helped in figuring out the
> > below details.>
> >
> > Introduction
> > ================
> > In case the node is subscribed to multiple providers, or when local
> > writes happen on a subscriber, conflicts can arise for the incoming
> > changes.  CDR is the mechanism to automatically detect and resolve
> > these conflicts depending on the application and configurations.
> > CDR is not applicable for the initial table sync. If locally, there
> > exists conflicting data on the table, the table sync worker will fail.
> > Please find the details on CDR in apply worker for INSERT, UPDATE and
> > DELETE operations:
> >
>
> Which architecture are you aiming for? Here you talk about multiple
> providers, but the wiki page mentions active-active. I'm not sure how
> much this matters, but it might.

Currently, we are working for multi providers case but ideally it
should work for active-active also. During further discussion and
implementation phase, if we find that, there are cases which will not
work in straight-forward way for active-active, then our primary focus
will remain to first implement it for multiple providers architecture.

>
> Also, what kind of consistency you expect from this? Because none of
> these simple conflict resolution methods can give you the regular
> consistency models we're used to, AFAICS.

Can you please explain a little bit more on this.

>
> > INSERT
> > ================
> > To resolve INSERT conflict on subscriber, it is important to find out
> > the conflicting row (if any) before we attempt an insertion. The
> > indexes or search preference for the same will be:
> > First check for replica identity (RI) index.
> >   - if not found, check for the primary key (PK) index.
> >     - if not found, then check for unique indexes (individual ones or
> > added by unique constraints)
> >          - if unique index also not found, skip CDR
> >
> > Note: if no RI index, PK, or unique index is found but
> > REPLICA_IDENTITY_FULL is defined, CDR will still be skipped.
> > The reason being that even though a row can be identified with
> > REPLICAT_IDENTITY_FULL, such tables are allowed to have duplicate
> > rows. Hence, we should not go for conflict detection in such a case.
> >
>
> It's not clear to me why would REPLICA_IDENTITY_FULL mean the table is
> allowed to have duplicate values? It just means the upstream is sending
> the whole original row, there can still be a PK/UNIQUE index on both the
> publisher and subscriber.

Yes, right. Sorry for confusion. I meant the same i.e. in absence of
'RI index, PK, or unique index', tables can have duplicates. So even
in presence of Replica-identity (FULL in this case) but in absence of
unique/primary index, CDR will be skipped for INSERT.

>
> > In case of replica identity ‘nothing’ and in absence of any suitable
> > index (as defined above), CDR will be skipped for INSERT.
> >
> > Conflict Type:
> > ----------------
> > insert_exists: A conflict is detected when the table has the same
> > value for a key column as the new value in the incoming row.
> >
> > Conflict Resolution
> > ----------------
> > a) latest_timestamp_wins:    The change with later commit timestamp wins.
> > b) earliest_timestamp_wins:   The change with earlier commit timestamp wins.
> > c) apply:   Always apply the remote change.
> > d) skip:    Remote change is skipped.
> > e) error:   Error out on conflict. Replication is stopped, manual
> > action is needed.
> >
>
> Why not to have some support for user-defined conflict resolution
> methods, allowing to do more complex stuff (e.g. merging the rows in
> some way, perhaps even with datatype-specific behavior)?

Initially, for the sake of simplicity, we are targeting to support
built-in resolvers. But we have a plan to work on user-defined
resolvers as well. We shall propose that separately.

>
> > The change will be converted to 'UPDATE' and applied if the decision
> > is in favor of applying remote change.
> >
> > It is important to have commit timestamp info available on subscriber
> > when latest_timestamp_wins or earliest_timestamp_wins method is chosen
> > as resolution method.  Thus ‘track_commit_timestamp’ must be enabled
> > on subscriber, in absence of which, configuring the said
> > timestamp-based resolution methods will result in error.
> >
> > Note: If the user has chosen the latest or earliest_timestamp_wins,
> > and the remote and local timestamps are the same, then it will go by
> > system identifier. The change with a higher system identifier will
> > win. This will ensure that the same change is picked on all the nodes.
>
> How is this going to deal with the fact that commit LSN and timestamps
> may not correlate perfectly? That is, commits may happen with LSN1 <
> LSN2 but with T1 > T2.

Are you pointing to the issue where a session/txn has taken
'xactStopTimestamp' timestamp earlier but is delayed to insert record
in XLOG, while another session/txn which has taken timestamp slightly
later succeeded to insert the record IN XLOG sooner than the session1,
making LSN and Timestamps out of sync? Going by this scenario, the
commit-timestamp may not be reflective of actual commits and thus
timestamp-based resolvers may take wrong decisions. Or do you mean
something else?

If this is the problem you are referring to, then I think this needs a
fix at the publisher side. Let me think more about it . Kindly let me
know if you have ideas on how to tackle it.

> >
> >  UPDATE
> > ================
> >
> > Conflict Detection Method:
> > --------------------------------
> > Origin conflict detection: The ‘origin’ info is used to detect
> > conflict which can be obtained from commit-timestamp generated for
> > incoming txn at the source node. To compare remote’s origin with the
> > local’s origin, we must have origin information for local txns as well
> > which can be obtained from commit-timestamp after enabling
> > ‘track_commit_timestamp’ locally.
> > The one drawback here is the ‘origin’ information cannot be obtained
> > once the row is frozen and the commit-timestamp info is removed by
> > vacuum. For a frozen row, conflicts cannot be raised, and thus the
> > incoming changes will be applied in all the cases.
> >
> > Conflict Types:
> > ----------------
> > a) update_differ: The origin of an incoming update's key row differs
> > from the local row i.e.; the row has already been updated locally or
> > by different nodes.
> > b) update_missing: The row with the same value as that incoming
> > update's key does not exist. Remote is trying to update a row which
> > does not exist locally.
> > c) update_deleted: The row with the same value as that incoming
> > update's key does not exist. The row is already deleted. This conflict
> > type is generated only if the deleted row is still detectable i.e., it
> > is not removed by VACUUM yet. If the row is removed by VACUUM already,
> > it cannot detect this conflict. It will detect it as update_missing
> > and will follow the default or configured resolver of update_missing
> > itself.
> >
>
> I don't understand the why should update_missing or update_deleted be
> different, especially considering it's not detected reliably. And also
> that even if we happen to find the row the associated TOAST data may
> have already been removed. So why would this matter?

Here, we are trying to tackle the case where the row is 'recently'
deleted i.e. concurrent UPDATE and DELETE on pub and sub. User may
want to opt for a different resolution in such a case as against the
one where the corresponding row was not even present in the first
place. The case where the row was deleted long back may not fall into
this category as there are higher chances that they have been removed
by vacuum and can be considered equivalent to the update_ missing
case.

Regarding "TOAST column" for deleted row cases, we may need to dig
more. Thanks for bringing this case. Let me analyze more here.

>
> >  Conflict Resolutions:
> > ----------------
> > a)   latest_timestamp_wins:    The change with later commit timestamp
> > wins. Can be used for ‘update_differ’.
> > b)   earliest_timestamp_wins:   The change with earlier commit
> > timestamp wins. Can be used for ‘update_differ’.
> > c)   apply:   The remote change is always applied.  Can be used for
> > ‘update_differ’.
> > d)   apply_or_skip: Remote change is converted to INSERT and is
> > applied. If the complete row cannot be constructed from the info
> > provided by the publisher, then the change is skipped. Can be used for
> > ‘update_missing’ or ‘update_deleted’.
> > e)   apply_or_error: Remote change is converted to INSERT and is
> > applied. If the complete row cannot be constructed from the info
> > provided by the publisher, then error is raised. Can be used for
> > ‘update_missing’ or ‘update_deleted’.
> > f) skip: Remote change is skipped and local one is retained. Can be
> > used for any conflict type.
> > g)   error: Error out of conflict. Replication is stopped, manual
> > action is needed.  Can be used for any conflict type.
> >
> >  To support UPDATE CDR, the presence of either replica identity Index
> > or primary key is required on target node.  Update CDR will not be
> > supported in absence of replica identity index or primary key even
> > though REPLICA IDENTITY FULL is set.  Please refer to "UPDATE" in
> > "Noteworthey Scenarios" section in [1] for further details.
> >
> > DELETE
> > ================
> > Conflict Type:
> > ----------------
> > delete_missing: An incoming delete is trying to delete a row on a
> > target node which does not exist.
> >
> > Conflict Resolutions:
> > ----------------
> > a)   error : Error out on conflict. Replication is stopped, manual
> > action is needed.
> > b)   skip  : The remote change is skipped.
> >
> > Configuring Conflict Resolution:
> > ------------------------------------------------
> > There are two parts when it comes to configuring CDR:
> >
> > a) Enabling/Disabling conflict detection.
> > b) Configuring conflict resolvers for different conflict types.
> >
> >  Users can sometimes create multiple subscriptions on the same node,
> > subscribing to different tables to improve replication performance by
> > starting multiple apply workers. If the tables in one subscription are
> > less likely to cause conflict, then it is possible that user may want
> > conflict detection disabled for that subscription to avoid detection
> > latency while enabling it for other subscriptions.  This generates a
> > requirement to make ‘conflict detection’ configurable per
> > subscription. While the conflict resolver configuration can remain
> > global. All the subscriptions which opt for ‘conflict detection’ will
> > follow global conflict resolver configuration.
> >
> > To implement the above, subscription commands will be changed to have
> > one more parameter 'conflict_resolution=on/off', default will be OFF.
> >
> > To configure global resolvers, new DDL command will be introduced:
> >
> > CONFLICT RESOLVER ON <conflict_type> IS <conflict_resolver>
> >
>
> I very much doubt we want a single global conflict resolver, or even one
> resolver per subscription. It seems like a very table-specific thing.

Even we thought about this. We feel that even if we go for table based
or subscription based resolvers configuration, there may be use case
scenarios where the user is not interested in configuring resolvers
for each table and thus may want to give global ones. Thus, we should
provide a way for users to do global configuration. Thus we started
with global one. I have noted your point here and would also like to
know the opinion of others. We are open to discussion. We can either
opt for any of these 2 options (global or table) or we can opt for
both global and table/sub based one.

>
> Also, doesn't all this whole design ignore the concurrency between
> publishers? Isn't this problematic considering the commit timestamps may
> go backwards (for a given publisher), which means the conflict
> resolution is not deterministic (as it depends on how exactly it
> interleaves)?
>
>
> > -------------------------
> >
> > Apart from the above three main operations and resolver configuration,
> > there are more conflict types like primary-key updates, multiple
> > unique constraints etc and some special scenarios to be considered.
> > Complete design details can be found in [1].
> >
> > [1]: https://wiki.postgresql.org/wiki/Conflict_Detection_and_Resolution
> >
>
> Hmmm, not sure it's good to have a "complete" design on wiki, and only
> some subset posted to the mailing list. I haven't compared what the
> differences are, though.

It would have been difficult to mention all the details in email
(including examples and corner scenarios) and thus we thought that it
will be better to document everything in wiki page for the time being.
We can keep on discussing the design and all the scenarios on need
basis (before implementation phase of that part) and thus eventually
everything will come in email on hackers. With out first patch, we
plan to provide everything in a README as well.

thanks
Shveta



Re: Conflict Detection and Resolution

From
Nisha Moond
Date:
On Mon, May 27, 2024 at 11:19 AM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Sat, May 25, 2024 at 2:39 AM Tomas Vondra
> <tomas.vondra@enterprisedb.com> wrote:
> >
> > On 5/23/24 08:36, shveta malik wrote:
> > > Hello hackers,
> > >
> > > Please find the proposal for Conflict Detection and Resolution (CDR)
> > > for Logical replication.
> > > <Thanks to Nisha, Hou-San, and Amit who helped in figuring out the
> > > below details.>
> > >
> > > Introduction
> > > ================
> > > In case the node is subscribed to multiple providers, or when local
> > > writes happen on a subscriber, conflicts can arise for the incoming
> > > changes.  CDR is the mechanism to automatically detect and resolve
> > > these conflicts depending on the application and configurations.
> > > CDR is not applicable for the initial table sync. If locally, there
> > > exists conflicting data on the table, the table sync worker will fail.
> > > Please find the details on CDR in apply worker for INSERT, UPDATE and
> > > DELETE operations:
> > >
> >
> > Which architecture are you aiming for? Here you talk about multiple
> > providers, but the wiki page mentions active-active. I'm not sure how
> > much this matters, but it might.
>
> Currently, we are working for multi providers case but ideally it
> should work for active-active also. During further discussion and
> implementation phase, if we find that, there are cases which will not
> work in straight-forward way for active-active, then our primary focus
> will remain to first implement it for multiple providers architecture.
>
> >
> > Also, what kind of consistency you expect from this? Because none of
> > these simple conflict resolution methods can give you the regular
> > consistency models we're used to, AFAICS.
>
> Can you please explain a little bit more on this.
>
> >
> > > INSERT
> > > ================
> > > To resolve INSERT conflict on subscriber, it is important to find out
> > > the conflicting row (if any) before we attempt an insertion. The
> > > indexes or search preference for the same will be:
> > > First check for replica identity (RI) index.
> > >   - if not found, check for the primary key (PK) index.
> > >     - if not found, then check for unique indexes (individual ones or
> > > added by unique constraints)
> > >          - if unique index also not found, skip CDR
> > >
> > > Note: if no RI index, PK, or unique index is found but
> > > REPLICA_IDENTITY_FULL is defined, CDR will still be skipped.
> > > The reason being that even though a row can be identified with
> > > REPLICAT_IDENTITY_FULL, such tables are allowed to have duplicate
> > > rows. Hence, we should not go for conflict detection in such a case.
> > >
> >
> > It's not clear to me why would REPLICA_IDENTITY_FULL mean the table is
> > allowed to have duplicate values? It just means the upstream is sending
> > the whole original row, there can still be a PK/UNIQUE index on both the
> > publisher and subscriber.
>
> Yes, right. Sorry for confusion. I meant the same i.e. in absence of
> 'RI index, PK, or unique index', tables can have duplicates. So even
> in presence of Replica-identity (FULL in this case) but in absence of
> unique/primary index, CDR will be skipped for INSERT.
>
> >
> > > In case of replica identity ‘nothing’ and in absence of any suitable
> > > index (as defined above), CDR will be skipped for INSERT.
> > >
> > > Conflict Type:
> > > ----------------
> > > insert_exists: A conflict is detected when the table has the same
> > > value for a key column as the new value in the incoming row.
> > >
> > > Conflict Resolution
> > > ----------------
> > > a) latest_timestamp_wins:    The change with later commit timestamp wins.
> > > b) earliest_timestamp_wins:   The change with earlier commit timestamp wins.
> > > c) apply:   Always apply the remote change.
> > > d) skip:    Remote change is skipped.
> > > e) error:   Error out on conflict. Replication is stopped, manual
> > > action is needed.
> > >
> >
> > Why not to have some support for user-defined conflict resolution
> > methods, allowing to do more complex stuff (e.g. merging the rows in
> > some way, perhaps even with datatype-specific behavior)?
>
> Initially, for the sake of simplicity, we are targeting to support
> built-in resolvers. But we have a plan to work on user-defined
> resolvers as well. We shall propose that separately.
>
> >
> > > The change will be converted to 'UPDATE' and applied if the decision
> > > is in favor of applying remote change.
> > >
> > > It is important to have commit timestamp info available on subscriber
> > > when latest_timestamp_wins or earliest_timestamp_wins method is chosen
> > > as resolution method.  Thus ‘track_commit_timestamp’ must be enabled
> > > on subscriber, in absence of which, configuring the said
> > > timestamp-based resolution methods will result in error.
> > >
> > > Note: If the user has chosen the latest or earliest_timestamp_wins,
> > > and the remote and local timestamps are the same, then it will go by
> > > system identifier. The change with a higher system identifier will
> > > win. This will ensure that the same change is picked on all the nodes.
> >
> > How is this going to deal with the fact that commit LSN and timestamps
> > may not correlate perfectly? That is, commits may happen with LSN1 <
> > LSN2 but with T1 > T2.
>
> Are you pointing to the issue where a session/txn has taken
> 'xactStopTimestamp' timestamp earlier but is delayed to insert record
> in XLOG, while another session/txn which has taken timestamp slightly
> later succeeded to insert the record IN XLOG sooner than the session1,
> making LSN and Timestamps out of sync? Going by this scenario, the
> commit-timestamp may not be reflective of actual commits and thus
> timestamp-based resolvers may take wrong decisions. Or do you mean
> something else?
>
> If this is the problem you are referring to, then I think this needs a
> fix at the publisher side. Let me think more about it . Kindly let me
> know if you have ideas on how to tackle it.
>
> > >
> > >  UPDATE
> > > ================
> > >
> > > Conflict Detection Method:
> > > --------------------------------
> > > Origin conflict detection: The ‘origin’ info is used to detect
> > > conflict which can be obtained from commit-timestamp generated for
> > > incoming txn at the source node. To compare remote’s origin with the
> > > local’s origin, we must have origin information for local txns as well
> > > which can be obtained from commit-timestamp after enabling
> > > ‘track_commit_timestamp’ locally.
> > > The one drawback here is the ‘origin’ information cannot be obtained
> > > once the row is frozen and the commit-timestamp info is removed by
> > > vacuum. For a frozen row, conflicts cannot be raised, and thus the
> > > incoming changes will be applied in all the cases.
> > >
> > > Conflict Types:
> > > ----------------
> > > a) update_differ: The origin of an incoming update's key row differs
> > > from the local row i.e.; the row has already been updated locally or
> > > by different nodes.
> > > b) update_missing: The row with the same value as that incoming
> > > update's key does not exist. Remote is trying to update a row which
> > > does not exist locally.
> > > c) update_deleted: The row with the same value as that incoming
> > > update's key does not exist. The row is already deleted. This conflict
> > > type is generated only if the deleted row is still detectable i.e., it
> > > is not removed by VACUUM yet. If the row is removed by VACUUM already,
> > > it cannot detect this conflict. It will detect it as update_missing
> > > and will follow the default or configured resolver of update_missing
> > > itself.
> > >
> >
> > I don't understand the why should update_missing or update_deleted be
> > different, especially considering it's not detected reliably. And also
> > that even if we happen to find the row the associated TOAST data may
> > have already been removed. So why would this matter?
>
> Here, we are trying to tackle the case where the row is 'recently'
> deleted i.e. concurrent UPDATE and DELETE on pub and sub. User may
> want to opt for a different resolution in such a case as against the
> one where the corresponding row was not even present in the first
> place. The case where the row was deleted long back may not fall into
> this category as there are higher chances that they have been removed
> by vacuum and can be considered equivalent to the update_ missing
> case.
>
> Regarding "TOAST column" for deleted row cases, we may need to dig
> more. Thanks for bringing this case. Let me analyze more here.
>
I tested a simple case with a table with one TOAST column and found
that when a tuple with a TOAST column is deleted, both the tuple and
corresponding pg_toast entries are marked as ‘deleted’ (dead) but not
removed immediately. The main tuple and respective pg_toast entry are
permanently deleted only during vacuum. First, the main table’s dead
tuples are vacuumed, followed by the secondary TOAST relation ones (if
available).
Please let us know if you have a specific scenario in mind where the
TOAST column data is deleted immediately upon ‘delete’ operation,
rather than during vacuum, which we are missing.

Thanks,
Nisha



Re: Conflict Detection and Resolution

From
Amit Kapila
Date:
On Sat, May 25, 2024 at 2:39 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
>
> On 5/23/24 08:36, shveta malik wrote:
> >
> > Conflict Resolution
> > ----------------
> > a) latest_timestamp_wins:    The change with later commit timestamp wins.
> > b) earliest_timestamp_wins:   The change with earlier commit timestamp wins.
> > c) apply:   Always apply the remote change.
> > d) skip:    Remote change is skipped.
> > e) error:   Error out on conflict. Replication is stopped, manual
> > action is needed.
> >
>
> Why not to have some support for user-defined conflict resolution
> methods, allowing to do more complex stuff (e.g. merging the rows in
> some way, perhaps even with datatype-specific behavior)?
>
> > The change will be converted to 'UPDATE' and applied if the decision
> > is in favor of applying remote change.
> >
> > It is important to have commit timestamp info available on subscriber
> > when latest_timestamp_wins or earliest_timestamp_wins method is chosen
> > as resolution method.  Thus ‘track_commit_timestamp’ must be enabled
> > on subscriber, in absence of which, configuring the said
> > timestamp-based resolution methods will result in error.
> >
> > Note: If the user has chosen the latest or earliest_timestamp_wins,
> > and the remote and local timestamps are the same, then it will go by
> > system identifier. The change with a higher system identifier will
> > win. This will ensure that the same change is picked on all the nodes.
>
> How is this going to deal with the fact that commit LSN and timestamps
> may not correlate perfectly? That is, commits may happen with LSN1 <
> LSN2 but with T1 > T2.
>

One of the possible scenarios discussed at pgconf.dev with Tomas for
this was as follows:

Say there are two publisher nodes PN1, PN2, and subscriber node SN3.
The logical replication is configured such that a subscription on SN3
has publications from both PN1 and PN2. For example, SN3 (sub) -> PN1,
PN2 (p1, p2)

Now, on PN1, we have the following operations that update the same row:

T1
Update-1 on table t1 at LSN1 (1000) on time (200)

T2
Update-2 on table t1 at LSN2 (2000) on time (100)

Then in parallel, we have the following operation on node PN2 that
updates the same row as Update-1, and Update-2 on node PN1.

T3
Update-3 on table t1 at LSN(1500) on time (150)

By theory, we can have a different state on subscribers depending on
the order of updates arriving at SN3 which shouldn't happen. Say, the
order in which they reach SN3 is: Update-1, Update-2, Update-3 then
the final row we have is by Update-3 considering we have configured
last_update_wins as a conflict resolution method. Now, consider the
other order:  Update-1, Update-3, Update-2, in this case, the final
row will be by Update-2 because when we try to apply Update-3, it will
generate a conflict and as per the resolution method
(last_update_wins) we need to retain Update-1.

On further thinking, the operations on node-1 PN-1 as defined above
seem impossible because one of the Updates needs to wait for the other
to write a commit record. So the commits may happen with LSN1 < LSN2
but with T1 > T2 but they can't be on the same row due to locks. So,
the order of apply should still be consistent. Am, I missing
something?

--
With Regards,
Amit Kapila.



Re: Conflict Detection and Resolution

From
Amit Kapila
Date:
On Mon, May 27, 2024 at 11:19 AM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Sat, May 25, 2024 at 2:39 AM Tomas Vondra
> <tomas.vondra@enterprisedb.com> wrote:
> >
> > >
> > > Conflict Resolution
> > > ----------------
> > > a) latest_timestamp_wins:    The change with later commit timestamp wins.
> > > b) earliest_timestamp_wins:   The change with earlier commit timestamp wins.

Can you share the use case of "earliest_timestamp_wins" resolution
method? It seems after the initial update on the local node, it will
never allow remote update to succeed which sounds a bit odd. Jan has
shared this and similar concerns about this resolution method, so I
have added him to the email as well.

> > >
> > > Conflict Types:
> > > ----------------
> > > a) update_differ: The origin of an incoming update's key row differs
> > > from the local row i.e.; the row has already been updated locally or
> > > by different nodes.
> > > b) update_missing: The row with the same value as that incoming
> > > update's key does not exist. Remote is trying to update a row which
> > > does not exist locally.
> > > c) update_deleted: The row with the same value as that incoming
> > > update's key does not exist. The row is already deleted. This conflict
> > > type is generated only if the deleted row is still detectable i.e., it
> > > is not removed by VACUUM yet. If the row is removed by VACUUM already,
> > > it cannot detect this conflict. It will detect it as update_missing
> > > and will follow the default or configured resolver of update_missing
> > > itself.
> > >
> >
> > I don't understand the why should update_missing or update_deleted be
> > different, especially considering it's not detected reliably. And also
> > that even if we happen to find the row the associated TOAST data may
> > have already been removed. So why would this matter?
>
> Here, we are trying to tackle the case where the row is 'recently'
> deleted i.e. concurrent UPDATE and DELETE on pub and sub. User may
> want to opt for a different resolution in such a case as against the
> one where the corresponding row was not even present in the first
> place. The case where the row was deleted long back may not fall into
> this category as there are higher chances that they have been removed
> by vacuum and can be considered equivalent to the update_ missing
> case.
>

I think to make 'update_deleted' work, we need another scan with a
different snapshot type to find the recently deleted row. I don't know
if it is a good idea to scan the index twice with different snapshots,
so for the sake of simplicity, can we consider 'updated_deleted' same
as 'update_missing'? If we think it is an important case to consider
then we can try to accomplish this once we finalize the
design/implementation of other resolution methods.

> > >
> > > To implement the above, subscription commands will be changed to have
> > > one more parameter 'conflict_resolution=on/off', default will be OFF.
> > >
> > > To configure global resolvers, new DDL command will be introduced:
> > >
> > > CONFLICT RESOLVER ON <conflict_type> IS <conflict_resolver>
> > >
> >
> > I very much doubt we want a single global conflict resolver, or even one
> > resolver per subscription. It seems like a very table-specific thing.
>

+1 to make it a table-level configuration but we probably need
something at the global level as well such that by default if users
don't define anything at table-level global-level configuration will
be used.

>
> >
> > Also, doesn't all this whole design ignore the concurrency between
> > publishers? Isn't this problematic considering the commit timestamps may
> > go backwards (for a given publisher), which means the conflict
> > resolution is not deterministic (as it depends on how exactly it
> > interleaves)?
> >

I am not able to imagine the cases you are worried about. Can you
please be specific? Is it similar to the case I described in
yesterday's email [1]?

[1] - https://www.postgresql.org/message-id/CAA4eK1JTMiBOoGqkt%3DaLPLU8Rs45ihbLhXaGHsz8XC76%2BOG3%2BQ%40mail.gmail.com

--
With Regards,
Amit Kapila.



Re: Conflict Detection and Resolution

From
shveta malik
Date:
On Tue, Jun 4, 2024 at 9:37 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> > >
> > > >
> > > > Conflict Resolution
> > > > ----------------
> > > > a) latest_timestamp_wins:    The change with later commit timestamp wins.
> > > > b) earliest_timestamp_wins:   The change with earlier commit timestamp wins.
>
> Can you share the use case of "earliest_timestamp_wins" resolution
> method? It seems after the initial update on the local node, it will
> never allow remote update to succeed which sounds a bit odd. Jan has
> shared this and similar concerns about this resolution method, so I
> have added him to the email as well.

I do not have the exact scenario for this.  But I feel, if 2 nodes are
concurrently inserting different data against a primary key, then some
users may have preferences that retain the row which was inserted
earlier. It is no different from latest_timestamp_wins. It totally
depends upon what kind of application and requirement the user may
have, based on which, he may discard the later coming rows (specially
for INSERT case).

> > > > Conflict Types:
> > > > ----------------
> > > > a) update_differ: The origin of an incoming update's key row differs
> > > > from the local row i.e.; the row has already been updated locally or
> > > > by different nodes.
> > > > b) update_missing: The row with the same value as that incoming
> > > > update's key does not exist. Remote is trying to update a row which
> > > > does not exist locally.
> > > > c) update_deleted: The row with the same value as that incoming
> > > > update's key does not exist. The row is already deleted. This conflict
> > > > type is generated only if the deleted row is still detectable i.e., it
> > > > is not removed by VACUUM yet. If the row is removed by VACUUM already,
> > > > it cannot detect this conflict. It will detect it as update_missing
> > > > and will follow the default or configured resolver of update_missing
> > > > itself.
> > > >
> > >
> > > I don't understand the why should update_missing or update_deleted be
> > > different, especially considering it's not detected reliably. And also
> > > that even if we happen to find the row the associated TOAST data may
> > > have already been removed. So why would this matter?
> >
> > Here, we are trying to tackle the case where the row is 'recently'
> > deleted i.e. concurrent UPDATE and DELETE on pub and sub. User may
> > want to opt for a different resolution in such a case as against the
> > one where the corresponding row was not even present in the first
> > place. The case where the row was deleted long back may not fall into
> > this category as there are higher chances that they have been removed
> > by vacuum and can be considered equivalent to the update_ missing
> > case.
> >
>
> I think to make 'update_deleted' work, we need another scan with a
> different snapshot type to find the recently deleted row. I don't know
> if it is a good idea to scan the index twice with different snapshots,
> so for the sake of simplicity, can we consider 'updated_deleted' same
> as 'update_missing'? If we think it is an important case to consider
> then we can try to accomplish this once we finalize the
> design/implementation of other resolution methods.

I think it is important for scenarios when data is being updated and
deleted concurrently. But yes, I agree that implementation may have
some performance hit for this case. We can tackle this scenario at a
later stage.

> > > >
> > > > To implement the above, subscription commands will be changed to have
> > > > one more parameter 'conflict_resolution=on/off', default will be OFF.
> > > >
> > > > To configure global resolvers, new DDL command will be introduced:
> > > >
> > > > CONFLICT RESOLVER ON <conflict_type> IS <conflict_resolver>
> > > >
> > >
> > > I very much doubt we want a single global conflict resolver, or even one
> > > resolver per subscription. It seems like a very table-specific thing.
> >
>
> +1 to make it a table-level configuration but we probably need
> something at the global level as well such that by default if users
> don't define anything at table-level global-level configuration will
> be used.
>
> >
> > >
> > > Also, doesn't all this whole design ignore the concurrency between
> > > publishers? Isn't this problematic considering the commit timestamps may
> > > go backwards (for a given publisher), which means the conflict
> > > resolution is not deterministic (as it depends on how exactly it
> > > interleaves)?
> > >
>
> I am not able to imagine the cases you are worried about. Can you
> please be specific? Is it similar to the case I described in
> yesterday's email [1]?
>
> [1] -
https://www.postgresql.org/message-id/CAA4eK1JTMiBOoGqkt%3DaLPLU8Rs45ihbLhXaGHsz8XC76%2BOG3%2BQ%40mail.gmail.com
>

thanks
Shveta



RE: Conflict Detection and Resolution

From
"Zhijie Hou (Fujitsu)"
Date:
Hi,

This time at PGconf.dev[1], we had some discussions regarding this
project. The proposed approach is to split the work into two main
components. The first part focuses on conflict detection, which aims to
identify and report conflicts in logical replication. This feature will
enable users to monitor the unexpected conflicts that may occur. The
second part involves the actual conflict resolution. Here, we will provide
built-in resolutions for each conflict and allow user to choose which
resolution will be used for which conflict(as described in the initial
email of this thread).
 
Of course, we are open to alternative ideas and suggestions, and the
strategy above can be changed based on ongoing discussions and feedback
received.
 
Here is the patch of the first part work, which adds a new parameter
detect_conflict for CREATE and ALTER subscription commands. This new
parameter will decide if subscription will go for conflict detection. By
default, conflict detection will be off for a subscription.
 
When conflict detection is enabled, additional logging is triggered in the
following conflict scenarios:
 
* updating a row that was previously modified by another origin.
* The tuple to be updated is not found.
* The tuple to be deleted is not found.
 
While there exist other conflict types in logical replication, such as an
incoming insert conflicting with an existing row due to a primary key or
unique index, these cases already result in constraint violation errors.
Therefore, additional conflict detection for these cases is currently
omitted to minimize potential overhead. However, the pre-detection for
conflict in these error cases is still essential to support automatic
conflict resolution in the future.

[1] https://2024.pgconf.dev/

Best Regards,
Hou zj

Attachment

Re: Conflict Detection and Resolution

From
Dilip Kumar
Date:
On Wed, Jun 5, 2024 at 9:12 AM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Tue, Jun 4, 2024 at 9:37 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > > >
> > > > >
> > > > > Conflict Resolution
> > > > > ----------------
> > > > > a) latest_timestamp_wins:    The change with later commit timestamp wins.
> > > > > b) earliest_timestamp_wins:   The change with earlier commit timestamp wins.
> >
> > Can you share the use case of "earliest_timestamp_wins" resolution
> > method? It seems after the initial update on the local node, it will
> > never allow remote update to succeed which sounds a bit odd. Jan has
> > shared this and similar concerns about this resolution method, so I
> > have added him to the email as well.
>
> I do not have the exact scenario for this.  But I feel, if 2 nodes are
> concurrently inserting different data against a primary key, then some
> users may have preferences that retain the row which was inserted
> earlier. It is no different from latest_timestamp_wins. It totally
> depends upon what kind of application and requirement the user may
> have, based on which, he may discard the later coming rows (specially
> for INSERT case).

I haven't read the complete design yet, but have we discussed how we
plan to deal with clock drift if we use timestamp-based conflict
resolution? For example, a user might insert something conflicting on
node1 first and then on node2. However, due to clock drift, the
timestamp from node2 might appear earlier. In this case, if we choose
"earliest timestamp wins," we would keep the changes from node2.

I haven't fully considered if this would cause any problems, but users
might detect this issue. For instance, a client machine might send a
change to node1 first and then, upon confirmation, send it to node2.
If the clocks on node1 and node2 are not synchronized, the changes
might appear in a different order. Does this seem like a potential
problem?

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: Conflict Detection and Resolution

From
Dilip Kumar
Date:
On Tue, Jun 4, 2024 at 9:37 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> Can you share the use case of "earliest_timestamp_wins" resolution
> method? It seems after the initial update on the local node, it will
> never allow remote update to succeed which sounds a bit odd. Jan has
> shared this and similar concerns about this resolution method, so I
> have added him to the email as well.
>
I can not think of a use case exactly in this context but it's very
common to have such a use case while designing a distributed
application with multiple clients.  For example, when we are doing git
push concurrently from multiple clients it is expected that the
earliest commit wins.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: Conflict Detection and Resolution

From
Amit Kapila
Date:
On Wed, Jun 5, 2024 at 7:29 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Tue, Jun 4, 2024 at 9:37 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > Can you share the use case of "earliest_timestamp_wins" resolution
> > method? It seems after the initial update on the local node, it will
> > never allow remote update to succeed which sounds a bit odd. Jan has
> > shared this and similar concerns about this resolution method, so I
> > have added him to the email as well.
> >
> I can not think of a use case exactly in this context but it's very
> common to have such a use case while designing a distributed
> application with multiple clients.  For example, when we are doing git
> push concurrently from multiple clients it is expected that the
> earliest commit wins.
>

Okay, I think it mostly boils down to something like what Shveta
mentioned where Inserts for a primary key can use
"earliest_timestamp_wins" resolution method [1]. So, it seems useful
to support this method as well.

[1] - https://www.postgresql.org/message-id/CAJpy0uC4riK8e6hQt8jcU%2BnXYmRRjnbFEapYNbmxVYjENxTw2g%40mail.gmail.com

--
With Regards,
Amit Kapila.



Re: Conflict Detection and Resolution

From
Dilip Kumar
Date:
On Thu, Jun 6, 2024 at 3:43 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Jun 5, 2024 at 7:29 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Tue, Jun 4, 2024 at 9:37 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > Can you share the use case of "earliest_timestamp_wins" resolution
> > > method? It seems after the initial update on the local node, it will
> > > never allow remote update to succeed which sounds a bit odd. Jan has
> > > shared this and similar concerns about this resolution method, so I
> > > have added him to the email as well.
> > >
> > I can not think of a use case exactly in this context but it's very
> > common to have such a use case while designing a distributed
> > application with multiple clients.  For example, when we are doing git
> > push concurrently from multiple clients it is expected that the
> > earliest commit wins.
> >
>
> Okay, I think it mostly boils down to something like what Shveta
> mentioned where Inserts for a primary key can use
> "earliest_timestamp_wins" resolution method [1]. So, it seems useful
> to support this method as well.

Correct, but we still need to think about how to make it work
correctly in the presence of a clock skew as I mentioned in one of my
previous emails.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: Conflict Detection and Resolution

From
Nisha Moond
Date:
On Wed, Jun 5, 2024 at 7:29 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Tue, Jun 4, 2024 at 9:37 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > Can you share the use case of "earliest_timestamp_wins" resolution
> > method? It seems after the initial update on the local node, it will
> > never allow remote update to succeed which sounds a bit odd. Jan has
> > shared this and similar concerns about this resolution method, so I
> > have added him to the email as well.
> >
> I can not think of a use case exactly in this context but it's very
> common to have such a use case while designing a distributed
> application with multiple clients.  For example, when we are doing git
> push concurrently from multiple clients it is expected that the
> earliest commit wins.
>

Here are more use cases of the "earliest_timestamp_wins" resolution method:
1) Applications where the record of first occurrence of an event is
important. For example, sensor based applications like earthquake
detection systems, capturing the first seismic wave's time is crucial.
2) Scheduling systems, like appointment booking, prioritize the
earliest request when handling concurrent ones.
3) In contexts where maintaining chronological order is important -
  a) Social media platforms display comments ensuring that the
earliest ones are visible first.
  b) Finance transaction processing systems rely on timestamps to
prioritize the processing of transactions, ensuring that the earliest
transaction is handled first

--
Thanks,
Nisha



Re: Conflict Detection and Resolution

From
Ashutosh Bapat
Date:


On Thu, Jun 6, 2024 at 5:16 PM Nisha Moond <nisha.moond412@gmail.com> wrote:
>

Here are more use cases of the "earliest_timestamp_wins" resolution method:
1) Applications where the record of first occurrence of an event is
important. For example, sensor based applications like earthquake
detection systems, capturing the first seismic wave's time is crucial.
2) Scheduling systems, like appointment booking, prioritize the
earliest request when handling concurrent ones.
3) In contexts where maintaining chronological order is important -
  a) Social media platforms display comments ensuring that the
earliest ones are visible first.
  b) Finance transaction processing systems rely on timestamps to
prioritize the processing of transactions, ensuring that the earliest
transaction is handled first

Thanks for sharing examples. However, these scenarios would be handled by the application and not during replication. What we are discussing here is the timestamp when a row was updated/inserted/deleted (or rather when the transaction that updated row committed/became visible) and not a DML on column which is of type timestamp. Some implementations use a hidden timestamp column but that's different from a user column which captures timestamp of (say) an event. The conflict resolution will be based on the timestamp when that column's value was recorded in the database which may be different from the value of the column itself.

If we use the transaction commit timestamp as basis for resolution, a transaction where multiple rows conflict may end up with different rows affected by that transaction being resolved differently. Say three transactions T1, T2 and T3 on separate origins with timestamps t1, t2, and t3 respectively changed rows r1, r2 and r2, r3 and r1, r4 respectively. Changes to r1 and r2 will conflict. Let's say T2 and T3 are applied first and then T1 is applied. If t2 < t1 < t3, r1 will end up with version of T3 and r2 will end up with version of T1 after applying all the three transactions. Would that introduce an inconsistency between r1 and r2?

--
Best Wishes,
Ashutosh Bapat

Re: Conflict Detection and Resolution

From
Tomas Vondra
Date:
On 5/27/24 07:48, shveta malik wrote:
> On Sat, May 25, 2024 at 2:39 AM Tomas Vondra
> <tomas.vondra@enterprisedb.com> wrote:
>>
>> On 5/23/24 08:36, shveta malik wrote:
>>> Hello hackers,
>>>
>>> Please find the proposal for Conflict Detection and Resolution (CDR)
>>> for Logical replication.
>>> <Thanks to Nisha, Hou-San, and Amit who helped in figuring out the
>>> below details.>
>>>
>>> Introduction
>>> ================
>>> In case the node is subscribed to multiple providers, or when local
>>> writes happen on a subscriber, conflicts can arise for the incoming
>>> changes.  CDR is the mechanism to automatically detect and resolve
>>> these conflicts depending on the application and configurations.
>>> CDR is not applicable for the initial table sync. If locally, there
>>> exists conflicting data on the table, the table sync worker will fail.
>>> Please find the details on CDR in apply worker for INSERT, UPDATE and
>>> DELETE operations:
>>>
>>
>> Which architecture are you aiming for? Here you talk about multiple
>> providers, but the wiki page mentions active-active. I'm not sure how
>> much this matters, but it might.
> 
> Currently, we are working for multi providers case but ideally it
> should work for active-active also. During further discussion and
> implementation phase, if we find that, there are cases which will not
> work in straight-forward way for active-active, then our primary focus
> will remain to first implement it for multiple providers architecture.
> 
>>
>> Also, what kind of consistency you expect from this? Because none of
>> these simple conflict resolution methods can give you the regular
>> consistency models we're used to, AFAICS.
> 
> Can you please explain a little bit more on this.
> 

I was referring to the well established consistency models / isolation
levels, e.g. READ COMMITTED or SNAPSHOT ISOLATION. This determines what
guarantees the application developer can expect, what anomalies can
happen, etc.

I don't think any such isolation level can be implemented with a simple
conflict resolution methods like last-update-wins etc. For example,
consider an active-active where both nodes do

  UPDATE accounts SET balance=balance+1000 WHERE id=1

This will inevitably lead to a conflict, and while the last-update-wins
resolves this "consistently" on both nodes (e.g. ending with the same
result), it's essentially a lost update.

This is a very simplistic example of course, I recall there are various
more complex examples involving foreign keys, multi-table transactions,
constraints, etc. But in principle it's a manifestation of the same
inherent limitation of conflict detection and resolution etc.

Similarly, I believe this affects not just active-active, but also the
case where one node aggregates data from multiple publishers. Maybe not
to the same extent / it might be fine for that use case, but you said
the end goal is to use this for active-active. So I'm wondering what's
the plan, there.

If I'm writing an application for active-active using this conflict
handling, what assumptions can I make? Will Can I just do stuff as if on
a single node, or do I need to be super conscious about the zillion ways
things can misbehave in a distributed system?

My personal opinion is that the closer this will be to the regular
consistency levels, the better. If past experience taught me anything,
it's very hard to predict how distributed systems with eventual
consistency behave, and even harder to actually test the application in
such environment.

In any case, if there are any differences compared to the usual
behavior, it needs to be very clearly explained in the docs.

>>
>>> INSERT
>>> ================
>>> To resolve INSERT conflict on subscriber, it is important to find out
>>> the conflicting row (if any) before we attempt an insertion. The
>>> indexes or search preference for the same will be:
>>> First check for replica identity (RI) index.
>>>   - if not found, check for the primary key (PK) index.
>>>     - if not found, then check for unique indexes (individual ones or
>>> added by unique constraints)
>>>          - if unique index also not found, skip CDR
>>>
>>> Note: if no RI index, PK, or unique index is found but
>>> REPLICA_IDENTITY_FULL is defined, CDR will still be skipped.
>>> The reason being that even though a row can be identified with
>>> REPLICAT_IDENTITY_FULL, such tables are allowed to have duplicate
>>> rows. Hence, we should not go for conflict detection in such a case.
>>>
>>
>> It's not clear to me why would REPLICA_IDENTITY_FULL mean the table is
>> allowed to have duplicate values? It just means the upstream is sending
>> the whole original row, there can still be a PK/UNIQUE index on both the
>> publisher and subscriber.
> 
> Yes, right. Sorry for confusion. I meant the same i.e. in absence of
> 'RI index, PK, or unique index', tables can have duplicates. So even
> in presence of Replica-identity (FULL in this case) but in absence of
> unique/primary index, CDR will be skipped for INSERT.
> 
>>
>>> In case of replica identity ‘nothing’ and in absence of any suitable
>>> index (as defined above), CDR will be skipped for INSERT.
>>>
>>> Conflict Type:
>>> ----------------
>>> insert_exists: A conflict is detected when the table has the same
>>> value for a key column as the new value in the incoming row.
>>>
>>> Conflict Resolution
>>> ----------------
>>> a) latest_timestamp_wins:    The change with later commit timestamp wins.
>>> b) earliest_timestamp_wins:   The change with earlier commit timestamp wins.
>>> c) apply:   Always apply the remote change.
>>> d) skip:    Remote change is skipped.
>>> e) error:   Error out on conflict. Replication is stopped, manual
>>> action is needed.
>>>
>>
>> Why not to have some support for user-defined conflict resolution
>> methods, allowing to do more complex stuff (e.g. merging the rows in
>> some way, perhaps even with datatype-specific behavior)?
> 
> Initially, for the sake of simplicity, we are targeting to support
> built-in resolvers. But we have a plan to work on user-defined
> resolvers as well. We shall propose that separately.
> 
>>
>>> The change will be converted to 'UPDATE' and applied if the decision
>>> is in favor of applying remote change.
>>>
>>> It is important to have commit timestamp info available on subscriber
>>> when latest_timestamp_wins or earliest_timestamp_wins method is chosen
>>> as resolution method.  Thus ‘track_commit_timestamp’ must be enabled
>>> on subscriber, in absence of which, configuring the said
>>> timestamp-based resolution methods will result in error.
>>>
>>> Note: If the user has chosen the latest or earliest_timestamp_wins,
>>> and the remote and local timestamps are the same, then it will go by
>>> system identifier. The change with a higher system identifier will
>>> win. This will ensure that the same change is picked on all the nodes.
>>
>> How is this going to deal with the fact that commit LSN and timestamps
>> may not correlate perfectly? That is, commits may happen with LSN1 <
>> LSN2 but with T1 > T2.
> 
> Are you pointing to the issue where a session/txn has taken
> 'xactStopTimestamp' timestamp earlier but is delayed to insert record
> in XLOG, while another session/txn which has taken timestamp slightly
> later succeeded to insert the record IN XLOG sooner than the session1,
> making LSN and Timestamps out of sync? Going by this scenario, the
> commit-timestamp may not be reflective of actual commits and thus
> timestamp-based resolvers may take wrong decisions. Or do you mean
> something else?
> 
> If this is the problem you are referring to, then I think this needs a
> fix at the publisher side. Let me think more about it . Kindly let me
> know if you have ideas on how to tackle it.
> 

Yes, this is the issue I'm talking about. We're acquiring the timestamp
when not holding the lock to reserve space in WAL, so the LSN and the
commit LSN may not actually correlate.

Consider this example I discussed with Amit last week:

node A:

  XACT1: UPDATE t SET v = 1;    LSN1 / T1

  XACT2: UPDATE t SET v = 2;    LSN2 / T2

node B

  XACT3: UPDATE t SET v = 3;    LSN3 / T3

And assume LSN1 < LSN2, T1 > T2 (i.e. the commit timestamp inversion),
and T2 < T3 < T1. Now consider that the messages may arrive in different
orders, due to async replication. Unfortunately, this would lead to
different results of the conflict resolution:

  XACT1 - XACT2 - XACT3 => v=3 (T3 wins)

  XACT3 - XACT1 - XACT2 => v=2 (T2 wins)

Now, I realize there's a flaw in this example - the (T1 > T2) inversion
can't actually happen, because these transactions have a dependency, and
thus won't commit concurrently. XACT1 will complete the commit, because
XACT2 starts to commit. And with monotonic clock (which is a requirement
for any timestamp-based resolution), that should guarantee (T1 < T2).

However, I doubt this is sufficient to declare victory. It's more likely
that there still are problems, but the examples are likely more complex
(changes to multiple tables, etc.).

I vaguely remember there were more issues with timestamp inversion, but
those might have been related to parallel apply etc.

>>>
>>>  UPDATE
>>> ================
>>>
>>> Conflict Detection Method:
>>> --------------------------------
>>> Origin conflict detection: The ‘origin’ info is used to detect
>>> conflict which can be obtained from commit-timestamp generated for
>>> incoming txn at the source node. To compare remote’s origin with the
>>> local’s origin, we must have origin information for local txns as well
>>> which can be obtained from commit-timestamp after enabling
>>> ‘track_commit_timestamp’ locally.
>>> The one drawback here is the ‘origin’ information cannot be obtained
>>> once the row is frozen and the commit-timestamp info is removed by
>>> vacuum. For a frozen row, conflicts cannot be raised, and thus the
>>> incoming changes will be applied in all the cases.
>>>
>>> Conflict Types:
>>> ----------------
>>> a) update_differ: The origin of an incoming update's key row differs
>>> from the local row i.e.; the row has already been updated locally or
>>> by different nodes.
>>> b) update_missing: The row with the same value as that incoming
>>> update's key does not exist. Remote is trying to update a row which
>>> does not exist locally.
>>> c) update_deleted: The row with the same value as that incoming
>>> update's key does not exist. The row is already deleted. This conflict
>>> type is generated only if the deleted row is still detectable i.e., it
>>> is not removed by VACUUM yet. If the row is removed by VACUUM already,
>>> it cannot detect this conflict. It will detect it as update_missing
>>> and will follow the default or configured resolver of update_missing
>>> itself.
>>>
>>
>> I don't understand the why should update_missing or update_deleted be
>> different, especially considering it's not detected reliably. And also
>> that even if we happen to find the row the associated TOAST data may
>> have already been removed. So why would this matter?
> 
> Here, we are trying to tackle the case where the row is 'recently'
> deleted i.e. concurrent UPDATE and DELETE on pub and sub. User may
> want to opt for a different resolution in such a case as against the
> one where the corresponding row was not even present in the first
> place. The case where the row was deleted long back may not fall into
> this category as there are higher chances that they have been removed
> by vacuum and can be considered equivalent to the update_ missing
> case.
> 

My point is that if we can't detect the difference reliably, it's not
very useful. Consider this example:

Node A:

  T1: INSERT INTO t (id, value) VALUES (1,1);

  T2: DELETE FROM t WHERE id = 1;

Node B:

  T3: UPDATE t SET value = 2 WHERE id = 1;

The "correct" order of received messages on a third node is T1-T3-T2.
But we may also see T1-T2-T3 and T3-T1-T2, e.g. due to network issues
and so on. For T1-T2-T3 the right decision is to discard the update,
while for T3-T1-T2 it's to either wait for the INSERT or wait for the
insert to arrive.

But if we misdetect the situation, we either end up with a row that
shouldn't be there, or losing an update.

> Regarding "TOAST column" for deleted row cases, we may need to dig
> more. Thanks for bringing this case. Let me analyze more here.
> 
>>
>>>  Conflict Resolutions:
>>> ----------------
>>> a)   latest_timestamp_wins:    The change with later commit timestamp
>>> wins. Can be used for ‘update_differ’.
>>> b)   earliest_timestamp_wins:   The change with earlier commit
>>> timestamp wins. Can be used for ‘update_differ’.
>>> c)   apply:   The remote change is always applied.  Can be used for
>>> ‘update_differ’.
>>> d)   apply_or_skip: Remote change is converted to INSERT and is
>>> applied. If the complete row cannot be constructed from the info
>>> provided by the publisher, then the change is skipped. Can be used for
>>> ‘update_missing’ or ‘update_deleted’.
>>> e)   apply_or_error: Remote change is converted to INSERT and is
>>> applied. If the complete row cannot be constructed from the info
>>> provided by the publisher, then error is raised. Can be used for
>>> ‘update_missing’ or ‘update_deleted’.
>>> f) skip: Remote change is skipped and local one is retained. Can be
>>> used for any conflict type.
>>> g)   error: Error out of conflict. Replication is stopped, manual
>>> action is needed.  Can be used for any conflict type.
>>>
>>>  To support UPDATE CDR, the presence of either replica identity Index
>>> or primary key is required on target node.  Update CDR will not be
>>> supported in absence of replica identity index or primary key even
>>> though REPLICA IDENTITY FULL is set.  Please refer to "UPDATE" in
>>> "Noteworthey Scenarios" section in [1] for further details.
>>>
>>> DELETE
>>> ================
>>> Conflict Type:
>>> ----------------
>>> delete_missing: An incoming delete is trying to delete a row on a
>>> target node which does not exist.
>>>
>>> Conflict Resolutions:
>>> ----------------
>>> a)   error : Error out on conflict. Replication is stopped, manual
>>> action is needed.
>>> b)   skip  : The remote change is skipped.
>>>
>>> Configuring Conflict Resolution:
>>> ------------------------------------------------
>>> There are two parts when it comes to configuring CDR:
>>>
>>> a) Enabling/Disabling conflict detection.
>>> b) Configuring conflict resolvers for different conflict types.
>>>
>>>  Users can sometimes create multiple subscriptions on the same node,
>>> subscribing to different tables to improve replication performance by
>>> starting multiple apply workers. If the tables in one subscription are
>>> less likely to cause conflict, then it is possible that user may want
>>> conflict detection disabled for that subscription to avoid detection
>>> latency while enabling it for other subscriptions.  This generates a
>>> requirement to make ‘conflict detection’ configurable per
>>> subscription. While the conflict resolver configuration can remain
>>> global. All the subscriptions which opt for ‘conflict detection’ will
>>> follow global conflict resolver configuration.
>>>
>>> To implement the above, subscription commands will be changed to have
>>> one more parameter 'conflict_resolution=on/off', default will be OFF.
>>>
>>> To configure global resolvers, new DDL command will be introduced:
>>>
>>> CONFLICT RESOLVER ON <conflict_type> IS <conflict_resolver>
>>>
>>
>> I very much doubt we want a single global conflict resolver, or even one
>> resolver per subscription. It seems like a very table-specific thing.
> 
> Even we thought about this. We feel that even if we go for table based
> or subscription based resolvers configuration, there may be use case
> scenarios where the user is not interested in configuring resolvers
> for each table and thus may want to give global ones. Thus, we should
> provide a way for users to do global configuration. Thus we started
> with global one. I have noted your point here and would also like to
> know the opinion of others. We are open to discussion. We can either
> opt for any of these 2 options (global or table) or we can opt for
> both global and table/sub based one.
> 

I have no problem with a default / global conflict handler, as long as
there's a way to override this per table. This is especially important
for cases with custom conflict handler at table / column level.

>>
>> Also, doesn't all this whole design ignore the concurrency between
>> publishers? Isn't this problematic considering the commit timestamps may
>> go backwards (for a given publisher), which means the conflict
>> resolution is not deterministic (as it depends on how exactly it
>> interleaves)?
>>
>>
>>> -------------------------
>>>
>>> Apart from the above three main operations and resolver configuration,
>>> there are more conflict types like primary-key updates, multiple
>>> unique constraints etc and some special scenarios to be considered.
>>> Complete design details can be found in [1].
>>>
>>> [1]: https://wiki.postgresql.org/wiki/Conflict_Detection_and_Resolution
>>>
>>
>> Hmmm, not sure it's good to have a "complete" design on wiki, and only
>> some subset posted to the mailing list. I haven't compared what the
>> differences are, though.
> 
> It would have been difficult to mention all the details in email
> (including examples and corner scenarios) and thus we thought that it
> will be better to document everything in wiki page for the time being.
> We can keep on discussing the design and all the scenarios on need
> basis (before implementation phase of that part) and thus eventually
> everything will come in email on hackers. With out first patch, we
> plan to provide everything in a README as well.
> 

The challenge with having this on wiki is that it's unlikely people will
notice any changes made to the wiki.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Conflict Detection and Resolution

From
Tomas Vondra
Date:
On 5/28/24 11:17, Nisha Moond wrote:
> On Mon, May 27, 2024 at 11:19 AM shveta malik <shveta.malik@gmail.com> wrote:
>>
>> On Sat, May 25, 2024 at 2:39 AM Tomas Vondra
>> <tomas.vondra@enterprisedb.com> wrote:
>>>
>>> ...
>>>
>>> I don't understand the why should update_missing or update_deleted be
>>> different, especially considering it's not detected reliably. And also
>>> that even if we happen to find the row the associated TOAST data may
>>> have already been removed. So why would this matter?
>>
>> Here, we are trying to tackle the case where the row is 'recently'
>> deleted i.e. concurrent UPDATE and DELETE on pub and sub. User may
>> want to opt for a different resolution in such a case as against the
>> one where the corresponding row was not even present in the first
>> place. The case where the row was deleted long back may not fall into
>> this category as there are higher chances that they have been removed
>> by vacuum and can be considered equivalent to the update_ missing
>> case.
>>
>> Regarding "TOAST column" for deleted row cases, we may need to dig
>> more. Thanks for bringing this case. Let me analyze more here.
>>
> I tested a simple case with a table with one TOAST column and found
> that when a tuple with a TOAST column is deleted, both the tuple and
> corresponding pg_toast entries are marked as ‘deleted’ (dead) but not
> removed immediately. The main tuple and respective pg_toast entry are
> permanently deleted only during vacuum. First, the main table’s dead
> tuples are vacuumed, followed by the secondary TOAST relation ones (if
> available).
> Please let us know if you have a specific scenario in mind where the
> TOAST column data is deleted immediately upon ‘delete’ operation,
> rather than during vacuum, which we are missing.
> 

I'm pretty sure you can vacuum the TOAST table directly, which means
you'll end up with a deleted tuple with TOAST pointers, but with the
TOAST entries already gone.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Conflict Detection and Resolution

From
Tomas Vondra
Date:

On 6/3/24 09:30, Amit Kapila wrote:
> On Sat, May 25, 2024 at 2:39 AM Tomas Vondra
> <tomas.vondra@enterprisedb.com> wrote:
>>
>> On 5/23/24 08:36, shveta malik wrote:
>>>
>>> Conflict Resolution
>>> ----------------
>>> a) latest_timestamp_wins:    The change with later commit timestamp wins.
>>> b) earliest_timestamp_wins:   The change with earlier commit timestamp wins.
>>> c) apply:   Always apply the remote change.
>>> d) skip:    Remote change is skipped.
>>> e) error:   Error out on conflict. Replication is stopped, manual
>>> action is needed.
>>>
>>
>> Why not to have some support for user-defined conflict resolution
>> methods, allowing to do more complex stuff (e.g. merging the rows in
>> some way, perhaps even with datatype-specific behavior)?
>>
>>> The change will be converted to 'UPDATE' and applied if the decision
>>> is in favor of applying remote change.
>>>
>>> It is important to have commit timestamp info available on subscriber
>>> when latest_timestamp_wins or earliest_timestamp_wins method is chosen
>>> as resolution method.  Thus ‘track_commit_timestamp’ must be enabled
>>> on subscriber, in absence of which, configuring the said
>>> timestamp-based resolution methods will result in error.
>>>
>>> Note: If the user has chosen the latest or earliest_timestamp_wins,
>>> and the remote and local timestamps are the same, then it will go by
>>> system identifier. The change with a higher system identifier will
>>> win. This will ensure that the same change is picked on all the nodes.
>>
>> How is this going to deal with the fact that commit LSN and timestamps
>> may not correlate perfectly? That is, commits may happen with LSN1 <
>> LSN2 but with T1 > T2.
>>
> 
> One of the possible scenarios discussed at pgconf.dev with Tomas for
> this was as follows:
> 
> Say there are two publisher nodes PN1, PN2, and subscriber node SN3.
> The logical replication is configured such that a subscription on SN3
> has publications from both PN1 and PN2. For example, SN3 (sub) -> PN1,
> PN2 (p1, p2)
> 
> Now, on PN1, we have the following operations that update the same row:
> 
> T1
> Update-1 on table t1 at LSN1 (1000) on time (200)
> 
> T2
> Update-2 on table t1 at LSN2 (2000) on time (100)
> 
> Then in parallel, we have the following operation on node PN2 that
> updates the same row as Update-1, and Update-2 on node PN1.
> 
> T3
> Update-3 on table t1 at LSN(1500) on time (150)
> 
> By theory, we can have a different state on subscribers depending on
> the order of updates arriving at SN3 which shouldn't happen. Say, the
> order in which they reach SN3 is: Update-1, Update-2, Update-3 then
> the final row we have is by Update-3 considering we have configured
> last_update_wins as a conflict resolution method. Now, consider the
> other order:  Update-1, Update-3, Update-2, in this case, the final
> row will be by Update-2 because when we try to apply Update-3, it will
> generate a conflict and as per the resolution method
> (last_update_wins) we need to retain Update-1.
> 
> On further thinking, the operations on node-1 PN-1 as defined above
> seem impossible because one of the Updates needs to wait for the other
> to write a commit record. So the commits may happen with LSN1 < LSN2
> but with T1 > T2 but they can't be on the same row due to locks. So,
> the order of apply should still be consistent. Am, I missing
> something?
> 

Sorry, I should have read your message before responding a couple
minutes ago. I think you're right this exact example can't happen, due
to the dependency between transactions.

But as I wrote, I'm not quite convinced this means there are not other
issues with this way of resolving conflicts. It's more likely a more
complex scenario is required.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Conflict Detection and Resolution

From
Amit Kapila
Date:
On Fri, Jun 7, 2024 at 5:39 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> On Thu, Jun 6, 2024 at 5:16 PM Nisha Moond <nisha.moond412@gmail.com> wrote:
>>
>> >
>>
>> Here are more use cases of the "earliest_timestamp_wins" resolution method:
>> 1) Applications where the record of first occurrence of an event is
>> important. For example, sensor based applications like earthquake
>> detection systems, capturing the first seismic wave's time is crucial.
>> 2) Scheduling systems, like appointment booking, prioritize the
>> earliest request when handling concurrent ones.
>> 3) In contexts where maintaining chronological order is important -
>>   a) Social media platforms display comments ensuring that the
>> earliest ones are visible first.
>>   b) Finance transaction processing systems rely on timestamps to
>> prioritize the processing of transactions, ensuring that the earliest
>> transaction is handled first
>
>
> Thanks for sharing examples. However, these scenarios would be handled by the application and not during replication.
Whatwe are discussing here is the timestamp when a row was updated/inserted/deleted (or rather when the transaction
thatupdated row committed/became visible) and not a DML on column which is of type timestamp. Some implementations use
ahidden timestamp column but that's different from a user column which captures timestamp of (say) an event. The
conflictresolution will be based on the timestamp when that column's value was recorded in the database which may be
differentfrom the value of the column itself. 
>

It depends on how these operations are performed. For example, the
appointment booking system could be prioritized via a transaction
updating a row with columns emp_name, emp_id, reserved, time_slot.
Now, if two employees at different geographical locations try to book
the calendar, the earlier transaction will win.

> If we use the transaction commit timestamp as basis for resolution, a transaction where multiple rows conflict may
endup with different rows affected by that transaction being resolved differently. Say three transactions T1, T2 and T3
onseparate origins with timestamps t1, t2, and t3 respectively changed rows r1, r2 and r2, r3 and r1, r4 respectively.
Changesto r1 and r2 will conflict. Let's say T2 and T3 are applied first and then T1 is applied. If t2 < t1 < t3, r1
willend up with version of T3 and r2 will end up with version of T1 after applying all the three transactions. 
>

Are you telling the results based on latest_timestamp_wins? If so,
then it is correct. OTOH, if the user has configured
"earliest_timestamp_wins" resolution method, then we should end up
with a version of r1 from T1 because t1 < t3. Also, due to the same
reason, we should have version r2 from T2.

>
 Would that introduce an inconsistency between r1 and r2?
>

As per my understanding, this shouldn't be an inconsistency. Won't it
be true even when the transactions are performed on a single node with
the same timing?

--
With Regards,
Amit Kapila.



Re: Conflict Detection and Resolution

From
Amit Kapila
Date:
On Fri, Jun 7, 2024 at 6:08 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
>
> On 5/27/24 07:48, shveta malik wrote:
> > On Sat, May 25, 2024 at 2:39 AM Tomas Vondra
> > <tomas.vondra@enterprisedb.com> wrote:
> >>
> >> Which architecture are you aiming for? Here you talk about multiple
> >> providers, but the wiki page mentions active-active. I'm not sure how
> >> much this matters, but it might.
> >
> > Currently, we are working for multi providers case but ideally it
> > should work for active-active also. During further discussion and
> > implementation phase, if we find that, there are cases which will not
> > work in straight-forward way for active-active, then our primary focus
> > will remain to first implement it for multiple providers architecture.
> >
> >>
> >> Also, what kind of consistency you expect from this? Because none of
> >> these simple conflict resolution methods can give you the regular
> >> consistency models we're used to, AFAICS.
> >
> > Can you please explain a little bit more on this.
> >
>
> I was referring to the well established consistency models / isolation
> levels, e.g. READ COMMITTED or SNAPSHOT ISOLATION. This determines what
> guarantees the application developer can expect, what anomalies can
> happen, etc.
>
> I don't think any such isolation level can be implemented with a simple
> conflict resolution methods like last-update-wins etc. For example,
> consider an active-active where both nodes do
>
>   UPDATE accounts SET balance=balance+1000 WHERE id=1
>
> This will inevitably lead to a conflict, and while the last-update-wins
> resolves this "consistently" on both nodes (e.g. ending with the same
> result), it's essentially a lost update.
>

The idea to solve such conflicts is using the delta apply technique
where the delta from both sides will be applied to the respective
columns. We do plan to target this as a separate patch. Now, if the
basic conflict resolution and delta apply both can't go in one
release, we shall document such cases clearly to avoid misuse of the
feature.

> This is a very simplistic example of course, I recall there are various
> more complex examples involving foreign keys, multi-table transactions,
> constraints, etc. But in principle it's a manifestation of the same
> inherent limitation of conflict detection and resolution etc.
>
> Similarly, I believe this affects not just active-active, but also the
> case where one node aggregates data from multiple publishers. Maybe not
> to the same extent / it might be fine for that use case,
>

I am not sure how much it is a problem for general logical replication
solution but we do intend to work on solving such problems in
step-wise manner. Trying to attempt everything in one patch doesn't
seem advisable to me.

>
 but you said
> the end goal is to use this for active-active. So I'm wondering what's
> the plan, there.
>

I think at this stage we are not ready for active-active because
leaving aside this feature we need many other features like
replication of all commands/objects (DDL replication, replicate large
objects, etc.), Global sequences, some sort of global two_phase
transaction management for data consistency, etc. So, it would be
better to consider logical replication cases intending to extend it
for active-active when we have other required pieces.

> If I'm writing an application for active-active using this conflict
> handling, what assumptions can I make? Will Can I just do stuff as if on
> a single node, or do I need to be super conscious about the zillion ways
> things can misbehave in a distributed system?
>
> My personal opinion is that the closer this will be to the regular
> consistency levels, the better. If past experience taught me anything,
> it's very hard to predict how distributed systems with eventual
> consistency behave, and even harder to actually test the application in
> such environment.
>

I don't think in any way this can enable users to start writing
applications for active-active workloads. For something like what you
are saying, we probably need a global transaction manager (or a global
two_pc) as well to allow transactions to behave as they are on
single-node or achieve similar consistency levels. With such
transaction management, we can allow transactions to commit on a node
only when it doesn't lead to a conflict on the peer node.

> In any case, if there are any differences compared to the usual
> behavior, it needs to be very clearly explained in the docs.
>

I agree that docs should be clear about the cases that this can and
can't support.

> >>
> >> How is this going to deal with the fact that commit LSN and timestamps
> >> may not correlate perfectly? That is, commits may happen with LSN1 <
> >> LSN2 but with T1 > T2.
> >
> > Are you pointing to the issue where a session/txn has taken
> > 'xactStopTimestamp' timestamp earlier but is delayed to insert record
> > in XLOG, while another session/txn which has taken timestamp slightly
> > later succeeded to insert the record IN XLOG sooner than the session1,
> > making LSN and Timestamps out of sync? Going by this scenario, the
> > commit-timestamp may not be reflective of actual commits and thus
> > timestamp-based resolvers may take wrong decisions. Or do you mean
> > something else?
> >
> > If this is the problem you are referring to, then I think this needs a
> > fix at the publisher side. Let me think more about it . Kindly let me
> > know if you have ideas on how to tackle it.
> >
>
> Yes, this is the issue I'm talking about. We're acquiring the timestamp
> when not holding the lock to reserve space in WAL, so the LSN and the
> commit LSN may not actually correlate.
>
> Consider this example I discussed with Amit last week:
>
> node A:
>
>   XACT1: UPDATE t SET v = 1;    LSN1 / T1
>
>   XACT2: UPDATE t SET v = 2;    LSN2 / T2
>
> node B
>
>   XACT3: UPDATE t SET v = 3;    LSN3 / T3
>
> And assume LSN1 < LSN2, T1 > T2 (i.e. the commit timestamp inversion),
> and T2 < T3 < T1. Now consider that the messages may arrive in different
> orders, due to async replication. Unfortunately, this would lead to
> different results of the conflict resolution:
>
>   XACT1 - XACT2 - XACT3 => v=3 (T3 wins)
>
>   XACT3 - XACT1 - XACT2 => v=2 (T2 wins)
>
> Now, I realize there's a flaw in this example - the (T1 > T2) inversion
> can't actually happen, because these transactions have a dependency, and
> thus won't commit concurrently. XACT1 will complete the commit, because
> XACT2 starts to commit. And with monotonic clock (which is a requirement
> for any timestamp-based resolution), that should guarantee (T1 < T2).
>
> However, I doubt this is sufficient to declare victory. It's more likely
> that there still are problems, but the examples are likely more complex
> (changes to multiple tables, etc.).
>

Fair enough, I think we need to analyze this more to find actual
problems or in some way try to prove that there is no problem.

> I vaguely remember there were more issues with timestamp inversion, but
> those might have been related to parallel apply etc.
>

Okay, so considering there are problems due to timestamp inversion, I
think the solution to that problem would probably be somehow
generating commit LSN and timestamp in order. I don't have a solution
at this stage but will think more both on the actual problem and
solution. In the meantime, if you get a chance to refer to the place
where you have seen such a problem please try to share the same with
us. It would be helpful.

--
With Regards,
Amit Kapila.



Re: Conflict Detection and Resolution

From
shveta malik
Date:
On Fri, Jun 7, 2024 at 6:08 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
>
> >>>
> >>>  UPDATE
> >>> ================
> >>>
> >>> Conflict Detection Method:
> >>> --------------------------------
> >>> Origin conflict detection: The ‘origin’ info is used to detect
> >>> conflict which can be obtained from commit-timestamp generated for
> >>> incoming txn at the source node. To compare remote’s origin with the
> >>> local’s origin, we must have origin information for local txns as well
> >>> which can be obtained from commit-timestamp after enabling
> >>> ‘track_commit_timestamp’ locally.
> >>> The one drawback here is the ‘origin’ information cannot be obtained
> >>> once the row is frozen and the commit-timestamp info is removed by
> >>> vacuum. For a frozen row, conflicts cannot be raised, and thus the
> >>> incoming changes will be applied in all the cases.
> >>>
> >>> Conflict Types:
> >>> ----------------
> >>> a) update_differ: The origin of an incoming update's key row differs
> >>> from the local row i.e.; the row has already been updated locally or
> >>> by different nodes.
> >>> b) update_missing: The row with the same value as that incoming
> >>> update's key does not exist. Remote is trying to update a row which
> >>> does not exist locally.
> >>> c) update_deleted: The row with the same value as that incoming
> >>> update's key does not exist. The row is already deleted. This conflict
> >>> type is generated only if the deleted row is still detectable i.e., it
> >>> is not removed by VACUUM yet. If the row is removed by VACUUM already,
> >>> it cannot detect this conflict. It will detect it as update_missing
> >>> and will follow the default or configured resolver of update_missing
> >>> itself.
> >>>
> >>
> >> I don't understand the why should update_missing or update_deleted be
> >> different, especially considering it's not detected reliably. And also
> >> that even if we happen to find the row the associated TOAST data may
> >> have already been removed. So why would this matter?
> >
> > Here, we are trying to tackle the case where the row is 'recently'
> > deleted i.e. concurrent UPDATE and DELETE on pub and sub. User may
> > want to opt for a different resolution in such a case as against the
> > one where the corresponding row was not even present in the first
> > place. The case where the row was deleted long back may not fall into
> > this category as there are higher chances that they have been removed
> > by vacuum and can be considered equivalent to the update_ missing
> > case.
> >
>
> My point is that if we can't detect the difference reliably, it's not
> very useful. Consider this example:
>
> Node A:
>
>   T1: INSERT INTO t (id, value) VALUES (1,1);
>
>   T2: DELETE FROM t WHERE id = 1;
>
> Node B:
>
>   T3: UPDATE t SET value = 2 WHERE id = 1;
>
> The "correct" order of received messages on a third node is T1-T3-T2.
> But we may also see T1-T2-T3 and T3-T1-T2, e.g. due to network issues
> and so on. For T1-T2-T3 the right decision is to discard the update,
> while for T3-T1-T2 it's to either wait for the INSERT or wait for the
> insert to arrive.
>
> But if we misdetect the situation, we either end up with a row that
> shouldn't be there, or losing an update.

Doesn't the above example indicate that 'update_deleted' should also
be considered a necessary conflict type?  Please see the possibilities
of conflicts in all three cases:


The "correct" order of receiving messages on node C (as suggested
above) is T1-T3-T2   (case1)
----------
T1 will insert the row.
T3 will have update_differ conflict; latest_timestamp wins or apply
will apply it. earliest_timestamp_wins or skip will skip it.
T2 will delete the row (irrespective of whether the update happened or not).
End Result: No Data.

T1-T2-T3
----------
T1 will insert the row.
T2 will delete the row.
T3 will have conflict update_deleted. If it is 'update_deleted', the
chances are that the resolver set here is to 'skip' (default is also
'skip' in this case).

If vacuum has deleted that row (or if we don't support
'update_deleted' conflict), it will be 'update_missing' conflict. In
that case, the user may end up inserting the row if resolver chosen is
in favor of apply (which seems an obvious choice for 'update_missing'
conflict; default is also 'apply_or_skip').

End result:
Row inserted with 'update_missing'.
Row correctly skipped with 'update_deleted' (assuming the obvious
choice seems to be 'skip' for update_deleted case).

So it seems that with 'update_deleted' conflict, there are higher
chances of opting for right decision here (which is to discard the
update), as 'update_deleted' conveys correct info to the user.  The
'update_missing' OTOH does not convey correct info and user may end up
inserting the data by choosing apply favoring resolvers for
'update_missing'. Again, we get benefit of 'update_deleted' for
*recently* deleted rows only.

T3-T1-T2
----------
T3 may end up inserting the record if the resolver is in favor of
'apply' and all the columns are received from remote.
T1 will have' insert_exists' conflict and thus may either overwrite
'updated' values or may leave the data as is (based on whether
resolver is in favor of apply or not)
T2 will end up deleting it.
End Result: No Data.

I feel for second case (and similar cases), 'update_deleted' serves a
better conflict type.

thanks
Shveta



Re: Conflict Detection and Resolution

From
shveta malik
Date:
On Fri, Jun 7, 2024 at 6:10 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
>
> >>> I don't understand the why should update_missing or update_deleted be
> >>> different, especially considering it's not detected reliably. And also
> >>> that even if we happen to find the row the associated TOAST data may
> >>> have already been removed. So why would this matter?
> >>
> >> Here, we are trying to tackle the case where the row is 'recently'
> >> deleted i.e. concurrent UPDATE and DELETE on pub and sub. User may
> >> want to opt for a different resolution in such a case as against the
> >> one where the corresponding row was not even present in the first
> >> place. The case where the row was deleted long back may not fall into
> >> this category as there are higher chances that they have been removed
> >> by vacuum and can be considered equivalent to the update_ missing
> >> case.
> >>
> >> Regarding "TOAST column" for deleted row cases, we may need to dig
> >> more. Thanks for bringing this case. Let me analyze more here.
> >>
> > I tested a simple case with a table with one TOAST column and found
> > that when a tuple with a TOAST column is deleted, both the tuple and
> > corresponding pg_toast entries are marked as ‘deleted’ (dead) but not
> > removed immediately. The main tuple and respective pg_toast entry are
> > permanently deleted only during vacuum. First, the main table’s dead
> > tuples are vacuumed, followed by the secondary TOAST relation ones (if
> > available).
> > Please let us know if you have a specific scenario in mind where the
> > TOAST column data is deleted immediately upon ‘delete’ operation,
> > rather than during vacuum, which we are missing.
> >
>
> I'm pretty sure you can vacuum the TOAST table directly, which means
> you'll end up with a deleted tuple with TOAST pointers, but with the
> TOAST entries already gone.
>

It is true that for a deleted row, its toast entries can be vacuumed
earlier than the original/parent row, but we do not need to be
concerned about that to raise 'update_deleted'. To raise an
'update_deleted' conflict, it is sufficient to know that the row has
been deleted and not yet vacuumed, regardless of the presence or
absence of its toast entries. Once this is determined, we need to
build the tuple from remote data and apply it (provided resolver is
such that). If the tuple cannot be fully constructed from the remote
data, the apply operation will either be skipped or an error will be
raised, depending on whether the user has chosen the apply_or_skip or
apply_or_error option.

In cases where the table has toast columns but the remote data does
not include the toast-column entry (when the toast column is
unmodified and not part of the replica identity),  the resolution for
'update_deleted' will be no worse than for 'update_missing'. That is,
for both the cases, we can not construct full tuple and thus the
operation either needs to be skipped or error needs to be raised.

thanks
Shveta



Re: Conflict Detection and Resolution

From
Tomas Vondra
Date:
On 6/10/24 10:54, Amit Kapila wrote:
> On Fri, Jun 7, 2024 at 6:08 PM Tomas Vondra
> <tomas.vondra@enterprisedb.com> wrote:
>>
>> On 5/27/24 07:48, shveta malik wrote:
>>> On Sat, May 25, 2024 at 2:39 AM Tomas Vondra
>>> <tomas.vondra@enterprisedb.com> wrote:
>>>>
>>>> Which architecture are you aiming for? Here you talk about multiple
>>>> providers, but the wiki page mentions active-active. I'm not sure how
>>>> much this matters, but it might.
>>>
>>> Currently, we are working for multi providers case but ideally it
>>> should work for active-active also. During further discussion and
>>> implementation phase, if we find that, there are cases which will not
>>> work in straight-forward way for active-active, then our primary focus
>>> will remain to first implement it for multiple providers architecture.
>>>
>>>>
>>>> Also, what kind of consistency you expect from this? Because none of
>>>> these simple conflict resolution methods can give you the regular
>>>> consistency models we're used to, AFAICS.
>>>
>>> Can you please explain a little bit more on this.
>>>
>>
>> I was referring to the well established consistency models / isolation
>> levels, e.g. READ COMMITTED or SNAPSHOT ISOLATION. This determines what
>> guarantees the application developer can expect, what anomalies can
>> happen, etc.
>>
>> I don't think any such isolation level can be implemented with a simple
>> conflict resolution methods like last-update-wins etc. For example,
>> consider an active-active where both nodes do
>>
>>   UPDATE accounts SET balance=balance+1000 WHERE id=1
>>
>> This will inevitably lead to a conflict, and while the last-update-wins
>> resolves this "consistently" on both nodes (e.g. ending with the same
>> result), it's essentially a lost update.
>>
> 
> The idea to solve such conflicts is using the delta apply technique
> where the delta from both sides will be applied to the respective
> columns. We do plan to target this as a separate patch. Now, if the
> basic conflict resolution and delta apply both can't go in one
> release, we shall document such cases clearly to avoid misuse of the
> feature.
> 

Perhaps, but it's not like having delta conflict resolution (or even
CRDT as a more generic variant) would lead to a regular consistency
model in a distributed system. At least I don't think it can achieve
that, because of the asynchronicity.

Consider a table with "CHECK (amount < 1000)" constraint, and an update
that sets (amount = amount + 900) on two nodes. AFAIK there's no way to
reconcile this using delta (or any other other) conflict resolution.

Which does not mean we should not have some form of conflict resolution,
as long as we know what the goal is. I simply don't want to spend time
working on this, add a lot of complex code, and then realize it doesn't
give us a consistency model that makes sense.

Which leads me back to my original question - what is the consistency
model this you expect to get from this (possibly when combined with some
other pieces?)?

>> This is a very simplistic example of course, I recall there are various
>> more complex examples involving foreign keys, multi-table transactions,
>> constraints, etc. But in principle it's a manifestation of the same
>> inherent limitation of conflict detection and resolution etc.
>>
>> Similarly, I believe this affects not just active-active, but also the
>> case where one node aggregates data from multiple publishers. Maybe not
>> to the same extent / it might be fine for that use case,
>>
> 
> I am not sure how much it is a problem for general logical replication
> solution but we do intend to work on solving such problems in
> step-wise manner. Trying to attempt everything in one patch doesn't
> seem advisable to me.
> 

I didn't say it needs to be done in one patch. I asked for someone to
explain what is the goal - consistency model observed by the users.

>>
>  but you said
>> the end goal is to use this for active-active. So I'm wondering what's
>> the plan, there.
>>
> 
> I think at this stage we are not ready for active-active because
> leaving aside this feature we need many other features like
> replication of all commands/objects (DDL replication, replicate large
> objects, etc.), Global sequences, some sort of global two_phase
> transaction management for data consistency, etc. So, it would be
> better to consider logical replication cases intending to extend it
> for active-active when we have other required pieces.
> 

We're not ready for active-active, sure. And I'm not saying a conflict
resolution would make us ready. The question is what consistency model
we'd like to get from the active-active, and whether conflict resolution
can get us there ...

As for the other missing bits (DDL replication, large objects, global
sequences), I think those are somewhat independent of the question I'm
asking. And some of the stuff is also somewhat optional - for example I
think it'd be fine to not support large objects or global sequences.

>> If I'm writing an application for active-active using this conflict
>> handling, what assumptions can I make? Will Can I just do stuff as if on
>> a single node, or do I need to be super conscious about the zillion ways
>> things can misbehave in a distributed system?
>>
>> My personal opinion is that the closer this will be to the regular
>> consistency levels, the better. If past experience taught me anything,
>> it's very hard to predict how distributed systems with eventual
>> consistency behave, and even harder to actually test the application in
>> such environment.
>>
> 
> I don't think in any way this can enable users to start writing
> applications for active-active workloads. For something like what you
> are saying, we probably need a global transaction manager (or a global
> two_pc) as well to allow transactions to behave as they are on
> single-node or achieve similar consistency levels. With such
> transaction management, we can allow transactions to commit on a node
> only when it doesn't lead to a conflict on the peer node.
> 

But the wiki linked in the first message says:

   CDR is an important and necessary feature for active-active
   replication.

But if I understand your response, you're saying active-active should
probably use global transaction manager etc. which would prevent
conflicts - but seems to make CDR unnecessary. Or do I understand it wrong?

FWIW I don't think we'd need global components, there are ways to do
distributed snapshots using timestamps (for example), which would give
us snapshot isolation.


>> In any case, if there are any differences compared to the usual
>> behavior, it needs to be very clearly explained in the docs.
>>
> 
> I agree that docs should be clear about the cases that this can and
> can't support.
> 
>>>>
>>>> How is this going to deal with the fact that commit LSN and timestamps
>>>> may not correlate perfectly? That is, commits may happen with LSN1 <
>>>> LSN2 but with T1 > T2.
>>>
>>> Are you pointing to the issue where a session/txn has taken
>>> 'xactStopTimestamp' timestamp earlier but is delayed to insert record
>>> in XLOG, while another session/txn which has taken timestamp slightly
>>> later succeeded to insert the record IN XLOG sooner than the session1,
>>> making LSN and Timestamps out of sync? Going by this scenario, the
>>> commit-timestamp may not be reflective of actual commits and thus
>>> timestamp-based resolvers may take wrong decisions. Or do you mean
>>> something else?
>>>
>>> If this is the problem you are referring to, then I think this needs a
>>> fix at the publisher side. Let me think more about it . Kindly let me
>>> know if you have ideas on how to tackle it.
>>>
>>
>> Yes, this is the issue I'm talking about. We're acquiring the timestamp
>> when not holding the lock to reserve space in WAL, so the LSN and the
>> commit LSN may not actually correlate.
>>
>> Consider this example I discussed with Amit last week:
>>
>> node A:
>>
>>   XACT1: UPDATE t SET v = 1;    LSN1 / T1
>>
>>   XACT2: UPDATE t SET v = 2;    LSN2 / T2
>>
>> node B
>>
>>   XACT3: UPDATE t SET v = 3;    LSN3 / T3
>>
>> And assume LSN1 < LSN2, T1 > T2 (i.e. the commit timestamp inversion),
>> and T2 < T3 < T1. Now consider that the messages may arrive in different
>> orders, due to async replication. Unfortunately, this would lead to
>> different results of the conflict resolution:
>>
>>   XACT1 - XACT2 - XACT3 => v=3 (T3 wins)
>>
>>   XACT3 - XACT1 - XACT2 => v=2 (T2 wins)
>>
>> Now, I realize there's a flaw in this example - the (T1 > T2) inversion
>> can't actually happen, because these transactions have a dependency, and
>> thus won't commit concurrently. XACT1 will complete the commit, because
>> XACT2 starts to commit. And with monotonic clock (which is a requirement
>> for any timestamp-based resolution), that should guarantee (T1 < T2).
>>
>> However, I doubt this is sufficient to declare victory. It's more likely
>> that there still are problems, but the examples are likely more complex
>> (changes to multiple tables, etc.).
>>
> 
> Fair enough, I think we need to analyze this more to find actual
> problems or in some way try to prove that there is no problem.
> 
>> I vaguely remember there were more issues with timestamp inversion, but
>> those might have been related to parallel apply etc.
>>
> 
> Okay, so considering there are problems due to timestamp inversion, I
> think the solution to that problem would probably be somehow
> generating commit LSN and timestamp in order. I don't have a solution
> at this stage but will think more both on the actual problem and
> solution. In the meantime, if you get a chance to refer to the place
> where you have seen such a problem please try to share the same with
> us. It would be helpful.
> 

I think the solution to this would be to acquire the timestamp while
reserving the space (because that happens in LSN order). The clock would
need to be monotonic (easy enough with CLOCK_MONOTONIC), but also cheap.
AFAIK this is the main problem why it's being done outside the critical
section, because gettimeofday() may be quite expensive. There's a
concept of hybrid clock, combining "time" and logical counter, which I
think might be useful independently of CDR ...

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Conflict Detection and Resolution

From
Tomas Vondra
Date:

On 6/10/24 12:56, shveta malik wrote:
> On Fri, Jun 7, 2024 at 6:08 PM Tomas Vondra
> <tomas.vondra@enterprisedb.com> wrote:
>>
>>>>>
>>>>>  UPDATE
>>>>> ================
>>>>>
>>>>> Conflict Detection Method:
>>>>> --------------------------------
>>>>> Origin conflict detection: The ‘origin’ info is used to detect
>>>>> conflict which can be obtained from commit-timestamp generated for
>>>>> incoming txn at the source node. To compare remote’s origin with the
>>>>> local’s origin, we must have origin information for local txns as well
>>>>> which can be obtained from commit-timestamp after enabling
>>>>> ‘track_commit_timestamp’ locally.
>>>>> The one drawback here is the ‘origin’ information cannot be obtained
>>>>> once the row is frozen and the commit-timestamp info is removed by
>>>>> vacuum. For a frozen row, conflicts cannot be raised, and thus the
>>>>> incoming changes will be applied in all the cases.
>>>>>
>>>>> Conflict Types:
>>>>> ----------------
>>>>> a) update_differ: The origin of an incoming update's key row differs
>>>>> from the local row i.e.; the row has already been updated locally or
>>>>> by different nodes.
>>>>> b) update_missing: The row with the same value as that incoming
>>>>> update's key does not exist. Remote is trying to update a row which
>>>>> does not exist locally.
>>>>> c) update_deleted: The row with the same value as that incoming
>>>>> update's key does not exist. The row is already deleted. This conflict
>>>>> type is generated only if the deleted row is still detectable i.e., it
>>>>> is not removed by VACUUM yet. If the row is removed by VACUUM already,
>>>>> it cannot detect this conflict. It will detect it as update_missing
>>>>> and will follow the default or configured resolver of update_missing
>>>>> itself.
>>>>>
>>>>
>>>> I don't understand the why should update_missing or update_deleted be
>>>> different, especially considering it's not detected reliably. And also
>>>> that even if we happen to find the row the associated TOAST data may
>>>> have already been removed. So why would this matter?
>>>
>>> Here, we are trying to tackle the case where the row is 'recently'
>>> deleted i.e. concurrent UPDATE and DELETE on pub and sub. User may
>>> want to opt for a different resolution in such a case as against the
>>> one where the corresponding row was not even present in the first
>>> place. The case where the row was deleted long back may not fall into
>>> this category as there are higher chances that they have been removed
>>> by vacuum and can be considered equivalent to the update_ missing
>>> case.
>>>
>>
>> My point is that if we can't detect the difference reliably, it's not
>> very useful. Consider this example:
>>
>> Node A:
>>
>>   T1: INSERT INTO t (id, value) VALUES (1,1);
>>
>>   T2: DELETE FROM t WHERE id = 1;
>>
>> Node B:
>>
>>   T3: UPDATE t SET value = 2 WHERE id = 1;
>>
>> The "correct" order of received messages on a third node is T1-T3-T2.
>> But we may also see T1-T2-T3 and T3-T1-T2, e.g. due to network issues
>> and so on. For T1-T2-T3 the right decision is to discard the update,
>> while for T3-T1-T2 it's to either wait for the INSERT or wait for the
>> insert to arrive.
>>
>> But if we misdetect the situation, we either end up with a row that
>> shouldn't be there, or losing an update.
> 
> Doesn't the above example indicate that 'update_deleted' should also
> be considered a necessary conflict type?  Please see the possibilities
> of conflicts in all three cases:
> 
> 
> The "correct" order of receiving messages on node C (as suggested
> above) is T1-T3-T2   (case1)
> ----------
> T1 will insert the row.
> T3 will have update_differ conflict; latest_timestamp wins or apply
> will apply it. earliest_timestamp_wins or skip will skip it.
> T2 will delete the row (irrespective of whether the update happened or not).
> End Result: No Data.
> 
> T1-T2-T3
> ----------
> T1 will insert the row.
> T2 will delete the row.
> T3 will have conflict update_deleted. If it is 'update_deleted', the
> chances are that the resolver set here is to 'skip' (default is also
> 'skip' in this case).
> 
> If vacuum has deleted that row (or if we don't support
> 'update_deleted' conflict), it will be 'update_missing' conflict. In
> that case, the user may end up inserting the row if resolver chosen is
> in favor of apply (which seems an obvious choice for 'update_missing'
> conflict; default is also 'apply_or_skip').
> 
> End result:
> Row inserted with 'update_missing'.
> Row correctly skipped with 'update_deleted' (assuming the obvious
> choice seems to be 'skip' for update_deleted case).
> 
> So it seems that with 'update_deleted' conflict, there are higher
> chances of opting for right decision here (which is to discard the
> update), as 'update_deleted' conveys correct info to the user.  The
> 'update_missing' OTOH does not convey correct info and user may end up
> inserting the data by choosing apply favoring resolvers for
> 'update_missing'. Again, we get benefit of 'update_deleted' for
> *recently* deleted rows only.
> 
> T3-T1-T2
> ----------
> T3 may end up inserting the record if the resolver is in favor of
> 'apply' and all the columns are received from remote.
> T1 will have' insert_exists' conflict and thus may either overwrite
> 'updated' values or may leave the data as is (based on whether
> resolver is in favor of apply or not)
> T2 will end up deleting it.
> End Result: No Data.
> 
> I feel for second case (and similar cases), 'update_deleted' serves a
> better conflict type.
> 

True, but this is pretty much just a restatement of the example, right?

The point I was trying to make is that this hinges on the ability to
detect the correct conflict type. And if vacuum can swoop in and remove
the recently deleted tuples (which I believe can happen at any time,
right?), then that's not guaranteed, because we won't see the deleted
tuple anymore. Or am I missing something?

Also, can the resolver even convert the UPDATE into INSERT and proceed?
Maybe with REPLICA IDENTITY FULL? Otherwise the row might be incomplete,
missing required columns etc. In which case it'd have to wait for the
actual INSERT to arrive - which would work for actual update_missing,
where the row may be delayed due to network issues. But if that's a
mistake due to vacuum removing the deleted tuple, it'll wait forever.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Conflict Detection and Resolution

From
shveta malik
Date:
On Mon, Jun 10, 2024 at 5:24 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
>
>
>
> On 6/10/24 12:56, shveta malik wrote:
> > On Fri, Jun 7, 2024 at 6:08 PM Tomas Vondra
> > <tomas.vondra@enterprisedb.com> wrote:
> >>
> >>>>>
> >>>>>  UPDATE
> >>>>> ================
> >>>>>
> >>>>> Conflict Detection Method:
> >>>>> --------------------------------
> >>>>> Origin conflict detection: The ‘origin’ info is used to detect
> >>>>> conflict which can be obtained from commit-timestamp generated for
> >>>>> incoming txn at the source node. To compare remote’s origin with the
> >>>>> local’s origin, we must have origin information for local txns as well
> >>>>> which can be obtained from commit-timestamp after enabling
> >>>>> ‘track_commit_timestamp’ locally.
> >>>>> The one drawback here is the ‘origin’ information cannot be obtained
> >>>>> once the row is frozen and the commit-timestamp info is removed by
> >>>>> vacuum. For a frozen row, conflicts cannot be raised, and thus the
> >>>>> incoming changes will be applied in all the cases.
> >>>>>
> >>>>> Conflict Types:
> >>>>> ----------------
> >>>>> a) update_differ: The origin of an incoming update's key row differs
> >>>>> from the local row i.e.; the row has already been updated locally or
> >>>>> by different nodes.
> >>>>> b) update_missing: The row with the same value as that incoming
> >>>>> update's key does not exist. Remote is trying to update a row which
> >>>>> does not exist locally.
> >>>>> c) update_deleted: The row with the same value as that incoming
> >>>>> update's key does not exist. The row is already deleted. This conflict
> >>>>> type is generated only if the deleted row is still detectable i.e., it
> >>>>> is not removed by VACUUM yet. If the row is removed by VACUUM already,
> >>>>> it cannot detect this conflict. It will detect it as update_missing
> >>>>> and will follow the default or configured resolver of update_missing
> >>>>> itself.
> >>>>>
> >>>>
> >>>> I don't understand the why should update_missing or update_deleted be
> >>>> different, especially considering it's not detected reliably. And also
> >>>> that even if we happen to find the row the associated TOAST data may
> >>>> have already been removed. So why would this matter?
> >>>
> >>> Here, we are trying to tackle the case where the row is 'recently'
> >>> deleted i.e. concurrent UPDATE and DELETE on pub and sub. User may
> >>> want to opt for a different resolution in such a case as against the
> >>> one where the corresponding row was not even present in the first
> >>> place. The case where the row was deleted long back may not fall into
> >>> this category as there are higher chances that they have been removed
> >>> by vacuum and can be considered equivalent to the update_ missing
> >>> case.
> >>>
> >>
> >> My point is that if we can't detect the difference reliably, it's not
> >> very useful. Consider this example:
> >>
> >> Node A:
> >>
> >>   T1: INSERT INTO t (id, value) VALUES (1,1);
> >>
> >>   T2: DELETE FROM t WHERE id = 1;
> >>
> >> Node B:
> >>
> >>   T3: UPDATE t SET value = 2 WHERE id = 1;
> >>
> >> The "correct" order of received messages on a third node is T1-T3-T2.
> >> But we may also see T1-T2-T3 and T3-T1-T2, e.g. due to network issues
> >> and so on. For T1-T2-T3 the right decision is to discard the update,
> >> while for T3-T1-T2 it's to either wait for the INSERT or wait for the
> >> insert to arrive.
> >>
> >> But if we misdetect the situation, we either end up with a row that
> >> shouldn't be there, or losing an update.
> >
> > Doesn't the above example indicate that 'update_deleted' should also
> > be considered a necessary conflict type?  Please see the possibilities
> > of conflicts in all three cases:
> >
> >
> > The "correct" order of receiving messages on node C (as suggested
> > above) is T1-T3-T2   (case1)
> > ----------
> > T1 will insert the row.
> > T3 will have update_differ conflict; latest_timestamp wins or apply
> > will apply it. earliest_timestamp_wins or skip will skip it.
> > T2 will delete the row (irrespective of whether the update happened or not).
> > End Result: No Data.
> >
> > T1-T2-T3
> > ----------
> > T1 will insert the row.
> > T2 will delete the row.
> > T3 will have conflict update_deleted. If it is 'update_deleted', the
> > chances are that the resolver set here is to 'skip' (default is also
> > 'skip' in this case).
> >
> > If vacuum has deleted that row (or if we don't support
> > 'update_deleted' conflict), it will be 'update_missing' conflict. In
> > that case, the user may end up inserting the row if resolver chosen is
> > in favor of apply (which seems an obvious choice for 'update_missing'
> > conflict; default is also 'apply_or_skip').
> >
> > End result:
> > Row inserted with 'update_missing'.
> > Row correctly skipped with 'update_deleted' (assuming the obvious
> > choice seems to be 'skip' for update_deleted case).
> >
> > So it seems that with 'update_deleted' conflict, there are higher
> > chances of opting for right decision here (which is to discard the
> > update), as 'update_deleted' conveys correct info to the user.  The
> > 'update_missing' OTOH does not convey correct info and user may end up
> > inserting the data by choosing apply favoring resolvers for
> > 'update_missing'. Again, we get benefit of 'update_deleted' for
> > *recently* deleted rows only.
> >
> > T3-T1-T2
> > ----------
> > T3 may end up inserting the record if the resolver is in favor of
> > 'apply' and all the columns are received from remote.
> > T1 will have' insert_exists' conflict and thus may either overwrite
> > 'updated' values or may leave the data as is (based on whether
> > resolver is in favor of apply or not)
> > T2 will end up deleting it.
> > End Result: No Data.
> >
> > I feel for second case (and similar cases), 'update_deleted' serves a
> > better conflict type.
> >
>
> True, but this is pretty much just a restatement of the example, right?
>
> The point I was trying to make is that this hinges on the ability to
> detect the correct conflict type. And if vacuum can swoop in and remove
> the recently deleted tuples (which I believe can happen at any time,
> right?), then that's not guaranteed, because we won't see the deleted
> tuple anymore.

Yes, that's correct. However, many cases could benefit from the
update_deleted conflict type if it can be implemented reliably. That's
why we wanted to give it a try. But if we can't achieve predictable
results with it, I'm fine to drop this approach and conflict_type. We
can consider a better design in the future that doesn't depend on
non-vacuumed entries and provides a more robust method for identifying
deleted rows.

> Also, can the resolver even convert the UPDATE into INSERT and proceed?
> Maybe with REPLICA IDENTITY FULL?

Yes, it can, as long as the row doesn't contain toasted data. Without
toasted data, the new tuple is fully logged. However, if the row does
contain toasted data, the new tuple won't log it completely. In such a
case, REPLICA IDENTITY FULL becomes a requirement to ensure we have
all the data necessary to create the row on the target side. In
absence of RI full and with row lacking toasted data, the operation
will be skipped or error will be raised.

> Otherwise the row might be incomplete,
> missing required columns etc. In which case it'd have to wait for the
> actual INSERT to arrive - which would work for actual update_missing,
> where the row may be delayed due to network issues. But if that's a
> mistake due to vacuum removing the deleted tuple, it'll wait forever.

Even in case of 'update_missing', we do not intend to wait for 'actual
insert' to arrive, as it is not guaranteed if the 'insert' will arrive
or not. And thus we plan to skip or error out  (based on user's
configuration) if a complete row can not be created for insertion.

thanks
Shveta



Re: Conflict Detection and Resolution

From
Ashutosh Bapat
Date:


On Sat, Jun 8, 2024 at 3:52 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Jun 7, 2024 at 5:39 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> On Thu, Jun 6, 2024 at 5:16 PM Nisha Moond <nisha.moond412@gmail.com> wrote:
>>
>> >
>>
>> Here are more use cases of the "earliest_timestamp_wins" resolution method:
>> 1) Applications where the record of first occurrence of an event is
>> important. For example, sensor based applications like earthquake
>> detection systems, capturing the first seismic wave's time is crucial.
>> 2) Scheduling systems, like appointment booking, prioritize the
>> earliest request when handling concurrent ones.
>> 3) In contexts where maintaining chronological order is important -
>>   a) Social media platforms display comments ensuring that the
>> earliest ones are visible first.
>>   b) Finance transaction processing systems rely on timestamps to
>> prioritize the processing of transactions, ensuring that the earliest
>> transaction is handled first
>
>
> Thanks for sharing examples. However, these scenarios would be handled by the application and not during replication. What we are discussing here is the timestamp when a row was updated/inserted/deleted (or rather when the transaction that updated row committed/became visible) and not a DML on column which is of type timestamp. Some implementations use a hidden timestamp column but that's different from a user column which captures timestamp of (say) an event. The conflict resolution will be based on the timestamp when that column's value was recorded in the database which may be different from the value of the column itself.
>

It depends on how these operations are performed. For example, the
appointment booking system could be prioritized via a transaction
updating a row with columns emp_name, emp_id, reserved, time_slot.
Now, if two employees at different geographical locations try to book
the calendar, the earlier transaction will win.

I doubt that it would be that simple. The application will have to intervene and tell one of the employees that their reservation has failed. It looks natural that the first one to reserve the room should get the reservation, but implementing that is more complex than resolving a conflict in the database. In fact, mostly it will be handled outside database.
 

> If we use the transaction commit timestamp as basis for resolution, a transaction where multiple rows conflict may end up with different rows affected by that transaction being resolved differently. Say three transactions T1, T2 and T3 on separate origins with timestamps t1, t2, and t3 respectively changed rows r1, r2 and r2, r3 and r1, r4 respectively. Changes to r1 and r2 will conflict. Let's say T2 and T3 are applied first and then T1 is applied. If t2 < t1 < t3, r1 will end up with version of T3 and r2 will end up with version of T1 after applying all the three transactions.
>

Are you telling the results based on latest_timestamp_wins? If so,
then it is correct. OTOH, if the user has configured
"earliest_timestamp_wins" resolution method, then we should end up
with a version of r1 from T1 because t1 < t3. Also, due to the same
reason, we should have version r2 from T2.

>
 Would that introduce an inconsistency between r1 and r2?
>

As per my understanding, this shouldn't be an inconsistency. Won't it
be true even when the transactions are performed on a single node with
the same timing?


The inconsistency will arise irrespective of conflict resolution method. On a single system effects of whichever transaction runs last will be visible entirely. But in the example above the node where T1, T2, and T3 (from *different*) origins) are applied, we might end up with a situation where some changes from T1 are applied whereas some changes from T3 are applied.
 
--
Best Wishes,
Ashutosh Bapat

Re: Conflict Detection and Resolution

From
Tomas Vondra
Date:

On 6/11/24 10:35, shveta malik wrote:
> On Mon, Jun 10, 2024 at 5:24 PM Tomas Vondra
> <tomas.vondra@enterprisedb.com> wrote:
>>
>>
>>
>> On 6/10/24 12:56, shveta malik wrote:
>>> On Fri, Jun 7, 2024 at 6:08 PM Tomas Vondra
>>> <tomas.vondra@enterprisedb.com> wrote:
>>>>
>>>>>>>
>>>>>>>  UPDATE
>>>>>>> ================
>>>>>>>
>>>>>>> Conflict Detection Method:
>>>>>>> --------------------------------
>>>>>>> Origin conflict detection: The ‘origin’ info is used to detect
>>>>>>> conflict which can be obtained from commit-timestamp generated for
>>>>>>> incoming txn at the source node. To compare remote’s origin with the
>>>>>>> local’s origin, we must have origin information for local txns as well
>>>>>>> which can be obtained from commit-timestamp after enabling
>>>>>>> ‘track_commit_timestamp’ locally.
>>>>>>> The one drawback here is the ‘origin’ information cannot be obtained
>>>>>>> once the row is frozen and the commit-timestamp info is removed by
>>>>>>> vacuum. For a frozen row, conflicts cannot be raised, and thus the
>>>>>>> incoming changes will be applied in all the cases.
>>>>>>>
>>>>>>> Conflict Types:
>>>>>>> ----------------
>>>>>>> a) update_differ: The origin of an incoming update's key row differs
>>>>>>> from the local row i.e.; the row has already been updated locally or
>>>>>>> by different nodes.
>>>>>>> b) update_missing: The row with the same value as that incoming
>>>>>>> update's key does not exist. Remote is trying to update a row which
>>>>>>> does not exist locally.
>>>>>>> c) update_deleted: The row with the same value as that incoming
>>>>>>> update's key does not exist. The row is already deleted. This conflict
>>>>>>> type is generated only if the deleted row is still detectable i.e., it
>>>>>>> is not removed by VACUUM yet. If the row is removed by VACUUM already,
>>>>>>> it cannot detect this conflict. It will detect it as update_missing
>>>>>>> and will follow the default or configured resolver of update_missing
>>>>>>> itself.
>>>>>>>
>>>>>>
>>>>>> I don't understand the why should update_missing or update_deleted be
>>>>>> different, especially considering it's not detected reliably. And also
>>>>>> that even if we happen to find the row the associated TOAST data may
>>>>>> have already been removed. So why would this matter?
>>>>>
>>>>> Here, we are trying to tackle the case where the row is 'recently'
>>>>> deleted i.e. concurrent UPDATE and DELETE on pub and sub. User may
>>>>> want to opt for a different resolution in such a case as against the
>>>>> one where the corresponding row was not even present in the first
>>>>> place. The case where the row was deleted long back may not fall into
>>>>> this category as there are higher chances that they have been removed
>>>>> by vacuum and can be considered equivalent to the update_ missing
>>>>> case.
>>>>>
>>>>
>>>> My point is that if we can't detect the difference reliably, it's not
>>>> very useful. Consider this example:
>>>>
>>>> Node A:
>>>>
>>>>   T1: INSERT INTO t (id, value) VALUES (1,1);
>>>>
>>>>   T2: DELETE FROM t WHERE id = 1;
>>>>
>>>> Node B:
>>>>
>>>>   T3: UPDATE t SET value = 2 WHERE id = 1;
>>>>
>>>> The "correct" order of received messages on a third node is T1-T3-T2.
>>>> But we may also see T1-T2-T3 and T3-T1-T2, e.g. due to network issues
>>>> and so on. For T1-T2-T3 the right decision is to discard the update,
>>>> while for T3-T1-T2 it's to either wait for the INSERT or wait for the
>>>> insert to arrive.
>>>>
>>>> But if we misdetect the situation, we either end up with a row that
>>>> shouldn't be there, or losing an update.
>>>
>>> Doesn't the above example indicate that 'update_deleted' should also
>>> be considered a necessary conflict type?  Please see the possibilities
>>> of conflicts in all three cases:
>>>
>>>
>>> The "correct" order of receiving messages on node C (as suggested
>>> above) is T1-T3-T2   (case1)
>>> ----------
>>> T1 will insert the row.
>>> T3 will have update_differ conflict; latest_timestamp wins or apply
>>> will apply it. earliest_timestamp_wins or skip will skip it.
>>> T2 will delete the row (irrespective of whether the update happened or not).
>>> End Result: No Data.
>>>
>>> T1-T2-T3
>>> ----------
>>> T1 will insert the row.
>>> T2 will delete the row.
>>> T3 will have conflict update_deleted. If it is 'update_deleted', the
>>> chances are that the resolver set here is to 'skip' (default is also
>>> 'skip' in this case).
>>>
>>> If vacuum has deleted that row (or if we don't support
>>> 'update_deleted' conflict), it will be 'update_missing' conflict. In
>>> that case, the user may end up inserting the row if resolver chosen is
>>> in favor of apply (which seems an obvious choice for 'update_missing'
>>> conflict; default is also 'apply_or_skip').
>>>
>>> End result:
>>> Row inserted with 'update_missing'.
>>> Row correctly skipped with 'update_deleted' (assuming the obvious
>>> choice seems to be 'skip' for update_deleted case).
>>>
>>> So it seems that with 'update_deleted' conflict, there are higher
>>> chances of opting for right decision here (which is to discard the
>>> update), as 'update_deleted' conveys correct info to the user.  The
>>> 'update_missing' OTOH does not convey correct info and user may end up
>>> inserting the data by choosing apply favoring resolvers for
>>> 'update_missing'. Again, we get benefit of 'update_deleted' for
>>> *recently* deleted rows only.
>>>
>>> T3-T1-T2
>>> ----------
>>> T3 may end up inserting the record if the resolver is in favor of
>>> 'apply' and all the columns are received from remote.
>>> T1 will have' insert_exists' conflict and thus may either overwrite
>>> 'updated' values or may leave the data as is (based on whether
>>> resolver is in favor of apply or not)
>>> T2 will end up deleting it.
>>> End Result: No Data.
>>>
>>> I feel for second case (and similar cases), 'update_deleted' serves a
>>> better conflict type.
>>>
>>
>> True, but this is pretty much just a restatement of the example, right?
>>
>> The point I was trying to make is that this hinges on the ability to
>> detect the correct conflict type. And if vacuum can swoop in and remove
>> the recently deleted tuples (which I believe can happen at any time,
>> right?), then that's not guaranteed, because we won't see the deleted
>> tuple anymore.
> 
> Yes, that's correct. However, many cases could benefit from the
> update_deleted conflict type if it can be implemented reliably. That's
> why we wanted to give it a try. But if we can't achieve predictable
> results with it, I'm fine to drop this approach and conflict_type. We
> can consider a better design in the future that doesn't depend on
> non-vacuumed entries and provides a more robust method for identifying
> deleted rows.
> 

I agree having a separate update_deleted conflict would be beneficial,
I'm not arguing against that - my point is actually that I think this
conflict type is required, and that it needs to be detected reliably.

I'm not sure dropping update_deleted entirely would be a good idea,
though. It pretty much guarantees making the wrong decision at least
sometimes. But at least it's predictable and users are more likely to
notice that (compared to update_delete working on well-behaving systems,
and then failing when a node starts lagging or something).

That's my opinion, though, and I don't intend to stay in the way. But I
think the solution is not that difficult - something needs to prevent
cleanup of recently dead tuples (until the "relevant" changes are
received and applied from other nodes). I don't know if that could be
done based on information we have for subscriptions, or if we need
something new.

>> Also, can the resolver even convert the UPDATE into INSERT and proceed?
>> Maybe with REPLICA IDENTITY FULL?
> 
> Yes, it can, as long as the row doesn't contain toasted data. Without
> toasted data, the new tuple is fully logged. However, if the row does
> contain toasted data, the new tuple won't log it completely. In such a
> case, REPLICA IDENTITY FULL becomes a requirement to ensure we have
> all the data necessary to create the row on the target side. In
> absence of RI full and with row lacking toasted data, the operation
> will be skipped or error will be raised.
> 
>> Otherwise the row might be incomplete,
>> missing required columns etc. In which case it'd have to wait for the
>> actual INSERT to arrive - which would work for actual update_missing,
>> where the row may be delayed due to network issues. But if that's a
>> mistake due to vacuum removing the deleted tuple, it'll wait forever.
> 
> Even in case of 'update_missing', we do not intend to wait for 'actual
> insert' to arrive, as it is not guaranteed if the 'insert' will arrive
> or not. And thus we plan to skip or error out  (based on user's
> configuration) if a complete row can not be created for insertion.
> 

If the UPDATE contains all the columns and can be turned into an INSERT,
then that seems reasonable. But I don't see how skipping it could work
in general (except for some very simple / specific use cases). I'm not
sure if you suggest to skip just the one UPDATE or transaction as a
whole, but it seems to me either of those options could easily lead to
all kinds of inconsistencies and user confusion.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Conflict Detection and Resolution

From
Dilip Kumar
Date:
On Tue, Jun 11, 2024 at 7:44 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

> > Yes, that's correct. However, many cases could benefit from the
> > update_deleted conflict type if it can be implemented reliably. That's
> > why we wanted to give it a try. But if we can't achieve predictable
> > results with it, I'm fine to drop this approach and conflict_type. We
> > can consider a better design in the future that doesn't depend on
> > non-vacuumed entries and provides a more robust method for identifying
> > deleted rows.
> >
>
> I agree having a separate update_deleted conflict would be beneficial,
> I'm not arguing against that - my point is actually that I think this
> conflict type is required, and that it needs to be detected reliably.
>

When working with a distributed system, we must accept some form of
eventual consistency model. However, it's essential to design a
predictable and acceptable behavior. For example, if a change is a
result of a previous operation (such as an update on node B triggered
after observing an operation on node A), we can say that the operation
on node A happened before the operation on node B. Conversely, if
operations on nodes A and B are independent, we consider them
concurrent.

In distributed systems, clock skew is a known issue. To establish a
consistency model, we need to ensure it guarantees the
"happens-before" relationship. Consider a scenario with three nodes:
NodeA, NodeB, and NodeC. If NodeA sends changes to NodeB, and
subsequently NodeB makes changes, and then both NodeA's and NodeB's
changes are sent to NodeC, the clock skew might make NodeB's changes
appear to have occurred before NodeA's changes. However, we should
maintain data that indicates NodeB's changes were triggered after
NodeA's changes arrived at NodeB. This implies that logically, NodeB's
changes happened after NodeA's changes, despite what the timestamps
suggest.

A common method to handle such cases is using vector clocks for
conflict resolution. "Vector clocks" allow us to track the causal
relationships between changes across nodes, ensuring that we can
correctly order events and resolve conflicts in a manner that respects
the "happens-before" relationship. This method helps maintain
consistency and predictability in the system despite issues like clock
skew.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: Conflict Detection and Resolution

From
Amit Kapila
Date:
On Mon, Jun 10, 2024 at 5:12 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
>
> On 6/10/24 10:54, Amit Kapila wrote:
> > On Fri, Jun 7, 2024 at 6:08 PM Tomas Vondra
> > <tomas.vondra@enterprisedb.com> wrote:
> >>
> >> On 5/27/24 07:48, shveta malik wrote:
> >>> On Sat, May 25, 2024 at 2:39 AM Tomas Vondra
> >>> <tomas.vondra@enterprisedb.com> wrote:
> >>>>
> >>>> Which architecture are you aiming for? Here you talk about multiple
> >>>> providers, but the wiki page mentions active-active. I'm not sure how
> >>>> much this matters, but it might.
> >>>
> >>> Currently, we are working for multi providers case but ideally it
> >>> should work for active-active also. During further discussion and
> >>> implementation phase, if we find that, there are cases which will not
> >>> work in straight-forward way for active-active, then our primary focus
> >>> will remain to first implement it for multiple providers architecture.
> >>>
> >>>>
> >>>> Also, what kind of consistency you expect from this? Because none of
> >>>> these simple conflict resolution methods can give you the regular
> >>>> consistency models we're used to, AFAICS.
> >>>
> >>> Can you please explain a little bit more on this.
> >>>
> >>
> >> I was referring to the well established consistency models / isolation
> >> levels, e.g. READ COMMITTED or SNAPSHOT ISOLATION. This determines what
> >> guarantees the application developer can expect, what anomalies can
> >> happen, etc.
> >>
> >> I don't think any such isolation level can be implemented with a simple
> >> conflict resolution methods like last-update-wins etc. For example,
> >> consider an active-active where both nodes do
> >>
> >>   UPDATE accounts SET balance=balance+1000 WHERE id=1
> >>
> >> This will inevitably lead to a conflict, and while the last-update-wins
> >> resolves this "consistently" on both nodes (e.g. ending with the same
> >> result), it's essentially a lost update.
> >>
> >
> > The idea to solve such conflicts is using the delta apply technique
> > where the delta from both sides will be applied to the respective
> > columns. We do plan to target this as a separate patch. Now, if the
> > basic conflict resolution and delta apply both can't go in one
> > release, we shall document such cases clearly to avoid misuse of the
> > feature.
> >
>
> Perhaps, but it's not like having delta conflict resolution (or even
> CRDT as a more generic variant) would lead to a regular consistency
> model in a distributed system. At least I don't think it can achieve
> that, because of the asynchronicity.
>
> Consider a table with "CHECK (amount < 1000)" constraint, and an update
> that sets (amount = amount + 900) on two nodes. AFAIK there's no way to
> reconcile this using delta (or any other other) conflict resolution.
>

Right, in such a case an error will be generated and I agree that we
can't always reconcile the updates on different nodes and some data
loss is unavoidable with or without conflict resolution.

> Which does not mean we should not have some form of conflict resolution,
> as long as we know what the goal is. I simply don't want to spend time
> working on this, add a lot of complex code, and then realize it doesn't
> give us a consistency model that makes sense.
>
> Which leads me back to my original question - what is the consistency
> model this you expect to get from this (possibly when combined with some
> other pieces?)?
>

I don't think this feature per se (or some additional features like
delta apply) can help with improving/changing the consistency model
our current logical replication module provides (which as per my
understanding is an eventual consistency model). This feature will
help with reducing the number of cases where manual intervention is
required with configurable way to resolve conflicts. For example, for
primary key violation ERRORs, or when we intentionally overwrite the
data even when there is conflicting data present from different
origin, or for cases we simply skip the remote data when there is a
conflict in the local node.

To achieve consistent reads on all nodes we either need a distributed
transaction using a two-phase commit with some sort of quorum
protocol, or a sharded database with multiple primaries each
responsible for a unique partition of the data, or some other way. The
current proposal doesn't intend to implement any of those.

--
With Regards,
Amit Kapila.



Re: Conflict Detection and Resolution

From
shveta malik
Date:
On Tue, Jun 11, 2024 at 7:44 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
>
>
>
> On 6/11/24 10:35, shveta malik wrote:
> > On Mon, Jun 10, 2024 at 5:24 PM Tomas Vondra
> > <tomas.vondra@enterprisedb.com> wrote:
> >>
> >>
> >>
> >> On 6/10/24 12:56, shveta malik wrote:
> >>> On Fri, Jun 7, 2024 at 6:08 PM Tomas Vondra
> >>> <tomas.vondra@enterprisedb.com> wrote:
> >>>>
> >>>>>>>
> >>>>>>>  UPDATE
> >>>>>>> ================
> >>>>>>>
> >>>>>>> Conflict Detection Method:
> >>>>>>> --------------------------------
> >>>>>>> Origin conflict detection: The ‘origin’ info is used to detect
> >>>>>>> conflict which can be obtained from commit-timestamp generated for
> >>>>>>> incoming txn at the source node. To compare remote’s origin with the
> >>>>>>> local’s origin, we must have origin information for local txns as well
> >>>>>>> which can be obtained from commit-timestamp after enabling
> >>>>>>> ‘track_commit_timestamp’ locally.
> >>>>>>> The one drawback here is the ‘origin’ information cannot be obtained
> >>>>>>> once the row is frozen and the commit-timestamp info is removed by
> >>>>>>> vacuum. For a frozen row, conflicts cannot be raised, and thus the
> >>>>>>> incoming changes will be applied in all the cases.
> >>>>>>>
> >>>>>>> Conflict Types:
> >>>>>>> ----------------
> >>>>>>> a) update_differ: The origin of an incoming update's key row differs
> >>>>>>> from the local row i.e.; the row has already been updated locally or
> >>>>>>> by different nodes.
> >>>>>>> b) update_missing: The row with the same value as that incoming
> >>>>>>> update's key does not exist. Remote is trying to update a row which
> >>>>>>> does not exist locally.
> >>>>>>> c) update_deleted: The row with the same value as that incoming
> >>>>>>> update's key does not exist. The row is already deleted. This conflict
> >>>>>>> type is generated only if the deleted row is still detectable i.e., it
> >>>>>>> is not removed by VACUUM yet. If the row is removed by VACUUM already,
> >>>>>>> it cannot detect this conflict. It will detect it as update_missing
> >>>>>>> and will follow the default or configured resolver of update_missing
> >>>>>>> itself.
> >>>>>>>
> >>>>>>
> >>>>>> I don't understand the why should update_missing or update_deleted be
> >>>>>> different, especially considering it's not detected reliably. And also
> >>>>>> that even if we happen to find the row the associated TOAST data may
> >>>>>> have already been removed. So why would this matter?
> >>>>>
> >>>>> Here, we are trying to tackle the case where the row is 'recently'
> >>>>> deleted i.e. concurrent UPDATE and DELETE on pub and sub. User may
> >>>>> want to opt for a different resolution in such a case as against the
> >>>>> one where the corresponding row was not even present in the first
> >>>>> place. The case where the row was deleted long back may not fall into
> >>>>> this category as there are higher chances that they have been removed
> >>>>> by vacuum and can be considered equivalent to the update_ missing
> >>>>> case.
> >>>>>
> >>>>
> >>>> My point is that if we can't detect the difference reliably, it's not
> >>>> very useful. Consider this example:
> >>>>
> >>>> Node A:
> >>>>
> >>>>   T1: INSERT INTO t (id, value) VALUES (1,1);
> >>>>
> >>>>   T2: DELETE FROM t WHERE id = 1;
> >>>>
> >>>> Node B:
> >>>>
> >>>>   T3: UPDATE t SET value = 2 WHERE id = 1;
> >>>>
> >>>> The "correct" order of received messages on a third node is T1-T3-T2.
> >>>> But we may also see T1-T2-T3 and T3-T1-T2, e.g. due to network issues
> >>>> and so on. For T1-T2-T3 the right decision is to discard the update,
> >>>> while for T3-T1-T2 it's to either wait for the INSERT or wait for the
> >>>> insert to arrive.
> >>>>
> >>>> But if we misdetect the situation, we either end up with a row that
> >>>> shouldn't be there, or losing an update.
> >>>
> >>> Doesn't the above example indicate that 'update_deleted' should also
> >>> be considered a necessary conflict type?  Please see the possibilities
> >>> of conflicts in all three cases:
> >>>
> >>>
> >>> The "correct" order of receiving messages on node C (as suggested
> >>> above) is T1-T3-T2   (case1)
> >>> ----------
> >>> T1 will insert the row.
> >>> T3 will have update_differ conflict; latest_timestamp wins or apply
> >>> will apply it. earliest_timestamp_wins or skip will skip it.
> >>> T2 will delete the row (irrespective of whether the update happened or not).
> >>> End Result: No Data.
> >>>
> >>> T1-T2-T3
> >>> ----------
> >>> T1 will insert the row.
> >>> T2 will delete the row.
> >>> T3 will have conflict update_deleted. If it is 'update_deleted', the
> >>> chances are that the resolver set here is to 'skip' (default is also
> >>> 'skip' in this case).
> >>>
> >>> If vacuum has deleted that row (or if we don't support
> >>> 'update_deleted' conflict), it will be 'update_missing' conflict. In
> >>> that case, the user may end up inserting the row if resolver chosen is
> >>> in favor of apply (which seems an obvious choice for 'update_missing'
> >>> conflict; default is also 'apply_or_skip').
> >>>
> >>> End result:
> >>> Row inserted with 'update_missing'.
> >>> Row correctly skipped with 'update_deleted' (assuming the obvious
> >>> choice seems to be 'skip' for update_deleted case).
> >>>
> >>> So it seems that with 'update_deleted' conflict, there are higher
> >>> chances of opting for right decision here (which is to discard the
> >>> update), as 'update_deleted' conveys correct info to the user.  The
> >>> 'update_missing' OTOH does not convey correct info and user may end up
> >>> inserting the data by choosing apply favoring resolvers for
> >>> 'update_missing'. Again, we get benefit of 'update_deleted' for
> >>> *recently* deleted rows only.
> >>>
> >>> T3-T1-T2
> >>> ----------
> >>> T3 may end up inserting the record if the resolver is in favor of
> >>> 'apply' and all the columns are received from remote.
> >>> T1 will have' insert_exists' conflict and thus may either overwrite
> >>> 'updated' values or may leave the data as is (based on whether
> >>> resolver is in favor of apply or not)
> >>> T2 will end up deleting it.
> >>> End Result: No Data.
> >>>
> >>> I feel for second case (and similar cases), 'update_deleted' serves a
> >>> better conflict type.
> >>>
> >>
> >> True, but this is pretty much just a restatement of the example, right?
> >>
> >> The point I was trying to make is that this hinges on the ability to
> >> detect the correct conflict type. And if vacuum can swoop in and remove
> >> the recently deleted tuples (which I believe can happen at any time,
> >> right?), then that's not guaranteed, because we won't see the deleted
> >> tuple anymore.
> >
> > Yes, that's correct. However, many cases could benefit from the
> > update_deleted conflict type if it can be implemented reliably. That's
> > why we wanted to give it a try. But if we can't achieve predictable
> > results with it, I'm fine to drop this approach and conflict_type. We
> > can consider a better design in the future that doesn't depend on
> > non-vacuumed entries and provides a more robust method for identifying
> > deleted rows.
> >
>
> I agree having a separate update_deleted conflict would be beneficial,
> I'm not arguing against that - my point is actually that I think this
> conflict type is required, and that it needs to be detected reliably.
>
> I'm not sure dropping update_deleted entirely would be a good idea,
> though. It pretty much guarantees making the wrong decision at least
> sometimes. But at least it's predictable and users are more likely to
> notice that (compared to update_delete working on well-behaving systems,
> and then failing when a node starts lagging or something).
>
> That's my opinion, though, and I don't intend to stay in the way. But I
> think the solution is not that difficult - something needs to prevent
> cleanup of recently dead tuples (until the "relevant" changes are
> received and applied from other nodes). I don't know if that could be
> done based on information we have for subscriptions, or if we need
> something new.

I agree that without update_deleted, there are higher chances of
making incorrect decisions in some cases. But not sure if relying on
delaying vacuum from removing such rows is a full proof plan. We
cannot predict if or when "relevant" changes will occur, so how long
should we delay the vacuum?
To address this problem, we may need a completely different approach.
One solution could be to store deleted rows in a separate table
(dead-rows-table) so we can consult that table for any deleted entries
at any time. Additionally, we would need methods to purge older data
from the dead-rows-table to prevent it from growing too large. This
would be a substantial project on its own, so we can aim to implement
some initial and simple conflict resolution methods first before
tackling this more complex solution.

> >> Also, can the resolver even convert the UPDATE into INSERT and proceed?
> >> Maybe with REPLICA IDENTITY FULL?
> >
> > Yes, it can, as long as the row doesn't contain toasted data. Without
> > toasted data, the new tuple is fully logged. However, if the row does
> > contain toasted data, the new tuple won't log it completely. In such a
> > case, REPLICA IDENTITY FULL becomes a requirement to ensure we have
> > all the data necessary to create the row on the target side. In
> > absence of RI full and with row lacking toasted data, the operation
> > will be skipped or error will be raised.
> >
> >> Otherwise the row might be incomplete,
> >> missing required columns etc. In which case it'd have to wait for the
> >> actual INSERT to arrive - which would work for actual update_missing,
> >> where the row may be delayed due to network issues. But if that's a
> >> mistake due to vacuum removing the deleted tuple, it'll wait forever.
> >
> > Even in case of 'update_missing', we do not intend to wait for 'actual
> > insert' to arrive, as it is not guaranteed if the 'insert' will arrive
> > or not. And thus we plan to skip or error out  (based on user's
> > configuration) if a complete row can not be created for insertion.
> >
>
> If the UPDATE contains all the columns and can be turned into an INSERT,
> then that seems reasonable. But I don't see how skipping it could work
> in general (except for some very simple / specific use cases). I'm not
> sure if you suggest to skip just the one UPDATE or transaction as a
> whole, but it seems to me either of those options could easily lead to
> all kinds of inconsistencies and user confusion.

Conflict resolution is row-based, meaning that whatever action we
choose (error or skip) applies to the specific change rather than the
entire transaction. I'm not sure if waiting indefinitely for an INSERT
to arrive is a good idea, as the node that triggered the INSERT might
be down for an extended period. At best, we could provide a
configuration parameter using which the apply worker waits for a
specified time period for the INSERT to arrive before either skipping
or throwing an error.

That said, even if we error out or skip and log without waiting for
the INSERT, we won't introduce any new inconsistencies. This is the
current behavior on pg-HEAD. But with options like apply_or_skip and
apply_or_error, we have a better chance of resolving conflicts by
constructing the complete row internally, without user's intervention.
There will still be some cases where we can't fully reconstruct the
row, but in those instances, the behavior won't be any worse than the
current pg-HEAD.

thanks
Shveta



Re: Conflict Detection and Resolution

From
Tomas Vondra
Date:

On 6/12/24 06:32, Dilip Kumar wrote:
> On Tue, Jun 11, 2024 at 7:44 PM Tomas Vondra
> <tomas.vondra@enterprisedb.com> wrote:
> 
>>> Yes, that's correct. However, many cases could benefit from the
>>> update_deleted conflict type if it can be implemented reliably. That's
>>> why we wanted to give it a try. But if we can't achieve predictable
>>> results with it, I'm fine to drop this approach and conflict_type. We
>>> can consider a better design in the future that doesn't depend on
>>> non-vacuumed entries and provides a more robust method for identifying
>>> deleted rows.
>>>
>>
>> I agree having a separate update_deleted conflict would be beneficial,
>> I'm not arguing against that - my point is actually that I think this
>> conflict type is required, and that it needs to be detected reliably.
>>
> 
> When working with a distributed system, we must accept some form of
> eventual consistency model.

I'm not sure this is necessarily true. There are distributed databases
implementing (or aiming to) regular consistency models, without eventual
consistency. I'm not saying it's easy, but it shows eventual consistency
is not the only option.

> However, it's essential to design a
> predictable and acceptable behavior. For example, if a change is a
> result of a previous operation (such as an update on node B triggered
> after observing an operation on node A), we can say that the operation
> on node A happened before the operation on node B. Conversely, if
> operations on nodes A and B are independent, we consider them
> concurrent.
> 

Right. And this is precisely the focus or my questions - understanding
what behavior we aim for / expect in the end. Or said differently, what
anomalies / weird behavior would be considered expected.

Because that's important both for discussions about feasibility, etc.
And also for evaluation / reviews of the patch.

> In distributed systems, clock skew is a known issue. To establish a
> consistency model, we need to ensure it guarantees the
> "happens-before" relationship. Consider a scenario with three nodes:
> NodeA, NodeB, and NodeC. If NodeA sends changes to NodeB, and
> subsequently NodeB makes changes, and then both NodeA's and NodeB's
> changes are sent to NodeC, the clock skew might make NodeB's changes
> appear to have occurred before NodeA's changes. However, we should
> maintain data that indicates NodeB's changes were triggered after
> NodeA's changes arrived at NodeB. This implies that logically, NodeB's
> changes happened after NodeA's changes, despite what the timestamps
> suggest.
> 
> A common method to handle such cases is using vector clocks for
> conflict resolution. "Vector clocks" allow us to track the causal
> relationships between changes across nodes, ensuring that we can
> correctly order events and resolve conflicts in a manner that respects
> the "happens-before" relationship. This method helps maintain
> consistency and predictability in the system despite issues like clock
> skew.
> 

I'm familiar with the concept of vector clock (or logical clock in
general), but it's not clear to me how you plan to use this in the
context of the conflict handling. Can you elaborate/explain?

The way I see it, conflict handling is pretty tightly coupled with
regular commit timestamps and MVCC in general. How would you use vector
clock to change that?

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Conflict Detection and Resolution

From
Dilip Kumar
Date:
On Wed, Jun 12, 2024 at 5:26 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
>
> >> I agree having a separate update_deleted conflict would be beneficial,
> >> I'm not arguing against that - my point is actually that I think this
> >> conflict type is required, and that it needs to be detected reliably.
> >>
> >
> > When working with a distributed system, we must accept some form of
> > eventual consistency model.
>
> I'm not sure this is necessarily true. There are distributed databases
> implementing (or aiming to) regular consistency models, without eventual
> consistency. I'm not saying it's easy, but it shows eventual consistency
> is not the only option.

Right, that statement might not be completely accurate. Based on the
CAP theorem, when a network partition is unavoidable and availability
is expected, we often choose an eventual consistency model. However,
claiming that a fully consistent model is impossible in any
distributed system is incorrect, as it can be achieved using
mechanisms like Two-Phase Commit.

We must also accept that our PostgreSQL replication mechanism does not
guarantee a fully consistent model. Even with synchronous commit, it
only waits for the WAL to be replayed on the standby but does not
change the commit decision based on other nodes. This means, at most,
we can only guarantee "Read Your Write" consistency.

> > However, it's essential to design a
> > predictable and acceptable behavior. For example, if a change is a
> > result of a previous operation (such as an update on node B triggered
> > after observing an operation on node A), we can say that the operation
> > on node A happened before the operation on node B. Conversely, if
> > operations on nodes A and B are independent, we consider them
> > concurrent.
> >
>
> Right. And this is precisely the focus or my questions - understanding
> what behavior we aim for / expect in the end. Or said differently, what
> anomalies / weird behavior would be considered expected.

> Because that's important both for discussions about feasibility, etc.
> And also for evaluation / reviews of the patch.

+1

> > In distributed systems, clock skew is a known issue. To establish a
> > consistency model, we need to ensure it guarantees the
> > "happens-before" relationship. Consider a scenario with three nodes:
> > NodeA, NodeB, and NodeC. If NodeA sends changes to NodeB, and
> > subsequently NodeB makes changes, and then both NodeA's and NodeB's
> > changes are sent to NodeC, the clock skew might make NodeB's changes
> > appear to have occurred before NodeA's changes. However, we should
> > maintain data that indicates NodeB's changes were triggered after
> > NodeA's changes arrived at NodeB. This implies that logically, NodeB's
> > changes happened after NodeA's changes, despite what the timestamps
> > suggest.
> >
> > A common method to handle such cases is using vector clocks for
> > conflict resolution. "Vector clocks" allow us to track the causal
> > relationships between changes across nodes, ensuring that we can
> > correctly order events and resolve conflicts in a manner that respects
> > the "happens-before" relationship. This method helps maintain
> > consistency and predictability in the system despite issues like clock
> > skew.
> >
>
> I'm familiar with the concept of vector clock (or logical clock in
> general), but it's not clear to me how you plan to use this in the
> context of the conflict handling. Can you elaborate/explain?
>
> The way I see it, conflict handling is pretty tightly coupled with
> regular commit timestamps and MVCC in general. How would you use vector
> clock to change that?

The issue with using commit timestamps is that, when multiple nodes
are involved, the commit timestamp won't accurately represent the
actual order of operations. There's no reliable way to determine the
perfect order of each operation happening on different nodes roughly
simultaneously unless we use some globally synchronized counter.
Generally, that order might not cause real issues unless one operation
is triggered by a previous operation, and relying solely on physical
timestamps would not detect that correctly.

We need some sort of logical counter, such as a vector clock, which
might be an independent counter on each node but can perfectly track
the causal order. For example, if NodeA observes an operation from
NodeB with a counter value of X, NodeA will adjust its counter to X+1.
This ensures that if NodeA has seen an operation from NodeB, its next
operation will appear to have occurred after NodeB's operation.

I admit that I haven't fully thought through how we could design such
version tracking in our logical replication protocol or how it would
fit into our system. However, my point is that we need to consider
something beyond commit timestamps to achieve reliable ordering.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: Conflict Detection and Resolution

From
Masahiko Sawada
Date:
On Wed, Jun 5, 2024 at 3:32 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
>
> Hi,
>
> This time at PGconf.dev[1], we had some discussions regarding this
> project. The proposed approach is to split the work into two main
> components. The first part focuses on conflict detection, which aims to
> identify and report conflicts in logical replication. This feature will
> enable users to monitor the unexpected conflicts that may occur. The
> second part involves the actual conflict resolution. Here, we will provide
> built-in resolutions for each conflict and allow user to choose which
> resolution will be used for which conflict(as described in the initial
> email of this thread).

I agree with this direction that we focus on conflict detection (and
logging) first and then develop conflict resolution on top of that.

>
> Of course, we are open to alternative ideas and suggestions, and the
> strategy above can be changed based on ongoing discussions and feedback
> received.
>
> Here is the patch of the first part work, which adds a new parameter
> detect_conflict for CREATE and ALTER subscription commands. This new
> parameter will decide if subscription will go for conflict detection. By
> default, conflict detection will be off for a subscription.
>
> When conflict detection is enabled, additional logging is triggered in the
> following conflict scenarios:
>
> * updating a row that was previously modified by another origin.
> * The tuple to be updated is not found.
> * The tuple to be deleted is not found.
>
> While there exist other conflict types in logical replication, such as an
> incoming insert conflicting with an existing row due to a primary key or
> unique index, these cases already result in constraint violation errors.

What does detect_conflict being true actually mean to users? I
understand that detect_conflict being true could introduce some
overhead to detect conflicts. But in terms of conflict detection, even
if detect_confict is false, we detect some conflicts such as
concurrent inserts with the same key. Once we introduce the complete
conflict detection feature, I'm not sure there is a case where a user
wants to detect only some particular types of conflict.

> Therefore, additional conflict detection for these cases is currently
> omitted to minimize potential overhead. However, the pre-detection for
> conflict in these error cases is still essential to support automatic
> conflict resolution in the future.

I feel that we should log all types of conflict in an uniform way. For
example, with detect_conflict being true, the update_differ conflict
is reported as "conflict %s detected on relation "%s"", whereas
concurrent inserts with the same key is reported as "duplicate key
value violates unique constraint "%s"", which could confuse users.
Ideally, I think that we log such conflict detection details (table
name, column name, conflict type, etc) to somewhere (e.g. a table or
server logs) so that the users can resolve them manually.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: Conflict Detection and Resolution

From
Amit Kapila
Date:
On Thu, Jun 13, 2024 at 11:41 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Wed, Jun 5, 2024 at 3:32 PM Zhijie Hou (Fujitsu)
> <houzj.fnst@fujitsu.com> wrote:
> >
> > Hi,
> >
> > This time at PGconf.dev[1], we had some discussions regarding this
> > project. The proposed approach is to split the work into two main
> > components. The first part focuses on conflict detection, which aims to
> > identify and report conflicts in logical replication. This feature will
> > enable users to monitor the unexpected conflicts that may occur. The
> > second part involves the actual conflict resolution. Here, we will provide
> > built-in resolutions for each conflict and allow user to choose which
> > resolution will be used for which conflict(as described in the initial
> > email of this thread).
>
> I agree with this direction that we focus on conflict detection (and
> logging) first and then develop conflict resolution on top of that.
>
> >
> > Of course, we are open to alternative ideas and suggestions, and the
> > strategy above can be changed based on ongoing discussions and feedback
> > received.
> >
> > Here is the patch of the first part work, which adds a new parameter
> > detect_conflict for CREATE and ALTER subscription commands. This new
> > parameter will decide if subscription will go for conflict detection. By
> > default, conflict detection will be off for a subscription.
> >
> > When conflict detection is enabled, additional logging is triggered in the
> > following conflict scenarios:
> >
> > * updating a row that was previously modified by another origin.
> > * The tuple to be updated is not found.
> > * The tuple to be deleted is not found.
> >
> > While there exist other conflict types in logical replication, such as an
> > incoming insert conflicting with an existing row due to a primary key or
> > unique index, these cases already result in constraint violation errors.
>
> What does detect_conflict being true actually mean to users? I
> understand that detect_conflict being true could introduce some
> overhead to detect conflicts. But in terms of conflict detection, even
> if detect_confict is false, we detect some conflicts such as
> concurrent inserts with the same key. Once we introduce the complete
> conflict detection feature, I'm not sure there is a case where a user
> wants to detect only some particular types of conflict.
>

You are right that users would wish to detect the conflicts and
probably the extra effort would only be in the 'update_differ' case
where we need to consult committs module and that we will only do when
'track_commit_timestamp' is true. BTW, I think for Inserts with
primary/unique key violation, we should catch the ERROR and log it. If
we want to log the conflicts in a separate table then do we want to do
that in the catch block after getting pk violation or do an extra scan
before 'INSERT' to find the conflict? I think logging would need extra
cost especially if we want to LOG it in some table as you are
suggesting below that may need some option.

> > Therefore, additional conflict detection for these cases is currently
> > omitted to minimize potential overhead. However, the pre-detection for
> > conflict in these error cases is still essential to support automatic
> > conflict resolution in the future.
>
> I feel that we should log all types of conflict in an uniform way. For
> example, with detect_conflict being true, the update_differ conflict
> is reported as "conflict %s detected on relation "%s"", whereas
> concurrent inserts with the same key is reported as "duplicate key
> value violates unique constraint "%s"", which could confuse users.
> Ideally, I think that we log such conflict detection details (table
> name, column name, conflict type, etc) to somewhere (e.g. a table or
> server logs) so that the users can resolve them manually.
>

It is good to think if there is a value in providing in
pg_conflicts_history kind of table which will have details of
conflicts that occurred and then we can extend it to have resolutions.
I feel we can anyway LOG the conflicts by default. Updating a separate
table with conflicts should be done by default or with a knob is a
point to consider.

--
With Regards,
Amit Kapila.



Re: Conflict Detection and Resolution

From
Peter Eisentraut
Date:
On 23.05.24 08:36, shveta malik wrote:
> Conflict Resolution
> ----------------
> a) latest_timestamp_wins:    The change with later commit timestamp wins.
> b) earliest_timestamp_wins:   The change with earlier commit timestamp wins.
> c) apply:   Always apply the remote change.
> d) skip:    Remote change is skipped.
> e) error:   Error out on conflict. Replication is stopped, manual
> action is needed.

You might be aware of pglogical, which has similar conflict resolution 
modes, but they appear to be spelled a bit different.  It might be worth 
reviewing this, so that we don't unnecessarily introduce differences.

https://github.com/2ndquadrant/pglogical?tab=readme-ov-file#conflicts

There might also be other inspiration to be found related to this in 
pglogical documentation or code.




Re: Conflict Detection and Resolution

From
Alvaro Herrera
Date:
On 2024-Jun-07, Tomas Vondra wrote:

> On 6/3/24 09:30, Amit Kapila wrote:
> > On Sat, May 25, 2024 at 2:39 AM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:

> >> How is this going to deal with the fact that commit LSN and timestamps
> >> may not correlate perfectly? That is, commits may happen with LSN1 <
> >> LSN2 but with T1 > T2.

> But as I wrote, I'm not quite convinced this means there are not other
> issues with this way of resolving conflicts. It's more likely a more
> complex scenario is required.

Jan Wieck approached me during pgconf.dev to reproach me of this
problem.  He also said he had some code to fix-up the commit TS
afterwards somehow, to make the sequence monotonically increasing.
Perhaps we should consider that, to avoid any problems caused by the
difference between LSN order and TS order.   It might be quite
nightmarish to try to make the system work correctly without
reasonable constraints of that nature.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/



Re: Conflict Detection and Resolution

From
"Jonathan S. Katz"
Date:
On 6/13/24 7:28 AM, Amit Kapila wrote:

> You are right that users would wish to detect the conflicts and
> probably the extra effort would only be in the 'update_differ' case
> where we need to consult committs module and that we will only do when
> 'track_commit_timestamp' is true. BTW, I think for Inserts with
> primary/unique key violation, we should catch the ERROR and log it. If
> we want to log the conflicts in a separate table then do we want to do
> that in the catch block after getting pk violation or do an extra scan
> before 'INSERT' to find the conflict? I think logging would need extra
> cost especially if we want to LOG it in some table as you are
> suggesting below that may need some option.
> 
>>> Therefore, additional conflict detection for these cases is currently
>>> omitted to minimize potential overhead. However, the pre-detection for
>>> conflict in these error cases is still essential to support automatic
>>> conflict resolution in the future.
>>
>> I feel that we should log all types of conflict in an uniform way. For
>> example, with detect_conflict being true, the update_differ conflict
>> is reported as "conflict %s detected on relation "%s"", whereas
>> concurrent inserts with the same key is reported as "duplicate key
>> value violates unique constraint "%s"", which could confuse users.
>> Ideally, I think that we log such conflict detection details (table
>> name, column name, conflict type, etc) to somewhere (e.g. a table or
>> server logs) so that the users can resolve them manually.
>>
> 
> It is good to think if there is a value in providing in
> pg_conflicts_history kind of table which will have details of
> conflicts that occurred and then we can extend it to have resolutions.
> I feel we can anyway LOG the conflicts by default. Updating a separate
> table with conflicts should be done by default or with a knob is a
> point to consider.

+1 for logging conflicts uniformly, but I would +100 to exposing the log 
in a way that's easy for the user to query (whether it's a system view 
or a stat table). Arguably, I'd say that would be the most important 
feature to come out of this effort.

Removing how conflicts are resolved, users want to know exactly what row 
had a conflict, and users from other database systems that have dealt 
with these issues will have tooling to be able to review and analyze if 
a conflicts occur. This data is typically stored in a queryable table, 
with data retained for N days. When you add in automatic conflict 
resolution, users then want to have a record of how the conflict was 
resolved, in case they need to manually update it.

Having this data in a table also gives the user opportunity to 
understand conflict stats (e.g. conflict rates) and potentially identify 
portions of the application and other parts of the system to optimize. 
It also makes it easier to import to downstream systems that may perform 
further analysis on conflict resolution, or alarm if a conflict rate 
exceeds a certain threshold.

Thanks,

Jonathan



Attachment

Re: Conflict Detection and Resolution

From
Robert Haas
Date:
On Thu, May 23, 2024 at 2:37 AM shveta malik <shveta.malik@gmail.com> wrote:
> c) update_deleted: The row with the same value as that incoming
> update's key does not exist. The row is already deleted. This conflict
> type is generated only if the deleted row is still detectable i.e., it
> is not removed by VACUUM yet. If the row is removed by VACUUM already,
> it cannot detect this conflict. It will detect it as update_missing
> and will follow the default or configured resolver of update_missing
> itself.

I think this design is categorically unacceptable. It amounts to
designing a feature that works except when it doesn't. I'm not exactly
sure how the proposal should be changed to avoid depending on the
timing of VACUUM, but I think it's absolutely not OK to depend on the
timing of VACUUm -- or, really, this is going to depend on the timing
of HOT-pruning, which will often happen almost instantly.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: Conflict Detection and Resolution

From
Amit Kapila
Date:
On Thu, Jun 13, 2024 at 7:00 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>
> On 2024-Jun-07, Tomas Vondra wrote:
>
> > On 6/3/24 09:30, Amit Kapila wrote:
> > > On Sat, May 25, 2024 at 2:39 AM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
>
> > >> How is this going to deal with the fact that commit LSN and timestamps
> > >> may not correlate perfectly? That is, commits may happen with LSN1 <
> > >> LSN2 but with T1 > T2.
>
> > But as I wrote, I'm not quite convinced this means there are not other
> > issues with this way of resolving conflicts. It's more likely a more
> > complex scenario is required.
>
> Jan Wieck approached me during pgconf.dev to reproach me of this
> problem.  He also said he had some code to fix-up the commit TS
> afterwards somehow, to make the sequence monotonically increasing.
> Perhaps we should consider that, to avoid any problems caused by the
> difference between LSN order and TS order.   It might be quite
> nightmarish to try to make the system work correctly without
> reasonable constraints of that nature.
>

I agree with this but the problem Jan was worried about was not
directly reproducible in what the PostgreSQL provides at least that is
what I understood then. We are also unable to think of a concrete
scenario where this is a problem but we are planning to spend more
time deriving a test to reproducible the problem.

--
With Regards,
Amit Kapila.



Re: Conflict Detection and Resolution

From
Amit Kapila
Date:
On Thu, Jun 13, 2024 at 11:18 PM Jonathan S. Katz <jkatz@postgresql.org> wrote:
>
> On 6/13/24 7:28 AM, Amit Kapila wrote:
> >>
> >> I feel that we should log all types of conflict in an uniform way. For
> >> example, with detect_conflict being true, the update_differ conflict
> >> is reported as "conflict %s detected on relation "%s"", whereas
> >> concurrent inserts with the same key is reported as "duplicate key
> >> value violates unique constraint "%s"", which could confuse users.
> >> Ideally, I think that we log such conflict detection details (table
> >> name, column name, conflict type, etc) to somewhere (e.g. a table or
> >> server logs) so that the users can resolve them manually.
> >>
> >
> > It is good to think if there is a value in providing in
> > pg_conflicts_history kind of table which will have details of
> > conflicts that occurred and then we can extend it to have resolutions.
> > I feel we can anyway LOG the conflicts by default. Updating a separate
> > table with conflicts should be done by default or with a knob is a
> > point to consider.
>
> +1 for logging conflicts uniformly, but I would +100 to exposing the log
> in a way that's easy for the user to query (whether it's a system view
> or a stat table). Arguably, I'd say that would be the most important
> feature to come out of this effort.
>

We can have both the system view and a stats table. The system view
could have some sort of cumulative stats data like how many times a
particular conflict had occurred and the table would provide detailed
information about the conflict. The one challenge I see in providing a
table is in its cleanup mechanism. We could prove a partitioned table
such that users can truncate/drop the not needed partitions or provide
a non-partitioned table where users can delete the old data in which
case they generate a work for auto vacuum.

> Removing how conflicts are resolved, users want to know exactly what row
> had a conflict, and users from other database systems that have dealt
> with these issues will have tooling to be able to review and analyze if
> a conflicts occur. This data is typically stored in a queryable table,
> with data retained for N days. When you add in automatic conflict
> resolution, users then want to have a record of how the conflict was
> resolved, in case they need to manually update it.
>
> Having this data in a table also gives the user opportunity to
> understand conflict stats (e.g. conflict rates) and potentially identify
> portions of the application and other parts of the system to optimize.
> It also makes it easier to import to downstream systems that may perform
> further analysis on conflict resolution, or alarm if a conflict rate
> exceeds a certain threshold.
>

Agreed those are good use cases to store conflict history.

--
With Regards,
Amit Kapila.



Re: Conflict Detection and Resolution

From
Amit Kapila
Date:
On Fri, Jun 14, 2024 at 12:10 AM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Thu, May 23, 2024 at 2:37 AM shveta malik <shveta.malik@gmail.com> wrote:
> > c) update_deleted: The row with the same value as that incoming
> > update's key does not exist. The row is already deleted. This conflict
> > type is generated only if the deleted row is still detectable i.e., it
> > is not removed by VACUUM yet. If the row is removed by VACUUM already,
> > it cannot detect this conflict. It will detect it as update_missing
> > and will follow the default or configured resolver of update_missing
> > itself.
>
> I think this design is categorically unacceptable. It amounts to
> designing a feature that works except when it doesn't. I'm not exactly
> sure how the proposal should be changed to avoid depending on the
> timing of VACUUM, but I think it's absolutely not OK to depend on the
> timing of VACUUm -- or, really, this is going to depend on the timing
> of HOT-pruning, which will often happen almost instantly.
>

Agreed, above Tomas has speculated to have a way to avoid vacuum
cleaning dead tuples until the required changes are received and
applied. Shveta also mentioned another way to have deads-store (say a
table where deleted rows are stored for resolution) [1] which is
similar to a technique used by some other databases. There is an
agreement to not rely on Vacuum to detect such a conflict but the
alternative is not clear. Currently, we are thinking to consider such
a conflict type as update_missing (The row with the same value as that
incoming update's key does not exist.). This is how the current HEAD
code behaves and LOGs the information (logical replication did not
find row to be updated ..).

[1] - https://www.postgresql.org/message-id/CAJpy0uCov4JfZJeOvY0O21_gk9bcgNUDp4jf8%2BBbMp%2BEAv8cVQ%40mail.gmail.com

--
With Regards,
Amit Kapila.



RE: Conflict Detection and Resolution

From
"Zhijie Hou (Fujitsu)"
Date:
On Thursday, June 13, 2024 8:46 PM Peter Eisentraut <peter@eisentraut.org> wrote:
> 
> On 23.05.24 08:36, shveta malik wrote:
> > Conflict Resolution
> > ----------------
> > a) latest_timestamp_wins:    The change with later commit timestamp
> wins.
> > b) earliest_timestamp_wins:   The change with earlier commit timestamp
> wins.
> > c) apply:   Always apply the remote change.
> > d) skip:    Remote change is skipped.
> > e) error:   Error out on conflict. Replication is stopped, manual
> > action is needed.
> 
> You might be aware of pglogical, which has similar conflict resolution modes,
> but they appear to be spelled a bit different.  It might be worth reviewing this,
> so that we don't unnecessarily introduce differences.

Right. Some of the proposed resolution names are different from pglogical's
while the functionalities are the same. The following is the comparison with
pglogical:

 latest_timestamp_wins(proposal) - last_update_wins(pglogical)
 earliest_timestamp_wins(proposal) - first_update_wins(pglogical)
 apply(proposal)                   - apply_remote(pglogical)
 skip(proposal)                    - keep_local(pglogical)

I personally think the pglogical's names read more naturally. But others may
have different opinions on this.

> 
> https://github.com/2ndquadrant/pglogical?tab=readme-ov-file#conflicts
> 
> There might also be other inspiration to be found related to this in pglogical
> documentation or code.

Another difference is that we allow users to specify different resolutions for
different conflicts, while pglogical allows specifying one resolution for all conflict.
I think the proposed approach offers more flexibility to users, which seems more
favorable to me.

Best Regards,
Hou zj

Re: Conflict Detection and Resolution

From
Tomas Vondra
Date:

On 6/14/24 13:29, Amit Kapila wrote:
> On Fri, Jun 14, 2024 at 12:10 AM Robert Haas <robertmhaas@gmail.com> wrote:
>>
>> On Thu, May 23, 2024 at 2:37 AM shveta malik <shveta.malik@gmail.com> wrote:
>>> c) update_deleted: The row with the same value as that incoming
>>> update's key does not exist. The row is already deleted. This conflict
>>> type is generated only if the deleted row is still detectable i.e., it
>>> is not removed by VACUUM yet. If the row is removed by VACUUM already,
>>> it cannot detect this conflict. It will detect it as update_missing
>>> and will follow the default or configured resolver of update_missing
>>> itself.
>>
>> I think this design is categorically unacceptable. It amounts to
>> designing a feature that works except when it doesn't. I'm not exactly
>> sure how the proposal should be changed to avoid depending on the
>> timing of VACUUM, but I think it's absolutely not OK to depend on the
>> timing of VACUUm -- or, really, this is going to depend on the timing
>> of HOT-pruning, which will often happen almost instantly.
>>
> 
> Agreed, above Tomas has speculated to have a way to avoid vacuum
> cleaning dead tuples until the required changes are received and
> applied. Shveta also mentioned another way to have deads-store (say a
> table where deleted rows are stored for resolution) [1] which is
> similar to a technique used by some other databases. There is an
> agreement to not rely on Vacuum to detect such a conflict but the
> alternative is not clear.

I'm not sure I'd say I "speculated" about it - it's not like we don't
have ways to hold off cleanup for a while for various reasons
(long-running query, replication slot, hot-standby feedback, ...).

How exactly would that be implemented I don't know, but it seems like a
far simpler approach than inventing a new "dead store". It'd need logic
to let the vacuum to cleanup the stuff no longer needed, but so would
the dead store I think.

> Currently, we are thinking to consider such
> a conflict type as update_missing (The row with the same value as that
> incoming update's key does not exist.). This is how the current HEAD
> code behaves and LOGs the information (logical replication did not
> find row to be updated ..).
> 

I thought the agreement was we need both conflict types to get sensible
behavior, so proceeding with just the update_missing (mostly because we
don't know how to detect these conflicts reliably) seems like maybe not
be the right direction ...


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Conflict Detection and Resolution

From
Tomas Vondra
Date:

On 6/13/24 06:52, Dilip Kumar wrote:
> On Wed, Jun 12, 2024 at 5:26 PM Tomas Vondra
> <tomas.vondra@enterprisedb.com> wrote:
>>
>>>> I agree having a separate update_deleted conflict would be beneficial,
>>>> I'm not arguing against that - my point is actually that I think this
>>>> conflict type is required, and that it needs to be detected reliably.
>>>>
>>>
>>> When working with a distributed system, we must accept some form of
>>> eventual consistency model.
>>
>> I'm not sure this is necessarily true. There are distributed databases
>> implementing (or aiming to) regular consistency models, without eventual
>> consistency. I'm not saying it's easy, but it shows eventual consistency
>> is not the only option.
>
> Right, that statement might not be completely accurate. Based on the
> CAP theorem, when a network partition is unavoidable and availability
> is expected, we often choose an eventual consistency model. However,
> claiming that a fully consistent model is impossible in any
> distributed system is incorrect, as it can be achieved using
> mechanisms like Two-Phase Commit.
>
> We must also accept that our PostgreSQL replication mechanism does not
> guarantee a fully consistent model. Even with synchronous commit, it
> only waits for the WAL to be replayed on the standby but does not
> change the commit decision based on other nodes. This means, at most,
> we can only guarantee "Read Your Write" consistency.
>
Perhaps, but even accepting eventual consistency does not absolve us
from actually defining what that means, ensuring it's sensible enough to
be practical/usable, and that it actually converges to a consistent
state (that's essentially the problem of the update conflict types,
because misdetecting it results in diverging results).

>>> However, it's essential to design a
>>> predictable and acceptable behavior. For example, if a change is a
>>> result of a previous operation (such as an update on node B triggered
>>> after observing an operation on node A), we can say that the operation
>>> on node A happened before the operation on node B. Conversely, if
>>> operations on nodes A and B are independent, we consider them
>>> concurrent.
>>>
>>
>> Right. And this is precisely the focus or my questions - understanding
>> what behavior we aim for / expect in the end. Or said differently, what
>> anomalies / weird behavior would be considered expected.
>
>> Because that's important both for discussions about feasibility, etc.
>> And also for evaluation / reviews of the patch.
>
> +1
>
>>> In distributed systems, clock skew is a known issue. To establish a
>>> consistency model, we need to ensure it guarantees the
>>> "happens-before" relationship. Consider a scenario with three nodes:
>>> NodeA, NodeB, and NodeC. If NodeA sends changes to NodeB, and
>>> subsequently NodeB makes changes, and then both NodeA's and NodeB's
>>> changes are sent to NodeC, the clock skew might make NodeB's changes
>>> appear to have occurred before NodeA's changes. However, we should
>>> maintain data that indicates NodeB's changes were triggered after
>>> NodeA's changes arrived at NodeB. This implies that logically, NodeB's
>>> changes happened after NodeA's changes, despite what the timestamps
>>> suggest.
>>>
>>> A common method to handle such cases is using vector clocks for
>>> conflict resolution. "Vector clocks" allow us to track the causal
>>> relationships between changes across nodes, ensuring that we can
>>> correctly order events and resolve conflicts in a manner that respects
>>> the "happens-before" relationship. This method helps maintain
>>> consistency and predictability in the system despite issues like clock
>>> skew.
>>>
>>
>> I'm familiar with the concept of vector clock (or logical clock in
>> general), but it's not clear to me how you plan to use this in the
>> context of the conflict handling. Can you elaborate/explain?
>>
>> The way I see it, conflict handling is pretty tightly coupled with
>> regular commit timestamps and MVCC in general. How would you use vector
>> clock to change that?
>
> The issue with using commit timestamps is that, when multiple nodes
> are involved, the commit timestamp won't accurately represent the
> actual order of operations. There's no reliable way to determine the
> perfect order of each operation happening on different nodes roughly
> simultaneously unless we use some globally synchronized counter.
> Generally, that order might not cause real issues unless one operation
> is triggered by a previous operation, and relying solely on physical
> timestamps would not detect that correctly.
>
This whole conflict detection / resolution proposal is based on using
commit timestamps. Aren't you suggesting it can't really work with
commit timestamps?

FWIW there are ways to builds distributed consistency with timestamps,
as long as it's monotonic - e.g. clock-SI does that. It's not perfect,
but it shows it's possible.

However, I'm not we have to go there - it depends on what the goal is.
For a one-directional replication (multiple nodes replicating to the
same target) it might be sufficient if the conflict resolution is
"deterministic" (e.g. not dependent on the order in which the changes
are applied). I'm not sure, but it's why I asked what's the goal in my
very first message in this thread.

> We need some sort of logical counter, such as a vector clock, which
> might be an independent counter on each node but can perfectly track
> the causal order. For example, if NodeA observes an operation from
> NodeB with a counter value of X, NodeA will adjust its counter to X+1.
> This ensures that if NodeA has seen an operation from NodeB, its next
> operation will appear to have occurred after NodeB's operation.
>
> I admit that I haven't fully thought through how we could design such
> version tracking in our logical replication protocol or how it would
> fit into our system. However, my point is that we need to consider
> something beyond commit timestamps to achieve reliable ordering.
>

I can't really respond to this as there's no suggestion how it would be
implemented in the patch discussed in this thread.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Conflict Detection and Resolution

From
Amit Kapila
Date:
On Mon, Jun 17, 2024 at 4:18 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
>
> On 6/14/24 13:29, Amit Kapila wrote:
> > On Fri, Jun 14, 2024 at 12:10 AM Robert Haas <robertmhaas@gmail.com> wrote:
> >>
> >> On Thu, May 23, 2024 at 2:37 AM shveta malik <shveta.malik@gmail.com> wrote:
> >>> c) update_deleted: The row with the same value as that incoming
> >>> update's key does not exist. The row is already deleted. This conflict
> >>> type is generated only if the deleted row is still detectable i.e., it
> >>> is not removed by VACUUM yet. If the row is removed by VACUUM already,
> >>> it cannot detect this conflict. It will detect it as update_missing
> >>> and will follow the default or configured resolver of update_missing
> >>> itself.
> >>
> >> I think this design is categorically unacceptable. It amounts to
> >> designing a feature that works except when it doesn't. I'm not exactly
> >> sure how the proposal should be changed to avoid depending on the
> >> timing of VACUUM, but I think it's absolutely not OK to depend on the
> >> timing of VACUUm -- or, really, this is going to depend on the timing
> >> of HOT-pruning, which will often happen almost instantly.
> >>
> >
> > Agreed, above Tomas has speculated to have a way to avoid vacuum
> > cleaning dead tuples until the required changes are received and
> > applied. Shveta also mentioned another way to have deads-store (say a
> > table where deleted rows are stored for resolution) [1] which is
> > similar to a technique used by some other databases. There is an
> > agreement to not rely on Vacuum to detect such a conflict but the
> > alternative is not clear.
>
> I'm not sure I'd say I "speculated" about it - it's not like we don't
> have ways to hold off cleanup for a while for various reasons
> (long-running query, replication slot, hot-standby feedback, ...).
>
> How exactly would that be implemented I don't know, but it seems like a
> far simpler approach than inventing a new "dead store". It'd need logic
> to let the vacuum to cleanup the stuff no longer needed, but so would
> the dead store I think.
>

The difference w.r.t the existing mechanisms for holding deleted data
is that we don't know whether we need to hold off the vacuum from
cleaning up the rows because we can't say with any certainty whether
other nodes will perform any conflicting operations in the future.
Using the example we discussed,
Node A:
  T1: INSERT INTO t (id, value) VALUES (1,1);
  T2: DELETE FROM t WHERE id = 1;

Node B:
  T3: UPDATE t SET value = 2 WHERE id = 1;

Say the order of receiving the commands is T1-T2-T3. We can't predict
whether we will ever get T-3, so on what basis shall we try to prevent
vacuum from removing the deleted row? The one factor could be time,
say we define a new parameter vacuum_committs_age which would indicate
that we will allow rows to be removed only if the modified time of the
tuple as indicated by committs module is greater than the
vacuum_committs_age. This needs more analysis if we want to pursue
this direction.

OTOH, in the existing mechanisms, there is a common factor among all
which is that we know that there is some event that requires data to
be present. For example, with a long-running query, we know that the
deleted/updated row is still visible for some running query. For
replication slots, we know that the client will acknowledge the
feedback in terms of LSN using which we can allow vacuum to remove
rows. Similar to these hot_standby_feedback allows the vacuum to
prevent row removal based on current activity (the xid horizons
required by queries on standby) on hot_standby.

> > Currently, we are thinking to consider such
> > a conflict type as update_missing (The row with the same value as that
> > incoming update's key does not exist.). This is how the current HEAD
> > code behaves and LOGs the information (logical replication did not
> > find row to be updated ..).
> >
>
> I thought the agreement was we need both conflict types to get sensible
> behavior, so proceeding with just the update_missing (mostly because we
> don't know how to detect these conflicts reliably) seems like maybe not
> be the right direction ...
>

Fair enough. I am also not in favor of ignoring this but if as a first
step, we want to improve our current conflict detection mechanism and
provide the stats or conflict information in some catalog or view, we
can do that even if update_delete is not detected. For example, as of
now, we only detect update_missing and simply LOG it at DEBUG1 level.
Additionally, we can detect update_differ (the row updated by a
different origin) and have some stats. We seem to have some agreement
that conflict detection and stats about the same could be the first
step.

--
With Regards,
Amit Kapila.



Re: Conflict Detection and Resolution

From
Dilip Kumar
Date:
On Mon, Jun 17, 2024 at 5:38 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
>

> > The issue with using commit timestamps is that, when multiple nodes
> > are involved, the commit timestamp won't accurately represent the
> > actual order of operations. There's no reliable way to determine the
> > perfect order of each operation happening on different nodes roughly
> > simultaneously unless we use some globally synchronized counter.
> > Generally, that order might not cause real issues unless one operation
> > is triggered by a previous operation, and relying solely on physical
> > timestamps would not detect that correctly.
> >
> This whole conflict detection / resolution proposal is based on using
> commit timestamps. Aren't you suggesting it can't really work with
> commit timestamps?
>
> FWIW there are ways to builds distributed consistency with timestamps,
> as long as it's monotonic - e.g. clock-SI does that. It's not perfect,
> but it shows it's possible.

Hmm, I see that clock-SI does this by delaying the transaction when it
detects the clock skew.

> However, I'm not we have to go there - it depends on what the goal is.
> For a one-directional replication (multiple nodes replicating to the
> same target) it might be sufficient if the conflict resolution is
> "deterministic" (e.g. not dependent on the order in which the changes
> are applied). I'm not sure, but it's why I asked what's the goal in my
> very first message in this thread.

I'm not completely certain about this.  Even in one directional
replication if multiple nodes are sending data how can we guarantee
determinism in the presence of clock skew if we are not using some
other mechanism like logical counters or something like what clock-SI
is doing?  I don't want to insist on using any specific solution here.
However, I noticed that we haven't addressed how we plan to manage
clock skew, which is my primary concern. I believe that if multiple
nodes are involved and we're receiving data from them with
unsynchronized clocks, ensuring determinism about their order will
require us to take some measures to handle that.

> > We need some sort of logical counter, such as a vector clock, which
> > might be an independent counter on each node but can perfectly track
> > the causal order. For example, if NodeA observes an operation from
> > NodeB with a counter value of X, NodeA will adjust its counter to X+1.
> > This ensures that if NodeA has seen an operation from NodeB, its next
> > operation will appear to have occurred after NodeB's operation.
> >
> > I admit that I haven't fully thought through how we could design such
> > version tracking in our logical replication protocol or how it would
> > fit into our system. However, my point is that we need to consider
> > something beyond commit timestamps to achieve reliable ordering.
> >
>
> I can't really respond to this as there's no suggestion how it would be
> implemented in the patch discussed in this thread.
>
No worries, I'll consider whether finding such a solution is feasible
for our situation. Thank you!

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: Conflict Detection and Resolution

From
Amit Kapila
Date:
On Wed, Jun 12, 2024 at 10:03 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Tue, Jun 11, 2024 at 7:44 PM Tomas Vondra
> <tomas.vondra@enterprisedb.com> wrote:
>
> > > Yes, that's correct. However, many cases could benefit from the
> > > update_deleted conflict type if it can be implemented reliably. That's
> > > why we wanted to give it a try. But if we can't achieve predictable
> > > results with it, I'm fine to drop this approach and conflict_type. We
> > > can consider a better design in the future that doesn't depend on
> > > non-vacuumed entries and provides a more robust method for identifying
> > > deleted rows.
> > >
> >
> > I agree having a separate update_deleted conflict would be beneficial,
> > I'm not arguing against that - my point is actually that I think this
> > conflict type is required, and that it needs to be detected reliably.
> >
>
> When working with a distributed system, we must accept some form of
> eventual consistency model. However, it's essential to design a
> predictable and acceptable behavior. For example, if a change is a
> result of a previous operation (such as an update on node B triggered
> after observing an operation on node A), we can say that the operation
> on node A happened before the operation on node B. Conversely, if
> operations on nodes A and B are independent, we consider them
> concurrent.
>
> In distributed systems, clock skew is a known issue. To establish a
> consistency model, we need to ensure it guarantees the
> "happens-before" relationship. Consider a scenario with three nodes:
> NodeA, NodeB, and NodeC. If NodeA sends changes to NodeB, and
> subsequently NodeB makes changes, and then both NodeA's and NodeB's
> changes are sent to NodeC, the clock skew might make NodeB's changes
> appear to have occurred before NodeA's changes. However, we should
> maintain data that indicates NodeB's changes were triggered after
> NodeA's changes arrived at NodeB. This implies that logically, NodeB's
> changes happened after NodeA's changes, despite what the timestamps
> suggest.
>
> A common method to handle such cases is using vector clocks for
> conflict resolution.
>

I think the unbounded size of the vector could be a problem to store
for each event. However, while researching previous discussions, it
came to our notice that we have discussed this topic in the past as
well in the context of standbys. For recovery_min_apply_delay, we
decided the clock skew is not a problem as the settings of this
parameter are much larger than typical time deviations between servers
as mentioned in docs. Similarly for casual reads [1], there was a
proposal to introduce max_clock_skew parameter and suggesting the user
to make sure to have NTP set up correctly. We have tried to check
other databases (like Ora and BDR) where CDR is implemented but didn't
find anything specific to clock skew. So, I propose to go with a GUC
like max_clock_skew such that if the difference of time between the
incoming transaction's commit time and the local time is more than
max_clock_skew then we raise an ERROR. It is not clear to me that
putting bigger effort into clock skew is worth especially when other
systems providing CDR feature (like Ora or BDR) for decades have not
done anything like vector clocks. It is possible that this is less of
a problem w.r.t CDR and just detecting the anomaly in clock skew is
good enough.

[1] -
https://www.postgresql.org/message-id/flat/CAEepm%3D1iiEzCVLD%3DRoBgtZSyEY1CR-Et7fRc9prCZ9MuTz3pWg%40mail.gmail.com

--
With Regards,
Amit Kapila.



Re: Conflict Detection and Resolution

From
Robert Haas
Date:
On Mon, Jun 17, 2024 at 1:42 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> The difference w.r.t the existing mechanisms for holding deleted data
> is that we don't know whether we need to hold off the vacuum from
> cleaning up the rows because we can't say with any certainty whether
> other nodes will perform any conflicting operations in the future.
> Using the example we discussed,
> Node A:
>   T1: INSERT INTO t (id, value) VALUES (1,1);
>   T2: DELETE FROM t WHERE id = 1;
>
> Node B:
>   T3: UPDATE t SET value = 2 WHERE id = 1;
>
> Say the order of receiving the commands is T1-T2-T3. We can't predict
> whether we will ever get T-3, so on what basis shall we try to prevent
> vacuum from removing the deleted row?

The problem arises because T2 and T3 might be applied out of order on
some nodes. Once either one of them has been applied on every node, no
further conflicts are possible.

--
Robert Haas
EDB: http://www.enterprisedb.com



RE: Conflict Detection and Resolution

From
"Zhijie Hou (Fujitsu)"
Date:
On Thursday, June 13, 2024 2:11 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Hi,

> On Wed, Jun 5, 2024 at 3:32 PM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com>
> wrote:
> >
> > This time at PGconf.dev[1], we had some discussions regarding this
> > project. The proposed approach is to split the work into two main
> > components. The first part focuses on conflict detection, which aims
> > to identify and report conflicts in logical replication. This feature
> > will enable users to monitor the unexpected conflicts that may occur.
> > The second part involves the actual conflict resolution. Here, we will
> > provide built-in resolutions for each conflict and allow user to
> > choose which resolution will be used for which conflict(as described
> > in the initial email of this thread).
> 
> I agree with this direction that we focus on conflict detection (and
> logging) first and then develop conflict resolution on top of that.

Thanks for your reply !

> 
> >
> > Of course, we are open to alternative ideas and suggestions, and the
> > strategy above can be changed based on ongoing discussions and
> > feedback received.
> >
> > Here is the patch of the first part work, which adds a new parameter
> > detect_conflict for CREATE and ALTER subscription commands. This new
> > parameter will decide if subscription will go for conflict detection.
> > By default, conflict detection will be off for a subscription.
> >
> > When conflict detection is enabled, additional logging is triggered in
> > the following conflict scenarios:
> >
> > * updating a row that was previously modified by another origin.
> > * The tuple to be updated is not found.
> > * The tuple to be deleted is not found.
> >
> > While there exist other conflict types in logical replication, such as
> > an incoming insert conflicting with an existing row due to a primary
> > key or unique index, these cases already result in constraint violation errors.
> 
> What does detect_conflict being true actually mean to users? I understand that
> detect_conflict being true could introduce some overhead to detect conflicts.
> But in terms of conflict detection, even if detect_confict is false, we detect
> some conflicts such as concurrent inserts with the same key. Once we
> introduce the complete conflict detection feature, I'm not sure there is a case
> where a user wants to detect only some particular types of conflict.
> 
> > Therefore, additional conflict detection for these cases is currently
> > omitted to minimize potential overhead. However, the pre-detection for
> > conflict in these error cases is still essential to support automatic
> > conflict resolution in the future.
> 
> I feel that we should log all types of conflict in an uniform way. For example,
> with detect_conflict being true, the update_differ conflict is reported as
> "conflict %s detected on relation "%s"", whereas concurrent inserts with the
> same key is reported as "duplicate key value violates unique constraint "%s"",
> which could confuse users.

Do you mean it's ok to add a pre-check before applying the INSERT, which will
verify if the remote tuple violates any unique constraints, and if it violates
then we log a conflict message ? I thought about this but was slightly
worried about the extra cost it would bring. OTOH, if we think it's acceptable,
we could do that since the cost is there only when detect_conflict is enabled.

I also thought of logging such a conflict message in pg_catch(), but I think we
lack some necessary info(relation, index name, column name) at the catch block.

Best Regards,
Hou zj




Re: Conflict Detection and Resolution

From
Dilip Kumar
Date:
On Mon, Jun 17, 2024 at 3:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Jun 12, 2024 at 10:03 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Tue, Jun 11, 2024 at 7:44 PM Tomas Vondra
> > <tomas.vondra@enterprisedb.com> wrote:
> >
> > > > Yes, that's correct. However, many cases could benefit from the
> > > > update_deleted conflict type if it can be implemented reliably. That's
> > > > why we wanted to give it a try. But if we can't achieve predictable
> > > > results with it, I'm fine to drop this approach and conflict_type. We
> > > > can consider a better design in the future that doesn't depend on
> > > > non-vacuumed entries and provides a more robust method for identifying
> > > > deleted rows.
> > > >
> > >
> > > I agree having a separate update_deleted conflict would be beneficial,
> > > I'm not arguing against that - my point is actually that I think this
> > > conflict type is required, and that it needs to be detected reliably.
> > >
> >
> > When working with a distributed system, we must accept some form of
> > eventual consistency model. However, it's essential to design a
> > predictable and acceptable behavior. For example, if a change is a
> > result of a previous operation (such as an update on node B triggered
> > after observing an operation on node A), we can say that the operation
> > on node A happened before the operation on node B. Conversely, if
> > operations on nodes A and B are independent, we consider them
> > concurrent.
> >
> > In distributed systems, clock skew is a known issue. To establish a
> > consistency model, we need to ensure it guarantees the
> > "happens-before" relationship. Consider a scenario with three nodes:
> > NodeA, NodeB, and NodeC. If NodeA sends changes to NodeB, and
> > subsequently NodeB makes changes, and then both NodeA's and NodeB's
> > changes are sent to NodeC, the clock skew might make NodeB's changes
> > appear to have occurred before NodeA's changes. However, we should
> > maintain data that indicates NodeB's changes were triggered after
> > NodeA's changes arrived at NodeB. This implies that logically, NodeB's
> > changes happened after NodeA's changes, despite what the timestamps
> > suggest.
> >
> > A common method to handle such cases is using vector clocks for
> > conflict resolution.
> >
>
> I think the unbounded size of the vector could be a problem to store
> for each event. However, while researching previous discussions, it
> came to our notice that we have discussed this topic in the past as
> well in the context of standbys. For recovery_min_apply_delay, we
> decided the clock skew is not a problem as the settings of this
> parameter are much larger than typical time deviations between servers
> as mentioned in docs. Similarly for casual reads [1], there was a
> proposal to introduce max_clock_skew parameter and suggesting the user
> to make sure to have NTP set up correctly. We have tried to check
> other databases (like Ora and BDR) where CDR is implemented but didn't
> find anything specific to clock skew. So, I propose to go with a GUC
> like max_clock_skew such that if the difference of time between the
> incoming transaction's commit time and the local time is more than
> max_clock_skew then we raise an ERROR. It is not clear to me that
> putting bigger effort into clock skew is worth especially when other
> systems providing CDR feature (like Ora or BDR) for decades have not
> done anything like vector clocks. It is possible that this is less of
> a problem w.r.t CDR and just detecting the anomaly in clock skew is
> good enough.

I believe that if we've accepted this solution elsewhere, then we can
also consider the same. Basically, we're allowing the application to
set its tolerance for clock skew. And, if the skew exceeds that
tolerance, it's the application's responsibility to synchronize;
otherwise, an error will occur. This approach seems reasonable.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: Conflict Detection and Resolution

From
Dilip Kumar
Date:
On Tue, Jun 18, 2024 at 10:17 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Mon, Jun 17, 2024 at 3:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Wed, Jun 12, 2024 at 10:03 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > On Tue, Jun 11, 2024 at 7:44 PM Tomas Vondra
> > > <tomas.vondra@enterprisedb.com> wrote:
> > >
> > > > > Yes, that's correct. However, many cases could benefit from the
> > > > > update_deleted conflict type if it can be implemented reliably. That's
> > > > > why we wanted to give it a try. But if we can't achieve predictable
> > > > > results with it, I'm fine to drop this approach and conflict_type. We
> > > > > can consider a better design in the future that doesn't depend on
> > > > > non-vacuumed entries and provides a more robust method for identifying
> > > > > deleted rows.
> > > > >
> > > >
> > > > I agree having a separate update_deleted conflict would be beneficial,
> > > > I'm not arguing against that - my point is actually that I think this
> > > > conflict type is required, and that it needs to be detected reliably.
> > > >
> > >
> > > When working with a distributed system, we must accept some form of
> > > eventual consistency model. However, it's essential to design a
> > > predictable and acceptable behavior. For example, if a change is a
> > > result of a previous operation (such as an update on node B triggered
> > > after observing an operation on node A), we can say that the operation
> > > on node A happened before the operation on node B. Conversely, if
> > > operations on nodes A and B are independent, we consider them
> > > concurrent.
> > >
> > > In distributed systems, clock skew is a known issue. To establish a
> > > consistency model, we need to ensure it guarantees the
> > > "happens-before" relationship. Consider a scenario with three nodes:
> > > NodeA, NodeB, and NodeC. If NodeA sends changes to NodeB, and
> > > subsequently NodeB makes changes, and then both NodeA's and NodeB's
> > > changes are sent to NodeC, the clock skew might make NodeB's changes
> > > appear to have occurred before NodeA's changes. However, we should
> > > maintain data that indicates NodeB's changes were triggered after
> > > NodeA's changes arrived at NodeB. This implies that logically, NodeB's
> > > changes happened after NodeA's changes, despite what the timestamps
> > > suggest.
> > >
> > > A common method to handle such cases is using vector clocks for
> > > conflict resolution.
> > >
> >
> > I think the unbounded size of the vector could be a problem to store
> > for each event. However, while researching previous discussions, it
> > came to our notice that we have discussed this topic in the past as
> > well in the context of standbys. For recovery_min_apply_delay, we
> > decided the clock skew is not a problem as the settings of this
> > parameter are much larger than typical time deviations between servers
> > as mentioned in docs. Similarly for casual reads [1], there was a
> > proposal to introduce max_clock_skew parameter and suggesting the user
> > to make sure to have NTP set up correctly. We have tried to check
> > other databases (like Ora and BDR) where CDR is implemented but didn't
> > find anything specific to clock skew. So, I propose to go with a GUC
> > like max_clock_skew such that if the difference of time between the
> > incoming transaction's commit time and the local time is more than
> > max_clock_skew then we raise an ERROR. It is not clear to me that
> > putting bigger effort into clock skew is worth especially when other
> > systems providing CDR feature (like Ora or BDR) for decades have not
> > done anything like vector clocks. It is possible that this is less of
> > a problem w.r.t CDR and just detecting the anomaly in clock skew is
> > good enough.
>
> I believe that if we've accepted this solution elsewhere, then we can
> also consider the same. Basically, we're allowing the application to
> set its tolerance for clock skew. And, if the skew exceeds that
> tolerance, it's the application's responsibility to synchronize;
> otherwise, an error will occur. This approach seems reasonable.

This model can be further extended by making the apply worker wait if
the remote transaction's commit_ts is greater than the local
timestamp. This ensures that no local transactions occurring after the
remote transaction appear to have happened earlier due to clock skew
instead we make them happen before the remote transaction by delaying
the remote transaction apply.  Essentially, by having the remote
application wait until the local timestamp matches the remote
transaction's timestamp, we ensure that the remote transaction, which
seems to occur after concurrent local transactions due to clock skew,
is actually applied after those transactions.

With this model, there should be no ordering errors from the
application's perspective as well if synchronous commit is enabled.
The transaction initiated by the publisher cannot be completed until
it is applied to the synchronous subscriber. This ensures that if the
subscriber's clock is lagging behind the publisher's clock, the
transaction will not be applied until the subscriber's local clock is
in sync, preventing the transaction from being completed out of order.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: Conflict Detection and Resolution

From
Dilip Kumar
Date:
On Mon, Jun 17, 2024 at 8:51 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Mon, Jun 17, 2024 at 1:42 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > The difference w.r.t the existing mechanisms for holding deleted data
> > is that we don't know whether we need to hold off the vacuum from
> > cleaning up the rows because we can't say with any certainty whether
> > other nodes will perform any conflicting operations in the future.
> > Using the example we discussed,
> > Node A:
> >   T1: INSERT INTO t (id, value) VALUES (1,1);
> >   T2: DELETE FROM t WHERE id = 1;
> >
> > Node B:
> >   T3: UPDATE t SET value = 2 WHERE id = 1;
> >
> > Say the order of receiving the commands is T1-T2-T3. We can't predict
> > whether we will ever get T-3, so on what basis shall we try to prevent
> > vacuum from removing the deleted row?
>
> The problem arises because T2 and T3 might be applied out of order on
> some nodes. Once either one of them has been applied on every node, no
> further conflicts are possible.

If we decide to skip the update whether the row is missing or deleted,
we indeed reach the same end result regardless of the order of T2, T3,
and Vacuum. Here's how it looks in each case:

Case 1: T1, T2, Vacuum, T3 -> Skip the update for a non-existing row
-> end result we do not have a row.
Case 2: T1, T2, T3 -> Skip the update for a deleted row -> end result
we do not have a row.
Case 3: T1, T3, T2 -> deleted the row -> end result we do not have a row.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: Conflict Detection and Resolution

From
Amit Kapila
Date:
On Tue, Jun 18, 2024 at 11:54 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Mon, Jun 17, 2024 at 8:51 PM Robert Haas <robertmhaas@gmail.com> wrote:
> >
> > On Mon, Jun 17, 2024 at 1:42 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > The difference w.r.t the existing mechanisms for holding deleted data
> > > is that we don't know whether we need to hold off the vacuum from
> > > cleaning up the rows because we can't say with any certainty whether
> > > other nodes will perform any conflicting operations in the future.
> > > Using the example we discussed,
> > > Node A:
> > >   T1: INSERT INTO t (id, value) VALUES (1,1);
> > >   T2: DELETE FROM t WHERE id = 1;
> > >
> > > Node B:
> > >   T3: UPDATE t SET value = 2 WHERE id = 1;
> > >
> > > Say the order of receiving the commands is T1-T2-T3. We can't predict
> > > whether we will ever get T-3, so on what basis shall we try to prevent
> > > vacuum from removing the deleted row?
> >
> > The problem arises because T2 and T3 might be applied out of order on
> > some nodes. Once either one of them has been applied on every node, no
> > further conflicts are possible.
>
> If we decide to skip the update whether the row is missing or deleted,
> we indeed reach the same end result regardless of the order of T2, T3,
> and Vacuum. Here's how it looks in each case:
>
> Case 1: T1, T2, Vacuum, T3 -> Skip the update for a non-existing row
> -> end result we do not have a row.
> Case 2: T1, T2, T3 -> Skip the update for a deleted row -> end result
> we do not have a row.
> Case 3: T1, T3, T2 -> deleted the row -> end result we do not have a row.
>

In case 3, how can deletion be successful? The row required to be
deleted has already been updated.

--
With Regards,
Amit Kapila.



Re: Conflict Detection and Resolution

From
Dilip Kumar
Date:
On Tue, Jun 18, 2024 at 12:11 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Jun 18, 2024 at 11:54 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Mon, Jun 17, 2024 at 8:51 PM Robert Haas <robertmhaas@gmail.com> wrote:
> > >
> > > On Mon, Jun 17, 2024 at 1:42 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > The difference w.r.t the existing mechanisms for holding deleted data
> > > > is that we don't know whether we need to hold off the vacuum from
> > > > cleaning up the rows because we can't say with any certainty whether
> > > > other nodes will perform any conflicting operations in the future.
> > > > Using the example we discussed,
> > > > Node A:
> > > >   T1: INSERT INTO t (id, value) VALUES (1,1);
> > > >   T2: DELETE FROM t WHERE id = 1;
> > > >
> > > > Node B:
> > > >   T3: UPDATE t SET value = 2 WHERE id = 1;
> > > >
> > > > Say the order of receiving the commands is T1-T2-T3. We can't predict
> > > > whether we will ever get T-3, so on what basis shall we try to prevent
> > > > vacuum from removing the deleted row?
> > >
> > > The problem arises because T2 and T3 might be applied out of order on
> > > some nodes. Once either one of them has been applied on every node, no
> > > further conflicts are possible.
> >
> > If we decide to skip the update whether the row is missing or deleted,
> > we indeed reach the same end result regardless of the order of T2, T3,
> > and Vacuum. Here's how it looks in each case:
> >
> > Case 1: T1, T2, Vacuum, T3 -> Skip the update for a non-existing row
> > -> end result we do not have a row.
> > Case 2: T1, T2, T3 -> Skip the update for a deleted row -> end result
> > we do not have a row.
> > Case 3: T1, T3, T2 -> deleted the row -> end result we do not have a row.
> >
>
> In case 3, how can deletion be successful? The row required to be
> deleted has already been updated.

Hmm, I was considering this case in the example given by you above[1],
so we have updated some fields of the row with id=1, isn't this row
still detectable by the delete because delete will find this by id=1
as we haven't updated the id?  I was making the point w.r.t. the
example used above.

[1]
> > > > Node A:
> > > >   T1: INSERT INTO t (id, value) VALUES (1,1);
> > > >   T2: DELETE FROM t WHERE id = 1;
> > > >
> > > > Node B:
> > > >   T3: UPDATE t SET value = 2 WHERE id = 1;




--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: Conflict Detection and Resolution

From
Amit Kapila
Date:
On Tue, Jun 18, 2024 at 1:18 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Tue, Jun 18, 2024 at 12:11 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Tue, Jun 18, 2024 at 11:54 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > On Mon, Jun 17, 2024 at 8:51 PM Robert Haas <robertmhaas@gmail.com> wrote:
> > > >
> > > > On Mon, Jun 17, 2024 at 1:42 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > > The difference w.r.t the existing mechanisms for holding deleted data
> > > > > is that we don't know whether we need to hold off the vacuum from
> > > > > cleaning up the rows because we can't say with any certainty whether
> > > > > other nodes will perform any conflicting operations in the future.
> > > > > Using the example we discussed,
> > > > > Node A:
> > > > >   T1: INSERT INTO t (id, value) VALUES (1,1);
> > > > >   T2: DELETE FROM t WHERE id = 1;
> > > > >
> > > > > Node B:
> > > > >   T3: UPDATE t SET value = 2 WHERE id = 1;
> > > > >
> > > > > Say the order of receiving the commands is T1-T2-T3. We can't predict
> > > > > whether we will ever get T-3, so on what basis shall we try to prevent
> > > > > vacuum from removing the deleted row?
> > > >
> > > > The problem arises because T2 and T3 might be applied out of order on
> > > > some nodes. Once either one of them has been applied on every node, no
> > > > further conflicts are possible.
> > >
> > > If we decide to skip the update whether the row is missing or deleted,
> > > we indeed reach the same end result regardless of the order of T2, T3,
> > > and Vacuum. Here's how it looks in each case:
> > >
> > > Case 1: T1, T2, Vacuum, T3 -> Skip the update for a non-existing row
> > > -> end result we do not have a row.
> > > Case 2: T1, T2, T3 -> Skip the update for a deleted row -> end result
> > > we do not have a row.
> > > Case 3: T1, T3, T2 -> deleted the row -> end result we do not have a row.
> > >
> >
> > In case 3, how can deletion be successful? The row required to be
> > deleted has already been updated.
>
> Hmm, I was considering this case in the example given by you above[1],
> so we have updated some fields of the row with id=1, isn't this row
> still detectable by the delete because delete will find this by id=1
> as we haven't updated the id?  I was making the point w.r.t. the
> example used above.
>

Your point is correct w.r.t the example but I responded considering a
general update-delete ordering. BTW, it is not clear to me how
update_delete conflict will be handled with what Robert and you are
saying. I'll try to say what I understood. If we assume that there are
two nodes A & B as mentioned in the above example and DELETE has
applied on both nodes, now say UPDATE has been performed on node B
then irrespective of whether we consider the conflict as update_delete
or update_missing, the data will remain same on both nodes. So, in
such a case, we don't need to bother differentiating between those two
types of conflicts. Is that what we can interpret from above?

--
With Regards,
Amit Kapila.



Re: Conflict Detection and Resolution

From
shveta malik
Date:
On Tue, Jun 18, 2024 at 11:34 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Tue, Jun 18, 2024 at 10:17 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Mon, Jun 17, 2024 at 3:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Wed, Jun 12, 2024 at 10:03 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > >
> > > > On Tue, Jun 11, 2024 at 7:44 PM Tomas Vondra
> > > > <tomas.vondra@enterprisedb.com> wrote:
> > > >
> > > > > > Yes, that's correct. However, many cases could benefit from the
> > > > > > update_deleted conflict type if it can be implemented reliably. That's
> > > > > > why we wanted to give it a try. But if we can't achieve predictable
> > > > > > results with it, I'm fine to drop this approach and conflict_type. We
> > > > > > can consider a better design in the future that doesn't depend on
> > > > > > non-vacuumed entries and provides a more robust method for identifying
> > > > > > deleted rows.
> > > > > >
> > > > >
> > > > > I agree having a separate update_deleted conflict would be beneficial,
> > > > > I'm not arguing against that - my point is actually that I think this
> > > > > conflict type is required, and that it needs to be detected reliably.
> > > > >
> > > >
> > > > When working with a distributed system, we must accept some form of
> > > > eventual consistency model. However, it's essential to design a
> > > > predictable and acceptable behavior. For example, if a change is a
> > > > result of a previous operation (such as an update on node B triggered
> > > > after observing an operation on node A), we can say that the operation
> > > > on node A happened before the operation on node B. Conversely, if
> > > > operations on nodes A and B are independent, we consider them
> > > > concurrent.
> > > >
> > > > In distributed systems, clock skew is a known issue. To establish a
> > > > consistency model, we need to ensure it guarantees the
> > > > "happens-before" relationship. Consider a scenario with three nodes:
> > > > NodeA, NodeB, and NodeC. If NodeA sends changes to NodeB, and
> > > > subsequently NodeB makes changes, and then both NodeA's and NodeB's
> > > > changes are sent to NodeC, the clock skew might make NodeB's changes
> > > > appear to have occurred before NodeA's changes. However, we should
> > > > maintain data that indicates NodeB's changes were triggered after
> > > > NodeA's changes arrived at NodeB. This implies that logically, NodeB's
> > > > changes happened after NodeA's changes, despite what the timestamps
> > > > suggest.
> > > >
> > > > A common method to handle such cases is using vector clocks for
> > > > conflict resolution.
> > > >
> > >
> > > I think the unbounded size of the vector could be a problem to store
> > > for each event. However, while researching previous discussions, it
> > > came to our notice that we have discussed this topic in the past as
> > > well in the context of standbys. For recovery_min_apply_delay, we
> > > decided the clock skew is not a problem as the settings of this
> > > parameter are much larger than typical time deviations between servers
> > > as mentioned in docs. Similarly for casual reads [1], there was a
> > > proposal to introduce max_clock_skew parameter and suggesting the user
> > > to make sure to have NTP set up correctly. We have tried to check
> > > other databases (like Ora and BDR) where CDR is implemented but didn't
> > > find anything specific to clock skew. So, I propose to go with a GUC
> > > like max_clock_skew such that if the difference of time between the
> > > incoming transaction's commit time and the local time is more than
> > > max_clock_skew then we raise an ERROR. It is not clear to me that
> > > putting bigger effort into clock skew is worth especially when other
> > > systems providing CDR feature (like Ora or BDR) for decades have not
> > > done anything like vector clocks. It is possible that this is less of
> > > a problem w.r.t CDR and just detecting the anomaly in clock skew is
> > > good enough.
> >
> > I believe that if we've accepted this solution elsewhere, then we can
> > also consider the same. Basically, we're allowing the application to
> > set its tolerance for clock skew. And, if the skew exceeds that
> > tolerance, it's the application's responsibility to synchronize;
> > otherwise, an error will occur. This approach seems reasonable.
>
> This model can be further extended by making the apply worker wait if
> the remote transaction's commit_ts is greater than the local
> timestamp. This ensures that no local transactions occurring after the
> remote transaction appear to have happened earlier due to clock skew
> instead we make them happen before the remote transaction by delaying
> the remote transaction apply.  Essentially, by having the remote
> application wait until the local timestamp matches the remote
> transaction's timestamp, we ensure that the remote transaction, which
> seems to occur after concurrent local transactions due to clock skew,
> is actually applied after those transactions.
>
> With this model, there should be no ordering errors from the
> application's perspective as well if synchronous commit is enabled.
> The transaction initiated by the publisher cannot be completed until
> it is applied to the synchronous subscriber. This ensures that if the
> subscriber's clock is lagging behind the publisher's clock, the
> transaction will not be applied until the subscriber's local clock is
> in sync, preventing the transaction from being completed out of order.

I tried to work out a few scenarios with this, where the apply worker
will wait until its local clock hits 'remote_commit_tts - max_skew
permitted'. Please have a look.

Let's say, we have a GUC to configure max_clock_skew permitted.
Resolver is last_update_wins in both cases.

----------------
1) Case 1: max_clock_skew set to 0 i.e. no tolerance for clock skew.

Remote Update with commit_timestamp = 10.20AM.
Local clock (which is say 5 min behind) shows = 10.15AM.

When remote update arrives at local node, we see that skew is greater
than max_clock_skew and thus apply worker waits till local clock hits
'remote's commit_tts - max_clock_skew' i.e. till 10.20 AM. Once the
local clock hits 10.20 AM, the worker applies the remote change with
commit_tts of 10.20AM. In the meantime (during wait period of apply
worker)) if some local update on same row has happened at say 10.18am,
that will applied first, which will be later overwritten by above
remote change of 10.20AM as remote-change's timestamp appear more
latest, even though it has happened earlier than local change.

2)  Case 2: max_clock_skew is set to 2min.

Remote Update with commit_timestamp=10.20AM
Local clock (which is say 5 min behind) = 10.15AM.

Now apply worker will notice skew greater than 2min and thus will wait
till local clock hits 'remote's commit_tts - max_clock_skew' i.e.
10.18 and will apply the change with commit_tts of 10.20 ( as we
always save the origin's commit timestamp into local commit_tts, see
RecordTransactionCommit->TransactionTreeSetCommitTsData). Now lets say
another local update is triggered at 10.19am, it will be applied
locally but it will be ignored on remote node. On the remote node ,
the existing change with a timestamp of 10.20 am will win resulting in
data divergence.
----------

In case 1, the local change which was otherwise triggered later than
the remote change is overwritten by remote change. And in Case2, it
results in data divergence. Is this behaviour in both cases expected?
Or am I getting the wait logic wrong? Thoughts?

thanks
Shveta



Re: Conflict Detection and Resolution

From
Amit Kapila
Date:
On Tue, Jun 18, 2024 at 7:44 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
>
> On Thursday, June 13, 2024 2:11 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> Hi,
>
> > On Wed, Jun 5, 2024 at 3:32 PM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com>
> > wrote:
> > >
> > > This time at PGconf.dev[1], we had some discussions regarding this
> > > project. The proposed approach is to split the work into two main
> > > components. The first part focuses on conflict detection, which aims
> > > to identify and report conflicts in logical replication. This feature
> > > will enable users to monitor the unexpected conflicts that may occur.
> > > The second part involves the actual conflict resolution. Here, we will
> > > provide built-in resolutions for each conflict and allow user to
> > > choose which resolution will be used for which conflict(as described
> > > in the initial email of this thread).
> >
> > I agree with this direction that we focus on conflict detection (and
> > logging) first and then develop conflict resolution on top of that.
>
> Thanks for your reply !
>
> >
> > >
> > > Of course, we are open to alternative ideas and suggestions, and the
> > > strategy above can be changed based on ongoing discussions and
> > > feedback received.
> > >
> > > Here is the patch of the first part work, which adds a new parameter
> > > detect_conflict for CREATE and ALTER subscription commands. This new
> > > parameter will decide if subscription will go for conflict detection.
> > > By default, conflict detection will be off for a subscription.
> > >
> > > When conflict detection is enabled, additional logging is triggered in
> > > the following conflict scenarios:
> > >
> > > * updating a row that was previously modified by another origin.
> > > * The tuple to be updated is not found.
> > > * The tuple to be deleted is not found.
> > >
> > > While there exist other conflict types in logical replication, such as
> > > an incoming insert conflicting with an existing row due to a primary
> > > key or unique index, these cases already result in constraint violation errors.
> >
> > What does detect_conflict being true actually mean to users? I understand that
> > detect_conflict being true could introduce some overhead to detect conflicts.
> > But in terms of conflict detection, even if detect_confict is false, we detect
> > some conflicts such as concurrent inserts with the same key. Once we
> > introduce the complete conflict detection feature, I'm not sure there is a case
> > where a user wants to detect only some particular types of conflict.
> >
> > > Therefore, additional conflict detection for these cases is currently
> > > omitted to minimize potential overhead. However, the pre-detection for
> > > conflict in these error cases is still essential to support automatic
> > > conflict resolution in the future.
> >
> > I feel that we should log all types of conflict in an uniform way. For example,
> > with detect_conflict being true, the update_differ conflict is reported as
> > "conflict %s detected on relation "%s"", whereas concurrent inserts with the
> > same key is reported as "duplicate key value violates unique constraint "%s"",
> > which could confuse users.
>
> Do you mean it's ok to add a pre-check before applying the INSERT, which will
> verify if the remote tuple violates any unique constraints, and if it violates
> then we log a conflict message ? I thought about this but was slightly
> worried about the extra cost it would bring. OTOH, if we think it's acceptable,
> we could do that since the cost is there only when detect_conflict is enabled.
>
> I also thought of logging such a conflict message in pg_catch(), but I think we
> lack some necessary info(relation, index name, column name) at the catch block.
>

Can't we use/extend existing 'apply_error_callback_arg' for this purpose?

--
With Regards,
Amit Kapila.



Re: Conflict Detection and Resolution

From
Amit Kapila
Date:
On Tue, Jun 11, 2024 at 3:12 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> On Sat, Jun 8, 2024 at 3:52 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>>
>> On Fri, Jun 7, 2024 at 5:39 PM Ashutosh Bapat
>> <ashutosh.bapat.oss@gmail.com> wrote:
>> >
>> > On Thu, Jun 6, 2024 at 5:16 PM Nisha Moond <nisha.moond412@gmail.com> wrote:
>> >>
>> >> >
>> >>
>> >> Here are more use cases of the "earliest_timestamp_wins" resolution method:
>> >> 1) Applications where the record of first occurrence of an event is
>> >> important. For example, sensor based applications like earthquake
>> >> detection systems, capturing the first seismic wave's time is crucial.
>> >> 2) Scheduling systems, like appointment booking, prioritize the
>> >> earliest request when handling concurrent ones.
>> >> 3) In contexts where maintaining chronological order is important -
>> >>   a) Social media platforms display comments ensuring that the
>> >> earliest ones are visible first.
>> >>   b) Finance transaction processing systems rely on timestamps to
>> >> prioritize the processing of transactions, ensuring that the earliest
>> >> transaction is handled first
>> >
>> >
>> > Thanks for sharing examples. However, these scenarios would be handled by the application and not during
replication.What we are discussing here is the timestamp when a row was updated/inserted/deleted (or rather when the
transactionthat updated row committed/became visible) and not a DML on column which is of type timestamp. Some
implementationsuse a hidden timestamp column but that's different from a user column which captures timestamp of (say)
anevent. The conflict resolution will be based on the timestamp when that column's value was recorded in the database
whichmay be different from the value of the column itself. 
>> >
>>
>> It depends on how these operations are performed. For example, the
>> appointment booking system could be prioritized via a transaction
>> updating a row with columns emp_name, emp_id, reserved, time_slot.
>> Now, if two employees at different geographical locations try to book
>> the calendar, the earlier transaction will win.
>
>
> I doubt that it would be that simple. The application will have to intervene and tell one of the employees that their
reservationhas failed. It looks natural that the first one to reserve the room should get the reservation, but
implementingthat is more complex than resolving a conflict in the database. In fact, mostly it will be handled outside
database.
>

Sure, the application needs some handling but I have tried to explain
with a simple way that comes to my mind and how it can be realized
with db involved. This is a known conflict detection method but note
that I am not insisting to have "earliest_timestamp_wins". Even, if we
want this we can have a separate discussion on this and add it later.

>>
>>
>> > If we use the transaction commit timestamp as basis for resolution, a transaction where multiple rows conflict may
endup with different rows affected by that transaction being resolved differently. Say three transactions T1, T2 and T3
onseparate origins with timestamps t1, t2, and t3 respectively changed rows r1, r2 and r2, r3 and r1, r4 respectively.
Changesto r1 and r2 will conflict. Let's say T2 and T3 are applied first and then T1 is applied. If t2 < t1 < t3, r1
willend up with version of T3 and r2 will end up with version of T1 after applying all the three transactions. 
>> >
>>
>> Are you telling the results based on latest_timestamp_wins? If so,
>> then it is correct. OTOH, if the user has configured
>> "earliest_timestamp_wins" resolution method, then we should end up
>> with a version of r1 from T1 because t1 < t3. Also, due to the same
>> reason, we should have version r2 from T2.
>>
>> >
>>  Would that introduce an inconsistency between r1 and r2?
>> >
>>
>> As per my understanding, this shouldn't be an inconsistency. Won't it
>> be true even when the transactions are performed on a single node with
>> the same timing?
>>
>
> The inconsistency will arise irrespective of conflict resolution method. On a single system effects of whichever
transactionruns last will be visible entirely. But in the example above the node where T1, T2, and T3 (from
*different*)origins) are applied, we might end up with a situation where some changes from T1 are applied whereas some
changesfrom T3 are applied. 
>

I still think it will lead to the same result if all three T1, T2, T3
happen on the same node in the same order as you mentioned. Say, we
have a pre-existing table with rows r1, r2, r3, r4. Now, if we use the
order of transactions to be applied on the same node based on t2 < t1
< t3. First T2 will be applied, so for now, r1 is a pre-existing
version and r2 is from T2. Next, when T1 is performed, both r1 and r2
are from T1. Lastly, when T3 is applied, r1 will be from T3 and r2
will be from T1. This is what you mentioned will happen after conflict
resolution in the above example.

--
With Regards,
Amit Kapila.



Re: Conflict Detection and Resolution

From
Dilip Kumar
Date:
On Tue, Jun 18, 2024 at 3:29 PM shveta malik <shveta.malik@gmail.com> wrote:
> On Tue, Jun 18, 2024 at 11:34 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> I tried to work out a few scenarios with this, where the apply worker
> will wait until its local clock hits 'remote_commit_tts - max_skew
> permitted'. Please have a look.
>
> Let's say, we have a GUC to configure max_clock_skew permitted.
> Resolver is last_update_wins in both cases.
> ----------------
> 1) Case 1: max_clock_skew set to 0 i.e. no tolerance for clock skew.
>
> Remote Update with commit_timestamp = 10.20AM.
> Local clock (which is say 5 min behind) shows = 10.15AM.
>
> When remote update arrives at local node, we see that skew is greater
> than max_clock_skew and thus apply worker waits till local clock hits
> 'remote's commit_tts - max_clock_skew' i.e. till 10.20 AM. Once the
> local clock hits 10.20 AM, the worker applies the remote change with
> commit_tts of 10.20AM. In the meantime (during wait period of apply
> worker)) if some local update on same row has happened at say 10.18am,
> that will applied first, which will be later overwritten by above
> remote change of 10.20AM as remote-change's timestamp appear more
> latest, even though it has happened earlier than local change.

For the sake of simplicity let's call the change that happened at
10:20 AM change-1 and the change that happened at 10:15 as change-2
and assume we are talking about the synchronous commit only.

I think now from an application perspective the change-1 wouldn't have
caused the change-2 because we delayed applying change-2 on the local
node which would have delayed the confirmation of the change-1 to the
application that means we have got the change-2 on the local node
without the confirmation of change-1 hence change-2 has no causal
dependency on the change-1.  So it's fine that we perform change-1
before change-2 and the timestamp will also show the same at any other
node if they receive these 2 changes.

The goal is to ensure that if we define the order where change-2
happens before change-1, this same order should be visible on all
other nodes. This will hold true because the commit timestamp of
change-2 is earlier than that of change-1.

> 2)  Case 2: max_clock_skew is set to 2min.
>
> Remote Update with commit_timestamp=10.20AM
> Local clock (which is say 5 min behind) = 10.15AM.
>
> Now apply worker will notice skew greater than 2min and thus will wait
> till local clock hits 'remote's commit_tts - max_clock_skew' i.e.
> 10.18 and will apply the change with commit_tts of 10.20 ( as we
> always save the origin's commit timestamp into local commit_tts, see
> RecordTransactionCommit->TransactionTreeSetCommitTsData). Now lets say
> another local update is triggered at 10.19am, it will be applied
> locally but it will be ignored on remote node. On the remote node ,
> the existing change with a timestamp of 10.20 am will win resulting in
> data divergence.

Let's call the 10:20 AM change as a change-1 and the change that
happened at 10:19 as change-2

IIUC, although we apply the change-1 at 10:18 AM the commit_ts of that
commit_ts of that change is 10:20, and the same will be visible to all
other nodes.  So in conflict resolution still the change-1 happened
after the change-2 because change-2's commit_ts is 10:19 AM.   Now
there could be a problem with the causal order because we applied the
change-1 at 10:18 AM so the application might have gotten confirmation
at 10:18 AM and the change-2 of the local node may be triggered as a
result of confirmation of the change-1 that means now change-2 has a
causal dependency on the change-1 but commit_ts shows change-2
happened before the change-1 on all the nodes.

So, is this acceptable? I think yes because the user has configured a
maximum clock skew of 2 minutes, which means the detected order might
not always align with the causal order for transactions occurring
within that time frame. Generally, the ideal configuration for
max_clock_skew should be in multiple of the network round trip time.
Assuming this configuration, we wouldn’t encounter this problem
because for change-2 to be caused by change-1, the client would need
to get confirmation of change-1 and then trigger change-2, which would
take at least 2-3 network round trips.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: Conflict Detection and Resolution

From
shveta malik
Date:
On Wed, Jun 19, 2024 at 1:52 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Tue, Jun 18, 2024 at 3:29 PM shveta malik <shveta.malik@gmail.com> wrote:
> > On Tue, Jun 18, 2024 at 11:34 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > I tried to work out a few scenarios with this, where the apply worker
> > will wait until its local clock hits 'remote_commit_tts - max_skew
> > permitted'. Please have a look.
> >
> > Let's say, we have a GUC to configure max_clock_skew permitted.
> > Resolver is last_update_wins in both cases.
> > ----------------
> > 1) Case 1: max_clock_skew set to 0 i.e. no tolerance for clock skew.
> >
> > Remote Update with commit_timestamp = 10.20AM.
> > Local clock (which is say 5 min behind) shows = 10.15AM.
> >
> > When remote update arrives at local node, we see that skew is greater
> > than max_clock_skew and thus apply worker waits till local clock hits
> > 'remote's commit_tts - max_clock_skew' i.e. till 10.20 AM. Once the
> > local clock hits 10.20 AM, the worker applies the remote change with
> > commit_tts of 10.20AM. In the meantime (during wait period of apply
> > worker)) if some local update on same row has happened at say 10.18am,
> > that will applied first, which will be later overwritten by above
> > remote change of 10.20AM as remote-change's timestamp appear more
> > latest, even though it has happened earlier than local change.
>
> For the sake of simplicity let's call the change that happened at
> 10:20 AM change-1 and the change that happened at 10:15 as change-2
> and assume we are talking about the synchronous commit only.

Do you mean "the change that happened at 10:18 as change-2"

>
> I think now from an application perspective the change-1 wouldn't have
> caused the change-2 because we delayed applying change-2 on the local
> node

Do you mean "we delayed applying change-1 on the local node."

>which would have delayed the confirmation of the change-1 to the
> application that means we have got the change-2 on the local node
> without the confirmation of change-1 hence change-2 has no causal
> dependency on the change-1.  So it's fine that we perform change-1
> before change-2

Do you mean "So it's fine that we perform change-2 before change-1"

>and the timestamp will also show the same at any other
> node if they receive these 2 changes.
>
> The goal is to ensure that if we define the order where change-2
> happens before change-1, this same order should be visible on all
> other nodes. This will hold true because the commit timestamp of
> change-2 is earlier than that of change-1.

Considering the above corrections as base, I agree with this.

> > 2)  Case 2: max_clock_skew is set to 2min.
> >
> > Remote Update with commit_timestamp=10.20AM
> > Local clock (which is say 5 min behind) = 10.15AM.
> >
> > Now apply worker will notice skew greater than 2min and thus will wait
> > till local clock hits 'remote's commit_tts - max_clock_skew' i.e.
> > 10.18 and will apply the change with commit_tts of 10.20 ( as we
> > always save the origin's commit timestamp into local commit_tts, see
> > RecordTransactionCommit->TransactionTreeSetCommitTsData). Now lets say
> > another local update is triggered at 10.19am, it will be applied
> > locally but it will be ignored on remote node. On the remote node ,
> > the existing change with a timestamp of 10.20 am will win resulting in
> > data divergence.
>
> Let's call the 10:20 AM change as a change-1 and the change that
> happened at 10:19 as change-2
>
> IIUC, although we apply the change-1 at 10:18 AM the commit_ts of that
> commit_ts of that change is 10:20, and the same will be visible to all
> other nodes.  So in conflict resolution still the change-1 happened
> after the change-2 because change-2's commit_ts is 10:19 AM.   Now
> there could be a problem with the causal order because we applied the
> change-1 at 10:18 AM so the application might have gotten confirmation
> at 10:18 AM and the change-2 of the local node may be triggered as a
> result of confirmation of the change-1 that means now change-2 has a
> causal dependency on the change-1 but commit_ts shows change-2
> happened before the change-1 on all the nodes.
>
> So, is this acceptable? I think yes because the user has configured a
> maximum clock skew of 2 minutes, which means the detected order might
> not always align with the causal order for transactions occurring
> within that time frame.

Agree. I had the same thoughts, and wanted to confirm my understanding.

>Generally, the ideal configuration for
> max_clock_skew should be in multiple of the network round trip time.
> Assuming this configuration, we wouldn’t encounter this problem
> because for change-2 to be caused by change-1, the client would need
> to get confirmation of change-1 and then trigger change-2, which would
> take at least 2-3 network round trips.


thanks
Shveta



Re: Conflict Detection and Resolution

From
Ashutosh Bapat
Date:


On Wed, Jun 19, 2024 at 12:03 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> I doubt that it would be that simple. The application will have to intervene and tell one of the employees that their reservation has failed. It looks natural that the first one to reserve the room should get the reservation, but implementing that is more complex than resolving a conflict in the database. In fact, mostly it will be handled outside database.
>

Sure, the application needs some handling but I have tried to explain
with a simple way that comes to my mind and how it can be realized
with db involved. This is a known conflict detection method but note
that I am not insisting to have "earliest_timestamp_wins". Even, if we
want this we can have a separate discussion on this and add it later.


It will be good to add a minimal set of conflict resolution strategies to begin with, while designing the feature for extensibility. I imagine the first version might just detect the conflict and throw error or do nothing. That's already two simple conflict resolution strategies with minimal efforts. We can add more complicated ones incrementally.
 
>
> The inconsistency will arise irrespective of conflict resolution method. On a single system effects of whichever transaction runs last will be visible entirely. But in the example above the node where T1, T2, and T3 (from *different*) origins) are applied, we might end up with a situation where some changes from T1 are applied whereas some changes from T3 are applied.
>

I still think it will lead to the same result if all three T1, T2, T3
happen on the same node in the same order as you mentioned. Say, we
have a pre-existing table with rows r1, r2, r3, r4. Now, if we use the
order of transactions to be applied on the same node based on t2 < t1
< t3. First T2 will be applied, so for now, r1 is a pre-existing
version and r2 is from T2. Next, when T1 is performed, both r1 and r2
are from T1. Lastly, when T3 is applied, r1 will be from T3 and r2
will be from T1. This is what you mentioned will happen after conflict
resolution in the above example.


You are right. It won't affect the consistency. The contents of transaction on each node might vary after application depending upon the changes that conflict resolver makes; but the end result will be the same.

--
Best Wishes,
Ashutosh Bapat

Re: Conflict Detection and Resolution

From
Dilip Kumar
Date:
On Wed, Jun 19, 2024 at 2:36 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Wed, Jun 19, 2024 at 1:52 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Tue, Jun 18, 2024 at 3:29 PM shveta malik <shveta.malik@gmail.com> wrote:
> > > On Tue, Jun 18, 2024 at 11:34 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > I tried to work out a few scenarios with this, where the apply worker
> > > will wait until its local clock hits 'remote_commit_tts - max_skew
> > > permitted'. Please have a look.
> > >
> > > Let's say, we have a GUC to configure max_clock_skew permitted.
> > > Resolver is last_update_wins in both cases.
> > > ----------------
> > > 1) Case 1: max_clock_skew set to 0 i.e. no tolerance for clock skew.
> > >
> > > Remote Update with commit_timestamp = 10.20AM.
> > > Local clock (which is say 5 min behind) shows = 10.15AM.
> > >
> > > When remote update arrives at local node, we see that skew is greater
> > > than max_clock_skew and thus apply worker waits till local clock hits
> > > 'remote's commit_tts - max_clock_skew' i.e. till 10.20 AM. Once the
> > > local clock hits 10.20 AM, the worker applies the remote change with
> > > commit_tts of 10.20AM. In the meantime (during wait period of apply
> > > worker)) if some local update on same row has happened at say 10.18am,
> > > that will applied first, which will be later overwritten by above
> > > remote change of 10.20AM as remote-change's timestamp appear more
> > > latest, even though it has happened earlier than local change.

Oops lot of mistakes in the usage of change-1 and change-2, sorry about that.

> > For the sake of simplicity let's call the change that happened at
> > 10:20 AM change-1 and the change that happened at 10:15 as change-2
> > and assume we are talking about the synchronous commit only.
>
> Do you mean "the change that happened at 10:18 as change-2"

Right

> >
> > I think now from an application perspective the change-1 wouldn't have
> > caused the change-2 because we delayed applying change-2 on the local
> > node
>
> Do you mean "we delayed applying change-1 on the local node."

Right

> >which would have delayed the confirmation of the change-1 to the
> > application that means we have got the change-2 on the local node
> > without the confirmation of change-1 hence change-2 has no causal
> > dependency on the change-1.  So it's fine that we perform change-1
> > before change-2
>
> Do you mean "So it's fine that we perform change-2 before change-1"

Right

> >and the timestamp will also show the same at any other
> > node if they receive these 2 changes.
> >
> > The goal is to ensure that if we define the order where change-2
> > happens before change-1, this same order should be visible on all
> > other nodes. This will hold true because the commit timestamp of
> > change-2 is earlier than that of change-1.
>
> Considering the above corrections as base, I agree with this.

+1

> > > 2)  Case 2: max_clock_skew is set to 2min.
> > >
> > > Remote Update with commit_timestamp=10.20AM
> > > Local clock (which is say 5 min behind) = 10.15AM.
> > >
> > > Now apply worker will notice skew greater than 2min and thus will wait
> > > till local clock hits 'remote's commit_tts - max_clock_skew' i.e.
> > > 10.18 and will apply the change with commit_tts of 10.20 ( as we
> > > always save the origin's commit timestamp into local commit_tts, see
> > > RecordTransactionCommit->TransactionTreeSetCommitTsData). Now lets say
> > > another local update is triggered at 10.19am, it will be applied
> > > locally but it will be ignored on remote node. On the remote node ,
> > > the existing change with a timestamp of 10.20 am will win resulting in
> > > data divergence.
> >
> > Let's call the 10:20 AM change as a change-1 and the change that
> > happened at 10:19 as change-2
> >
> > IIUC, although we apply the change-1 at 10:18 AM the commit_ts of that
> > commit_ts of that change is 10:20, and the same will be visible to all
> > other nodes.  So in conflict resolution still the change-1 happened
> > after the change-2 because change-2's commit_ts is 10:19 AM.   Now
> > there could be a problem with the causal order because we applied the
> > change-1 at 10:18 AM so the application might have gotten confirmation
> > at 10:18 AM and the change-2 of the local node may be triggered as a
> > result of confirmation of the change-1 that means now change-2 has a
> > causal dependency on the change-1 but commit_ts shows change-2
> > happened before the change-1 on all the nodes.
> >
> > So, is this acceptable? I think yes because the user has configured a
> > maximum clock skew of 2 minutes, which means the detected order might
> > not always align with the causal order for transactions occurring
> > within that time frame.
>
> Agree. I had the same thoughts, and wanted to confirm my understanding.

Okay

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: Conflict Detection and Resolution

From
Amit Kapila
Date:
On Wed, Jun 19, 2024 at 2:51 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> On Wed, Jun 19, 2024 at 12:03 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>>
>> > I doubt that it would be that simple. The application will have to intervene and tell one of the employees that
theirreservation has failed. It looks natural that the first one to reserve the room should get the reservation, but
implementingthat is more complex than resolving a conflict in the database. In fact, mostly it will be handled outside
database.
>> >
>>
>> Sure, the application needs some handling but I have tried to explain
>> with a simple way that comes to my mind and how it can be realized
>> with db involved. This is a known conflict detection method but note
>> that I am not insisting to have "earliest_timestamp_wins". Even, if we
>> want this we can have a separate discussion on this and add it later.
>>
>
> It will be good to add a minimal set of conflict resolution strategies to begin with, while designing the feature for
extensibility.I imagine the first version might just detect the conflict and throw error or do nothing. That's already
twosimple conflict resolution strategies with minimal efforts. We can add more complicated ones incrementally. 
>

Agreed, splitting the work into multiple patches would help us to
finish the easier ones first.

I have thought to divide it such that in the first patch, we detect
conflicts like 'insert_exists', 'update_differ', 'update_missing', and
'delete_missing' (the definition of each could be found in the initial
email [1]) and throw an ERROR or write them in LOG. Various people
agreed to have this as a separate committable work [2]. This can help
users to detect and monitor the conflicts in a better way. I have
intentionally skipped update_deleted as it would require more
infrastructure and it would be helpful even without that.

In the second patch, we can implement simple built-in resolution
strategies like apply and skip (which can be named as remote_apply and
keep_local, see [3][4] for details on these strategies) with ERROR or
LOG being the default strategy. We can allow these strategies to be
configured at the global and table level.

In the third patch, we can add monitoring capability for conflicts and
resolutions as mentioned by Jonathan [5]. Here, we can have stats like
how many conflicts of a particular type have happened.

In the meantime, we can keep discussing and try to reach a consensus
on the timing-related resolution strategy like 'last_update_wins' and
the conflict strategy 'update_deleted'.

If we agree on the above, some of the work, especially the first one,
could even be discussed in a separate thread.

Thoughts?

[1] - https://www.postgresql.org/message-id/CAJpy0uD0-DpYVMtsxK5R%3DzszXauZBayQMAYET9sWr_w0CNWXxQ%40mail.gmail.com
[2] - https://www.postgresql.org/message-id/CAD21AoAa6JzqhXY02uNUPb-aTozu2RY9nMdD1%3DTUh%2BFpskkYtw%40mail.gmail.com
[3] - https://www.postgresql.org/message-id/CAJpy0uD0-DpYVMtsxK5R%3DzszXauZBayQMAYET9sWr_w0CNWXxQ%40mail.gmail.com
[4] - https://github.com/2ndquadrant/pglogical?tab=readme-ov-file#conflicts
[5] - https://www.postgresql.org/message-id/1eb9242f-dcb6-45c3-871c-98ec324e03ef%40postgresql.org

--
With Regards,
Amit Kapila.



Re: Conflict Detection and Resolution

From
Amit Kapila
Date:
On Tue, Jun 18, 2024 at 11:34 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Tue, Jun 18, 2024 at 10:17 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > >
> > > I think the unbounded size of the vector could be a problem to store
> > > for each event. However, while researching previous discussions, it
> > > came to our notice that we have discussed this topic in the past as
> > > well in the context of standbys. For recovery_min_apply_delay, we
> > > decided the clock skew is not a problem as the settings of this
> > > parameter are much larger than typical time deviations between servers
> > > as mentioned in docs. Similarly for casual reads [1], there was a
> > > proposal to introduce max_clock_skew parameter and suggesting the user
> > > to make sure to have NTP set up correctly. We have tried to check
> > > other databases (like Ora and BDR) where CDR is implemented but didn't
> > > find anything specific to clock skew. So, I propose to go with a GUC
> > > like max_clock_skew such that if the difference of time between the
> > > incoming transaction's commit time and the local time is more than
> > > max_clock_skew then we raise an ERROR. It is not clear to me that
> > > putting bigger effort into clock skew is worth especially when other
> > > systems providing CDR feature (like Ora or BDR) for decades have not
> > > done anything like vector clocks. It is possible that this is less of
> > > a problem w.r.t CDR and just detecting the anomaly in clock skew is
> > > good enough.
> >
> > I believe that if we've accepted this solution elsewhere, then we can
> > also consider the same. Basically, we're allowing the application to
> > set its tolerance for clock skew. And, if the skew exceeds that
> > tolerance, it's the application's responsibility to synchronize;
> > otherwise, an error will occur. This approach seems reasonable.
>
> This model can be further extended by making the apply worker wait if
> the remote transaction's commit_ts is greater than the local
> timestamp. This ensures that no local transactions occurring after the
> remote transaction appear to have happened earlier due to clock skew
> instead we make them happen before the remote transaction by delaying
> the remote transaction apply.  Essentially, by having the remote
> application wait until the local timestamp matches the remote
> transaction's timestamp, we ensure that the remote transaction, which
> seems to occur after concurrent local transactions due to clock skew,
> is actually applied after those transactions.
>
> With this model, there should be no ordering errors from the
> application's perspective as well if synchronous commit is enabled.
> The transaction initiated by the publisher cannot be completed until
> it is applied to the synchronous subscriber. This ensures that if the
> subscriber's clock is lagging behind the publisher's clock, the
> transaction will not be applied until the subscriber's local clock is
> in sync, preventing the transaction from being completed out of order.
>

As per the discussion, this idea will help us to resolve transaction
ordering issues due to clock skew. I was thinking of having two
variables max_clock_skew (indicates how much clock skew is
acceptable), max_clock_skew_options: ERROR, LOG, WAIT (indicates the
action we need to take once the clock skew is detected). There could
be multiple ways to provide these parameters, one is providing them as
GUCs, and another at the subscription or the table level. I am
thinking whether users would only like to care about a table or set of
tables or they would like to set such variables at the system level.
We already have an SKIP option (that allows us to skip the
transactions till a particular LSN) at the subscription level, so I am
wondering if there is a sense to provide these new parameters related
to conflict resolution also at the same level?

--
With Regards,
Amit Kapila.



Re: Conflict Detection and Resolution

From
Ashutosh Bapat
Date:


On Thu, Jun 20, 2024 at 3:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Jun 19, 2024 at 2:51 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> On Wed, Jun 19, 2024 at 12:03 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>>
>> > I doubt that it would be that simple. The application will have to intervene and tell one of the employees that their reservation has failed. It looks natural that the first one to reserve the room should get the reservation, but implementing that is more complex than resolving a conflict in the database. In fact, mostly it will be handled outside database.
>> >
>>
>> Sure, the application needs some handling but I have tried to explain
>> with a simple way that comes to my mind and how it can be realized
>> with db involved. This is a known conflict detection method but note
>> that I am not insisting to have "earliest_timestamp_wins". Even, if we
>> want this we can have a separate discussion on this and add it later.
>>
>
> It will be good to add a minimal set of conflict resolution strategies to begin with, while designing the feature for extensibility. I imagine the first version might just detect the conflict and throw error or do nothing. That's already two simple conflict resolution strategies with minimal efforts. We can add more complicated ones incrementally.
>

Agreed, splitting the work into multiple patches would help us to
finish the easier ones first.

I have thought to divide it such that in the first patch, we detect
conflicts like 'insert_exists', 'update_differ', 'update_missing', and
'delete_missing' (the definition of each could be found in the initial
email [1]) and throw an ERROR or write them in LOG. Various people
agreed to have this as a separate committable work [2]. This can help
users to detect and monitor the conflicts in a better way. I have
intentionally skipped update_deleted as it would require more
infrastructure and it would be helpful even without that.

Since we are in the initial months of release, it will be good to take a stock of whether the receiver receives all the information needed for most (if not all) of the conflict detection and resolution strategies. If there are any missing pieces, we may want to add those in PG18 so that improved conflict detection and resolution on a higher version receiver can still work.
 

In the second patch, we can implement simple built-in resolution
strategies like apply and skip (which can be named as remote_apply and
keep_local, see [3][4] for details on these strategies) with ERROR or
LOG being the default strategy. We can allow these strategies to be
configured at the global and table level.

In the third patch, we can add monitoring capability for conflicts and
resolutions as mentioned by Jonathan [5]. Here, we can have stats like
how many conflicts of a particular type have happened.

That looks like a plan. Thanks for chalking it out.

--
Best Wishes,
Ashutosh Bapat

Re: Conflict Detection and Resolution

From
Amit Kapila
Date:
On Thu, Jun 20, 2024 at 5:06 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> On Thu, Jun 20, 2024 at 3:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>>
>> On Wed, Jun 19, 2024 at 2:51 PM Ashutosh Bapat
>> <ashutosh.bapat.oss@gmail.com> wrote:
>> >
>> > On Wed, Jun 19, 2024 at 12:03 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>> >>
>> >> > I doubt that it would be that simple. The application will have to intervene and tell one of the employees that
theirreservation has failed. It looks natural that the first one to reserve the room should get the reservation, but
implementingthat is more complex than resolving a conflict in the database. In fact, mostly it will be handled outside
database.
>> >> >
>> >>
>> >> Sure, the application needs some handling but I have tried to explain
>> >> with a simple way that comes to my mind and how it can be realized
>> >> with db involved. This is a known conflict detection method but note
>> >> that I am not insisting to have "earliest_timestamp_wins". Even, if we
>> >> want this we can have a separate discussion on this and add it later.
>> >>
>> >
>> > It will be good to add a minimal set of conflict resolution strategies to begin with, while designing the feature
forextensibility. I imagine the first version might just detect the conflict and throw error or do nothing. That's
alreadytwo simple conflict resolution strategies with minimal efforts. We can add more complicated ones incrementally. 
>> >
>>
>> Agreed, splitting the work into multiple patches would help us to
>> finish the easier ones first.
>>
>> I have thought to divide it such that in the first patch, we detect
>> conflicts like 'insert_exists', 'update_differ', 'update_missing', and
>> 'delete_missing' (the definition of each could be found in the initial
>> email [1]) and throw an ERROR or write them in LOG. Various people
>> agreed to have this as a separate committable work [2]. This can help
>> users to detect and monitor the conflicts in a better way. I have
>> intentionally skipped update_deleted as it would require more
>> infrastructure and it would be helpful even without that.
>
>
> Since we are in the initial months of release, it will be good to take a stock of whether the receiver receives all
theinformation needed for most (if not all) of the conflict detection and resolution strategies. If there are any
missingpieces, we may want to add those in PG18 so that improved conflict detection and resolution on a higher version
receivercan still work. 
>

Good point. This can help us to detect conflicts if required even when
we move to a higher version. As we continue to discuss/develop the
features, I hope we will be able to see any missing pieces.

>>
>>
>> In the second patch, we can implement simple built-in resolution
>> strategies like apply and skip (which can be named as remote_apply and
>> keep_local, see [3][4] for details on these strategies) with ERROR or
>> LOG being the default strategy. We can allow these strategies to be
>> configured at the global and table level.
>>
>> In the third patch, we can add monitoring capability for conflicts and
>> resolutions as mentioned by Jonathan [5]. Here, we can have stats like
>> how many conflicts of a particular type have happened.
>
>
> That looks like a plan. Thanks for chalking it out.
>

Thanks!

--
With Regards,
Amit Kapila.



Re: Conflict Detection and Resolution

From
shveta malik
Date:
On Thu, Jun 20, 2024 at 6:41 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> >> In the second patch, we can implement simple built-in resolution
> >> strategies like apply and skip (which can be named as remote_apply and
> >> keep_local, see [3][4] for details on these strategies) with ERROR or
> >> LOG being the default strategy. We can allow these strategies to be
> >> configured at the global and table level.

Before we implement resolvers, we need a way to configure them. Please
find the patch002 which attempts to implement Global Level Conflict
Resolvers Configuration.  Note that patch002 is dependent upon
Conflict-Detection patch001 which is reviewed in another thread [1].
I have attached patch001 here for convenience and to avoid CFBot
failures. But please use [1] if you have any comments on patch001.

New DDL commands in patch002 are:

To set global resolver for given conflcit_type:
SET CONFLICT RESOLVER 'conflict_resolver' FOR 'conflict_type'

To reset to default resolver:
RESET CONFLICT RESOLVER FOR 'conflict_type'

TODO: Once we get initial consensus on DDL commands, I will add
support for them in pg_dump/restore and will add doc.

------------

As suggested in [2] and above, it seems logical to have table-specific
resolvers configuration along with global one.

Here is the proposal for table level resolvers:

1) We can provide support for table level resolvers using ALTER TABLE:

ALTER TABLE <name> SET CONFLICT RESOLVER <resolver1> on <conflict_type1>,
                                       SET CONFLICT RESOLVER
<resolver2> on <conflict_type2>, ...;

Reset can be done using:
ALTER TABLE <name> RESET CONFLICT RESOLVER on <conflict_type1>,
                                      RESET CONFLICT RESOLVER on
<conflict_type2>, ...;

Above commands will save/remove configuration in/from the new system
catalog pg_conflict_rel.

2) Table level configuration (if any) will be given preference over
global ones. The tables not having table-specific resolvers will use
global configured ones.

3) If the table is a partition table, then resolvers created for the
parent will be inherited by all child partition tables. Multiple
resolver entries will be created, one for each child partition in the
system catalog (similar to constraints).

4) Users can also configure explicit resolvers for child partitions.
In such a case, child's resolvers will override inherited resolvers
(if any).

5) Any attempt to RESET (remove) inherited resolvers on the child
partition table *alone* will result in error:  "cannot reset inherited
resolvers" (similar to constraints). But RESET of explicit created
resolvers (non-inherited ones) will be permitted for child partitions.
On RESET, the resolver configuration will not fallback to the
inherited resolver again. Users need to explicitly configure new
resolvers for the child partition tables (after RESET) if needed.

6) Removal/Reset of resolvers on parent will remove corresponding
"inherited" resolvers on all the child partitions as well. If any
child has overridden inherited resolvers earlier, those will stay.

7) For 'ALTER TABLE parent ATTACH PARTITION child'; if 'child' has its
own resolvers set, those will not be overridden. But if it does not
have resolvers set, it will inherit from the parent table. This will
mean, for say out of 5 conflict_types, if the child table has
resolvers configured for any 2, 'attach' will retain those; for the
rest 3, it will inherit from the parent (if any).

8) Detach partition will not remove inherited resolvers, it will just
mark them 'non inherited' (similar to constraints).

Thoughts?

------------

[1]:
https://www.postgresql.org/message-id/OS0PR01MB57161006B8F2779F2C97318194D42%40OS0PR01MB5716.jpnprd01.prod.outlook.com
[2]: https://www.postgresql.org/message-id/4738d098-6378-494e-9f88-9e3a85a5de82%40enterprisedb.com

thanks
Shveta

Attachment

Re: Conflict Detection and Resolution

From
Amit Kapila
Date:
On Mon, Jun 24, 2024 at 1:47 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Thu, Jun 20, 2024 at 6:41 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > >> In the second patch, we can implement simple built-in resolution
> > >> strategies like apply and skip (which can be named as remote_apply and
> > >> keep_local, see [3][4] for details on these strategies) with ERROR or
> > >> LOG being the default strategy. We can allow these strategies to be
> > >> configured at the global and table level.
>
> Before we implement resolvers, we need a way to configure them. Please
> find the patch002 which attempts to implement Global Level Conflict
> Resolvers Configuration.  Note that patch002 is dependent upon
> Conflict-Detection patch001 which is reviewed in another thread [1].
> I have attached patch001 here for convenience and to avoid CFBot
> failures. But please use [1] if you have any comments on patch001.
>
> New DDL commands in patch002 are:
>
> To set global resolver for given conflcit_type:
> SET CONFLICT RESOLVER 'conflict_resolver' FOR 'conflict_type'
>
> To reset to default resolver:
> RESET CONFLICT RESOLVER FOR 'conflict_type'
>

Does setting up resolvers have any meaning without subscriptions? I am
wondering whether we should allow to set up the resolvers at the
subscription level. One benefit is that users don't need to use a
different DDL to set up resolvers. The first patch gives a conflict
detection option at the subscription level, so it would be symmetrical
to provide a resolver at the subscription level. Yet another benefit
could be that it provides users facility to configure different
resolvers for a set of tables belonging to a particular
publication/node.

>
> ------------
>
> As suggested in [2] and above, it seems logical to have table-specific
> resolvers configuration along with global one.
>
> Here is the proposal for table level resolvers:
>
> 1) We can provide support for table level resolvers using ALTER TABLE:
>
> ALTER TABLE <name> SET CONFLICT RESOLVER <resolver1> on <conflict_type1>,
>                                        SET CONFLICT RESOLVER
> <resolver2> on <conflict_type2>, ...;
>
> Reset can be done using:
> ALTER TABLE <name> RESET CONFLICT RESOLVER on <conflict_type1>,
>                                       RESET CONFLICT RESOLVER on
> <conflict_type2>, ...;
>
> Above commands will save/remove configuration in/from the new system
> catalog pg_conflict_rel.
>
> 2) Table level configuration (if any) will be given preference over
> global ones. The tables not having table-specific resolvers will use
> global configured ones.
>
> 3) If the table is a partition table, then resolvers created for the
> parent will be inherited by all child partition tables. Multiple
> resolver entries will be created, one for each child partition in the
> system catalog (similar to constraints).
>
> 4) Users can also configure explicit resolvers for child partitions.
> In such a case, child's resolvers will override inherited resolvers
> (if any).
>
> 5) Any attempt to RESET (remove) inherited resolvers on the child
> partition table *alone* will result in error:  "cannot reset inherited
> resolvers" (similar to constraints). But RESET of explicit created
> resolvers (non-inherited ones) will be permitted for child partitions.
> On RESET, the resolver configuration will not fallback to the
> inherited resolver again. Users need to explicitly configure new
> resolvers for the child partition tables (after RESET) if needed.
>

Why so? If we can allow the RESET command to fallback to the inherited
resolver it would make the behavior consistent for the child table
where we don't have performed SET.

> 6) Removal/Reset of resolvers on parent will remove corresponding
> "inherited" resolvers on all the child partitions as well. If any
> child has overridden inherited resolvers earlier, those will stay.
>
> 7) For 'ALTER TABLE parent ATTACH PARTITION child'; if 'child' has its
> own resolvers set, those will not be overridden. But if it does not
> have resolvers set, it will inherit from the parent table. This will
> mean, for say out of 5 conflict_types, if the child table has
> resolvers configured for any 2, 'attach' will retain those; for the
> rest 3, it will inherit from the parent (if any).
>
> 8) Detach partition will not remove inherited resolvers, it will just
> mark them 'non inherited' (similar to constraints).
>

BTW, to keep the initial patch simple, can we prohibit setting
resolvers at the child table level? If we follow this, then we can
give an ERROR if the user tries to attach the table (with configured
resolvers) to an existing partitioned table.

--
With Regards,
Amit Kapila.



Re: Conflict Detection and Resolution

From
shveta malik
Date:
On Tue, Jun 25, 2024 at 3:12 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Jun 24, 2024 at 1:47 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > On Thu, Jun 20, 2024 at 6:41 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > >> In the second patch, we can implement simple built-in resolution
> > > >> strategies like apply and skip (which can be named as remote_apply and
> > > >> keep_local, see [3][4] for details on these strategies) with ERROR or
> > > >> LOG being the default strategy. We can allow these strategies to be
> > > >> configured at the global and table level.
> >
> > Before we implement resolvers, we need a way to configure them. Please
> > find the patch002 which attempts to implement Global Level Conflict
> > Resolvers Configuration.  Note that patch002 is dependent upon
> > Conflict-Detection patch001 which is reviewed in another thread [1].
> > I have attached patch001 here for convenience and to avoid CFBot
> > failures. But please use [1] if you have any comments on patch001.
> >
> > New DDL commands in patch002 are:
> >
> > To set global resolver for given conflcit_type:
> > SET CONFLICT RESOLVER 'conflict_resolver' FOR 'conflict_type'
> >
> > To reset to default resolver:
> > RESET CONFLICT RESOLVER FOR 'conflict_type'
> >
>
> Does setting up resolvers have any meaning without subscriptions? I am
> wondering whether we should allow to set up the resolvers at the
> subscription level. One benefit is that users don't need to use a
> different DDL to set up resolvers. The first patch gives a conflict
> detection option at the subscription level, so it would be symmetrical
> to provide a resolver at the subscription level. Yet another benefit
> could be that it provides users facility to configure different
> resolvers for a set of tables belonging to a particular
> publication/node.

There can be multiple tables included in a publication with varying
business use-cases and thus may need different resolvers set, even
though they all are part of the same publication.

> >
> > ------------
> >
> > As suggested in [2] and above, it seems logical to have table-specific
> > resolvers configuration along with global one.
> >
> > Here is the proposal for table level resolvers:
> >
> > 1) We can provide support for table level resolvers using ALTER TABLE:
> >
> > ALTER TABLE <name> SET CONFLICT RESOLVER <resolver1> on <conflict_type1>,
> >                                        SET CONFLICT RESOLVER
> > <resolver2> on <conflict_type2>, ...;
> >
> > Reset can be done using:
> > ALTER TABLE <name> RESET CONFLICT RESOLVER on <conflict_type1>,
> >                                       RESET CONFLICT RESOLVER on
> > <conflict_type2>, ...;
> >
> > Above commands will save/remove configuration in/from the new system
> > catalog pg_conflict_rel.
> >
> > 2) Table level configuration (if any) will be given preference over
> > global ones. The tables not having table-specific resolvers will use
> > global configured ones.
> >
> > 3) If the table is a partition table, then resolvers created for the
> > parent will be inherited by all child partition tables. Multiple
> > resolver entries will be created, one for each child partition in the
> > system catalog (similar to constraints).
> >
> > 4) Users can also configure explicit resolvers for child partitions.
> > In such a case, child's resolvers will override inherited resolvers
> > (if any).
> >
> > 5) Any attempt to RESET (remove) inherited resolvers on the child
> > partition table *alone* will result in error:  "cannot reset inherited
> > resolvers" (similar to constraints). But RESET of explicit created
> > resolvers (non-inherited ones) will be permitted for child partitions.
> > On RESET, the resolver configuration will not fallback to the
> > inherited resolver again. Users need to explicitly configure new
> > resolvers for the child partition tables (after RESET) if needed.
> >
>
> Why so? If we can allow the RESET command to fallback to the inherited
> resolver it would make the behavior consistent for the child table
> where we don't have performed SET.

Thought behind not making it fallback is since the user has done
'RESET', he may want to remove the resolver completely. We don't know
if he really wants to go back to the previous one. If he does, it is
easy to set it again. But if he does not, and we set the inherited
resolver again during 'RESET', there is no way he can drop that
inherited resolver alone on the child partition.

> > 6) Removal/Reset of resolvers on parent will remove corresponding
> > "inherited" resolvers on all the child partitions as well. If any
> > child has overridden inherited resolvers earlier, those will stay.
> >
> > 7) For 'ALTER TABLE parent ATTACH PARTITION child'; if 'child' has its
> > own resolvers set, those will not be overridden. But if it does not
> > have resolvers set, it will inherit from the parent table. This will
> > mean, for say out of 5 conflict_types, if the child table has
> > resolvers configured for any 2, 'attach' will retain those; for the
> > rest 3, it will inherit from the parent (if any).
> >
> > 8) Detach partition will not remove inherited resolvers, it will just
> > mark them 'non inherited' (similar to constraints).
> >
>
> BTW, to keep the initial patch simple, can we prohibit setting
> resolvers at the child table level? If we follow this, then we can
> give an ERROR if the user tries to attach the table (with configured
> resolvers) to an existing partitioned table.

Okay, I will think about this if the patch becomes too complex.

thanks
Shveta



Re: Conflict Detection and Resolution

From
Amit Kapila
Date:
On Tue, Jun 25, 2024 at 3:39 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Tue, Jun 25, 2024 at 3:12 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Mon, Jun 24, 2024 at 1:47 PM shveta malik <shveta.malik@gmail.com> wrote:
> > >
> > > On Thu, Jun 20, 2024 at 6:41 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > > >> In the second patch, we can implement simple built-in resolution
> > > > >> strategies like apply and skip (which can be named as remote_apply and
> > > > >> keep_local, see [3][4] for details on these strategies) with ERROR or
> > > > >> LOG being the default strategy. We can allow these strategies to be
> > > > >> configured at the global and table level.
> > >
> > > Before we implement resolvers, we need a way to configure them. Please
> > > find the patch002 which attempts to implement Global Level Conflict
> > > Resolvers Configuration.  Note that patch002 is dependent upon
> > > Conflict-Detection patch001 which is reviewed in another thread [1].
> > > I have attached patch001 here for convenience and to avoid CFBot
> > > failures. But please use [1] if you have any comments on patch001.
> > >
> > > New DDL commands in patch002 are:
> > >
> > > To set global resolver for given conflcit_type:
> > > SET CONFLICT RESOLVER 'conflict_resolver' FOR 'conflict_type'
> > >
> > > To reset to default resolver:
> > > RESET CONFLICT RESOLVER FOR 'conflict_type'
> > >
> >
> > Does setting up resolvers have any meaning without subscriptions? I am
> > wondering whether we should allow to set up the resolvers at the
> > subscription level. One benefit is that users don't need to use a
> > different DDL to set up resolvers. The first patch gives a conflict
> > detection option at the subscription level, so it would be symmetrical
> > to provide a resolver at the subscription level. Yet another benefit
> > could be that it provides users facility to configure different
> > resolvers for a set of tables belonging to a particular
> > publication/node.
>
> There can be multiple tables included in a publication with varying
> business use-cases and thus may need different resolvers set, even
> though they all are part of the same publication.
>

Agreed but this is the reason we are planning to keep resolvers at the
table level. Here, I am asking to set resolvers at the subscription
level rather than at the global level.

> > >
> > > ------------
> > >
> > > As suggested in [2] and above, it seems logical to have table-specific
> > > resolvers configuration along with global one.
> > >
> > > Here is the proposal for table level resolvers:
> > >
> > > 1) We can provide support for table level resolvers using ALTER TABLE:
> > >
> > > ALTER TABLE <name> SET CONFLICT RESOLVER <resolver1> on <conflict_type1>,
> > >                                        SET CONFLICT RESOLVER
> > > <resolver2> on <conflict_type2>, ...;
> > >
> > > Reset can be done using:
> > > ALTER TABLE <name> RESET CONFLICT RESOLVER on <conflict_type1>,
> > >                                       RESET CONFLICT RESOLVER on
> > > <conflict_type2>, ...;
> > >
> > > Above commands will save/remove configuration in/from the new system
> > > catalog pg_conflict_rel.
> > >
> > > 2) Table level configuration (if any) will be given preference over
> > > global ones. The tables not having table-specific resolvers will use
> > > global configured ones.
> > >
> > > 3) If the table is a partition table, then resolvers created for the
> > > parent will be inherited by all child partition tables. Multiple
> > > resolver entries will be created, one for each child partition in the
> > > system catalog (similar to constraints).
> > >
> > > 4) Users can also configure explicit resolvers for child partitions.
> > > In such a case, child's resolvers will override inherited resolvers
> > > (if any).
> > >
> > > 5) Any attempt to RESET (remove) inherited resolvers on the child
> > > partition table *alone* will result in error:  "cannot reset inherited
> > > resolvers" (similar to constraints). But RESET of explicit created
> > > resolvers (non-inherited ones) will be permitted for child partitions.
> > > On RESET, the resolver configuration will not fallback to the
> > > inherited resolver again. Users need to explicitly configure new
> > > resolvers for the child partition tables (after RESET) if needed.
> > >
> >
> > Why so? If we can allow the RESET command to fallback to the inherited
> > resolver it would make the behavior consistent for the child table
> > where we don't have performed SET.
>
> Thought behind not making it fallback is since the user has done
> 'RESET', he may want to remove the resolver completely. We don't know
> if he really wants to go back to the previous one. If he does, it is
> easy to set it again. But if he does not, and we set the inherited
> resolver again during 'RESET', there is no way he can drop that
> inherited resolver alone on the child partition.
>

I see your point but normally RESET allows us to go back to the
default which in this case would be the resolver inherited from the
parent table.

--
With Regards,
Amit Kapila.



Re: Conflict Detection and Resolution

From
Nisha Moond
Date:
Please find the attached  'patch0003', which implements conflict
resolutions according to the global resolver settings.

Summary of Conflict Resolutions Implemented in 'patch0003':

INSERT Conflicts:
------------------------
1) Conflict Type: 'insert_exists'

Supported Resolutions:
a) 'remote_apply': Convert the INSERT to an UPDATE and apply.
b) 'keep_local': Ignore the incoming (conflicting) INSERT and retain
the local tuple.
c) 'error': The apply worker will error out and restart.

UPDATE Conflicts:
------------------------
1) Conflict Type: 'update_differ'

Supported Resolutions:
a) 'remote_apply': Apply the remote update.
b) 'keep_local': Skip the remote update and retain the local tuple.
c) 'error': The apply worker will error out and restart.

2) Conflict Type: 'update_missing'

Supported Resolutions:
a) 'apply_or_skip': Try to convert the UPDATE to an INSERT; if
unsuccessful, skip the remote update and continue.
b) 'apply_or_error': Try to convert the UPDATE to an INSERT; if
unsuccessful, error out.
c) 'skip': Skip the remote update and continue.
d) 'error': The apply worker will error out and restart.

DELETE Conflicts:
------------------------
1) Conflict Type: 'delete_missing'

Supported Resolutions:
a) 'skip': Skip the remote delete and continue.
b) 'error': The apply worker will error out and restart.

NOTE: With these basic resolution techniques, the patch does not aim
to ensure consistency across nodes, so data divergence is expected.

--
Thanks,
Nisha

Attachment

Re: Conflict Detection and Resolution

From
shveta malik
Date:
On Wed, Jun 26, 2024 at 2:33 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Jun 25, 2024 at 3:39 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > On Tue, Jun 25, 2024 at 3:12 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Mon, Jun 24, 2024 at 1:47 PM shveta malik <shveta.malik@gmail.com> wrote:
> > > >
> > > > On Thu, Jun 20, 2024 at 6:41 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > >
> > > > > >> In the second patch, we can implement simple built-in resolution
> > > > > >> strategies like apply and skip (which can be named as remote_apply and
> > > > > >> keep_local, see [3][4] for details on these strategies) with ERROR or
> > > > > >> LOG being the default strategy. We can allow these strategies to be
> > > > > >> configured at the global and table level.
> > > >
> > > > Before we implement resolvers, we need a way to configure them. Please
> > > > find the patch002 which attempts to implement Global Level Conflict
> > > > Resolvers Configuration.  Note that patch002 is dependent upon
> > > > Conflict-Detection patch001 which is reviewed in another thread [1].
> > > > I have attached patch001 here for convenience and to avoid CFBot
> > > > failures. But please use [1] if you have any comments on patch001.
> > > >
> > > > New DDL commands in patch002 are:
> > > >
> > > > To set global resolver for given conflcit_type:
> > > > SET CONFLICT RESOLVER 'conflict_resolver' FOR 'conflict_type'
> > > >
> > > > To reset to default resolver:
> > > > RESET CONFLICT RESOLVER FOR 'conflict_type'
> > > >
> > >
> > > Does setting up resolvers have any meaning without subscriptions? I am
> > > wondering whether we should allow to set up the resolvers at the
> > > subscription level. One benefit is that users don't need to use a
> > > different DDL to set up resolvers. The first patch gives a conflict
> > > detection option at the subscription level, so it would be symmetrical
> > > to provide a resolver at the subscription level. Yet another benefit
> > > could be that it provides users facility to configure different
> > > resolvers for a set of tables belonging to a particular
> > > publication/node.
> >
> > There can be multiple tables included in a publication with varying
> > business use-cases and thus may need different resolvers set, even
> > though they all are part of the same publication.
> >
>
> Agreed but this is the reason we are planning to keep resolvers at the
> table level. Here, I am asking to set resolvers at the subscription
> level rather than at the global level.

Okay, got it. I misunderstood earlier that we want to replace table
level resolvers with subscription ones.
Having global configuration has one benefit that if the user has no
requirement to set different resolvers for different subscriptions or
tables, he may always set one global configuration and be done with
it. OTOH, I also agree with benefits coming with subscription level
configuration.

thanks
Shveta



Re: Conflict Detection and Resolution

From
shveta malik
Date:
On Thu, Jun 27, 2024 at 8:44 AM Nisha Moond <nisha.moond412@gmail.com> wrote:
>
> Please find the attached  'patch0003', which implements conflict
> resolutions according to the global resolver settings.

Thanks for providing the resolver patch.

Please find new patches attached. Changes:

patch002:
--Fixed CFBot compilation failure where a header file was not included
in meson.build
--Also this is the correct version of patch. Previous email has
attached an older version by mistake.

patch004:
This is a WIP progress which attempts to implement Configuration of
table-level resolvers . It has below changes:
--Alter table SET CONFLICT RESOLVER.
--Alter table RESET CONFLICT RESOLVER. <Note that these 2 commands
also take care of resolvers inheritance for partition tables as
discussed in [1]>.
--Resolver inheritance support during 'Alter table ATTACH PARTITION'.
--Resolver inheritance removal during 'Alter table DETACH PARTITION'.

Pending:
--Resolver Inheritance support during 'CREATE TABLE .. PARTITION OF
..'.
--Using tabel-level resolver while resolving conflicts. (Resolver
patch003 still relies on global resolvers).

Please refer [1] for the complete proposal for table-level resolvers.

[1]: https://www.postgresql.org/message-id/CAJpy0uAqegGDbuJk3Z-ku8wYFZyPv7C1KmHCkJ3885O%2Bj5enFg%40mail.gmail.com

thanks
Shveta

Attachment

Re: Conflict Detection and Resolution

From
shveta malik
Date:
On Thu, Jun 27, 2024 at 4:03 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Thu, Jun 27, 2024 at 8:44 AM Nisha Moond <nisha.moond412@gmail.com> wrote:
> >
> > Please find the attached  'patch0003', which implements conflict
> > resolutions according to the global resolver settings.
>
> Thanks for providing the resolver patch.
>
> Please find new patches attached. Changes:
>
> patch002:
> --Fixed CFBot compilation failure where a header file was not included
> in meson.build
> --Also this is the correct version of patch. Previous email has
> attached an older version by mistake.
>
> patch004:
> This is a WIP progress which attempts to implement Configuration of
> table-level resolvers . It has below changes:
> --Alter table SET CONFLICT RESOLVER.
> --Alter table RESET CONFLICT RESOLVER. <Note that these 2 commands
> also take care of resolvers inheritance for partition tables as
> discussed in [1]>.
> --Resolver inheritance support during 'Alter table ATTACH PARTITION'.
> --Resolver inheritance removal during 'Alter table DETACH PARTITION'.
>
> Pending:
> --Resolver Inheritance support during 'CREATE TABLE .. PARTITION OF
> ..'.
> --Using tabel-level resolver while resolving conflicts. (Resolver
> patch003 still relies on global resolvers).
>
> Please refer [1] for the complete proposal for table-level resolvers.
>

Please find v2 attached. Changes are in patch004 only, which are:

--Resolver Inheritance support during 'CREATE TABLE .. PARTITION OF'.
--SPLIT and MERGE partition review and testing (it was missed earlier).
--Test Cases added for all above cases.

thanks
Shveta

Attachment