Thread: Conflict Detection and Resolution
Hello hackers, Please find the proposal for Conflict Detection and Resolution (CDR) for Logical replication. <Thanks to Nisha, Hou-San, and Amit who helped in figuring out the below details.> Introduction ================ In case the node is subscribed to multiple providers, or when local writes happen on a subscriber, conflicts can arise for the incoming changes. CDR is the mechanism to automatically detect and resolve these conflicts depending on the application and configurations. CDR is not applicable for the initial table sync. If locally, there exists conflicting data on the table, the table sync worker will fail. Please find the details on CDR in apply worker for INSERT, UPDATE and DELETE operations: INSERT ================ To resolve INSERT conflict on subscriber, it is important to find out the conflicting row (if any) before we attempt an insertion. The indexes or search preference for the same will be: First check for replica identity (RI) index. - if not found, check for the primary key (PK) index. - if not found, then check for unique indexes (individual ones or added by unique constraints) - if unique index also not found, skip CDR Note: if no RI index, PK, or unique index is found but REPLICA_IDENTITY_FULL is defined, CDR will still be skipped. The reason being that even though a row can be identified with REPLICAT_IDENTITY_FULL, such tables are allowed to have duplicate rows. Hence, we should not go for conflict detection in such a case. In case of replica identity ‘nothing’ and in absence of any suitable index (as defined above), CDR will be skipped for INSERT. Conflict Type: ---------------- insert_exists: A conflict is detected when the table has the same value for a key column as the new value in the incoming row. Conflict Resolution ---------------- a) latest_timestamp_wins: The change with later commit timestamp wins. b) earliest_timestamp_wins: The change with earlier commit timestamp wins. c) apply: Always apply the remote change. d) skip: Remote change is skipped. e) error: Error out on conflict. Replication is stopped, manual action is needed. The change will be converted to 'UPDATE' and applied if the decision is in favor of applying remote change. It is important to have commit timestamp info available on subscriber when latest_timestamp_wins or earliest_timestamp_wins method is chosen as resolution method. Thus ‘track_commit_timestamp’ must be enabled on subscriber, in absence of which, configuring the said timestamp-based resolution methods will result in error. Note: If the user has chosen the latest or earliest_timestamp_wins, and the remote and local timestamps are the same, then it will go by system identifier. The change with a higher system identifier will win. This will ensure that the same change is picked on all the nodes. UPDATE ================ Conflict Detection Method: -------------------------------- Origin conflict detection: The ‘origin’ info is used to detect conflict which can be obtained from commit-timestamp generated for incoming txn at the source node. To compare remote’s origin with the local’s origin, we must have origin information for local txns as well which can be obtained from commit-timestamp after enabling ‘track_commit_timestamp’ locally. The one drawback here is the ‘origin’ information cannot be obtained once the row is frozen and the commit-timestamp info is removed by vacuum. For a frozen row, conflicts cannot be raised, and thus the incoming changes will be applied in all the cases. Conflict Types: ---------------- a) update_differ: The origin of an incoming update's key row differs from the local row i.e.; the row has already been updated locally or by different nodes. b) update_missing: The row with the same value as that incoming update's key does not exist. Remote is trying to update a row which does not exist locally. c) update_deleted: The row with the same value as that incoming update's key does not exist. The row is already deleted. This conflict type is generated only if the deleted row is still detectable i.e., it is not removed by VACUUM yet. If the row is removed by VACUUM already, it cannot detect this conflict. It will detect it as update_missing and will follow the default or configured resolver of update_missing itself. Conflict Resolutions: ---------------- a) latest_timestamp_wins: The change with later commit timestamp wins. Can be used for ‘update_differ’. b) earliest_timestamp_wins: The change with earlier commit timestamp wins. Can be used for ‘update_differ’. c) apply: The remote change is always applied. Can be used for ‘update_differ’. d) apply_or_skip: Remote change is converted to INSERT and is applied. If the complete row cannot be constructed from the info provided by the publisher, then the change is skipped. Can be used for ‘update_missing’ or ‘update_deleted’. e) apply_or_error: Remote change is converted to INSERT and is applied. If the complete row cannot be constructed from the info provided by the publisher, then error is raised. Can be used for ‘update_missing’ or ‘update_deleted’. f) skip: Remote change is skipped and local one is retained. Can be used for any conflict type. g) error: Error out of conflict. Replication is stopped, manual action is needed. Can be used for any conflict type. To support UPDATE CDR, the presence of either replica identity Index or primary key is required on target node. Update CDR will not be supported in absence of replica identity index or primary key even though REPLICA IDENTITY FULL is set. Please refer to "UPDATE" in "Noteworthey Scenarios" section in [1] for further details. DELETE ================ Conflict Type: ---------------- delete_missing: An incoming delete is trying to delete a row on a target node which does not exist. Conflict Resolutions: ---------------- a) error : Error out on conflict. Replication is stopped, manual action is needed. b) skip : The remote change is skipped. Configuring Conflict Resolution: ------------------------------------------------ There are two parts when it comes to configuring CDR: a) Enabling/Disabling conflict detection. b) Configuring conflict resolvers for different conflict types. Users can sometimes create multiple subscriptions on the same node, subscribing to different tables to improve replication performance by starting multiple apply workers. If the tables in one subscription are less likely to cause conflict, then it is possible that user may want conflict detection disabled for that subscription to avoid detection latency while enabling it for other subscriptions. This generates a requirement to make ‘conflict detection’ configurable per subscription. While the conflict resolver configuration can remain global. All the subscriptions which opt for ‘conflict detection’ will follow global conflict resolver configuration. To implement the above, subscription commands will be changed to have one more parameter 'conflict_resolution=on/off', default will be OFF. To configure global resolvers, new DDL command will be introduced: CONFLICT RESOLVER ON <conflict_type> IS <conflict_resolver> ------------------------- Apart from the above three main operations and resolver configuration, there are more conflict types like primary-key updates, multiple unique constraints etc and some special scenarios to be considered. Complete design details can be found in [1]. [1]: https://wiki.postgresql.org/wiki/Conflict_Detection_and_Resolution thanks Shveta
On 5/23/24 08:36, shveta malik wrote: > Hello hackers, > > Please find the proposal for Conflict Detection and Resolution (CDR) > for Logical replication. > <Thanks to Nisha, Hou-San, and Amit who helped in figuring out the > below details.> > > Introduction > ================ > In case the node is subscribed to multiple providers, or when local > writes happen on a subscriber, conflicts can arise for the incoming > changes. CDR is the mechanism to automatically detect and resolve > these conflicts depending on the application and configurations. > CDR is not applicable for the initial table sync. If locally, there > exists conflicting data on the table, the table sync worker will fail. > Please find the details on CDR in apply worker for INSERT, UPDATE and > DELETE operations: > Which architecture are you aiming for? Here you talk about multiple providers, but the wiki page mentions active-active. I'm not sure how much this matters, but it might. Also, what kind of consistency you expect from this? Because none of these simple conflict resolution methods can give you the regular consistency models we're used to, AFAICS. > INSERT > ================ > To resolve INSERT conflict on subscriber, it is important to find out > the conflicting row (if any) before we attempt an insertion. The > indexes or search preference for the same will be: > First check for replica identity (RI) index. > - if not found, check for the primary key (PK) index. > - if not found, then check for unique indexes (individual ones or > added by unique constraints) > - if unique index also not found, skip CDR > > Note: if no RI index, PK, or unique index is found but > REPLICA_IDENTITY_FULL is defined, CDR will still be skipped. > The reason being that even though a row can be identified with > REPLICAT_IDENTITY_FULL, such tables are allowed to have duplicate > rows. Hence, we should not go for conflict detection in such a case. > It's not clear to me why would REPLICA_IDENTITY_FULL mean the table is allowed to have duplicate values? It just means the upstream is sending the whole original row, there can still be a PK/UNIQUE index on both the publisher and subscriber. > In case of replica identity ‘nothing’ and in absence of any suitable > index (as defined above), CDR will be skipped for INSERT. > > Conflict Type: > ---------------- > insert_exists: A conflict is detected when the table has the same > value for a key column as the new value in the incoming row. > > Conflict Resolution > ---------------- > a) latest_timestamp_wins: The change with later commit timestamp wins. > b) earliest_timestamp_wins: The change with earlier commit timestamp wins. > c) apply: Always apply the remote change. > d) skip: Remote change is skipped. > e) error: Error out on conflict. Replication is stopped, manual > action is needed. > Why not to have some support for user-defined conflict resolution methods, allowing to do more complex stuff (e.g. merging the rows in some way, perhaps even with datatype-specific behavior)? > The change will be converted to 'UPDATE' and applied if the decision > is in favor of applying remote change. > > It is important to have commit timestamp info available on subscriber > when latest_timestamp_wins or earliest_timestamp_wins method is chosen > as resolution method. Thus ‘track_commit_timestamp’ must be enabled > on subscriber, in absence of which, configuring the said > timestamp-based resolution methods will result in error. > > Note: If the user has chosen the latest or earliest_timestamp_wins, > and the remote and local timestamps are the same, then it will go by > system identifier. The change with a higher system identifier will > win. This will ensure that the same change is picked on all the nodes. How is this going to deal with the fact that commit LSN and timestamps may not correlate perfectly? That is, commits may happen with LSN1 < LSN2 but with T1 > T2. > > UPDATE > ================ > > Conflict Detection Method: > -------------------------------- > Origin conflict detection: The ‘origin’ info is used to detect > conflict which can be obtained from commit-timestamp generated for > incoming txn at the source node. To compare remote’s origin with the > local’s origin, we must have origin information for local txns as well > which can be obtained from commit-timestamp after enabling > ‘track_commit_timestamp’ locally. > The one drawback here is the ‘origin’ information cannot be obtained > once the row is frozen and the commit-timestamp info is removed by > vacuum. For a frozen row, conflicts cannot be raised, and thus the > incoming changes will be applied in all the cases. > > Conflict Types: > ---------------- > a) update_differ: The origin of an incoming update's key row differs > from the local row i.e.; the row has already been updated locally or > by different nodes. > b) update_missing: The row with the same value as that incoming > update's key does not exist. Remote is trying to update a row which > does not exist locally. > c) update_deleted: The row with the same value as that incoming > update's key does not exist. The row is already deleted. This conflict > type is generated only if the deleted row is still detectable i.e., it > is not removed by VACUUM yet. If the row is removed by VACUUM already, > it cannot detect this conflict. It will detect it as update_missing > and will follow the default or configured resolver of update_missing > itself. > I don't understand the why should update_missing or update_deleted be different, especially considering it's not detected reliably. And also that even if we happen to find the row the associated TOAST data may have already been removed. So why would this matter? > Conflict Resolutions: > ---------------- > a) latest_timestamp_wins: The change with later commit timestamp > wins. Can be used for ‘update_differ’. > b) earliest_timestamp_wins: The change with earlier commit > timestamp wins. Can be used for ‘update_differ’. > c) apply: The remote change is always applied. Can be used for > ‘update_differ’. > d) apply_or_skip: Remote change is converted to INSERT and is > applied. If the complete row cannot be constructed from the info > provided by the publisher, then the change is skipped. Can be used for > ‘update_missing’ or ‘update_deleted’. > e) apply_or_error: Remote change is converted to INSERT and is > applied. If the complete row cannot be constructed from the info > provided by the publisher, then error is raised. Can be used for > ‘update_missing’ or ‘update_deleted’. > f) skip: Remote change is skipped and local one is retained. Can be > used for any conflict type. > g) error: Error out of conflict. Replication is stopped, manual > action is needed. Can be used for any conflict type. > > To support UPDATE CDR, the presence of either replica identity Index > or primary key is required on target node. Update CDR will not be > supported in absence of replica identity index or primary key even > though REPLICA IDENTITY FULL is set. Please refer to "UPDATE" in > "Noteworthey Scenarios" section in [1] for further details. > > DELETE > ================ > Conflict Type: > ---------------- > delete_missing: An incoming delete is trying to delete a row on a > target node which does not exist. > > Conflict Resolutions: > ---------------- > a) error : Error out on conflict. Replication is stopped, manual > action is needed. > b) skip : The remote change is skipped. > > Configuring Conflict Resolution: > ------------------------------------------------ > There are two parts when it comes to configuring CDR: > > a) Enabling/Disabling conflict detection. > b) Configuring conflict resolvers for different conflict types. > > Users can sometimes create multiple subscriptions on the same node, > subscribing to different tables to improve replication performance by > starting multiple apply workers. If the tables in one subscription are > less likely to cause conflict, then it is possible that user may want > conflict detection disabled for that subscription to avoid detection > latency while enabling it for other subscriptions. This generates a > requirement to make ‘conflict detection’ configurable per > subscription. While the conflict resolver configuration can remain > global. All the subscriptions which opt for ‘conflict detection’ will > follow global conflict resolver configuration. > > To implement the above, subscription commands will be changed to have > one more parameter 'conflict_resolution=on/off', default will be OFF. > > To configure global resolvers, new DDL command will be introduced: > > CONFLICT RESOLVER ON <conflict_type> IS <conflict_resolver> > I very much doubt we want a single global conflict resolver, or even one resolver per subscription. It seems like a very table-specific thing. Also, doesn't all this whole design ignore the concurrency between publishers? Isn't this problematic considering the commit timestamps may go backwards (for a given publisher), which means the conflict resolution is not deterministic (as it depends on how exactly it interleaves)? > ------------------------- > > Apart from the above three main operations and resolver configuration, > there are more conflict types like primary-key updates, multiple > unique constraints etc and some special scenarios to be considered. > Complete design details can be found in [1]. > > [1]: https://wiki.postgresql.org/wiki/Conflict_Detection_and_Resolution > Hmmm, not sure it's good to have a "complete" design on wiki, and only some subset posted to the mailing list. I haven't compared what the differences are, though. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sat, May 25, 2024 at 2:39 AM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > On 5/23/24 08:36, shveta malik wrote: > > Hello hackers, > > > > Please find the proposal for Conflict Detection and Resolution (CDR) > > for Logical replication. > > <Thanks to Nisha, Hou-San, and Amit who helped in figuring out the > > below details.> > > > > Introduction > > ================ > > In case the node is subscribed to multiple providers, or when local > > writes happen on a subscriber, conflicts can arise for the incoming > > changes. CDR is the mechanism to automatically detect and resolve > > these conflicts depending on the application and configurations. > > CDR is not applicable for the initial table sync. If locally, there > > exists conflicting data on the table, the table sync worker will fail. > > Please find the details on CDR in apply worker for INSERT, UPDATE and > > DELETE operations: > > > > Which architecture are you aiming for? Here you talk about multiple > providers, but the wiki page mentions active-active. I'm not sure how > much this matters, but it might. Currently, we are working for multi providers case but ideally it should work for active-active also. During further discussion and implementation phase, if we find that, there are cases which will not work in straight-forward way for active-active, then our primary focus will remain to first implement it for multiple providers architecture. > > Also, what kind of consistency you expect from this? Because none of > these simple conflict resolution methods can give you the regular > consistency models we're used to, AFAICS. Can you please explain a little bit more on this. > > > INSERT > > ================ > > To resolve INSERT conflict on subscriber, it is important to find out > > the conflicting row (if any) before we attempt an insertion. The > > indexes or search preference for the same will be: > > First check for replica identity (RI) index. > > - if not found, check for the primary key (PK) index. > > - if not found, then check for unique indexes (individual ones or > > added by unique constraints) > > - if unique index also not found, skip CDR > > > > Note: if no RI index, PK, or unique index is found but > > REPLICA_IDENTITY_FULL is defined, CDR will still be skipped. > > The reason being that even though a row can be identified with > > REPLICAT_IDENTITY_FULL, such tables are allowed to have duplicate > > rows. Hence, we should not go for conflict detection in such a case. > > > > It's not clear to me why would REPLICA_IDENTITY_FULL mean the table is > allowed to have duplicate values? It just means the upstream is sending > the whole original row, there can still be a PK/UNIQUE index on both the > publisher and subscriber. Yes, right. Sorry for confusion. I meant the same i.e. in absence of 'RI index, PK, or unique index', tables can have duplicates. So even in presence of Replica-identity (FULL in this case) but in absence of unique/primary index, CDR will be skipped for INSERT. > > > In case of replica identity ‘nothing’ and in absence of any suitable > > index (as defined above), CDR will be skipped for INSERT. > > > > Conflict Type: > > ---------------- > > insert_exists: A conflict is detected when the table has the same > > value for a key column as the new value in the incoming row. > > > > Conflict Resolution > > ---------------- > > a) latest_timestamp_wins: The change with later commit timestamp wins. > > b) earliest_timestamp_wins: The change with earlier commit timestamp wins. > > c) apply: Always apply the remote change. > > d) skip: Remote change is skipped. > > e) error: Error out on conflict. Replication is stopped, manual > > action is needed. > > > > Why not to have some support for user-defined conflict resolution > methods, allowing to do more complex stuff (e.g. merging the rows in > some way, perhaps even with datatype-specific behavior)? Initially, for the sake of simplicity, we are targeting to support built-in resolvers. But we have a plan to work on user-defined resolvers as well. We shall propose that separately. > > > The change will be converted to 'UPDATE' and applied if the decision > > is in favor of applying remote change. > > > > It is important to have commit timestamp info available on subscriber > > when latest_timestamp_wins or earliest_timestamp_wins method is chosen > > as resolution method. Thus ‘track_commit_timestamp’ must be enabled > > on subscriber, in absence of which, configuring the said > > timestamp-based resolution methods will result in error. > > > > Note: If the user has chosen the latest or earliest_timestamp_wins, > > and the remote and local timestamps are the same, then it will go by > > system identifier. The change with a higher system identifier will > > win. This will ensure that the same change is picked on all the nodes. > > How is this going to deal with the fact that commit LSN and timestamps > may not correlate perfectly? That is, commits may happen with LSN1 < > LSN2 but with T1 > T2. Are you pointing to the issue where a session/txn has taken 'xactStopTimestamp' timestamp earlier but is delayed to insert record in XLOG, while another session/txn which has taken timestamp slightly later succeeded to insert the record IN XLOG sooner than the session1, making LSN and Timestamps out of sync? Going by this scenario, the commit-timestamp may not be reflective of actual commits and thus timestamp-based resolvers may take wrong decisions. Or do you mean something else? If this is the problem you are referring to, then I think this needs a fix at the publisher side. Let me think more about it . Kindly let me know if you have ideas on how to tackle it. > > > > UPDATE > > ================ > > > > Conflict Detection Method: > > -------------------------------- > > Origin conflict detection: The ‘origin’ info is used to detect > > conflict which can be obtained from commit-timestamp generated for > > incoming txn at the source node. To compare remote’s origin with the > > local’s origin, we must have origin information for local txns as well > > which can be obtained from commit-timestamp after enabling > > ‘track_commit_timestamp’ locally. > > The one drawback here is the ‘origin’ information cannot be obtained > > once the row is frozen and the commit-timestamp info is removed by > > vacuum. For a frozen row, conflicts cannot be raised, and thus the > > incoming changes will be applied in all the cases. > > > > Conflict Types: > > ---------------- > > a) update_differ: The origin of an incoming update's key row differs > > from the local row i.e.; the row has already been updated locally or > > by different nodes. > > b) update_missing: The row with the same value as that incoming > > update's key does not exist. Remote is trying to update a row which > > does not exist locally. > > c) update_deleted: The row with the same value as that incoming > > update's key does not exist. The row is already deleted. This conflict > > type is generated only if the deleted row is still detectable i.e., it > > is not removed by VACUUM yet. If the row is removed by VACUUM already, > > it cannot detect this conflict. It will detect it as update_missing > > and will follow the default or configured resolver of update_missing > > itself. > > > > I don't understand the why should update_missing or update_deleted be > different, especially considering it's not detected reliably. And also > that even if we happen to find the row the associated TOAST data may > have already been removed. So why would this matter? Here, we are trying to tackle the case where the row is 'recently' deleted i.e. concurrent UPDATE and DELETE on pub and sub. User may want to opt for a different resolution in such a case as against the one where the corresponding row was not even present in the first place. The case where the row was deleted long back may not fall into this category as there are higher chances that they have been removed by vacuum and can be considered equivalent to the update_ missing case. Regarding "TOAST column" for deleted row cases, we may need to dig more. Thanks for bringing this case. Let me analyze more here. > > > Conflict Resolutions: > > ---------------- > > a) latest_timestamp_wins: The change with later commit timestamp > > wins. Can be used for ‘update_differ’. > > b) earliest_timestamp_wins: The change with earlier commit > > timestamp wins. Can be used for ‘update_differ’. > > c) apply: The remote change is always applied. Can be used for > > ‘update_differ’. > > d) apply_or_skip: Remote change is converted to INSERT and is > > applied. If the complete row cannot be constructed from the info > > provided by the publisher, then the change is skipped. Can be used for > > ‘update_missing’ or ‘update_deleted’. > > e) apply_or_error: Remote change is converted to INSERT and is > > applied. If the complete row cannot be constructed from the info > > provided by the publisher, then error is raised. Can be used for > > ‘update_missing’ or ‘update_deleted’. > > f) skip: Remote change is skipped and local one is retained. Can be > > used for any conflict type. > > g) error: Error out of conflict. Replication is stopped, manual > > action is needed. Can be used for any conflict type. > > > > To support UPDATE CDR, the presence of either replica identity Index > > or primary key is required on target node. Update CDR will not be > > supported in absence of replica identity index or primary key even > > though REPLICA IDENTITY FULL is set. Please refer to "UPDATE" in > > "Noteworthey Scenarios" section in [1] for further details. > > > > DELETE > > ================ > > Conflict Type: > > ---------------- > > delete_missing: An incoming delete is trying to delete a row on a > > target node which does not exist. > > > > Conflict Resolutions: > > ---------------- > > a) error : Error out on conflict. Replication is stopped, manual > > action is needed. > > b) skip : The remote change is skipped. > > > > Configuring Conflict Resolution: > > ------------------------------------------------ > > There are two parts when it comes to configuring CDR: > > > > a) Enabling/Disabling conflict detection. > > b) Configuring conflict resolvers for different conflict types. > > > > Users can sometimes create multiple subscriptions on the same node, > > subscribing to different tables to improve replication performance by > > starting multiple apply workers. If the tables in one subscription are > > less likely to cause conflict, then it is possible that user may want > > conflict detection disabled for that subscription to avoid detection > > latency while enabling it for other subscriptions. This generates a > > requirement to make ‘conflict detection’ configurable per > > subscription. While the conflict resolver configuration can remain > > global. All the subscriptions which opt for ‘conflict detection’ will > > follow global conflict resolver configuration. > > > > To implement the above, subscription commands will be changed to have > > one more parameter 'conflict_resolution=on/off', default will be OFF. > > > > To configure global resolvers, new DDL command will be introduced: > > > > CONFLICT RESOLVER ON <conflict_type> IS <conflict_resolver> > > > > I very much doubt we want a single global conflict resolver, or even one > resolver per subscription. It seems like a very table-specific thing. Even we thought about this. We feel that even if we go for table based or subscription based resolvers configuration, there may be use case scenarios where the user is not interested in configuring resolvers for each table and thus may want to give global ones. Thus, we should provide a way for users to do global configuration. Thus we started with global one. I have noted your point here and would also like to know the opinion of others. We are open to discussion. We can either opt for any of these 2 options (global or table) or we can opt for both global and table/sub based one. > > Also, doesn't all this whole design ignore the concurrency between > publishers? Isn't this problematic considering the commit timestamps may > go backwards (for a given publisher), which means the conflict > resolution is not deterministic (as it depends on how exactly it > interleaves)? > > > > ------------------------- > > > > Apart from the above three main operations and resolver configuration, > > there are more conflict types like primary-key updates, multiple > > unique constraints etc and some special scenarios to be considered. > > Complete design details can be found in [1]. > > > > [1]: https://wiki.postgresql.org/wiki/Conflict_Detection_and_Resolution > > > > Hmmm, not sure it's good to have a "complete" design on wiki, and only > some subset posted to the mailing list. I haven't compared what the > differences are, though. It would have been difficult to mention all the details in email (including examples and corner scenarios) and thus we thought that it will be better to document everything in wiki page for the time being. We can keep on discussing the design and all the scenarios on need basis (before implementation phase of that part) and thus eventually everything will come in email on hackers. With out first patch, we plan to provide everything in a README as well. thanks Shveta
On Mon, May 27, 2024 at 11:19 AM shveta malik <shveta.malik@gmail.com> wrote: > > On Sat, May 25, 2024 at 2:39 AM Tomas Vondra > <tomas.vondra@enterprisedb.com> wrote: > > > > On 5/23/24 08:36, shveta malik wrote: > > > Hello hackers, > > > > > > Please find the proposal for Conflict Detection and Resolution (CDR) > > > for Logical replication. > > > <Thanks to Nisha, Hou-San, and Amit who helped in figuring out the > > > below details.> > > > > > > Introduction > > > ================ > > > In case the node is subscribed to multiple providers, or when local > > > writes happen on a subscriber, conflicts can arise for the incoming > > > changes. CDR is the mechanism to automatically detect and resolve > > > these conflicts depending on the application and configurations. > > > CDR is not applicable for the initial table sync. If locally, there > > > exists conflicting data on the table, the table sync worker will fail. > > > Please find the details on CDR in apply worker for INSERT, UPDATE and > > > DELETE operations: > > > > > > > Which architecture are you aiming for? Here you talk about multiple > > providers, but the wiki page mentions active-active. I'm not sure how > > much this matters, but it might. > > Currently, we are working for multi providers case but ideally it > should work for active-active also. During further discussion and > implementation phase, if we find that, there are cases which will not > work in straight-forward way for active-active, then our primary focus > will remain to first implement it for multiple providers architecture. > > > > > Also, what kind of consistency you expect from this? Because none of > > these simple conflict resolution methods can give you the regular > > consistency models we're used to, AFAICS. > > Can you please explain a little bit more on this. > > > > > > INSERT > > > ================ > > > To resolve INSERT conflict on subscriber, it is important to find out > > > the conflicting row (if any) before we attempt an insertion. The > > > indexes or search preference for the same will be: > > > First check for replica identity (RI) index. > > > - if not found, check for the primary key (PK) index. > > > - if not found, then check for unique indexes (individual ones or > > > added by unique constraints) > > > - if unique index also not found, skip CDR > > > > > > Note: if no RI index, PK, or unique index is found but > > > REPLICA_IDENTITY_FULL is defined, CDR will still be skipped. > > > The reason being that even though a row can be identified with > > > REPLICAT_IDENTITY_FULL, such tables are allowed to have duplicate > > > rows. Hence, we should not go for conflict detection in such a case. > > > > > > > It's not clear to me why would REPLICA_IDENTITY_FULL mean the table is > > allowed to have duplicate values? It just means the upstream is sending > > the whole original row, there can still be a PK/UNIQUE index on both the > > publisher and subscriber. > > Yes, right. Sorry for confusion. I meant the same i.e. in absence of > 'RI index, PK, or unique index', tables can have duplicates. So even > in presence of Replica-identity (FULL in this case) but in absence of > unique/primary index, CDR will be skipped for INSERT. > > > > > > In case of replica identity ‘nothing’ and in absence of any suitable > > > index (as defined above), CDR will be skipped for INSERT. > > > > > > Conflict Type: > > > ---------------- > > > insert_exists: A conflict is detected when the table has the same > > > value for a key column as the new value in the incoming row. > > > > > > Conflict Resolution > > > ---------------- > > > a) latest_timestamp_wins: The change with later commit timestamp wins. > > > b) earliest_timestamp_wins: The change with earlier commit timestamp wins. > > > c) apply: Always apply the remote change. > > > d) skip: Remote change is skipped. > > > e) error: Error out on conflict. Replication is stopped, manual > > > action is needed. > > > > > > > Why not to have some support for user-defined conflict resolution > > methods, allowing to do more complex stuff (e.g. merging the rows in > > some way, perhaps even with datatype-specific behavior)? > > Initially, for the sake of simplicity, we are targeting to support > built-in resolvers. But we have a plan to work on user-defined > resolvers as well. We shall propose that separately. > > > > > > The change will be converted to 'UPDATE' and applied if the decision > > > is in favor of applying remote change. > > > > > > It is important to have commit timestamp info available on subscriber > > > when latest_timestamp_wins or earliest_timestamp_wins method is chosen > > > as resolution method. Thus ‘track_commit_timestamp’ must be enabled > > > on subscriber, in absence of which, configuring the said > > > timestamp-based resolution methods will result in error. > > > > > > Note: If the user has chosen the latest or earliest_timestamp_wins, > > > and the remote and local timestamps are the same, then it will go by > > > system identifier. The change with a higher system identifier will > > > win. This will ensure that the same change is picked on all the nodes. > > > > How is this going to deal with the fact that commit LSN and timestamps > > may not correlate perfectly? That is, commits may happen with LSN1 < > > LSN2 but with T1 > T2. > > Are you pointing to the issue where a session/txn has taken > 'xactStopTimestamp' timestamp earlier but is delayed to insert record > in XLOG, while another session/txn which has taken timestamp slightly > later succeeded to insert the record IN XLOG sooner than the session1, > making LSN and Timestamps out of sync? Going by this scenario, the > commit-timestamp may not be reflective of actual commits and thus > timestamp-based resolvers may take wrong decisions. Or do you mean > something else? > > If this is the problem you are referring to, then I think this needs a > fix at the publisher side. Let me think more about it . Kindly let me > know if you have ideas on how to tackle it. > > > > > > > UPDATE > > > ================ > > > > > > Conflict Detection Method: > > > -------------------------------- > > > Origin conflict detection: The ‘origin’ info is used to detect > > > conflict which can be obtained from commit-timestamp generated for > > > incoming txn at the source node. To compare remote’s origin with the > > > local’s origin, we must have origin information for local txns as well > > > which can be obtained from commit-timestamp after enabling > > > ‘track_commit_timestamp’ locally. > > > The one drawback here is the ‘origin’ information cannot be obtained > > > once the row is frozen and the commit-timestamp info is removed by > > > vacuum. For a frozen row, conflicts cannot be raised, and thus the > > > incoming changes will be applied in all the cases. > > > > > > Conflict Types: > > > ---------------- > > > a) update_differ: The origin of an incoming update's key row differs > > > from the local row i.e.; the row has already been updated locally or > > > by different nodes. > > > b) update_missing: The row with the same value as that incoming > > > update's key does not exist. Remote is trying to update a row which > > > does not exist locally. > > > c) update_deleted: The row with the same value as that incoming > > > update's key does not exist. The row is already deleted. This conflict > > > type is generated only if the deleted row is still detectable i.e., it > > > is not removed by VACUUM yet. If the row is removed by VACUUM already, > > > it cannot detect this conflict. It will detect it as update_missing > > > and will follow the default or configured resolver of update_missing > > > itself. > > > > > > > I don't understand the why should update_missing or update_deleted be > > different, especially considering it's not detected reliably. And also > > that even if we happen to find the row the associated TOAST data may > > have already been removed. So why would this matter? > > Here, we are trying to tackle the case where the row is 'recently' > deleted i.e. concurrent UPDATE and DELETE on pub and sub. User may > want to opt for a different resolution in such a case as against the > one where the corresponding row was not even present in the first > place. The case where the row was deleted long back may not fall into > this category as there are higher chances that they have been removed > by vacuum and can be considered equivalent to the update_ missing > case. > > Regarding "TOAST column" for deleted row cases, we may need to dig > more. Thanks for bringing this case. Let me analyze more here. > I tested a simple case with a table with one TOAST column and found that when a tuple with a TOAST column is deleted, both the tuple and corresponding pg_toast entries are marked as ‘deleted’ (dead) but not removed immediately. The main tuple and respective pg_toast entry are permanently deleted only during vacuum. First, the main table’s dead tuples are vacuumed, followed by the secondary TOAST relation ones (if available). Please let us know if you have a specific scenario in mind where the TOAST column data is deleted immediately upon ‘delete’ operation, rather than during vacuum, which we are missing. Thanks, Nisha
On Sat, May 25, 2024 at 2:39 AM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > On 5/23/24 08:36, shveta malik wrote: > > > > Conflict Resolution > > ---------------- > > a) latest_timestamp_wins: The change with later commit timestamp wins. > > b) earliest_timestamp_wins: The change with earlier commit timestamp wins. > > c) apply: Always apply the remote change. > > d) skip: Remote change is skipped. > > e) error: Error out on conflict. Replication is stopped, manual > > action is needed. > > > > Why not to have some support for user-defined conflict resolution > methods, allowing to do more complex stuff (e.g. merging the rows in > some way, perhaps even with datatype-specific behavior)? > > > The change will be converted to 'UPDATE' and applied if the decision > > is in favor of applying remote change. > > > > It is important to have commit timestamp info available on subscriber > > when latest_timestamp_wins or earliest_timestamp_wins method is chosen > > as resolution method. Thus ‘track_commit_timestamp’ must be enabled > > on subscriber, in absence of which, configuring the said > > timestamp-based resolution methods will result in error. > > > > Note: If the user has chosen the latest or earliest_timestamp_wins, > > and the remote and local timestamps are the same, then it will go by > > system identifier. The change with a higher system identifier will > > win. This will ensure that the same change is picked on all the nodes. > > How is this going to deal with the fact that commit LSN and timestamps > may not correlate perfectly? That is, commits may happen with LSN1 < > LSN2 but with T1 > T2. > One of the possible scenarios discussed at pgconf.dev with Tomas for this was as follows: Say there are two publisher nodes PN1, PN2, and subscriber node SN3. The logical replication is configured such that a subscription on SN3 has publications from both PN1 and PN2. For example, SN3 (sub) -> PN1, PN2 (p1, p2) Now, on PN1, we have the following operations that update the same row: T1 Update-1 on table t1 at LSN1 (1000) on time (200) T2 Update-2 on table t1 at LSN2 (2000) on time (100) Then in parallel, we have the following operation on node PN2 that updates the same row as Update-1, and Update-2 on node PN1. T3 Update-3 on table t1 at LSN(1500) on time (150) By theory, we can have a different state on subscribers depending on the order of updates arriving at SN3 which shouldn't happen. Say, the order in which they reach SN3 is: Update-1, Update-2, Update-3 then the final row we have is by Update-3 considering we have configured last_update_wins as a conflict resolution method. Now, consider the other order: Update-1, Update-3, Update-2, in this case, the final row will be by Update-2 because when we try to apply Update-3, it will generate a conflict and as per the resolution method (last_update_wins) we need to retain Update-1. On further thinking, the operations on node-1 PN-1 as defined above seem impossible because one of the Updates needs to wait for the other to write a commit record. So the commits may happen with LSN1 < LSN2 but with T1 > T2 but they can't be on the same row due to locks. So, the order of apply should still be consistent. Am, I missing something? -- With Regards, Amit Kapila.
On Mon, May 27, 2024 at 11:19 AM shveta malik <shveta.malik@gmail.com> wrote: > > On Sat, May 25, 2024 at 2:39 AM Tomas Vondra > <tomas.vondra@enterprisedb.com> wrote: > > > > > > > > Conflict Resolution > > > ---------------- > > > a) latest_timestamp_wins: The change with later commit timestamp wins. > > > b) earliest_timestamp_wins: The change with earlier commit timestamp wins. Can you share the use case of "earliest_timestamp_wins" resolution method? It seems after the initial update on the local node, it will never allow remote update to succeed which sounds a bit odd. Jan has shared this and similar concerns about this resolution method, so I have added him to the email as well. > > > > > > Conflict Types: > > > ---------------- > > > a) update_differ: The origin of an incoming update's key row differs > > > from the local row i.e.; the row has already been updated locally or > > > by different nodes. > > > b) update_missing: The row with the same value as that incoming > > > update's key does not exist. Remote is trying to update a row which > > > does not exist locally. > > > c) update_deleted: The row with the same value as that incoming > > > update's key does not exist. The row is already deleted. This conflict > > > type is generated only if the deleted row is still detectable i.e., it > > > is not removed by VACUUM yet. If the row is removed by VACUUM already, > > > it cannot detect this conflict. It will detect it as update_missing > > > and will follow the default or configured resolver of update_missing > > > itself. > > > > > > > I don't understand the why should update_missing or update_deleted be > > different, especially considering it's not detected reliably. And also > > that even if we happen to find the row the associated TOAST data may > > have already been removed. So why would this matter? > > Here, we are trying to tackle the case where the row is 'recently' > deleted i.e. concurrent UPDATE and DELETE on pub and sub. User may > want to opt for a different resolution in such a case as against the > one where the corresponding row was not even present in the first > place. The case where the row was deleted long back may not fall into > this category as there are higher chances that they have been removed > by vacuum and can be considered equivalent to the update_ missing > case. > I think to make 'update_deleted' work, we need another scan with a different snapshot type to find the recently deleted row. I don't know if it is a good idea to scan the index twice with different snapshots, so for the sake of simplicity, can we consider 'updated_deleted' same as 'update_missing'? If we think it is an important case to consider then we can try to accomplish this once we finalize the design/implementation of other resolution methods. > > > > > > To implement the above, subscription commands will be changed to have > > > one more parameter 'conflict_resolution=on/off', default will be OFF. > > > > > > To configure global resolvers, new DDL command will be introduced: > > > > > > CONFLICT RESOLVER ON <conflict_type> IS <conflict_resolver> > > > > > > > I very much doubt we want a single global conflict resolver, or even one > > resolver per subscription. It seems like a very table-specific thing. > +1 to make it a table-level configuration but we probably need something at the global level as well such that by default if users don't define anything at table-level global-level configuration will be used. > > > > > Also, doesn't all this whole design ignore the concurrency between > > publishers? Isn't this problematic considering the commit timestamps may > > go backwards (for a given publisher), which means the conflict > > resolution is not deterministic (as it depends on how exactly it > > interleaves)? > > I am not able to imagine the cases you are worried about. Can you please be specific? Is it similar to the case I described in yesterday's email [1]? [1] - https://www.postgresql.org/message-id/CAA4eK1JTMiBOoGqkt%3DaLPLU8Rs45ihbLhXaGHsz8XC76%2BOG3%2BQ%40mail.gmail.com -- With Regards, Amit Kapila.
On Tue, Jun 4, 2024 at 9:37 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > > > Conflict Resolution > > > > ---------------- > > > > a) latest_timestamp_wins: The change with later commit timestamp wins. > > > > b) earliest_timestamp_wins: The change with earlier commit timestamp wins. > > Can you share the use case of "earliest_timestamp_wins" resolution > method? It seems after the initial update on the local node, it will > never allow remote update to succeed which sounds a bit odd. Jan has > shared this and similar concerns about this resolution method, so I > have added him to the email as well. I do not have the exact scenario for this. But I feel, if 2 nodes are concurrently inserting different data against a primary key, then some users may have preferences that retain the row which was inserted earlier. It is no different from latest_timestamp_wins. It totally depends upon what kind of application and requirement the user may have, based on which, he may discard the later coming rows (specially for INSERT case). > > > > Conflict Types: > > > > ---------------- > > > > a) update_differ: The origin of an incoming update's key row differs > > > > from the local row i.e.; the row has already been updated locally or > > > > by different nodes. > > > > b) update_missing: The row with the same value as that incoming > > > > update's key does not exist. Remote is trying to update a row which > > > > does not exist locally. > > > > c) update_deleted: The row with the same value as that incoming > > > > update's key does not exist. The row is already deleted. This conflict > > > > type is generated only if the deleted row is still detectable i.e., it > > > > is not removed by VACUUM yet. If the row is removed by VACUUM already, > > > > it cannot detect this conflict. It will detect it as update_missing > > > > and will follow the default or configured resolver of update_missing > > > > itself. > > > > > > > > > > I don't understand the why should update_missing or update_deleted be > > > different, especially considering it's not detected reliably. And also > > > that even if we happen to find the row the associated TOAST data may > > > have already been removed. So why would this matter? > > > > Here, we are trying to tackle the case where the row is 'recently' > > deleted i.e. concurrent UPDATE and DELETE on pub and sub. User may > > want to opt for a different resolution in such a case as against the > > one where the corresponding row was not even present in the first > > place. The case where the row was deleted long back may not fall into > > this category as there are higher chances that they have been removed > > by vacuum and can be considered equivalent to the update_ missing > > case. > > > > I think to make 'update_deleted' work, we need another scan with a > different snapshot type to find the recently deleted row. I don't know > if it is a good idea to scan the index twice with different snapshots, > so for the sake of simplicity, can we consider 'updated_deleted' same > as 'update_missing'? If we think it is an important case to consider > then we can try to accomplish this once we finalize the > design/implementation of other resolution methods. I think it is important for scenarios when data is being updated and deleted concurrently. But yes, I agree that implementation may have some performance hit for this case. We can tackle this scenario at a later stage. > > > > > > > > To implement the above, subscription commands will be changed to have > > > > one more parameter 'conflict_resolution=on/off', default will be OFF. > > > > > > > > To configure global resolvers, new DDL command will be introduced: > > > > > > > > CONFLICT RESOLVER ON <conflict_type> IS <conflict_resolver> > > > > > > > > > > I very much doubt we want a single global conflict resolver, or even one > > > resolver per subscription. It seems like a very table-specific thing. > > > > +1 to make it a table-level configuration but we probably need > something at the global level as well such that by default if users > don't define anything at table-level global-level configuration will > be used. > > > > > > > > > Also, doesn't all this whole design ignore the concurrency between > > > publishers? Isn't this problematic considering the commit timestamps may > > > go backwards (for a given publisher), which means the conflict > > > resolution is not deterministic (as it depends on how exactly it > > > interleaves)? > > > > > I am not able to imagine the cases you are worried about. Can you > please be specific? Is it similar to the case I described in > yesterday's email [1]? > > [1] - https://www.postgresql.org/message-id/CAA4eK1JTMiBOoGqkt%3DaLPLU8Rs45ihbLhXaGHsz8XC76%2BOG3%2BQ%40mail.gmail.com > thanks Shveta
Hi, This time at PGconf.dev[1], we had some discussions regarding this project. The proposed approach is to split the work into two main components. The first part focuses on conflict detection, which aims to identify and report conflicts in logical replication. This feature will enable users to monitor the unexpected conflicts that may occur. The second part involves the actual conflict resolution. Here, we will provide built-in resolutions for each conflict and allow user to choose which resolution will be used for which conflict(as described in the initial email of this thread). Of course, we are open to alternative ideas and suggestions, and the strategy above can be changed based on ongoing discussions and feedback received. Here is the patch of the first part work, which adds a new parameter detect_conflict for CREATE and ALTER subscription commands. This new parameter will decide if subscription will go for conflict detection. By default, conflict detection will be off for a subscription. When conflict detection is enabled, additional logging is triggered in the following conflict scenarios: * updating a row that was previously modified by another origin. * The tuple to be updated is not found. * The tuple to be deleted is not found. While there exist other conflict types in logical replication, such as an incoming insert conflicting with an existing row due to a primary key or unique index, these cases already result in constraint violation errors. Therefore, additional conflict detection for these cases is currently omitted to minimize potential overhead. However, the pre-detection for conflict in these error cases is still essential to support automatic conflict resolution in the future. [1] https://2024.pgconf.dev/ Best Regards, Hou zj
Attachment
On Wed, Jun 5, 2024 at 9:12 AM shveta malik <shveta.malik@gmail.com> wrote: > > On Tue, Jun 4, 2024 at 9:37 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > > > > > > > Conflict Resolution > > > > > ---------------- > > > > > a) latest_timestamp_wins: The change with later commit timestamp wins. > > > > > b) earliest_timestamp_wins: The change with earlier commit timestamp wins. > > > > Can you share the use case of "earliest_timestamp_wins" resolution > > method? It seems after the initial update on the local node, it will > > never allow remote update to succeed which sounds a bit odd. Jan has > > shared this and similar concerns about this resolution method, so I > > have added him to the email as well. > > I do not have the exact scenario for this. But I feel, if 2 nodes are > concurrently inserting different data against a primary key, then some > users may have preferences that retain the row which was inserted > earlier. It is no different from latest_timestamp_wins. It totally > depends upon what kind of application and requirement the user may > have, based on which, he may discard the later coming rows (specially > for INSERT case). I haven't read the complete design yet, but have we discussed how we plan to deal with clock drift if we use timestamp-based conflict resolution? For example, a user might insert something conflicting on node1 first and then on node2. However, due to clock drift, the timestamp from node2 might appear earlier. In this case, if we choose "earliest timestamp wins," we would keep the changes from node2. I haven't fully considered if this would cause any problems, but users might detect this issue. For instance, a client machine might send a change to node1 first and then, upon confirmation, send it to node2. If the clocks on node1 and node2 are not synchronized, the changes might appear in a different order. Does this seem like a potential problem? -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Tue, Jun 4, 2024 at 9:37 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > Can you share the use case of "earliest_timestamp_wins" resolution > method? It seems after the initial update on the local node, it will > never allow remote update to succeed which sounds a bit odd. Jan has > shared this and similar concerns about this resolution method, so I > have added him to the email as well. > I can not think of a use case exactly in this context but it's very common to have such a use case while designing a distributed application with multiple clients. For example, when we are doing git push concurrently from multiple clients it is expected that the earliest commit wins. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Wed, Jun 5, 2024 at 7:29 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Tue, Jun 4, 2024 at 9:37 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > Can you share the use case of "earliest_timestamp_wins" resolution > > method? It seems after the initial update on the local node, it will > > never allow remote update to succeed which sounds a bit odd. Jan has > > shared this and similar concerns about this resolution method, so I > > have added him to the email as well. > > > I can not think of a use case exactly in this context but it's very > common to have such a use case while designing a distributed > application with multiple clients. For example, when we are doing git > push concurrently from multiple clients it is expected that the > earliest commit wins. > Okay, I think it mostly boils down to something like what Shveta mentioned where Inserts for a primary key can use "earliest_timestamp_wins" resolution method [1]. So, it seems useful to support this method as well. [1] - https://www.postgresql.org/message-id/CAJpy0uC4riK8e6hQt8jcU%2BnXYmRRjnbFEapYNbmxVYjENxTw2g%40mail.gmail.com -- With Regards, Amit Kapila.
On Thu, Jun 6, 2024 at 3:43 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Jun 5, 2024 at 7:29 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Tue, Jun 4, 2024 at 9:37 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > Can you share the use case of "earliest_timestamp_wins" resolution > > > method? It seems after the initial update on the local node, it will > > > never allow remote update to succeed which sounds a bit odd. Jan has > > > shared this and similar concerns about this resolution method, so I > > > have added him to the email as well. > > > > > I can not think of a use case exactly in this context but it's very > > common to have such a use case while designing a distributed > > application with multiple clients. For example, when we are doing git > > push concurrently from multiple clients it is expected that the > > earliest commit wins. > > > > Okay, I think it mostly boils down to something like what Shveta > mentioned where Inserts for a primary key can use > "earliest_timestamp_wins" resolution method [1]. So, it seems useful > to support this method as well. Correct, but we still need to think about how to make it work correctly in the presence of a clock skew as I mentioned in one of my previous emails. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Wed, Jun 5, 2024 at 7:29 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Tue, Jun 4, 2024 at 9:37 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > Can you share the use case of "earliest_timestamp_wins" resolution > > method? It seems after the initial update on the local node, it will > > never allow remote update to succeed which sounds a bit odd. Jan has > > shared this and similar concerns about this resolution method, so I > > have added him to the email as well. > > > I can not think of a use case exactly in this context but it's very > common to have such a use case while designing a distributed > application with multiple clients. For example, when we are doing git > push concurrently from multiple clients it is expected that the > earliest commit wins. > Here are more use cases of the "earliest_timestamp_wins" resolution method: 1) Applications where the record of first occurrence of an event is important. For example, sensor based applications like earthquake detection systems, capturing the first seismic wave's time is crucial. 2) Scheduling systems, like appointment booking, prioritize the earliest request when handling concurrent ones. 3) In contexts where maintaining chronological order is important - a) Social media platforms display comments ensuring that the earliest ones are visible first. b) Finance transaction processing systems rely on timestamps to prioritize the processing of transactions, ensuring that the earliest transaction is handled first -- Thanks, Nisha
On Thu, Jun 6, 2024 at 5:16 PM Nisha Moond <nisha.moond412@gmail.com> wrote:
>
Here are more use cases of the "earliest_timestamp_wins" resolution method:
1) Applications where the record of first occurrence of an event is
important. For example, sensor based applications like earthquake
detection systems, capturing the first seismic wave's time is crucial.
2) Scheduling systems, like appointment booking, prioritize the
earliest request when handling concurrent ones.
3) In contexts where maintaining chronological order is important -
a) Social media platforms display comments ensuring that the
earliest ones are visible first.
b) Finance transaction processing systems rely on timestamps to
prioritize the processing of transactions, ensuring that the earliest
transaction is handled first
Thanks for sharing examples. However, these scenarios would be handled by the application and not during replication. What we are discussing here is the timestamp when a row was updated/inserted/deleted (or rather when the transaction that updated row committed/became visible) and not a DML on column which is of type timestamp. Some implementations use a hidden timestamp column but that's different from a user column which captures timestamp of (say) an event. The conflict resolution will be based on the timestamp when that column's value was recorded in the database which may be different from the value of the column itself.
If we use the transaction commit timestamp as basis for resolution, a transaction where multiple rows conflict may end up with different rows affected by that transaction being resolved differently. Say three transactions T1, T2 and T3 on separate origins with timestamps t1, t2, and t3 respectively changed rows r1, r2 and r2, r3 and r1, r4 respectively. Changes to r1 and r2 will conflict. Let's say T2 and T3 are applied first and then T1 is applied. If t2 < t1 < t3, r1 will end up with version of T3 and r2 will end up with version of T1 after applying all the three transactions. Would that introduce an inconsistency between r1 and r2?
Best Wishes,
Ashutosh Bapat
On 5/27/24 07:48, shveta malik wrote: > On Sat, May 25, 2024 at 2:39 AM Tomas Vondra > <tomas.vondra@enterprisedb.com> wrote: >> >> On 5/23/24 08:36, shveta malik wrote: >>> Hello hackers, >>> >>> Please find the proposal for Conflict Detection and Resolution (CDR) >>> for Logical replication. >>> <Thanks to Nisha, Hou-San, and Amit who helped in figuring out the >>> below details.> >>> >>> Introduction >>> ================ >>> In case the node is subscribed to multiple providers, or when local >>> writes happen on a subscriber, conflicts can arise for the incoming >>> changes. CDR is the mechanism to automatically detect and resolve >>> these conflicts depending on the application and configurations. >>> CDR is not applicable for the initial table sync. If locally, there >>> exists conflicting data on the table, the table sync worker will fail. >>> Please find the details on CDR in apply worker for INSERT, UPDATE and >>> DELETE operations: >>> >> >> Which architecture are you aiming for? Here you talk about multiple >> providers, but the wiki page mentions active-active. I'm not sure how >> much this matters, but it might. > > Currently, we are working for multi providers case but ideally it > should work for active-active also. During further discussion and > implementation phase, if we find that, there are cases which will not > work in straight-forward way for active-active, then our primary focus > will remain to first implement it for multiple providers architecture. > >> >> Also, what kind of consistency you expect from this? Because none of >> these simple conflict resolution methods can give you the regular >> consistency models we're used to, AFAICS. > > Can you please explain a little bit more on this. > I was referring to the well established consistency models / isolation levels, e.g. READ COMMITTED or SNAPSHOT ISOLATION. This determines what guarantees the application developer can expect, what anomalies can happen, etc. I don't think any such isolation level can be implemented with a simple conflict resolution methods like last-update-wins etc. For example, consider an active-active where both nodes do UPDATE accounts SET balance=balance+1000 WHERE id=1 This will inevitably lead to a conflict, and while the last-update-wins resolves this "consistently" on both nodes (e.g. ending with the same result), it's essentially a lost update. This is a very simplistic example of course, I recall there are various more complex examples involving foreign keys, multi-table transactions, constraints, etc. But in principle it's a manifestation of the same inherent limitation of conflict detection and resolution etc. Similarly, I believe this affects not just active-active, but also the case where one node aggregates data from multiple publishers. Maybe not to the same extent / it might be fine for that use case, but you said the end goal is to use this for active-active. So I'm wondering what's the plan, there. If I'm writing an application for active-active using this conflict handling, what assumptions can I make? Will Can I just do stuff as if on a single node, or do I need to be super conscious about the zillion ways things can misbehave in a distributed system? My personal opinion is that the closer this will be to the regular consistency levels, the better. If past experience taught me anything, it's very hard to predict how distributed systems with eventual consistency behave, and even harder to actually test the application in such environment. In any case, if there are any differences compared to the usual behavior, it needs to be very clearly explained in the docs. >> >>> INSERT >>> ================ >>> To resolve INSERT conflict on subscriber, it is important to find out >>> the conflicting row (if any) before we attempt an insertion. The >>> indexes or search preference for the same will be: >>> First check for replica identity (RI) index. >>> - if not found, check for the primary key (PK) index. >>> - if not found, then check for unique indexes (individual ones or >>> added by unique constraints) >>> - if unique index also not found, skip CDR >>> >>> Note: if no RI index, PK, or unique index is found but >>> REPLICA_IDENTITY_FULL is defined, CDR will still be skipped. >>> The reason being that even though a row can be identified with >>> REPLICAT_IDENTITY_FULL, such tables are allowed to have duplicate >>> rows. Hence, we should not go for conflict detection in such a case. >>> >> >> It's not clear to me why would REPLICA_IDENTITY_FULL mean the table is >> allowed to have duplicate values? It just means the upstream is sending >> the whole original row, there can still be a PK/UNIQUE index on both the >> publisher and subscriber. > > Yes, right. Sorry for confusion. I meant the same i.e. in absence of > 'RI index, PK, or unique index', tables can have duplicates. So even > in presence of Replica-identity (FULL in this case) but in absence of > unique/primary index, CDR will be skipped for INSERT. > >> >>> In case of replica identity ‘nothing’ and in absence of any suitable >>> index (as defined above), CDR will be skipped for INSERT. >>> >>> Conflict Type: >>> ---------------- >>> insert_exists: A conflict is detected when the table has the same >>> value for a key column as the new value in the incoming row. >>> >>> Conflict Resolution >>> ---------------- >>> a) latest_timestamp_wins: The change with later commit timestamp wins. >>> b) earliest_timestamp_wins: The change with earlier commit timestamp wins. >>> c) apply: Always apply the remote change. >>> d) skip: Remote change is skipped. >>> e) error: Error out on conflict. Replication is stopped, manual >>> action is needed. >>> >> >> Why not to have some support for user-defined conflict resolution >> methods, allowing to do more complex stuff (e.g. merging the rows in >> some way, perhaps even with datatype-specific behavior)? > > Initially, for the sake of simplicity, we are targeting to support > built-in resolvers. But we have a plan to work on user-defined > resolvers as well. We shall propose that separately. > >> >>> The change will be converted to 'UPDATE' and applied if the decision >>> is in favor of applying remote change. >>> >>> It is important to have commit timestamp info available on subscriber >>> when latest_timestamp_wins or earliest_timestamp_wins method is chosen >>> as resolution method. Thus ‘track_commit_timestamp’ must be enabled >>> on subscriber, in absence of which, configuring the said >>> timestamp-based resolution methods will result in error. >>> >>> Note: If the user has chosen the latest or earliest_timestamp_wins, >>> and the remote and local timestamps are the same, then it will go by >>> system identifier. The change with a higher system identifier will >>> win. This will ensure that the same change is picked on all the nodes. >> >> How is this going to deal with the fact that commit LSN and timestamps >> may not correlate perfectly? That is, commits may happen with LSN1 < >> LSN2 but with T1 > T2. > > Are you pointing to the issue where a session/txn has taken > 'xactStopTimestamp' timestamp earlier but is delayed to insert record > in XLOG, while another session/txn which has taken timestamp slightly > later succeeded to insert the record IN XLOG sooner than the session1, > making LSN and Timestamps out of sync? Going by this scenario, the > commit-timestamp may not be reflective of actual commits and thus > timestamp-based resolvers may take wrong decisions. Or do you mean > something else? > > If this is the problem you are referring to, then I think this needs a > fix at the publisher side. Let me think more about it . Kindly let me > know if you have ideas on how to tackle it. > Yes, this is the issue I'm talking about. We're acquiring the timestamp when not holding the lock to reserve space in WAL, so the LSN and the commit LSN may not actually correlate. Consider this example I discussed with Amit last week: node A: XACT1: UPDATE t SET v = 1; LSN1 / T1 XACT2: UPDATE t SET v = 2; LSN2 / T2 node B XACT3: UPDATE t SET v = 3; LSN3 / T3 And assume LSN1 < LSN2, T1 > T2 (i.e. the commit timestamp inversion), and T2 < T3 < T1. Now consider that the messages may arrive in different orders, due to async replication. Unfortunately, this would lead to different results of the conflict resolution: XACT1 - XACT2 - XACT3 => v=3 (T3 wins) XACT3 - XACT1 - XACT2 => v=2 (T2 wins) Now, I realize there's a flaw in this example - the (T1 > T2) inversion can't actually happen, because these transactions have a dependency, and thus won't commit concurrently. XACT1 will complete the commit, because XACT2 starts to commit. And with monotonic clock (which is a requirement for any timestamp-based resolution), that should guarantee (T1 < T2). However, I doubt this is sufficient to declare victory. It's more likely that there still are problems, but the examples are likely more complex (changes to multiple tables, etc.). I vaguely remember there were more issues with timestamp inversion, but those might have been related to parallel apply etc. >>> >>> UPDATE >>> ================ >>> >>> Conflict Detection Method: >>> -------------------------------- >>> Origin conflict detection: The ‘origin’ info is used to detect >>> conflict which can be obtained from commit-timestamp generated for >>> incoming txn at the source node. To compare remote’s origin with the >>> local’s origin, we must have origin information for local txns as well >>> which can be obtained from commit-timestamp after enabling >>> ‘track_commit_timestamp’ locally. >>> The one drawback here is the ‘origin’ information cannot be obtained >>> once the row is frozen and the commit-timestamp info is removed by >>> vacuum. For a frozen row, conflicts cannot be raised, and thus the >>> incoming changes will be applied in all the cases. >>> >>> Conflict Types: >>> ---------------- >>> a) update_differ: The origin of an incoming update's key row differs >>> from the local row i.e.; the row has already been updated locally or >>> by different nodes. >>> b) update_missing: The row with the same value as that incoming >>> update's key does not exist. Remote is trying to update a row which >>> does not exist locally. >>> c) update_deleted: The row with the same value as that incoming >>> update's key does not exist. The row is already deleted. This conflict >>> type is generated only if the deleted row is still detectable i.e., it >>> is not removed by VACUUM yet. If the row is removed by VACUUM already, >>> it cannot detect this conflict. It will detect it as update_missing >>> and will follow the default or configured resolver of update_missing >>> itself. >>> >> >> I don't understand the why should update_missing or update_deleted be >> different, especially considering it's not detected reliably. And also >> that even if we happen to find the row the associated TOAST data may >> have already been removed. So why would this matter? > > Here, we are trying to tackle the case where the row is 'recently' > deleted i.e. concurrent UPDATE and DELETE on pub and sub. User may > want to opt for a different resolution in such a case as against the > one where the corresponding row was not even present in the first > place. The case where the row was deleted long back may not fall into > this category as there are higher chances that they have been removed > by vacuum and can be considered equivalent to the update_ missing > case. > My point is that if we can't detect the difference reliably, it's not very useful. Consider this example: Node A: T1: INSERT INTO t (id, value) VALUES (1,1); T2: DELETE FROM t WHERE id = 1; Node B: T3: UPDATE t SET value = 2 WHERE id = 1; The "correct" order of received messages on a third node is T1-T3-T2. But we may also see T1-T2-T3 and T3-T1-T2, e.g. due to network issues and so on. For T1-T2-T3 the right decision is to discard the update, while for T3-T1-T2 it's to either wait for the INSERT or wait for the insert to arrive. But if we misdetect the situation, we either end up with a row that shouldn't be there, or losing an update. > Regarding "TOAST column" for deleted row cases, we may need to dig > more. Thanks for bringing this case. Let me analyze more here. > >> >>> Conflict Resolutions: >>> ---------------- >>> a) latest_timestamp_wins: The change with later commit timestamp >>> wins. Can be used for ‘update_differ’. >>> b) earliest_timestamp_wins: The change with earlier commit >>> timestamp wins. Can be used for ‘update_differ’. >>> c) apply: The remote change is always applied. Can be used for >>> ‘update_differ’. >>> d) apply_or_skip: Remote change is converted to INSERT and is >>> applied. If the complete row cannot be constructed from the info >>> provided by the publisher, then the change is skipped. Can be used for >>> ‘update_missing’ or ‘update_deleted’. >>> e) apply_or_error: Remote change is converted to INSERT and is >>> applied. If the complete row cannot be constructed from the info >>> provided by the publisher, then error is raised. Can be used for >>> ‘update_missing’ or ‘update_deleted’. >>> f) skip: Remote change is skipped and local one is retained. Can be >>> used for any conflict type. >>> g) error: Error out of conflict. Replication is stopped, manual >>> action is needed. Can be used for any conflict type. >>> >>> To support UPDATE CDR, the presence of either replica identity Index >>> or primary key is required on target node. Update CDR will not be >>> supported in absence of replica identity index or primary key even >>> though REPLICA IDENTITY FULL is set. Please refer to "UPDATE" in >>> "Noteworthey Scenarios" section in [1] for further details. >>> >>> DELETE >>> ================ >>> Conflict Type: >>> ---------------- >>> delete_missing: An incoming delete is trying to delete a row on a >>> target node which does not exist. >>> >>> Conflict Resolutions: >>> ---------------- >>> a) error : Error out on conflict. Replication is stopped, manual >>> action is needed. >>> b) skip : The remote change is skipped. >>> >>> Configuring Conflict Resolution: >>> ------------------------------------------------ >>> There are two parts when it comes to configuring CDR: >>> >>> a) Enabling/Disabling conflict detection. >>> b) Configuring conflict resolvers for different conflict types. >>> >>> Users can sometimes create multiple subscriptions on the same node, >>> subscribing to different tables to improve replication performance by >>> starting multiple apply workers. If the tables in one subscription are >>> less likely to cause conflict, then it is possible that user may want >>> conflict detection disabled for that subscription to avoid detection >>> latency while enabling it for other subscriptions. This generates a >>> requirement to make ‘conflict detection’ configurable per >>> subscription. While the conflict resolver configuration can remain >>> global. All the subscriptions which opt for ‘conflict detection’ will >>> follow global conflict resolver configuration. >>> >>> To implement the above, subscription commands will be changed to have >>> one more parameter 'conflict_resolution=on/off', default will be OFF. >>> >>> To configure global resolvers, new DDL command will be introduced: >>> >>> CONFLICT RESOLVER ON <conflict_type> IS <conflict_resolver> >>> >> >> I very much doubt we want a single global conflict resolver, or even one >> resolver per subscription. It seems like a very table-specific thing. > > Even we thought about this. We feel that even if we go for table based > or subscription based resolvers configuration, there may be use case > scenarios where the user is not interested in configuring resolvers > for each table and thus may want to give global ones. Thus, we should > provide a way for users to do global configuration. Thus we started > with global one. I have noted your point here and would also like to > know the opinion of others. We are open to discussion. We can either > opt for any of these 2 options (global or table) or we can opt for > both global and table/sub based one. > I have no problem with a default / global conflict handler, as long as there's a way to override this per table. This is especially important for cases with custom conflict handler at table / column level. >> >> Also, doesn't all this whole design ignore the concurrency between >> publishers? Isn't this problematic considering the commit timestamps may >> go backwards (for a given publisher), which means the conflict >> resolution is not deterministic (as it depends on how exactly it >> interleaves)? >> >> >>> ------------------------- >>> >>> Apart from the above three main operations and resolver configuration, >>> there are more conflict types like primary-key updates, multiple >>> unique constraints etc and some special scenarios to be considered. >>> Complete design details can be found in [1]. >>> >>> [1]: https://wiki.postgresql.org/wiki/Conflict_Detection_and_Resolution >>> >> >> Hmmm, not sure it's good to have a "complete" design on wiki, and only >> some subset posted to the mailing list. I haven't compared what the >> differences are, though. > > It would have been difficult to mention all the details in email > (including examples and corner scenarios) and thus we thought that it > will be better to document everything in wiki page for the time being. > We can keep on discussing the design and all the scenarios on need > basis (before implementation phase of that part) and thus eventually > everything will come in email on hackers. With out first patch, we > plan to provide everything in a README as well. > The challenge with having this on wiki is that it's unlikely people will notice any changes made to the wiki. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 5/28/24 11:17, Nisha Moond wrote: > On Mon, May 27, 2024 at 11:19 AM shveta malik <shveta.malik@gmail.com> wrote: >> >> On Sat, May 25, 2024 at 2:39 AM Tomas Vondra >> <tomas.vondra@enterprisedb.com> wrote: >>> >>> ... >>> >>> I don't understand the why should update_missing or update_deleted be >>> different, especially considering it's not detected reliably. And also >>> that even if we happen to find the row the associated TOAST data may >>> have already been removed. So why would this matter? >> >> Here, we are trying to tackle the case where the row is 'recently' >> deleted i.e. concurrent UPDATE and DELETE on pub and sub. User may >> want to opt for a different resolution in such a case as against the >> one where the corresponding row was not even present in the first >> place. The case where the row was deleted long back may not fall into >> this category as there are higher chances that they have been removed >> by vacuum and can be considered equivalent to the update_ missing >> case. >> >> Regarding "TOAST column" for deleted row cases, we may need to dig >> more. Thanks for bringing this case. Let me analyze more here. >> > I tested a simple case with a table with one TOAST column and found > that when a tuple with a TOAST column is deleted, both the tuple and > corresponding pg_toast entries are marked as ‘deleted’ (dead) but not > removed immediately. The main tuple and respective pg_toast entry are > permanently deleted only during vacuum. First, the main table’s dead > tuples are vacuumed, followed by the secondary TOAST relation ones (if > available). > Please let us know if you have a specific scenario in mind where the > TOAST column data is deleted immediately upon ‘delete’ operation, > rather than during vacuum, which we are missing. > I'm pretty sure you can vacuum the TOAST table directly, which means you'll end up with a deleted tuple with TOAST pointers, but with the TOAST entries already gone. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 6/3/24 09:30, Amit Kapila wrote: > On Sat, May 25, 2024 at 2:39 AM Tomas Vondra > <tomas.vondra@enterprisedb.com> wrote: >> >> On 5/23/24 08:36, shveta malik wrote: >>> >>> Conflict Resolution >>> ---------------- >>> a) latest_timestamp_wins: The change with later commit timestamp wins. >>> b) earliest_timestamp_wins: The change with earlier commit timestamp wins. >>> c) apply: Always apply the remote change. >>> d) skip: Remote change is skipped. >>> e) error: Error out on conflict. Replication is stopped, manual >>> action is needed. >>> >> >> Why not to have some support for user-defined conflict resolution >> methods, allowing to do more complex stuff (e.g. merging the rows in >> some way, perhaps even with datatype-specific behavior)? >> >>> The change will be converted to 'UPDATE' and applied if the decision >>> is in favor of applying remote change. >>> >>> It is important to have commit timestamp info available on subscriber >>> when latest_timestamp_wins or earliest_timestamp_wins method is chosen >>> as resolution method. Thus ‘track_commit_timestamp’ must be enabled >>> on subscriber, in absence of which, configuring the said >>> timestamp-based resolution methods will result in error. >>> >>> Note: If the user has chosen the latest or earliest_timestamp_wins, >>> and the remote and local timestamps are the same, then it will go by >>> system identifier. The change with a higher system identifier will >>> win. This will ensure that the same change is picked on all the nodes. >> >> How is this going to deal with the fact that commit LSN and timestamps >> may not correlate perfectly? That is, commits may happen with LSN1 < >> LSN2 but with T1 > T2. >> > > One of the possible scenarios discussed at pgconf.dev with Tomas for > this was as follows: > > Say there are two publisher nodes PN1, PN2, and subscriber node SN3. > The logical replication is configured such that a subscription on SN3 > has publications from both PN1 and PN2. For example, SN3 (sub) -> PN1, > PN2 (p1, p2) > > Now, on PN1, we have the following operations that update the same row: > > T1 > Update-1 on table t1 at LSN1 (1000) on time (200) > > T2 > Update-2 on table t1 at LSN2 (2000) on time (100) > > Then in parallel, we have the following operation on node PN2 that > updates the same row as Update-1, and Update-2 on node PN1. > > T3 > Update-3 on table t1 at LSN(1500) on time (150) > > By theory, we can have a different state on subscribers depending on > the order of updates arriving at SN3 which shouldn't happen. Say, the > order in which they reach SN3 is: Update-1, Update-2, Update-3 then > the final row we have is by Update-3 considering we have configured > last_update_wins as a conflict resolution method. Now, consider the > other order: Update-1, Update-3, Update-2, in this case, the final > row will be by Update-2 because when we try to apply Update-3, it will > generate a conflict and as per the resolution method > (last_update_wins) we need to retain Update-1. > > On further thinking, the operations on node-1 PN-1 as defined above > seem impossible because one of the Updates needs to wait for the other > to write a commit record. So the commits may happen with LSN1 < LSN2 > but with T1 > T2 but they can't be on the same row due to locks. So, > the order of apply should still be consistent. Am, I missing > something? > Sorry, I should have read your message before responding a couple minutes ago. I think you're right this exact example can't happen, due to the dependency between transactions. But as I wrote, I'm not quite convinced this means there are not other issues with this way of resolving conflicts. It's more likely a more complex scenario is required. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Jun 7, 2024 at 5:39 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote: > > On Thu, Jun 6, 2024 at 5:16 PM Nisha Moond <nisha.moond412@gmail.com> wrote: >> >> > >> >> Here are more use cases of the "earliest_timestamp_wins" resolution method: >> 1) Applications where the record of first occurrence of an event is >> important. For example, sensor based applications like earthquake >> detection systems, capturing the first seismic wave's time is crucial. >> 2) Scheduling systems, like appointment booking, prioritize the >> earliest request when handling concurrent ones. >> 3) In contexts where maintaining chronological order is important - >> a) Social media platforms display comments ensuring that the >> earliest ones are visible first. >> b) Finance transaction processing systems rely on timestamps to >> prioritize the processing of transactions, ensuring that the earliest >> transaction is handled first > > > Thanks for sharing examples. However, these scenarios would be handled by the application and not during replication. Whatwe are discussing here is the timestamp when a row was updated/inserted/deleted (or rather when the transaction thatupdated row committed/became visible) and not a DML on column which is of type timestamp. Some implementations use ahidden timestamp column but that's different from a user column which captures timestamp of (say) an event. The conflictresolution will be based on the timestamp when that column's value was recorded in the database which may be differentfrom the value of the column itself. > It depends on how these operations are performed. For example, the appointment booking system could be prioritized via a transaction updating a row with columns emp_name, emp_id, reserved, time_slot. Now, if two employees at different geographical locations try to book the calendar, the earlier transaction will win. > If we use the transaction commit timestamp as basis for resolution, a transaction where multiple rows conflict may endup with different rows affected by that transaction being resolved differently. Say three transactions T1, T2 and T3 onseparate origins with timestamps t1, t2, and t3 respectively changed rows r1, r2 and r2, r3 and r1, r4 respectively. Changesto r1 and r2 will conflict. Let's say T2 and T3 are applied first and then T1 is applied. If t2 < t1 < t3, r1 willend up with version of T3 and r2 will end up with version of T1 after applying all the three transactions. > Are you telling the results based on latest_timestamp_wins? If so, then it is correct. OTOH, if the user has configured "earliest_timestamp_wins" resolution method, then we should end up with a version of r1 from T1 because t1 < t3. Also, due to the same reason, we should have version r2 from T2. > Would that introduce an inconsistency between r1 and r2? > As per my understanding, this shouldn't be an inconsistency. Won't it be true even when the transactions are performed on a single node with the same timing? -- With Regards, Amit Kapila.
On Fri, Jun 7, 2024 at 6:08 PM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > On 5/27/24 07:48, shveta malik wrote: > > On Sat, May 25, 2024 at 2:39 AM Tomas Vondra > > <tomas.vondra@enterprisedb.com> wrote: > >> > >> Which architecture are you aiming for? Here you talk about multiple > >> providers, but the wiki page mentions active-active. I'm not sure how > >> much this matters, but it might. > > > > Currently, we are working for multi providers case but ideally it > > should work for active-active also. During further discussion and > > implementation phase, if we find that, there are cases which will not > > work in straight-forward way for active-active, then our primary focus > > will remain to first implement it for multiple providers architecture. > > > >> > >> Also, what kind of consistency you expect from this? Because none of > >> these simple conflict resolution methods can give you the regular > >> consistency models we're used to, AFAICS. > > > > Can you please explain a little bit more on this. > > > > I was referring to the well established consistency models / isolation > levels, e.g. READ COMMITTED or SNAPSHOT ISOLATION. This determines what > guarantees the application developer can expect, what anomalies can > happen, etc. > > I don't think any such isolation level can be implemented with a simple > conflict resolution methods like last-update-wins etc. For example, > consider an active-active where both nodes do > > UPDATE accounts SET balance=balance+1000 WHERE id=1 > > This will inevitably lead to a conflict, and while the last-update-wins > resolves this "consistently" on both nodes (e.g. ending with the same > result), it's essentially a lost update. > The idea to solve such conflicts is using the delta apply technique where the delta from both sides will be applied to the respective columns. We do plan to target this as a separate patch. Now, if the basic conflict resolution and delta apply both can't go in one release, we shall document such cases clearly to avoid misuse of the feature. > This is a very simplistic example of course, I recall there are various > more complex examples involving foreign keys, multi-table transactions, > constraints, etc. But in principle it's a manifestation of the same > inherent limitation of conflict detection and resolution etc. > > Similarly, I believe this affects not just active-active, but also the > case where one node aggregates data from multiple publishers. Maybe not > to the same extent / it might be fine for that use case, > I am not sure how much it is a problem for general logical replication solution but we do intend to work on solving such problems in step-wise manner. Trying to attempt everything in one patch doesn't seem advisable to me. > but you said > the end goal is to use this for active-active. So I'm wondering what's > the plan, there. > I think at this stage we are not ready for active-active because leaving aside this feature we need many other features like replication of all commands/objects (DDL replication, replicate large objects, etc.), Global sequences, some sort of global two_phase transaction management for data consistency, etc. So, it would be better to consider logical replication cases intending to extend it for active-active when we have other required pieces. > If I'm writing an application for active-active using this conflict > handling, what assumptions can I make? Will Can I just do stuff as if on > a single node, or do I need to be super conscious about the zillion ways > things can misbehave in a distributed system? > > My personal opinion is that the closer this will be to the regular > consistency levels, the better. If past experience taught me anything, > it's very hard to predict how distributed systems with eventual > consistency behave, and even harder to actually test the application in > such environment. > I don't think in any way this can enable users to start writing applications for active-active workloads. For something like what you are saying, we probably need a global transaction manager (or a global two_pc) as well to allow transactions to behave as they are on single-node or achieve similar consistency levels. With such transaction management, we can allow transactions to commit on a node only when it doesn't lead to a conflict on the peer node. > In any case, if there are any differences compared to the usual > behavior, it needs to be very clearly explained in the docs. > I agree that docs should be clear about the cases that this can and can't support. > >> > >> How is this going to deal with the fact that commit LSN and timestamps > >> may not correlate perfectly? That is, commits may happen with LSN1 < > >> LSN2 but with T1 > T2. > > > > Are you pointing to the issue where a session/txn has taken > > 'xactStopTimestamp' timestamp earlier but is delayed to insert record > > in XLOG, while another session/txn which has taken timestamp slightly > > later succeeded to insert the record IN XLOG sooner than the session1, > > making LSN and Timestamps out of sync? Going by this scenario, the > > commit-timestamp may not be reflective of actual commits and thus > > timestamp-based resolvers may take wrong decisions. Or do you mean > > something else? > > > > If this is the problem you are referring to, then I think this needs a > > fix at the publisher side. Let me think more about it . Kindly let me > > know if you have ideas on how to tackle it. > > > > Yes, this is the issue I'm talking about. We're acquiring the timestamp > when not holding the lock to reserve space in WAL, so the LSN and the > commit LSN may not actually correlate. > > Consider this example I discussed with Amit last week: > > node A: > > XACT1: UPDATE t SET v = 1; LSN1 / T1 > > XACT2: UPDATE t SET v = 2; LSN2 / T2 > > node B > > XACT3: UPDATE t SET v = 3; LSN3 / T3 > > And assume LSN1 < LSN2, T1 > T2 (i.e. the commit timestamp inversion), > and T2 < T3 < T1. Now consider that the messages may arrive in different > orders, due to async replication. Unfortunately, this would lead to > different results of the conflict resolution: > > XACT1 - XACT2 - XACT3 => v=3 (T3 wins) > > XACT3 - XACT1 - XACT2 => v=2 (T2 wins) > > Now, I realize there's a flaw in this example - the (T1 > T2) inversion > can't actually happen, because these transactions have a dependency, and > thus won't commit concurrently. XACT1 will complete the commit, because > XACT2 starts to commit. And with monotonic clock (which is a requirement > for any timestamp-based resolution), that should guarantee (T1 < T2). > > However, I doubt this is sufficient to declare victory. It's more likely > that there still are problems, but the examples are likely more complex > (changes to multiple tables, etc.). > Fair enough, I think we need to analyze this more to find actual problems or in some way try to prove that there is no problem. > I vaguely remember there were more issues with timestamp inversion, but > those might have been related to parallel apply etc. > Okay, so considering there are problems due to timestamp inversion, I think the solution to that problem would probably be somehow generating commit LSN and timestamp in order. I don't have a solution at this stage but will think more both on the actual problem and solution. In the meantime, if you get a chance to refer to the place where you have seen such a problem please try to share the same with us. It would be helpful. -- With Regards, Amit Kapila.
On Fri, Jun 7, 2024 at 6:08 PM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > >>> > >>> UPDATE > >>> ================ > >>> > >>> Conflict Detection Method: > >>> -------------------------------- > >>> Origin conflict detection: The ‘origin’ info is used to detect > >>> conflict which can be obtained from commit-timestamp generated for > >>> incoming txn at the source node. To compare remote’s origin with the > >>> local’s origin, we must have origin information for local txns as well > >>> which can be obtained from commit-timestamp after enabling > >>> ‘track_commit_timestamp’ locally. > >>> The one drawback here is the ‘origin’ information cannot be obtained > >>> once the row is frozen and the commit-timestamp info is removed by > >>> vacuum. For a frozen row, conflicts cannot be raised, and thus the > >>> incoming changes will be applied in all the cases. > >>> > >>> Conflict Types: > >>> ---------------- > >>> a) update_differ: The origin of an incoming update's key row differs > >>> from the local row i.e.; the row has already been updated locally or > >>> by different nodes. > >>> b) update_missing: The row with the same value as that incoming > >>> update's key does not exist. Remote is trying to update a row which > >>> does not exist locally. > >>> c) update_deleted: The row with the same value as that incoming > >>> update's key does not exist. The row is already deleted. This conflict > >>> type is generated only if the deleted row is still detectable i.e., it > >>> is not removed by VACUUM yet. If the row is removed by VACUUM already, > >>> it cannot detect this conflict. It will detect it as update_missing > >>> and will follow the default or configured resolver of update_missing > >>> itself. > >>> > >> > >> I don't understand the why should update_missing or update_deleted be > >> different, especially considering it's not detected reliably. And also > >> that even if we happen to find the row the associated TOAST data may > >> have already been removed. So why would this matter? > > > > Here, we are trying to tackle the case where the row is 'recently' > > deleted i.e. concurrent UPDATE and DELETE on pub and sub. User may > > want to opt for a different resolution in such a case as against the > > one where the corresponding row was not even present in the first > > place. The case where the row was deleted long back may not fall into > > this category as there are higher chances that they have been removed > > by vacuum and can be considered equivalent to the update_ missing > > case. > > > > My point is that if we can't detect the difference reliably, it's not > very useful. Consider this example: > > Node A: > > T1: INSERT INTO t (id, value) VALUES (1,1); > > T2: DELETE FROM t WHERE id = 1; > > Node B: > > T3: UPDATE t SET value = 2 WHERE id = 1; > > The "correct" order of received messages on a third node is T1-T3-T2. > But we may also see T1-T2-T3 and T3-T1-T2, e.g. due to network issues > and so on. For T1-T2-T3 the right decision is to discard the update, > while for T3-T1-T2 it's to either wait for the INSERT or wait for the > insert to arrive. > > But if we misdetect the situation, we either end up with a row that > shouldn't be there, or losing an update. Doesn't the above example indicate that 'update_deleted' should also be considered a necessary conflict type? Please see the possibilities of conflicts in all three cases: The "correct" order of receiving messages on node C (as suggested above) is T1-T3-T2 (case1) ---------- T1 will insert the row. T3 will have update_differ conflict; latest_timestamp wins or apply will apply it. earliest_timestamp_wins or skip will skip it. T2 will delete the row (irrespective of whether the update happened or not). End Result: No Data. T1-T2-T3 ---------- T1 will insert the row. T2 will delete the row. T3 will have conflict update_deleted. If it is 'update_deleted', the chances are that the resolver set here is to 'skip' (default is also 'skip' in this case). If vacuum has deleted that row (or if we don't support 'update_deleted' conflict), it will be 'update_missing' conflict. In that case, the user may end up inserting the row if resolver chosen is in favor of apply (which seems an obvious choice for 'update_missing' conflict; default is also 'apply_or_skip'). End result: Row inserted with 'update_missing'. Row correctly skipped with 'update_deleted' (assuming the obvious choice seems to be 'skip' for update_deleted case). So it seems that with 'update_deleted' conflict, there are higher chances of opting for right decision here (which is to discard the update), as 'update_deleted' conveys correct info to the user. The 'update_missing' OTOH does not convey correct info and user may end up inserting the data by choosing apply favoring resolvers for 'update_missing'. Again, we get benefit of 'update_deleted' for *recently* deleted rows only. T3-T1-T2 ---------- T3 may end up inserting the record if the resolver is in favor of 'apply' and all the columns are received from remote. T1 will have' insert_exists' conflict and thus may either overwrite 'updated' values or may leave the data as is (based on whether resolver is in favor of apply or not) T2 will end up deleting it. End Result: No Data. I feel for second case (and similar cases), 'update_deleted' serves a better conflict type. thanks Shveta
On Fri, Jun 7, 2024 at 6:10 PM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > >>> I don't understand the why should update_missing or update_deleted be > >>> different, especially considering it's not detected reliably. And also > >>> that even if we happen to find the row the associated TOAST data may > >>> have already been removed. So why would this matter? > >> > >> Here, we are trying to tackle the case where the row is 'recently' > >> deleted i.e. concurrent UPDATE and DELETE on pub and sub. User may > >> want to opt for a different resolution in such a case as against the > >> one where the corresponding row was not even present in the first > >> place. The case where the row was deleted long back may not fall into > >> this category as there are higher chances that they have been removed > >> by vacuum and can be considered equivalent to the update_ missing > >> case. > >> > >> Regarding "TOAST column" for deleted row cases, we may need to dig > >> more. Thanks for bringing this case. Let me analyze more here. > >> > > I tested a simple case with a table with one TOAST column and found > > that when a tuple with a TOAST column is deleted, both the tuple and > > corresponding pg_toast entries are marked as ‘deleted’ (dead) but not > > removed immediately. The main tuple and respective pg_toast entry are > > permanently deleted only during vacuum. First, the main table’s dead > > tuples are vacuumed, followed by the secondary TOAST relation ones (if > > available). > > Please let us know if you have a specific scenario in mind where the > > TOAST column data is deleted immediately upon ‘delete’ operation, > > rather than during vacuum, which we are missing. > > > > I'm pretty sure you can vacuum the TOAST table directly, which means > you'll end up with a deleted tuple with TOAST pointers, but with the > TOAST entries already gone. > It is true that for a deleted row, its toast entries can be vacuumed earlier than the original/parent row, but we do not need to be concerned about that to raise 'update_deleted'. To raise an 'update_deleted' conflict, it is sufficient to know that the row has been deleted and not yet vacuumed, regardless of the presence or absence of its toast entries. Once this is determined, we need to build the tuple from remote data and apply it (provided resolver is such that). If the tuple cannot be fully constructed from the remote data, the apply operation will either be skipped or an error will be raised, depending on whether the user has chosen the apply_or_skip or apply_or_error option. In cases where the table has toast columns but the remote data does not include the toast-column entry (when the toast column is unmodified and not part of the replica identity), the resolution for 'update_deleted' will be no worse than for 'update_missing'. That is, for both the cases, we can not construct full tuple and thus the operation either needs to be skipped or error needs to be raised. thanks Shveta
On 6/10/24 10:54, Amit Kapila wrote: > On Fri, Jun 7, 2024 at 6:08 PM Tomas Vondra > <tomas.vondra@enterprisedb.com> wrote: >> >> On 5/27/24 07:48, shveta malik wrote: >>> On Sat, May 25, 2024 at 2:39 AM Tomas Vondra >>> <tomas.vondra@enterprisedb.com> wrote: >>>> >>>> Which architecture are you aiming for? Here you talk about multiple >>>> providers, but the wiki page mentions active-active. I'm not sure how >>>> much this matters, but it might. >>> >>> Currently, we are working for multi providers case but ideally it >>> should work for active-active also. During further discussion and >>> implementation phase, if we find that, there are cases which will not >>> work in straight-forward way for active-active, then our primary focus >>> will remain to first implement it for multiple providers architecture. >>> >>>> >>>> Also, what kind of consistency you expect from this? Because none of >>>> these simple conflict resolution methods can give you the regular >>>> consistency models we're used to, AFAICS. >>> >>> Can you please explain a little bit more on this. >>> >> >> I was referring to the well established consistency models / isolation >> levels, e.g. READ COMMITTED or SNAPSHOT ISOLATION. This determines what >> guarantees the application developer can expect, what anomalies can >> happen, etc. >> >> I don't think any such isolation level can be implemented with a simple >> conflict resolution methods like last-update-wins etc. For example, >> consider an active-active where both nodes do >> >> UPDATE accounts SET balance=balance+1000 WHERE id=1 >> >> This will inevitably lead to a conflict, and while the last-update-wins >> resolves this "consistently" on both nodes (e.g. ending with the same >> result), it's essentially a lost update. >> > > The idea to solve such conflicts is using the delta apply technique > where the delta from both sides will be applied to the respective > columns. We do plan to target this as a separate patch. Now, if the > basic conflict resolution and delta apply both can't go in one > release, we shall document such cases clearly to avoid misuse of the > feature. > Perhaps, but it's not like having delta conflict resolution (or even CRDT as a more generic variant) would lead to a regular consistency model in a distributed system. At least I don't think it can achieve that, because of the asynchronicity. Consider a table with "CHECK (amount < 1000)" constraint, and an update that sets (amount = amount + 900) on two nodes. AFAIK there's no way to reconcile this using delta (or any other other) conflict resolution. Which does not mean we should not have some form of conflict resolution, as long as we know what the goal is. I simply don't want to spend time working on this, add a lot of complex code, and then realize it doesn't give us a consistency model that makes sense. Which leads me back to my original question - what is the consistency model this you expect to get from this (possibly when combined with some other pieces?)? >> This is a very simplistic example of course, I recall there are various >> more complex examples involving foreign keys, multi-table transactions, >> constraints, etc. But in principle it's a manifestation of the same >> inherent limitation of conflict detection and resolution etc. >> >> Similarly, I believe this affects not just active-active, but also the >> case where one node aggregates data from multiple publishers. Maybe not >> to the same extent / it might be fine for that use case, >> > > I am not sure how much it is a problem for general logical replication > solution but we do intend to work on solving such problems in > step-wise manner. Trying to attempt everything in one patch doesn't > seem advisable to me. > I didn't say it needs to be done in one patch. I asked for someone to explain what is the goal - consistency model observed by the users. >> > but you said >> the end goal is to use this for active-active. So I'm wondering what's >> the plan, there. >> > > I think at this stage we are not ready for active-active because > leaving aside this feature we need many other features like > replication of all commands/objects (DDL replication, replicate large > objects, etc.), Global sequences, some sort of global two_phase > transaction management for data consistency, etc. So, it would be > better to consider logical replication cases intending to extend it > for active-active when we have other required pieces. > We're not ready for active-active, sure. And I'm not saying a conflict resolution would make us ready. The question is what consistency model we'd like to get from the active-active, and whether conflict resolution can get us there ... As for the other missing bits (DDL replication, large objects, global sequences), I think those are somewhat independent of the question I'm asking. And some of the stuff is also somewhat optional - for example I think it'd be fine to not support large objects or global sequences. >> If I'm writing an application for active-active using this conflict >> handling, what assumptions can I make? Will Can I just do stuff as if on >> a single node, or do I need to be super conscious about the zillion ways >> things can misbehave in a distributed system? >> >> My personal opinion is that the closer this will be to the regular >> consistency levels, the better. If past experience taught me anything, >> it's very hard to predict how distributed systems with eventual >> consistency behave, and even harder to actually test the application in >> such environment. >> > > I don't think in any way this can enable users to start writing > applications for active-active workloads. For something like what you > are saying, we probably need a global transaction manager (or a global > two_pc) as well to allow transactions to behave as they are on > single-node or achieve similar consistency levels. With such > transaction management, we can allow transactions to commit on a node > only when it doesn't lead to a conflict on the peer node. > But the wiki linked in the first message says: CDR is an important and necessary feature for active-active replication. But if I understand your response, you're saying active-active should probably use global transaction manager etc. which would prevent conflicts - but seems to make CDR unnecessary. Or do I understand it wrong? FWIW I don't think we'd need global components, there are ways to do distributed snapshots using timestamps (for example), which would give us snapshot isolation. >> In any case, if there are any differences compared to the usual >> behavior, it needs to be very clearly explained in the docs. >> > > I agree that docs should be clear about the cases that this can and > can't support. > >>>> >>>> How is this going to deal with the fact that commit LSN and timestamps >>>> may not correlate perfectly? That is, commits may happen with LSN1 < >>>> LSN2 but with T1 > T2. >>> >>> Are you pointing to the issue where a session/txn has taken >>> 'xactStopTimestamp' timestamp earlier but is delayed to insert record >>> in XLOG, while another session/txn which has taken timestamp slightly >>> later succeeded to insert the record IN XLOG sooner than the session1, >>> making LSN and Timestamps out of sync? Going by this scenario, the >>> commit-timestamp may not be reflective of actual commits and thus >>> timestamp-based resolvers may take wrong decisions. Or do you mean >>> something else? >>> >>> If this is the problem you are referring to, then I think this needs a >>> fix at the publisher side. Let me think more about it . Kindly let me >>> know if you have ideas on how to tackle it. >>> >> >> Yes, this is the issue I'm talking about. We're acquiring the timestamp >> when not holding the lock to reserve space in WAL, so the LSN and the >> commit LSN may not actually correlate. >> >> Consider this example I discussed with Amit last week: >> >> node A: >> >> XACT1: UPDATE t SET v = 1; LSN1 / T1 >> >> XACT2: UPDATE t SET v = 2; LSN2 / T2 >> >> node B >> >> XACT3: UPDATE t SET v = 3; LSN3 / T3 >> >> And assume LSN1 < LSN2, T1 > T2 (i.e. the commit timestamp inversion), >> and T2 < T3 < T1. Now consider that the messages may arrive in different >> orders, due to async replication. Unfortunately, this would lead to >> different results of the conflict resolution: >> >> XACT1 - XACT2 - XACT3 => v=3 (T3 wins) >> >> XACT3 - XACT1 - XACT2 => v=2 (T2 wins) >> >> Now, I realize there's a flaw in this example - the (T1 > T2) inversion >> can't actually happen, because these transactions have a dependency, and >> thus won't commit concurrently. XACT1 will complete the commit, because >> XACT2 starts to commit. And with monotonic clock (which is a requirement >> for any timestamp-based resolution), that should guarantee (T1 < T2). >> >> However, I doubt this is sufficient to declare victory. It's more likely >> that there still are problems, but the examples are likely more complex >> (changes to multiple tables, etc.). >> > > Fair enough, I think we need to analyze this more to find actual > problems or in some way try to prove that there is no problem. > >> I vaguely remember there were more issues with timestamp inversion, but >> those might have been related to parallel apply etc. >> > > Okay, so considering there are problems due to timestamp inversion, I > think the solution to that problem would probably be somehow > generating commit LSN and timestamp in order. I don't have a solution > at this stage but will think more both on the actual problem and > solution. In the meantime, if you get a chance to refer to the place > where you have seen such a problem please try to share the same with > us. It would be helpful. > I think the solution to this would be to acquire the timestamp while reserving the space (because that happens in LSN order). The clock would need to be monotonic (easy enough with CLOCK_MONOTONIC), but also cheap. AFAIK this is the main problem why it's being done outside the critical section, because gettimeofday() may be quite expensive. There's a concept of hybrid clock, combining "time" and logical counter, which I think might be useful independently of CDR ... -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 6/10/24 12:56, shveta malik wrote: > On Fri, Jun 7, 2024 at 6:08 PM Tomas Vondra > <tomas.vondra@enterprisedb.com> wrote: >> >>>>> >>>>> UPDATE >>>>> ================ >>>>> >>>>> Conflict Detection Method: >>>>> -------------------------------- >>>>> Origin conflict detection: The ‘origin’ info is used to detect >>>>> conflict which can be obtained from commit-timestamp generated for >>>>> incoming txn at the source node. To compare remote’s origin with the >>>>> local’s origin, we must have origin information for local txns as well >>>>> which can be obtained from commit-timestamp after enabling >>>>> ‘track_commit_timestamp’ locally. >>>>> The one drawback here is the ‘origin’ information cannot be obtained >>>>> once the row is frozen and the commit-timestamp info is removed by >>>>> vacuum. For a frozen row, conflicts cannot be raised, and thus the >>>>> incoming changes will be applied in all the cases. >>>>> >>>>> Conflict Types: >>>>> ---------------- >>>>> a) update_differ: The origin of an incoming update's key row differs >>>>> from the local row i.e.; the row has already been updated locally or >>>>> by different nodes. >>>>> b) update_missing: The row with the same value as that incoming >>>>> update's key does not exist. Remote is trying to update a row which >>>>> does not exist locally. >>>>> c) update_deleted: The row with the same value as that incoming >>>>> update's key does not exist. The row is already deleted. This conflict >>>>> type is generated only if the deleted row is still detectable i.e., it >>>>> is not removed by VACUUM yet. If the row is removed by VACUUM already, >>>>> it cannot detect this conflict. It will detect it as update_missing >>>>> and will follow the default or configured resolver of update_missing >>>>> itself. >>>>> >>>> >>>> I don't understand the why should update_missing or update_deleted be >>>> different, especially considering it's not detected reliably. And also >>>> that even if we happen to find the row the associated TOAST data may >>>> have already been removed. So why would this matter? >>> >>> Here, we are trying to tackle the case where the row is 'recently' >>> deleted i.e. concurrent UPDATE and DELETE on pub and sub. User may >>> want to opt for a different resolution in such a case as against the >>> one where the corresponding row was not even present in the first >>> place. The case where the row was deleted long back may not fall into >>> this category as there are higher chances that they have been removed >>> by vacuum and can be considered equivalent to the update_ missing >>> case. >>> >> >> My point is that if we can't detect the difference reliably, it's not >> very useful. Consider this example: >> >> Node A: >> >> T1: INSERT INTO t (id, value) VALUES (1,1); >> >> T2: DELETE FROM t WHERE id = 1; >> >> Node B: >> >> T3: UPDATE t SET value = 2 WHERE id = 1; >> >> The "correct" order of received messages on a third node is T1-T3-T2. >> But we may also see T1-T2-T3 and T3-T1-T2, e.g. due to network issues >> and so on. For T1-T2-T3 the right decision is to discard the update, >> while for T3-T1-T2 it's to either wait for the INSERT or wait for the >> insert to arrive. >> >> But if we misdetect the situation, we either end up with a row that >> shouldn't be there, or losing an update. > > Doesn't the above example indicate that 'update_deleted' should also > be considered a necessary conflict type? Please see the possibilities > of conflicts in all three cases: > > > The "correct" order of receiving messages on node C (as suggested > above) is T1-T3-T2 (case1) > ---------- > T1 will insert the row. > T3 will have update_differ conflict; latest_timestamp wins or apply > will apply it. earliest_timestamp_wins or skip will skip it. > T2 will delete the row (irrespective of whether the update happened or not). > End Result: No Data. > > T1-T2-T3 > ---------- > T1 will insert the row. > T2 will delete the row. > T3 will have conflict update_deleted. If it is 'update_deleted', the > chances are that the resolver set here is to 'skip' (default is also > 'skip' in this case). > > If vacuum has deleted that row (or if we don't support > 'update_deleted' conflict), it will be 'update_missing' conflict. In > that case, the user may end up inserting the row if resolver chosen is > in favor of apply (which seems an obvious choice for 'update_missing' > conflict; default is also 'apply_or_skip'). > > End result: > Row inserted with 'update_missing'. > Row correctly skipped with 'update_deleted' (assuming the obvious > choice seems to be 'skip' for update_deleted case). > > So it seems that with 'update_deleted' conflict, there are higher > chances of opting for right decision here (which is to discard the > update), as 'update_deleted' conveys correct info to the user. The > 'update_missing' OTOH does not convey correct info and user may end up > inserting the data by choosing apply favoring resolvers for > 'update_missing'. Again, we get benefit of 'update_deleted' for > *recently* deleted rows only. > > T3-T1-T2 > ---------- > T3 may end up inserting the record if the resolver is in favor of > 'apply' and all the columns are received from remote. > T1 will have' insert_exists' conflict and thus may either overwrite > 'updated' values or may leave the data as is (based on whether > resolver is in favor of apply or not) > T2 will end up deleting it. > End Result: No Data. > > I feel for second case (and similar cases), 'update_deleted' serves a > better conflict type. > True, but this is pretty much just a restatement of the example, right? The point I was trying to make is that this hinges on the ability to detect the correct conflict type. And if vacuum can swoop in and remove the recently deleted tuples (which I believe can happen at any time, right?), then that's not guaranteed, because we won't see the deleted tuple anymore. Or am I missing something? Also, can the resolver even convert the UPDATE into INSERT and proceed? Maybe with REPLICA IDENTITY FULL? Otherwise the row might be incomplete, missing required columns etc. In which case it'd have to wait for the actual INSERT to arrive - which would work for actual update_missing, where the row may be delayed due to network issues. But if that's a mistake due to vacuum removing the deleted tuple, it'll wait forever. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Jun 10, 2024 at 5:24 PM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > > > On 6/10/24 12:56, shveta malik wrote: > > On Fri, Jun 7, 2024 at 6:08 PM Tomas Vondra > > <tomas.vondra@enterprisedb.com> wrote: > >> > >>>>> > >>>>> UPDATE > >>>>> ================ > >>>>> > >>>>> Conflict Detection Method: > >>>>> -------------------------------- > >>>>> Origin conflict detection: The ‘origin’ info is used to detect > >>>>> conflict which can be obtained from commit-timestamp generated for > >>>>> incoming txn at the source node. To compare remote’s origin with the > >>>>> local’s origin, we must have origin information for local txns as well > >>>>> which can be obtained from commit-timestamp after enabling > >>>>> ‘track_commit_timestamp’ locally. > >>>>> The one drawback here is the ‘origin’ information cannot be obtained > >>>>> once the row is frozen and the commit-timestamp info is removed by > >>>>> vacuum. For a frozen row, conflicts cannot be raised, and thus the > >>>>> incoming changes will be applied in all the cases. > >>>>> > >>>>> Conflict Types: > >>>>> ---------------- > >>>>> a) update_differ: The origin of an incoming update's key row differs > >>>>> from the local row i.e.; the row has already been updated locally or > >>>>> by different nodes. > >>>>> b) update_missing: The row with the same value as that incoming > >>>>> update's key does not exist. Remote is trying to update a row which > >>>>> does not exist locally. > >>>>> c) update_deleted: The row with the same value as that incoming > >>>>> update's key does not exist. The row is already deleted. This conflict > >>>>> type is generated only if the deleted row is still detectable i.e., it > >>>>> is not removed by VACUUM yet. If the row is removed by VACUUM already, > >>>>> it cannot detect this conflict. It will detect it as update_missing > >>>>> and will follow the default or configured resolver of update_missing > >>>>> itself. > >>>>> > >>>> > >>>> I don't understand the why should update_missing or update_deleted be > >>>> different, especially considering it's not detected reliably. And also > >>>> that even if we happen to find the row the associated TOAST data may > >>>> have already been removed. So why would this matter? > >>> > >>> Here, we are trying to tackle the case where the row is 'recently' > >>> deleted i.e. concurrent UPDATE and DELETE on pub and sub. User may > >>> want to opt for a different resolution in such a case as against the > >>> one where the corresponding row was not even present in the first > >>> place. The case where the row was deleted long back may not fall into > >>> this category as there are higher chances that they have been removed > >>> by vacuum and can be considered equivalent to the update_ missing > >>> case. > >>> > >> > >> My point is that if we can't detect the difference reliably, it's not > >> very useful. Consider this example: > >> > >> Node A: > >> > >> T1: INSERT INTO t (id, value) VALUES (1,1); > >> > >> T2: DELETE FROM t WHERE id = 1; > >> > >> Node B: > >> > >> T3: UPDATE t SET value = 2 WHERE id = 1; > >> > >> The "correct" order of received messages on a third node is T1-T3-T2. > >> But we may also see T1-T2-T3 and T3-T1-T2, e.g. due to network issues > >> and so on. For T1-T2-T3 the right decision is to discard the update, > >> while for T3-T1-T2 it's to either wait for the INSERT or wait for the > >> insert to arrive. > >> > >> But if we misdetect the situation, we either end up with a row that > >> shouldn't be there, or losing an update. > > > > Doesn't the above example indicate that 'update_deleted' should also > > be considered a necessary conflict type? Please see the possibilities > > of conflicts in all three cases: > > > > > > The "correct" order of receiving messages on node C (as suggested > > above) is T1-T3-T2 (case1) > > ---------- > > T1 will insert the row. > > T3 will have update_differ conflict; latest_timestamp wins or apply > > will apply it. earliest_timestamp_wins or skip will skip it. > > T2 will delete the row (irrespective of whether the update happened or not). > > End Result: No Data. > > > > T1-T2-T3 > > ---------- > > T1 will insert the row. > > T2 will delete the row. > > T3 will have conflict update_deleted. If it is 'update_deleted', the > > chances are that the resolver set here is to 'skip' (default is also > > 'skip' in this case). > > > > If vacuum has deleted that row (or if we don't support > > 'update_deleted' conflict), it will be 'update_missing' conflict. In > > that case, the user may end up inserting the row if resolver chosen is > > in favor of apply (which seems an obvious choice for 'update_missing' > > conflict; default is also 'apply_or_skip'). > > > > End result: > > Row inserted with 'update_missing'. > > Row correctly skipped with 'update_deleted' (assuming the obvious > > choice seems to be 'skip' for update_deleted case). > > > > So it seems that with 'update_deleted' conflict, there are higher > > chances of opting for right decision here (which is to discard the > > update), as 'update_deleted' conveys correct info to the user. The > > 'update_missing' OTOH does not convey correct info and user may end up > > inserting the data by choosing apply favoring resolvers for > > 'update_missing'. Again, we get benefit of 'update_deleted' for > > *recently* deleted rows only. > > > > T3-T1-T2 > > ---------- > > T3 may end up inserting the record if the resolver is in favor of > > 'apply' and all the columns are received from remote. > > T1 will have' insert_exists' conflict and thus may either overwrite > > 'updated' values or may leave the data as is (based on whether > > resolver is in favor of apply or not) > > T2 will end up deleting it. > > End Result: No Data. > > > > I feel for second case (and similar cases), 'update_deleted' serves a > > better conflict type. > > > > True, but this is pretty much just a restatement of the example, right? > > The point I was trying to make is that this hinges on the ability to > detect the correct conflict type. And if vacuum can swoop in and remove > the recently deleted tuples (which I believe can happen at any time, > right?), then that's not guaranteed, because we won't see the deleted > tuple anymore. Yes, that's correct. However, many cases could benefit from the update_deleted conflict type if it can be implemented reliably. That's why we wanted to give it a try. But if we can't achieve predictable results with it, I'm fine to drop this approach and conflict_type. We can consider a better design in the future that doesn't depend on non-vacuumed entries and provides a more robust method for identifying deleted rows. > Also, can the resolver even convert the UPDATE into INSERT and proceed? > Maybe with REPLICA IDENTITY FULL? Yes, it can, as long as the row doesn't contain toasted data. Without toasted data, the new tuple is fully logged. However, if the row does contain toasted data, the new tuple won't log it completely. In such a case, REPLICA IDENTITY FULL becomes a requirement to ensure we have all the data necessary to create the row on the target side. In absence of RI full and with row lacking toasted data, the operation will be skipped or error will be raised. > Otherwise the row might be incomplete, > missing required columns etc. In which case it'd have to wait for the > actual INSERT to arrive - which would work for actual update_missing, > where the row may be delayed due to network issues. But if that's a > mistake due to vacuum removing the deleted tuple, it'll wait forever. Even in case of 'update_missing', we do not intend to wait for 'actual insert' to arrive, as it is not guaranteed if the 'insert' will arrive or not. And thus we plan to skip or error out (based on user's configuration) if a complete row can not be created for insertion. thanks Shveta
On Sat, Jun 8, 2024 at 3:52 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Jun 7, 2024 at 5:39 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> On Thu, Jun 6, 2024 at 5:16 PM Nisha Moond <nisha.moond412@gmail.com> wrote:
>>
>> >
>>
>> Here are more use cases of the "earliest_timestamp_wins" resolution method:
>> 1) Applications where the record of first occurrence of an event is
>> important. For example, sensor based applications like earthquake
>> detection systems, capturing the first seismic wave's time is crucial.
>> 2) Scheduling systems, like appointment booking, prioritize the
>> earliest request when handling concurrent ones.
>> 3) In contexts where maintaining chronological order is important -
>> a) Social media platforms display comments ensuring that the
>> earliest ones are visible first.
>> b) Finance transaction processing systems rely on timestamps to
>> prioritize the processing of transactions, ensuring that the earliest
>> transaction is handled first
>
>
> Thanks for sharing examples. However, these scenarios would be handled by the application and not during replication. What we are discussing here is the timestamp when a row was updated/inserted/deleted (or rather when the transaction that updated row committed/became visible) and not a DML on column which is of type timestamp. Some implementations use a hidden timestamp column but that's different from a user column which captures timestamp of (say) an event. The conflict resolution will be based on the timestamp when that column's value was recorded in the database which may be different from the value of the column itself.
>
It depends on how these operations are performed. For example, the
appointment booking system could be prioritized via a transaction
updating a row with columns emp_name, emp_id, reserved, time_slot.
Now, if two employees at different geographical locations try to book
the calendar, the earlier transaction will win.
I doubt that it would be that simple. The application will have to intervene and tell one of the employees that their reservation has failed. It looks natural that the first one to reserve the room should get the reservation, but implementing that is more complex than resolving a conflict in the database. In fact, mostly it will be handled outside database.
> If we use the transaction commit timestamp as basis for resolution, a transaction where multiple rows conflict may end up with different rows affected by that transaction being resolved differently. Say three transactions T1, T2 and T3 on separate origins with timestamps t1, t2, and t3 respectively changed rows r1, r2 and r2, r3 and r1, r4 respectively. Changes to r1 and r2 will conflict. Let's say T2 and T3 are applied first and then T1 is applied. If t2 < t1 < t3, r1 will end up with version of T3 and r2 will end up with version of T1 after applying all the three transactions.
>
Are you telling the results based on latest_timestamp_wins? If so,
then it is correct. OTOH, if the user has configured
"earliest_timestamp_wins" resolution method, then we should end up
with a version of r1 from T1 because t1 < t3. Also, due to the same
reason, we should have version r2 from T2.
>
Would that introduce an inconsistency between r1 and r2?
>
As per my understanding, this shouldn't be an inconsistency. Won't it
be true even when the transactions are performed on a single node with
the same timing?
The inconsistency will arise irrespective of conflict resolution method. On a single system effects of whichever transaction runs last will be visible entirely. But in the example above the node where T1, T2, and T3 (from *different*) origins) are applied, we might end up with a situation where some changes from T1 are applied whereas some changes from T3 are applied.
Best Wishes,
Ashutosh Bapat
On 6/11/24 10:35, shveta malik wrote: > On Mon, Jun 10, 2024 at 5:24 PM Tomas Vondra > <tomas.vondra@enterprisedb.com> wrote: >> >> >> >> On 6/10/24 12:56, shveta malik wrote: >>> On Fri, Jun 7, 2024 at 6:08 PM Tomas Vondra >>> <tomas.vondra@enterprisedb.com> wrote: >>>> >>>>>>> >>>>>>> UPDATE >>>>>>> ================ >>>>>>> >>>>>>> Conflict Detection Method: >>>>>>> -------------------------------- >>>>>>> Origin conflict detection: The ‘origin’ info is used to detect >>>>>>> conflict which can be obtained from commit-timestamp generated for >>>>>>> incoming txn at the source node. To compare remote’s origin with the >>>>>>> local’s origin, we must have origin information for local txns as well >>>>>>> which can be obtained from commit-timestamp after enabling >>>>>>> ‘track_commit_timestamp’ locally. >>>>>>> The one drawback here is the ‘origin’ information cannot be obtained >>>>>>> once the row is frozen and the commit-timestamp info is removed by >>>>>>> vacuum. For a frozen row, conflicts cannot be raised, and thus the >>>>>>> incoming changes will be applied in all the cases. >>>>>>> >>>>>>> Conflict Types: >>>>>>> ---------------- >>>>>>> a) update_differ: The origin of an incoming update's key row differs >>>>>>> from the local row i.e.; the row has already been updated locally or >>>>>>> by different nodes. >>>>>>> b) update_missing: The row with the same value as that incoming >>>>>>> update's key does not exist. Remote is trying to update a row which >>>>>>> does not exist locally. >>>>>>> c) update_deleted: The row with the same value as that incoming >>>>>>> update's key does not exist. The row is already deleted. This conflict >>>>>>> type is generated only if the deleted row is still detectable i.e., it >>>>>>> is not removed by VACUUM yet. If the row is removed by VACUUM already, >>>>>>> it cannot detect this conflict. It will detect it as update_missing >>>>>>> and will follow the default or configured resolver of update_missing >>>>>>> itself. >>>>>>> >>>>>> >>>>>> I don't understand the why should update_missing or update_deleted be >>>>>> different, especially considering it's not detected reliably. And also >>>>>> that even if we happen to find the row the associated TOAST data may >>>>>> have already been removed. So why would this matter? >>>>> >>>>> Here, we are trying to tackle the case where the row is 'recently' >>>>> deleted i.e. concurrent UPDATE and DELETE on pub and sub. User may >>>>> want to opt for a different resolution in such a case as against the >>>>> one where the corresponding row was not even present in the first >>>>> place. The case where the row was deleted long back may not fall into >>>>> this category as there are higher chances that they have been removed >>>>> by vacuum and can be considered equivalent to the update_ missing >>>>> case. >>>>> >>>> >>>> My point is that if we can't detect the difference reliably, it's not >>>> very useful. Consider this example: >>>> >>>> Node A: >>>> >>>> T1: INSERT INTO t (id, value) VALUES (1,1); >>>> >>>> T2: DELETE FROM t WHERE id = 1; >>>> >>>> Node B: >>>> >>>> T3: UPDATE t SET value = 2 WHERE id = 1; >>>> >>>> The "correct" order of received messages on a third node is T1-T3-T2. >>>> But we may also see T1-T2-T3 and T3-T1-T2, e.g. due to network issues >>>> and so on. For T1-T2-T3 the right decision is to discard the update, >>>> while for T3-T1-T2 it's to either wait for the INSERT or wait for the >>>> insert to arrive. >>>> >>>> But if we misdetect the situation, we either end up with a row that >>>> shouldn't be there, or losing an update. >>> >>> Doesn't the above example indicate that 'update_deleted' should also >>> be considered a necessary conflict type? Please see the possibilities >>> of conflicts in all three cases: >>> >>> >>> The "correct" order of receiving messages on node C (as suggested >>> above) is T1-T3-T2 (case1) >>> ---------- >>> T1 will insert the row. >>> T3 will have update_differ conflict; latest_timestamp wins or apply >>> will apply it. earliest_timestamp_wins or skip will skip it. >>> T2 will delete the row (irrespective of whether the update happened or not). >>> End Result: No Data. >>> >>> T1-T2-T3 >>> ---------- >>> T1 will insert the row. >>> T2 will delete the row. >>> T3 will have conflict update_deleted. If it is 'update_deleted', the >>> chances are that the resolver set here is to 'skip' (default is also >>> 'skip' in this case). >>> >>> If vacuum has deleted that row (or if we don't support >>> 'update_deleted' conflict), it will be 'update_missing' conflict. In >>> that case, the user may end up inserting the row if resolver chosen is >>> in favor of apply (which seems an obvious choice for 'update_missing' >>> conflict; default is also 'apply_or_skip'). >>> >>> End result: >>> Row inserted with 'update_missing'. >>> Row correctly skipped with 'update_deleted' (assuming the obvious >>> choice seems to be 'skip' for update_deleted case). >>> >>> So it seems that with 'update_deleted' conflict, there are higher >>> chances of opting for right decision here (which is to discard the >>> update), as 'update_deleted' conveys correct info to the user. The >>> 'update_missing' OTOH does not convey correct info and user may end up >>> inserting the data by choosing apply favoring resolvers for >>> 'update_missing'. Again, we get benefit of 'update_deleted' for >>> *recently* deleted rows only. >>> >>> T3-T1-T2 >>> ---------- >>> T3 may end up inserting the record if the resolver is in favor of >>> 'apply' and all the columns are received from remote. >>> T1 will have' insert_exists' conflict and thus may either overwrite >>> 'updated' values or may leave the data as is (based on whether >>> resolver is in favor of apply or not) >>> T2 will end up deleting it. >>> End Result: No Data. >>> >>> I feel for second case (and similar cases), 'update_deleted' serves a >>> better conflict type. >>> >> >> True, but this is pretty much just a restatement of the example, right? >> >> The point I was trying to make is that this hinges on the ability to >> detect the correct conflict type. And if vacuum can swoop in and remove >> the recently deleted tuples (which I believe can happen at any time, >> right?), then that's not guaranteed, because we won't see the deleted >> tuple anymore. > > Yes, that's correct. However, many cases could benefit from the > update_deleted conflict type if it can be implemented reliably. That's > why we wanted to give it a try. But if we can't achieve predictable > results with it, I'm fine to drop this approach and conflict_type. We > can consider a better design in the future that doesn't depend on > non-vacuumed entries and provides a more robust method for identifying > deleted rows. > I agree having a separate update_deleted conflict would be beneficial, I'm not arguing against that - my point is actually that I think this conflict type is required, and that it needs to be detected reliably. I'm not sure dropping update_deleted entirely would be a good idea, though. It pretty much guarantees making the wrong decision at least sometimes. But at least it's predictable and users are more likely to notice that (compared to update_delete working on well-behaving systems, and then failing when a node starts lagging or something). That's my opinion, though, and I don't intend to stay in the way. But I think the solution is not that difficult - something needs to prevent cleanup of recently dead tuples (until the "relevant" changes are received and applied from other nodes). I don't know if that could be done based on information we have for subscriptions, or if we need something new. >> Also, can the resolver even convert the UPDATE into INSERT and proceed? >> Maybe with REPLICA IDENTITY FULL? > > Yes, it can, as long as the row doesn't contain toasted data. Without > toasted data, the new tuple is fully logged. However, if the row does > contain toasted data, the new tuple won't log it completely. In such a > case, REPLICA IDENTITY FULL becomes a requirement to ensure we have > all the data necessary to create the row on the target side. In > absence of RI full and with row lacking toasted data, the operation > will be skipped or error will be raised. > >> Otherwise the row might be incomplete, >> missing required columns etc. In which case it'd have to wait for the >> actual INSERT to arrive - which would work for actual update_missing, >> where the row may be delayed due to network issues. But if that's a >> mistake due to vacuum removing the deleted tuple, it'll wait forever. > > Even in case of 'update_missing', we do not intend to wait for 'actual > insert' to arrive, as it is not guaranteed if the 'insert' will arrive > or not. And thus we plan to skip or error out (based on user's > configuration) if a complete row can not be created for insertion. > If the UPDATE contains all the columns and can be turned into an INSERT, then that seems reasonable. But I don't see how skipping it could work in general (except for some very simple / specific use cases). I'm not sure if you suggest to skip just the one UPDATE or transaction as a whole, but it seems to me either of those options could easily lead to all kinds of inconsistencies and user confusion. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Jun 11, 2024 at 7:44 PM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > Yes, that's correct. However, many cases could benefit from the > > update_deleted conflict type if it can be implemented reliably. That's > > why we wanted to give it a try. But if we can't achieve predictable > > results with it, I'm fine to drop this approach and conflict_type. We > > can consider a better design in the future that doesn't depend on > > non-vacuumed entries and provides a more robust method for identifying > > deleted rows. > > > > I agree having a separate update_deleted conflict would be beneficial, > I'm not arguing against that - my point is actually that I think this > conflict type is required, and that it needs to be detected reliably. > When working with a distributed system, we must accept some form of eventual consistency model. However, it's essential to design a predictable and acceptable behavior. For example, if a change is a result of a previous operation (such as an update on node B triggered after observing an operation on node A), we can say that the operation on node A happened before the operation on node B. Conversely, if operations on nodes A and B are independent, we consider them concurrent. In distributed systems, clock skew is a known issue. To establish a consistency model, we need to ensure it guarantees the "happens-before" relationship. Consider a scenario with three nodes: NodeA, NodeB, and NodeC. If NodeA sends changes to NodeB, and subsequently NodeB makes changes, and then both NodeA's and NodeB's changes are sent to NodeC, the clock skew might make NodeB's changes appear to have occurred before NodeA's changes. However, we should maintain data that indicates NodeB's changes were triggered after NodeA's changes arrived at NodeB. This implies that logically, NodeB's changes happened after NodeA's changes, despite what the timestamps suggest. A common method to handle such cases is using vector clocks for conflict resolution. "Vector clocks" allow us to track the causal relationships between changes across nodes, ensuring that we can correctly order events and resolve conflicts in a manner that respects the "happens-before" relationship. This method helps maintain consistency and predictability in the system despite issues like clock skew. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Mon, Jun 10, 2024 at 5:12 PM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > On 6/10/24 10:54, Amit Kapila wrote: > > On Fri, Jun 7, 2024 at 6:08 PM Tomas Vondra > > <tomas.vondra@enterprisedb.com> wrote: > >> > >> On 5/27/24 07:48, shveta malik wrote: > >>> On Sat, May 25, 2024 at 2:39 AM Tomas Vondra > >>> <tomas.vondra@enterprisedb.com> wrote: > >>>> > >>>> Which architecture are you aiming for? Here you talk about multiple > >>>> providers, but the wiki page mentions active-active. I'm not sure how > >>>> much this matters, but it might. > >>> > >>> Currently, we are working for multi providers case but ideally it > >>> should work for active-active also. During further discussion and > >>> implementation phase, if we find that, there are cases which will not > >>> work in straight-forward way for active-active, then our primary focus > >>> will remain to first implement it for multiple providers architecture. > >>> > >>>> > >>>> Also, what kind of consistency you expect from this? Because none of > >>>> these simple conflict resolution methods can give you the regular > >>>> consistency models we're used to, AFAICS. > >>> > >>> Can you please explain a little bit more on this. > >>> > >> > >> I was referring to the well established consistency models / isolation > >> levels, e.g. READ COMMITTED or SNAPSHOT ISOLATION. This determines what > >> guarantees the application developer can expect, what anomalies can > >> happen, etc. > >> > >> I don't think any such isolation level can be implemented with a simple > >> conflict resolution methods like last-update-wins etc. For example, > >> consider an active-active where both nodes do > >> > >> UPDATE accounts SET balance=balance+1000 WHERE id=1 > >> > >> This will inevitably lead to a conflict, and while the last-update-wins > >> resolves this "consistently" on both nodes (e.g. ending with the same > >> result), it's essentially a lost update. > >> > > > > The idea to solve such conflicts is using the delta apply technique > > where the delta from both sides will be applied to the respective > > columns. We do plan to target this as a separate patch. Now, if the > > basic conflict resolution and delta apply both can't go in one > > release, we shall document such cases clearly to avoid misuse of the > > feature. > > > > Perhaps, but it's not like having delta conflict resolution (or even > CRDT as a more generic variant) would lead to a regular consistency > model in a distributed system. At least I don't think it can achieve > that, because of the asynchronicity. > > Consider a table with "CHECK (amount < 1000)" constraint, and an update > that sets (amount = amount + 900) on two nodes. AFAIK there's no way to > reconcile this using delta (or any other other) conflict resolution. > Right, in such a case an error will be generated and I agree that we can't always reconcile the updates on different nodes and some data loss is unavoidable with or without conflict resolution. > Which does not mean we should not have some form of conflict resolution, > as long as we know what the goal is. I simply don't want to spend time > working on this, add a lot of complex code, and then realize it doesn't > give us a consistency model that makes sense. > > Which leads me back to my original question - what is the consistency > model this you expect to get from this (possibly when combined with some > other pieces?)? > I don't think this feature per se (or some additional features like delta apply) can help with improving/changing the consistency model our current logical replication module provides (which as per my understanding is an eventual consistency model). This feature will help with reducing the number of cases where manual intervention is required with configurable way to resolve conflicts. For example, for primary key violation ERRORs, or when we intentionally overwrite the data even when there is conflicting data present from different origin, or for cases we simply skip the remote data when there is a conflict in the local node. To achieve consistent reads on all nodes we either need a distributed transaction using a two-phase commit with some sort of quorum protocol, or a sharded database with multiple primaries each responsible for a unique partition of the data, or some other way. The current proposal doesn't intend to implement any of those. -- With Regards, Amit Kapila.
On Tue, Jun 11, 2024 at 7:44 PM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > > > On 6/11/24 10:35, shveta malik wrote: > > On Mon, Jun 10, 2024 at 5:24 PM Tomas Vondra > > <tomas.vondra@enterprisedb.com> wrote: > >> > >> > >> > >> On 6/10/24 12:56, shveta malik wrote: > >>> On Fri, Jun 7, 2024 at 6:08 PM Tomas Vondra > >>> <tomas.vondra@enterprisedb.com> wrote: > >>>> > >>>>>>> > >>>>>>> UPDATE > >>>>>>> ================ > >>>>>>> > >>>>>>> Conflict Detection Method: > >>>>>>> -------------------------------- > >>>>>>> Origin conflict detection: The ‘origin’ info is used to detect > >>>>>>> conflict which can be obtained from commit-timestamp generated for > >>>>>>> incoming txn at the source node. To compare remote’s origin with the > >>>>>>> local’s origin, we must have origin information for local txns as well > >>>>>>> which can be obtained from commit-timestamp after enabling > >>>>>>> ‘track_commit_timestamp’ locally. > >>>>>>> The one drawback here is the ‘origin’ information cannot be obtained > >>>>>>> once the row is frozen and the commit-timestamp info is removed by > >>>>>>> vacuum. For a frozen row, conflicts cannot be raised, and thus the > >>>>>>> incoming changes will be applied in all the cases. > >>>>>>> > >>>>>>> Conflict Types: > >>>>>>> ---------------- > >>>>>>> a) update_differ: The origin of an incoming update's key row differs > >>>>>>> from the local row i.e.; the row has already been updated locally or > >>>>>>> by different nodes. > >>>>>>> b) update_missing: The row with the same value as that incoming > >>>>>>> update's key does not exist. Remote is trying to update a row which > >>>>>>> does not exist locally. > >>>>>>> c) update_deleted: The row with the same value as that incoming > >>>>>>> update's key does not exist. The row is already deleted. This conflict > >>>>>>> type is generated only if the deleted row is still detectable i.e., it > >>>>>>> is not removed by VACUUM yet. If the row is removed by VACUUM already, > >>>>>>> it cannot detect this conflict. It will detect it as update_missing > >>>>>>> and will follow the default or configured resolver of update_missing > >>>>>>> itself. > >>>>>>> > >>>>>> > >>>>>> I don't understand the why should update_missing or update_deleted be > >>>>>> different, especially considering it's not detected reliably. And also > >>>>>> that even if we happen to find the row the associated TOAST data may > >>>>>> have already been removed. So why would this matter? > >>>>> > >>>>> Here, we are trying to tackle the case where the row is 'recently' > >>>>> deleted i.e. concurrent UPDATE and DELETE on pub and sub. User may > >>>>> want to opt for a different resolution in such a case as against the > >>>>> one where the corresponding row was not even present in the first > >>>>> place. The case where the row was deleted long back may not fall into > >>>>> this category as there are higher chances that they have been removed > >>>>> by vacuum and can be considered equivalent to the update_ missing > >>>>> case. > >>>>> > >>>> > >>>> My point is that if we can't detect the difference reliably, it's not > >>>> very useful. Consider this example: > >>>> > >>>> Node A: > >>>> > >>>> T1: INSERT INTO t (id, value) VALUES (1,1); > >>>> > >>>> T2: DELETE FROM t WHERE id = 1; > >>>> > >>>> Node B: > >>>> > >>>> T3: UPDATE t SET value = 2 WHERE id = 1; > >>>> > >>>> The "correct" order of received messages on a third node is T1-T3-T2. > >>>> But we may also see T1-T2-T3 and T3-T1-T2, e.g. due to network issues > >>>> and so on. For T1-T2-T3 the right decision is to discard the update, > >>>> while for T3-T1-T2 it's to either wait for the INSERT or wait for the > >>>> insert to arrive. > >>>> > >>>> But if we misdetect the situation, we either end up with a row that > >>>> shouldn't be there, or losing an update. > >>> > >>> Doesn't the above example indicate that 'update_deleted' should also > >>> be considered a necessary conflict type? Please see the possibilities > >>> of conflicts in all three cases: > >>> > >>> > >>> The "correct" order of receiving messages on node C (as suggested > >>> above) is T1-T3-T2 (case1) > >>> ---------- > >>> T1 will insert the row. > >>> T3 will have update_differ conflict; latest_timestamp wins or apply > >>> will apply it. earliest_timestamp_wins or skip will skip it. > >>> T2 will delete the row (irrespective of whether the update happened or not). > >>> End Result: No Data. > >>> > >>> T1-T2-T3 > >>> ---------- > >>> T1 will insert the row. > >>> T2 will delete the row. > >>> T3 will have conflict update_deleted. If it is 'update_deleted', the > >>> chances are that the resolver set here is to 'skip' (default is also > >>> 'skip' in this case). > >>> > >>> If vacuum has deleted that row (or if we don't support > >>> 'update_deleted' conflict), it will be 'update_missing' conflict. In > >>> that case, the user may end up inserting the row if resolver chosen is > >>> in favor of apply (which seems an obvious choice for 'update_missing' > >>> conflict; default is also 'apply_or_skip'). > >>> > >>> End result: > >>> Row inserted with 'update_missing'. > >>> Row correctly skipped with 'update_deleted' (assuming the obvious > >>> choice seems to be 'skip' for update_deleted case). > >>> > >>> So it seems that with 'update_deleted' conflict, there are higher > >>> chances of opting for right decision here (which is to discard the > >>> update), as 'update_deleted' conveys correct info to the user. The > >>> 'update_missing' OTOH does not convey correct info and user may end up > >>> inserting the data by choosing apply favoring resolvers for > >>> 'update_missing'. Again, we get benefit of 'update_deleted' for > >>> *recently* deleted rows only. > >>> > >>> T3-T1-T2 > >>> ---------- > >>> T3 may end up inserting the record if the resolver is in favor of > >>> 'apply' and all the columns are received from remote. > >>> T1 will have' insert_exists' conflict and thus may either overwrite > >>> 'updated' values or may leave the data as is (based on whether > >>> resolver is in favor of apply or not) > >>> T2 will end up deleting it. > >>> End Result: No Data. > >>> > >>> I feel for second case (and similar cases), 'update_deleted' serves a > >>> better conflict type. > >>> > >> > >> True, but this is pretty much just a restatement of the example, right? > >> > >> The point I was trying to make is that this hinges on the ability to > >> detect the correct conflict type. And if vacuum can swoop in and remove > >> the recently deleted tuples (which I believe can happen at any time, > >> right?), then that's not guaranteed, because we won't see the deleted > >> tuple anymore. > > > > Yes, that's correct. However, many cases could benefit from the > > update_deleted conflict type if it can be implemented reliably. That's > > why we wanted to give it a try. But if we can't achieve predictable > > results with it, I'm fine to drop this approach and conflict_type. We > > can consider a better design in the future that doesn't depend on > > non-vacuumed entries and provides a more robust method for identifying > > deleted rows. > > > > I agree having a separate update_deleted conflict would be beneficial, > I'm not arguing against that - my point is actually that I think this > conflict type is required, and that it needs to be detected reliably. > > I'm not sure dropping update_deleted entirely would be a good idea, > though. It pretty much guarantees making the wrong decision at least > sometimes. But at least it's predictable and users are more likely to > notice that (compared to update_delete working on well-behaving systems, > and then failing when a node starts lagging or something). > > That's my opinion, though, and I don't intend to stay in the way. But I > think the solution is not that difficult - something needs to prevent > cleanup of recently dead tuples (until the "relevant" changes are > received and applied from other nodes). I don't know if that could be > done based on information we have for subscriptions, or if we need > something new. I agree that without update_deleted, there are higher chances of making incorrect decisions in some cases. But not sure if relying on delaying vacuum from removing such rows is a full proof plan. We cannot predict if or when "relevant" changes will occur, so how long should we delay the vacuum? To address this problem, we may need a completely different approach. One solution could be to store deleted rows in a separate table (dead-rows-table) so we can consult that table for any deleted entries at any time. Additionally, we would need methods to purge older data from the dead-rows-table to prevent it from growing too large. This would be a substantial project on its own, so we can aim to implement some initial and simple conflict resolution methods first before tackling this more complex solution. > >> Also, can the resolver even convert the UPDATE into INSERT and proceed? > >> Maybe with REPLICA IDENTITY FULL? > > > > Yes, it can, as long as the row doesn't contain toasted data. Without > > toasted data, the new tuple is fully logged. However, if the row does > > contain toasted data, the new tuple won't log it completely. In such a > > case, REPLICA IDENTITY FULL becomes a requirement to ensure we have > > all the data necessary to create the row on the target side. In > > absence of RI full and with row lacking toasted data, the operation > > will be skipped or error will be raised. > > > >> Otherwise the row might be incomplete, > >> missing required columns etc. In which case it'd have to wait for the > >> actual INSERT to arrive - which would work for actual update_missing, > >> where the row may be delayed due to network issues. But if that's a > >> mistake due to vacuum removing the deleted tuple, it'll wait forever. > > > > Even in case of 'update_missing', we do not intend to wait for 'actual > > insert' to arrive, as it is not guaranteed if the 'insert' will arrive > > or not. And thus we plan to skip or error out (based on user's > > configuration) if a complete row can not be created for insertion. > > > > If the UPDATE contains all the columns and can be turned into an INSERT, > then that seems reasonable. But I don't see how skipping it could work > in general (except for some very simple / specific use cases). I'm not > sure if you suggest to skip just the one UPDATE or transaction as a > whole, but it seems to me either of those options could easily lead to > all kinds of inconsistencies and user confusion. Conflict resolution is row-based, meaning that whatever action we choose (error or skip) applies to the specific change rather than the entire transaction. I'm not sure if waiting indefinitely for an INSERT to arrive is a good idea, as the node that triggered the INSERT might be down for an extended period. At best, we could provide a configuration parameter using which the apply worker waits for a specified time period for the INSERT to arrive before either skipping or throwing an error. That said, even if we error out or skip and log without waiting for the INSERT, we won't introduce any new inconsistencies. This is the current behavior on pg-HEAD. But with options like apply_or_skip and apply_or_error, we have a better chance of resolving conflicts by constructing the complete row internally, without user's intervention. There will still be some cases where we can't fully reconstruct the row, but in those instances, the behavior won't be any worse than the current pg-HEAD. thanks Shveta
On 6/12/24 06:32, Dilip Kumar wrote: > On Tue, Jun 11, 2024 at 7:44 PM Tomas Vondra > <tomas.vondra@enterprisedb.com> wrote: > >>> Yes, that's correct. However, many cases could benefit from the >>> update_deleted conflict type if it can be implemented reliably. That's >>> why we wanted to give it a try. But if we can't achieve predictable >>> results with it, I'm fine to drop this approach and conflict_type. We >>> can consider a better design in the future that doesn't depend on >>> non-vacuumed entries and provides a more robust method for identifying >>> deleted rows. >>> >> >> I agree having a separate update_deleted conflict would be beneficial, >> I'm not arguing against that - my point is actually that I think this >> conflict type is required, and that it needs to be detected reliably. >> > > When working with a distributed system, we must accept some form of > eventual consistency model. I'm not sure this is necessarily true. There are distributed databases implementing (or aiming to) regular consistency models, without eventual consistency. I'm not saying it's easy, but it shows eventual consistency is not the only option. > However, it's essential to design a > predictable and acceptable behavior. For example, if a change is a > result of a previous operation (such as an update on node B triggered > after observing an operation on node A), we can say that the operation > on node A happened before the operation on node B. Conversely, if > operations on nodes A and B are independent, we consider them > concurrent. > Right. And this is precisely the focus or my questions - understanding what behavior we aim for / expect in the end. Or said differently, what anomalies / weird behavior would be considered expected. Because that's important both for discussions about feasibility, etc. And also for evaluation / reviews of the patch. > In distributed systems, clock skew is a known issue. To establish a > consistency model, we need to ensure it guarantees the > "happens-before" relationship. Consider a scenario with three nodes: > NodeA, NodeB, and NodeC. If NodeA sends changes to NodeB, and > subsequently NodeB makes changes, and then both NodeA's and NodeB's > changes are sent to NodeC, the clock skew might make NodeB's changes > appear to have occurred before NodeA's changes. However, we should > maintain data that indicates NodeB's changes were triggered after > NodeA's changes arrived at NodeB. This implies that logically, NodeB's > changes happened after NodeA's changes, despite what the timestamps > suggest. > > A common method to handle such cases is using vector clocks for > conflict resolution. "Vector clocks" allow us to track the causal > relationships between changes across nodes, ensuring that we can > correctly order events and resolve conflicts in a manner that respects > the "happens-before" relationship. This method helps maintain > consistency and predictability in the system despite issues like clock > skew. > I'm familiar with the concept of vector clock (or logical clock in general), but it's not clear to me how you plan to use this in the context of the conflict handling. Can you elaborate/explain? The way I see it, conflict handling is pretty tightly coupled with regular commit timestamps and MVCC in general. How would you use vector clock to change that? regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Jun 12, 2024 at 5:26 PM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > >> I agree having a separate update_deleted conflict would be beneficial, > >> I'm not arguing against that - my point is actually that I think this > >> conflict type is required, and that it needs to be detected reliably. > >> > > > > When working with a distributed system, we must accept some form of > > eventual consistency model. > > I'm not sure this is necessarily true. There are distributed databases > implementing (or aiming to) regular consistency models, without eventual > consistency. I'm not saying it's easy, but it shows eventual consistency > is not the only option. Right, that statement might not be completely accurate. Based on the CAP theorem, when a network partition is unavoidable and availability is expected, we often choose an eventual consistency model. However, claiming that a fully consistent model is impossible in any distributed system is incorrect, as it can be achieved using mechanisms like Two-Phase Commit. We must also accept that our PostgreSQL replication mechanism does not guarantee a fully consistent model. Even with synchronous commit, it only waits for the WAL to be replayed on the standby but does not change the commit decision based on other nodes. This means, at most, we can only guarantee "Read Your Write" consistency. > > However, it's essential to design a > > predictable and acceptable behavior. For example, if a change is a > > result of a previous operation (such as an update on node B triggered > > after observing an operation on node A), we can say that the operation > > on node A happened before the operation on node B. Conversely, if > > operations on nodes A and B are independent, we consider them > > concurrent. > > > > Right. And this is precisely the focus or my questions - understanding > what behavior we aim for / expect in the end. Or said differently, what > anomalies / weird behavior would be considered expected. > Because that's important both for discussions about feasibility, etc. > And also for evaluation / reviews of the patch. +1 > > In distributed systems, clock skew is a known issue. To establish a > > consistency model, we need to ensure it guarantees the > > "happens-before" relationship. Consider a scenario with three nodes: > > NodeA, NodeB, and NodeC. If NodeA sends changes to NodeB, and > > subsequently NodeB makes changes, and then both NodeA's and NodeB's > > changes are sent to NodeC, the clock skew might make NodeB's changes > > appear to have occurred before NodeA's changes. However, we should > > maintain data that indicates NodeB's changes were triggered after > > NodeA's changes arrived at NodeB. This implies that logically, NodeB's > > changes happened after NodeA's changes, despite what the timestamps > > suggest. > > > > A common method to handle such cases is using vector clocks for > > conflict resolution. "Vector clocks" allow us to track the causal > > relationships between changes across nodes, ensuring that we can > > correctly order events and resolve conflicts in a manner that respects > > the "happens-before" relationship. This method helps maintain > > consistency and predictability in the system despite issues like clock > > skew. > > > > I'm familiar with the concept of vector clock (or logical clock in > general), but it's not clear to me how you plan to use this in the > context of the conflict handling. Can you elaborate/explain? > > The way I see it, conflict handling is pretty tightly coupled with > regular commit timestamps and MVCC in general. How would you use vector > clock to change that? The issue with using commit timestamps is that, when multiple nodes are involved, the commit timestamp won't accurately represent the actual order of operations. There's no reliable way to determine the perfect order of each operation happening on different nodes roughly simultaneously unless we use some globally synchronized counter. Generally, that order might not cause real issues unless one operation is triggered by a previous operation, and relying solely on physical timestamps would not detect that correctly. We need some sort of logical counter, such as a vector clock, which might be an independent counter on each node but can perfectly track the causal order. For example, if NodeA observes an operation from NodeB with a counter value of X, NodeA will adjust its counter to X+1. This ensures that if NodeA has seen an operation from NodeB, its next operation will appear to have occurred after NodeB's operation. I admit that I haven't fully thought through how we could design such version tracking in our logical replication protocol or how it would fit into our system. However, my point is that we need to consider something beyond commit timestamps to achieve reliable ordering. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Wed, Jun 5, 2024 at 3:32 PM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote: > > Hi, > > This time at PGconf.dev[1], we had some discussions regarding this > project. The proposed approach is to split the work into two main > components. The first part focuses on conflict detection, which aims to > identify and report conflicts in logical replication. This feature will > enable users to monitor the unexpected conflicts that may occur. The > second part involves the actual conflict resolution. Here, we will provide > built-in resolutions for each conflict and allow user to choose which > resolution will be used for which conflict(as described in the initial > email of this thread). I agree with this direction that we focus on conflict detection (and logging) first and then develop conflict resolution on top of that. > > Of course, we are open to alternative ideas and suggestions, and the > strategy above can be changed based on ongoing discussions and feedback > received. > > Here is the patch of the first part work, which adds a new parameter > detect_conflict for CREATE and ALTER subscription commands. This new > parameter will decide if subscription will go for conflict detection. By > default, conflict detection will be off for a subscription. > > When conflict detection is enabled, additional logging is triggered in the > following conflict scenarios: > > * updating a row that was previously modified by another origin. > * The tuple to be updated is not found. > * The tuple to be deleted is not found. > > While there exist other conflict types in logical replication, such as an > incoming insert conflicting with an existing row due to a primary key or > unique index, these cases already result in constraint violation errors. What does detect_conflict being true actually mean to users? I understand that detect_conflict being true could introduce some overhead to detect conflicts. But in terms of conflict detection, even if detect_confict is false, we detect some conflicts such as concurrent inserts with the same key. Once we introduce the complete conflict detection feature, I'm not sure there is a case where a user wants to detect only some particular types of conflict. > Therefore, additional conflict detection for these cases is currently > omitted to minimize potential overhead. However, the pre-detection for > conflict in these error cases is still essential to support automatic > conflict resolution in the future. I feel that we should log all types of conflict in an uniform way. For example, with detect_conflict being true, the update_differ conflict is reported as "conflict %s detected on relation "%s"", whereas concurrent inserts with the same key is reported as "duplicate key value violates unique constraint "%s"", which could confuse users. Ideally, I think that we log such conflict detection details (table name, column name, conflict type, etc) to somewhere (e.g. a table or server logs) so that the users can resolve them manually. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
On Thu, Jun 13, 2024 at 11:41 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Wed, Jun 5, 2024 at 3:32 PM Zhijie Hou (Fujitsu) > <houzj.fnst@fujitsu.com> wrote: > > > > Hi, > > > > This time at PGconf.dev[1], we had some discussions regarding this > > project. The proposed approach is to split the work into two main > > components. The first part focuses on conflict detection, which aims to > > identify and report conflicts in logical replication. This feature will > > enable users to monitor the unexpected conflicts that may occur. The > > second part involves the actual conflict resolution. Here, we will provide > > built-in resolutions for each conflict and allow user to choose which > > resolution will be used for which conflict(as described in the initial > > email of this thread). > > I agree with this direction that we focus on conflict detection (and > logging) first and then develop conflict resolution on top of that. > > > > > Of course, we are open to alternative ideas and suggestions, and the > > strategy above can be changed based on ongoing discussions and feedback > > received. > > > > Here is the patch of the first part work, which adds a new parameter > > detect_conflict for CREATE and ALTER subscription commands. This new > > parameter will decide if subscription will go for conflict detection. By > > default, conflict detection will be off for a subscription. > > > > When conflict detection is enabled, additional logging is triggered in the > > following conflict scenarios: > > > > * updating a row that was previously modified by another origin. > > * The tuple to be updated is not found. > > * The tuple to be deleted is not found. > > > > While there exist other conflict types in logical replication, such as an > > incoming insert conflicting with an existing row due to a primary key or > > unique index, these cases already result in constraint violation errors. > > What does detect_conflict being true actually mean to users? I > understand that detect_conflict being true could introduce some > overhead to detect conflicts. But in terms of conflict detection, even > if detect_confict is false, we detect some conflicts such as > concurrent inserts with the same key. Once we introduce the complete > conflict detection feature, I'm not sure there is a case where a user > wants to detect only some particular types of conflict. > You are right that users would wish to detect the conflicts and probably the extra effort would only be in the 'update_differ' case where we need to consult committs module and that we will only do when 'track_commit_timestamp' is true. BTW, I think for Inserts with primary/unique key violation, we should catch the ERROR and log it. If we want to log the conflicts in a separate table then do we want to do that in the catch block after getting pk violation or do an extra scan before 'INSERT' to find the conflict? I think logging would need extra cost especially if we want to LOG it in some table as you are suggesting below that may need some option. > > Therefore, additional conflict detection for these cases is currently > > omitted to minimize potential overhead. However, the pre-detection for > > conflict in these error cases is still essential to support automatic > > conflict resolution in the future. > > I feel that we should log all types of conflict in an uniform way. For > example, with detect_conflict being true, the update_differ conflict > is reported as "conflict %s detected on relation "%s"", whereas > concurrent inserts with the same key is reported as "duplicate key > value violates unique constraint "%s"", which could confuse users. > Ideally, I think that we log such conflict detection details (table > name, column name, conflict type, etc) to somewhere (e.g. a table or > server logs) so that the users can resolve them manually. > It is good to think if there is a value in providing in pg_conflicts_history kind of table which will have details of conflicts that occurred and then we can extend it to have resolutions. I feel we can anyway LOG the conflicts by default. Updating a separate table with conflicts should be done by default or with a knob is a point to consider. -- With Regards, Amit Kapila.
On 23.05.24 08:36, shveta malik wrote: > Conflict Resolution > ---------------- > a) latest_timestamp_wins: The change with later commit timestamp wins. > b) earliest_timestamp_wins: The change with earlier commit timestamp wins. > c) apply: Always apply the remote change. > d) skip: Remote change is skipped. > e) error: Error out on conflict. Replication is stopped, manual > action is needed. You might be aware of pglogical, which has similar conflict resolution modes, but they appear to be spelled a bit different. It might be worth reviewing this, so that we don't unnecessarily introduce differences. https://github.com/2ndquadrant/pglogical?tab=readme-ov-file#conflicts There might also be other inspiration to be found related to this in pglogical documentation or code.
On 2024-Jun-07, Tomas Vondra wrote: > On 6/3/24 09:30, Amit Kapila wrote: > > On Sat, May 25, 2024 at 2:39 AM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > >> How is this going to deal with the fact that commit LSN and timestamps > >> may not correlate perfectly? That is, commits may happen with LSN1 < > >> LSN2 but with T1 > T2. > But as I wrote, I'm not quite convinced this means there are not other > issues with this way of resolving conflicts. It's more likely a more > complex scenario is required. Jan Wieck approached me during pgconf.dev to reproach me of this problem. He also said he had some code to fix-up the commit TS afterwards somehow, to make the sequence monotonically increasing. Perhaps we should consider that, to avoid any problems caused by the difference between LSN order and TS order. It might be quite nightmarish to try to make the system work correctly without reasonable constraints of that nature. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
On 6/13/24 7:28 AM, Amit Kapila wrote: > You are right that users would wish to detect the conflicts and > probably the extra effort would only be in the 'update_differ' case > where we need to consult committs module and that we will only do when > 'track_commit_timestamp' is true. BTW, I think for Inserts with > primary/unique key violation, we should catch the ERROR and log it. If > we want to log the conflicts in a separate table then do we want to do > that in the catch block after getting pk violation or do an extra scan > before 'INSERT' to find the conflict? I think logging would need extra > cost especially if we want to LOG it in some table as you are > suggesting below that may need some option. > >>> Therefore, additional conflict detection for these cases is currently >>> omitted to minimize potential overhead. However, the pre-detection for >>> conflict in these error cases is still essential to support automatic >>> conflict resolution in the future. >> >> I feel that we should log all types of conflict in an uniform way. For >> example, with detect_conflict being true, the update_differ conflict >> is reported as "conflict %s detected on relation "%s"", whereas >> concurrent inserts with the same key is reported as "duplicate key >> value violates unique constraint "%s"", which could confuse users. >> Ideally, I think that we log such conflict detection details (table >> name, column name, conflict type, etc) to somewhere (e.g. a table or >> server logs) so that the users can resolve them manually. >> > > It is good to think if there is a value in providing in > pg_conflicts_history kind of table which will have details of > conflicts that occurred and then we can extend it to have resolutions. > I feel we can anyway LOG the conflicts by default. Updating a separate > table with conflicts should be done by default or with a knob is a > point to consider. +1 for logging conflicts uniformly, but I would +100 to exposing the log in a way that's easy for the user to query (whether it's a system view or a stat table). Arguably, I'd say that would be the most important feature to come out of this effort. Removing how conflicts are resolved, users want to know exactly what row had a conflict, and users from other database systems that have dealt with these issues will have tooling to be able to review and analyze if a conflicts occur. This data is typically stored in a queryable table, with data retained for N days. When you add in automatic conflict resolution, users then want to have a record of how the conflict was resolved, in case they need to manually update it. Having this data in a table also gives the user opportunity to understand conflict stats (e.g. conflict rates) and potentially identify portions of the application and other parts of the system to optimize. It also makes it easier to import to downstream systems that may perform further analysis on conflict resolution, or alarm if a conflict rate exceeds a certain threshold. Thanks, Jonathan
Attachment
On Thu, May 23, 2024 at 2:37 AM shveta malik <shveta.malik@gmail.com> wrote: > c) update_deleted: The row with the same value as that incoming > update's key does not exist. The row is already deleted. This conflict > type is generated only if the deleted row is still detectable i.e., it > is not removed by VACUUM yet. If the row is removed by VACUUM already, > it cannot detect this conflict. It will detect it as update_missing > and will follow the default or configured resolver of update_missing > itself. I think this design is categorically unacceptable. It amounts to designing a feature that works except when it doesn't. I'm not exactly sure how the proposal should be changed to avoid depending on the timing of VACUUM, but I think it's absolutely not OK to depend on the timing of VACUUm -- or, really, this is going to depend on the timing of HOT-pruning, which will often happen almost instantly. -- Robert Haas EDB: http://www.enterprisedb.com
On Thu, Jun 13, 2024 at 7:00 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > On 2024-Jun-07, Tomas Vondra wrote: > > > On 6/3/24 09:30, Amit Kapila wrote: > > > On Sat, May 25, 2024 at 2:39 AM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > > >> How is this going to deal with the fact that commit LSN and timestamps > > >> may not correlate perfectly? That is, commits may happen with LSN1 < > > >> LSN2 but with T1 > T2. > > > But as I wrote, I'm not quite convinced this means there are not other > > issues with this way of resolving conflicts. It's more likely a more > > complex scenario is required. > > Jan Wieck approached me during pgconf.dev to reproach me of this > problem. He also said he had some code to fix-up the commit TS > afterwards somehow, to make the sequence monotonically increasing. > Perhaps we should consider that, to avoid any problems caused by the > difference between LSN order and TS order. It might be quite > nightmarish to try to make the system work correctly without > reasonable constraints of that nature. > I agree with this but the problem Jan was worried about was not directly reproducible in what the PostgreSQL provides at least that is what I understood then. We are also unable to think of a concrete scenario where this is a problem but we are planning to spend more time deriving a test to reproducible the problem. -- With Regards, Amit Kapila.
On Thu, Jun 13, 2024 at 11:18 PM Jonathan S. Katz <jkatz@postgresql.org> wrote: > > On 6/13/24 7:28 AM, Amit Kapila wrote: > >> > >> I feel that we should log all types of conflict in an uniform way. For > >> example, with detect_conflict being true, the update_differ conflict > >> is reported as "conflict %s detected on relation "%s"", whereas > >> concurrent inserts with the same key is reported as "duplicate key > >> value violates unique constraint "%s"", which could confuse users. > >> Ideally, I think that we log such conflict detection details (table > >> name, column name, conflict type, etc) to somewhere (e.g. a table or > >> server logs) so that the users can resolve them manually. > >> > > > > It is good to think if there is a value in providing in > > pg_conflicts_history kind of table which will have details of > > conflicts that occurred and then we can extend it to have resolutions. > > I feel we can anyway LOG the conflicts by default. Updating a separate > > table with conflicts should be done by default or with a knob is a > > point to consider. > > +1 for logging conflicts uniformly, but I would +100 to exposing the log > in a way that's easy for the user to query (whether it's a system view > or a stat table). Arguably, I'd say that would be the most important > feature to come out of this effort. > We can have both the system view and a stats table. The system view could have some sort of cumulative stats data like how many times a particular conflict had occurred and the table would provide detailed information about the conflict. The one challenge I see in providing a table is in its cleanup mechanism. We could prove a partitioned table such that users can truncate/drop the not needed partitions or provide a non-partitioned table where users can delete the old data in which case they generate a work for auto vacuum. > Removing how conflicts are resolved, users want to know exactly what row > had a conflict, and users from other database systems that have dealt > with these issues will have tooling to be able to review and analyze if > a conflicts occur. This data is typically stored in a queryable table, > with data retained for N days. When you add in automatic conflict > resolution, users then want to have a record of how the conflict was > resolved, in case they need to manually update it. > > Having this data in a table also gives the user opportunity to > understand conflict stats (e.g. conflict rates) and potentially identify > portions of the application and other parts of the system to optimize. > It also makes it easier to import to downstream systems that may perform > further analysis on conflict resolution, or alarm if a conflict rate > exceeds a certain threshold. > Agreed those are good use cases to store conflict history. -- With Regards, Amit Kapila.
On Fri, Jun 14, 2024 at 12:10 AM Robert Haas <robertmhaas@gmail.com> wrote: > > On Thu, May 23, 2024 at 2:37 AM shveta malik <shveta.malik@gmail.com> wrote: > > c) update_deleted: The row with the same value as that incoming > > update's key does not exist. The row is already deleted. This conflict > > type is generated only if the deleted row is still detectable i.e., it > > is not removed by VACUUM yet. If the row is removed by VACUUM already, > > it cannot detect this conflict. It will detect it as update_missing > > and will follow the default or configured resolver of update_missing > > itself. > > I think this design is categorically unacceptable. It amounts to > designing a feature that works except when it doesn't. I'm not exactly > sure how the proposal should be changed to avoid depending on the > timing of VACUUM, but I think it's absolutely not OK to depend on the > timing of VACUUm -- or, really, this is going to depend on the timing > of HOT-pruning, which will often happen almost instantly. > Agreed, above Tomas has speculated to have a way to avoid vacuum cleaning dead tuples until the required changes are received and applied. Shveta also mentioned another way to have deads-store (say a table where deleted rows are stored for resolution) [1] which is similar to a technique used by some other databases. There is an agreement to not rely on Vacuum to detect such a conflict but the alternative is not clear. Currently, we are thinking to consider such a conflict type as update_missing (The row with the same value as that incoming update's key does not exist.). This is how the current HEAD code behaves and LOGs the information (logical replication did not find row to be updated ..). [1] - https://www.postgresql.org/message-id/CAJpy0uCov4JfZJeOvY0O21_gk9bcgNUDp4jf8%2BBbMp%2BEAv8cVQ%40mail.gmail.com -- With Regards, Amit Kapila.
On Thursday, June 13, 2024 8:46 PM Peter Eisentraut <peter@eisentraut.org> wrote: > > On 23.05.24 08:36, shveta malik wrote: > > Conflict Resolution > > ---------------- > > a) latest_timestamp_wins: The change with later commit timestamp > wins. > > b) earliest_timestamp_wins: The change with earlier commit timestamp > wins. > > c) apply: Always apply the remote change. > > d) skip: Remote change is skipped. > > e) error: Error out on conflict. Replication is stopped, manual > > action is needed. > > You might be aware of pglogical, which has similar conflict resolution modes, > but they appear to be spelled a bit different. It might be worth reviewing this, > so that we don't unnecessarily introduce differences. Right. Some of the proposed resolution names are different from pglogical's while the functionalities are the same. The following is the comparison with pglogical: latest_timestamp_wins(proposal) - last_update_wins(pglogical) earliest_timestamp_wins(proposal) - first_update_wins(pglogical) apply(proposal) - apply_remote(pglogical) skip(proposal) - keep_local(pglogical) I personally think the pglogical's names read more naturally. But others may have different opinions on this. > > https://github.com/2ndquadrant/pglogical?tab=readme-ov-file#conflicts > > There might also be other inspiration to be found related to this in pglogical > documentation or code. Another difference is that we allow users to specify different resolutions for different conflicts, while pglogical allows specifying one resolution for all conflict. I think the proposed approach offers more flexibility to users, which seems more favorable to me. Best Regards, Hou zj
On 6/14/24 13:29, Amit Kapila wrote: > On Fri, Jun 14, 2024 at 12:10 AM Robert Haas <robertmhaas@gmail.com> wrote: >> >> On Thu, May 23, 2024 at 2:37 AM shveta malik <shveta.malik@gmail.com> wrote: >>> c) update_deleted: The row with the same value as that incoming >>> update's key does not exist. The row is already deleted. This conflict >>> type is generated only if the deleted row is still detectable i.e., it >>> is not removed by VACUUM yet. If the row is removed by VACUUM already, >>> it cannot detect this conflict. It will detect it as update_missing >>> and will follow the default or configured resolver of update_missing >>> itself. >> >> I think this design is categorically unacceptable. It amounts to >> designing a feature that works except when it doesn't. I'm not exactly >> sure how the proposal should be changed to avoid depending on the >> timing of VACUUM, but I think it's absolutely not OK to depend on the >> timing of VACUUm -- or, really, this is going to depend on the timing >> of HOT-pruning, which will often happen almost instantly. >> > > Agreed, above Tomas has speculated to have a way to avoid vacuum > cleaning dead tuples until the required changes are received and > applied. Shveta also mentioned another way to have deads-store (say a > table where deleted rows are stored for resolution) [1] which is > similar to a technique used by some other databases. There is an > agreement to not rely on Vacuum to detect such a conflict but the > alternative is not clear. I'm not sure I'd say I "speculated" about it - it's not like we don't have ways to hold off cleanup for a while for various reasons (long-running query, replication slot, hot-standby feedback, ...). How exactly would that be implemented I don't know, but it seems like a far simpler approach than inventing a new "dead store". It'd need logic to let the vacuum to cleanup the stuff no longer needed, but so would the dead store I think. > Currently, we are thinking to consider such > a conflict type as update_missing (The row with the same value as that > incoming update's key does not exist.). This is how the current HEAD > code behaves and LOGs the information (logical replication did not > find row to be updated ..). > I thought the agreement was we need both conflict types to get sensible behavior, so proceeding with just the update_missing (mostly because we don't know how to detect these conflicts reliably) seems like maybe not be the right direction ... regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 6/13/24 06:52, Dilip Kumar wrote: > On Wed, Jun 12, 2024 at 5:26 PM Tomas Vondra > <tomas.vondra@enterprisedb.com> wrote: >> >>>> I agree having a separate update_deleted conflict would be beneficial, >>>> I'm not arguing against that - my point is actually that I think this >>>> conflict type is required, and that it needs to be detected reliably. >>>> >>> >>> When working with a distributed system, we must accept some form of >>> eventual consistency model. >> >> I'm not sure this is necessarily true. There are distributed databases >> implementing (or aiming to) regular consistency models, without eventual >> consistency. I'm not saying it's easy, but it shows eventual consistency >> is not the only option. > > Right, that statement might not be completely accurate. Based on the > CAP theorem, when a network partition is unavoidable and availability > is expected, we often choose an eventual consistency model. However, > claiming that a fully consistent model is impossible in any > distributed system is incorrect, as it can be achieved using > mechanisms like Two-Phase Commit. > > We must also accept that our PostgreSQL replication mechanism does not > guarantee a fully consistent model. Even with synchronous commit, it > only waits for the WAL to be replayed on the standby but does not > change the commit decision based on other nodes. This means, at most, > we can only guarantee "Read Your Write" consistency. > Perhaps, but even accepting eventual consistency does not absolve us from actually defining what that means, ensuring it's sensible enough to be practical/usable, and that it actually converges to a consistent state (that's essentially the problem of the update conflict types, because misdetecting it results in diverging results). >>> However, it's essential to design a >>> predictable and acceptable behavior. For example, if a change is a >>> result of a previous operation (such as an update on node B triggered >>> after observing an operation on node A), we can say that the operation >>> on node A happened before the operation on node B. Conversely, if >>> operations on nodes A and B are independent, we consider them >>> concurrent. >>> >> >> Right. And this is precisely the focus or my questions - understanding >> what behavior we aim for / expect in the end. Or said differently, what >> anomalies / weird behavior would be considered expected. > >> Because that's important both for discussions about feasibility, etc. >> And also for evaluation / reviews of the patch. > > +1 > >>> In distributed systems, clock skew is a known issue. To establish a >>> consistency model, we need to ensure it guarantees the >>> "happens-before" relationship. Consider a scenario with three nodes: >>> NodeA, NodeB, and NodeC. If NodeA sends changes to NodeB, and >>> subsequently NodeB makes changes, and then both NodeA's and NodeB's >>> changes are sent to NodeC, the clock skew might make NodeB's changes >>> appear to have occurred before NodeA's changes. However, we should >>> maintain data that indicates NodeB's changes were triggered after >>> NodeA's changes arrived at NodeB. This implies that logically, NodeB's >>> changes happened after NodeA's changes, despite what the timestamps >>> suggest. >>> >>> A common method to handle such cases is using vector clocks for >>> conflict resolution. "Vector clocks" allow us to track the causal >>> relationships between changes across nodes, ensuring that we can >>> correctly order events and resolve conflicts in a manner that respects >>> the "happens-before" relationship. This method helps maintain >>> consistency and predictability in the system despite issues like clock >>> skew. >>> >> >> I'm familiar with the concept of vector clock (or logical clock in >> general), but it's not clear to me how you plan to use this in the >> context of the conflict handling. Can you elaborate/explain? >> >> The way I see it, conflict handling is pretty tightly coupled with >> regular commit timestamps and MVCC in general. How would you use vector >> clock to change that? > > The issue with using commit timestamps is that, when multiple nodes > are involved, the commit timestamp won't accurately represent the > actual order of operations. There's no reliable way to determine the > perfect order of each operation happening on different nodes roughly > simultaneously unless we use some globally synchronized counter. > Generally, that order might not cause real issues unless one operation > is triggered by a previous operation, and relying solely on physical > timestamps would not detect that correctly. > This whole conflict detection / resolution proposal is based on using commit timestamps. Aren't you suggesting it can't really work with commit timestamps? FWIW there are ways to builds distributed consistency with timestamps, as long as it's monotonic - e.g. clock-SI does that. It's not perfect, but it shows it's possible. However, I'm not we have to go there - it depends on what the goal is. For a one-directional replication (multiple nodes replicating to the same target) it might be sufficient if the conflict resolution is "deterministic" (e.g. not dependent on the order in which the changes are applied). I'm not sure, but it's why I asked what's the goal in my very first message in this thread. > We need some sort of logical counter, such as a vector clock, which > might be an independent counter on each node but can perfectly track > the causal order. For example, if NodeA observes an operation from > NodeB with a counter value of X, NodeA will adjust its counter to X+1. > This ensures that if NodeA has seen an operation from NodeB, its next > operation will appear to have occurred after NodeB's operation. > > I admit that I haven't fully thought through how we could design such > version tracking in our logical replication protocol or how it would > fit into our system. However, my point is that we need to consider > something beyond commit timestamps to achieve reliable ordering. > I can't really respond to this as there's no suggestion how it would be implemented in the patch discussed in this thread. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Jun 17, 2024 at 4:18 AM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > On 6/14/24 13:29, Amit Kapila wrote: > > On Fri, Jun 14, 2024 at 12:10 AM Robert Haas <robertmhaas@gmail.com> wrote: > >> > >> On Thu, May 23, 2024 at 2:37 AM shveta malik <shveta.malik@gmail.com> wrote: > >>> c) update_deleted: The row with the same value as that incoming > >>> update's key does not exist. The row is already deleted. This conflict > >>> type is generated only if the deleted row is still detectable i.e., it > >>> is not removed by VACUUM yet. If the row is removed by VACUUM already, > >>> it cannot detect this conflict. It will detect it as update_missing > >>> and will follow the default or configured resolver of update_missing > >>> itself. > >> > >> I think this design is categorically unacceptable. It amounts to > >> designing a feature that works except when it doesn't. I'm not exactly > >> sure how the proposal should be changed to avoid depending on the > >> timing of VACUUM, but I think it's absolutely not OK to depend on the > >> timing of VACUUm -- or, really, this is going to depend on the timing > >> of HOT-pruning, which will often happen almost instantly. > >> > > > > Agreed, above Tomas has speculated to have a way to avoid vacuum > > cleaning dead tuples until the required changes are received and > > applied. Shveta also mentioned another way to have deads-store (say a > > table where deleted rows are stored for resolution) [1] which is > > similar to a technique used by some other databases. There is an > > agreement to not rely on Vacuum to detect such a conflict but the > > alternative is not clear. > > I'm not sure I'd say I "speculated" about it - it's not like we don't > have ways to hold off cleanup for a while for various reasons > (long-running query, replication slot, hot-standby feedback, ...). > > How exactly would that be implemented I don't know, but it seems like a > far simpler approach than inventing a new "dead store". It'd need logic > to let the vacuum to cleanup the stuff no longer needed, but so would > the dead store I think. > The difference w.r.t the existing mechanisms for holding deleted data is that we don't know whether we need to hold off the vacuum from cleaning up the rows because we can't say with any certainty whether other nodes will perform any conflicting operations in the future. Using the example we discussed, Node A: T1: INSERT INTO t (id, value) VALUES (1,1); T2: DELETE FROM t WHERE id = 1; Node B: T3: UPDATE t SET value = 2 WHERE id = 1; Say the order of receiving the commands is T1-T2-T3. We can't predict whether we will ever get T-3, so on what basis shall we try to prevent vacuum from removing the deleted row? The one factor could be time, say we define a new parameter vacuum_committs_age which would indicate that we will allow rows to be removed only if the modified time of the tuple as indicated by committs module is greater than the vacuum_committs_age. This needs more analysis if we want to pursue this direction. OTOH, in the existing mechanisms, there is a common factor among all which is that we know that there is some event that requires data to be present. For example, with a long-running query, we know that the deleted/updated row is still visible for some running query. For replication slots, we know that the client will acknowledge the feedback in terms of LSN using which we can allow vacuum to remove rows. Similar to these hot_standby_feedback allows the vacuum to prevent row removal based on current activity (the xid horizons required by queries on standby) on hot_standby. > > Currently, we are thinking to consider such > > a conflict type as update_missing (The row with the same value as that > > incoming update's key does not exist.). This is how the current HEAD > > code behaves and LOGs the information (logical replication did not > > find row to be updated ..). > > > > I thought the agreement was we need both conflict types to get sensible > behavior, so proceeding with just the update_missing (mostly because we > don't know how to detect these conflicts reliably) seems like maybe not > be the right direction ... > Fair enough. I am also not in favor of ignoring this but if as a first step, we want to improve our current conflict detection mechanism and provide the stats or conflict information in some catalog or view, we can do that even if update_delete is not detected. For example, as of now, we only detect update_missing and simply LOG it at DEBUG1 level. Additionally, we can detect update_differ (the row updated by a different origin) and have some stats. We seem to have some agreement that conflict detection and stats about the same could be the first step. -- With Regards, Amit Kapila.
On Mon, Jun 17, 2024 at 5:38 AM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > > The issue with using commit timestamps is that, when multiple nodes > > are involved, the commit timestamp won't accurately represent the > > actual order of operations. There's no reliable way to determine the > > perfect order of each operation happening on different nodes roughly > > simultaneously unless we use some globally synchronized counter. > > Generally, that order might not cause real issues unless one operation > > is triggered by a previous operation, and relying solely on physical > > timestamps would not detect that correctly. > > > This whole conflict detection / resolution proposal is based on using > commit timestamps. Aren't you suggesting it can't really work with > commit timestamps? > > FWIW there are ways to builds distributed consistency with timestamps, > as long as it's monotonic - e.g. clock-SI does that. It's not perfect, > but it shows it's possible. Hmm, I see that clock-SI does this by delaying the transaction when it detects the clock skew. > However, I'm not we have to go there - it depends on what the goal is. > For a one-directional replication (multiple nodes replicating to the > same target) it might be sufficient if the conflict resolution is > "deterministic" (e.g. not dependent on the order in which the changes > are applied). I'm not sure, but it's why I asked what's the goal in my > very first message in this thread. I'm not completely certain about this. Even in one directional replication if multiple nodes are sending data how can we guarantee determinism in the presence of clock skew if we are not using some other mechanism like logical counters or something like what clock-SI is doing? I don't want to insist on using any specific solution here. However, I noticed that we haven't addressed how we plan to manage clock skew, which is my primary concern. I believe that if multiple nodes are involved and we're receiving data from them with unsynchronized clocks, ensuring determinism about their order will require us to take some measures to handle that. > > We need some sort of logical counter, such as a vector clock, which > > might be an independent counter on each node but can perfectly track > > the causal order. For example, if NodeA observes an operation from > > NodeB with a counter value of X, NodeA will adjust its counter to X+1. > > This ensures that if NodeA has seen an operation from NodeB, its next > > operation will appear to have occurred after NodeB's operation. > > > > I admit that I haven't fully thought through how we could design such > > version tracking in our logical replication protocol or how it would > > fit into our system. However, my point is that we need to consider > > something beyond commit timestamps to achieve reliable ordering. > > > > I can't really respond to this as there's no suggestion how it would be > implemented in the patch discussed in this thread. > No worries, I'll consider whether finding such a solution is feasible for our situation. Thank you! -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Wed, Jun 12, 2024 at 10:03 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Tue, Jun 11, 2024 at 7:44 PM Tomas Vondra > <tomas.vondra@enterprisedb.com> wrote: > > > > Yes, that's correct. However, many cases could benefit from the > > > update_deleted conflict type if it can be implemented reliably. That's > > > why we wanted to give it a try. But if we can't achieve predictable > > > results with it, I'm fine to drop this approach and conflict_type. We > > > can consider a better design in the future that doesn't depend on > > > non-vacuumed entries and provides a more robust method for identifying > > > deleted rows. > > > > > > > I agree having a separate update_deleted conflict would be beneficial, > > I'm not arguing against that - my point is actually that I think this > > conflict type is required, and that it needs to be detected reliably. > > > > When working with a distributed system, we must accept some form of > eventual consistency model. However, it's essential to design a > predictable and acceptable behavior. For example, if a change is a > result of a previous operation (such as an update on node B triggered > after observing an operation on node A), we can say that the operation > on node A happened before the operation on node B. Conversely, if > operations on nodes A and B are independent, we consider them > concurrent. > > In distributed systems, clock skew is a known issue. To establish a > consistency model, we need to ensure it guarantees the > "happens-before" relationship. Consider a scenario with three nodes: > NodeA, NodeB, and NodeC. If NodeA sends changes to NodeB, and > subsequently NodeB makes changes, and then both NodeA's and NodeB's > changes are sent to NodeC, the clock skew might make NodeB's changes > appear to have occurred before NodeA's changes. However, we should > maintain data that indicates NodeB's changes were triggered after > NodeA's changes arrived at NodeB. This implies that logically, NodeB's > changes happened after NodeA's changes, despite what the timestamps > suggest. > > A common method to handle such cases is using vector clocks for > conflict resolution. > I think the unbounded size of the vector could be a problem to store for each event. However, while researching previous discussions, it came to our notice that we have discussed this topic in the past as well in the context of standbys. For recovery_min_apply_delay, we decided the clock skew is not a problem as the settings of this parameter are much larger than typical time deviations between servers as mentioned in docs. Similarly for casual reads [1], there was a proposal to introduce max_clock_skew parameter and suggesting the user to make sure to have NTP set up correctly. We have tried to check other databases (like Ora and BDR) where CDR is implemented but didn't find anything specific to clock skew. So, I propose to go with a GUC like max_clock_skew such that if the difference of time between the incoming transaction's commit time and the local time is more than max_clock_skew then we raise an ERROR. It is not clear to me that putting bigger effort into clock skew is worth especially when other systems providing CDR feature (like Ora or BDR) for decades have not done anything like vector clocks. It is possible that this is less of a problem w.r.t CDR and just detecting the anomaly in clock skew is good enough. [1] - https://www.postgresql.org/message-id/flat/CAEepm%3D1iiEzCVLD%3DRoBgtZSyEY1CR-Et7fRc9prCZ9MuTz3pWg%40mail.gmail.com -- With Regards, Amit Kapila.
On Mon, Jun 17, 2024 at 1:42 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > The difference w.r.t the existing mechanisms for holding deleted data > is that we don't know whether we need to hold off the vacuum from > cleaning up the rows because we can't say with any certainty whether > other nodes will perform any conflicting operations in the future. > Using the example we discussed, > Node A: > T1: INSERT INTO t (id, value) VALUES (1,1); > T2: DELETE FROM t WHERE id = 1; > > Node B: > T3: UPDATE t SET value = 2 WHERE id = 1; > > Say the order of receiving the commands is T1-T2-T3. We can't predict > whether we will ever get T-3, so on what basis shall we try to prevent > vacuum from removing the deleted row? The problem arises because T2 and T3 might be applied out of order on some nodes. Once either one of them has been applied on every node, no further conflicts are possible. -- Robert Haas EDB: http://www.enterprisedb.com
On Thursday, June 13, 2024 2:11 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: Hi, > On Wed, Jun 5, 2024 at 3:32 PM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> > wrote: > > > > This time at PGconf.dev[1], we had some discussions regarding this > > project. The proposed approach is to split the work into two main > > components. The first part focuses on conflict detection, which aims > > to identify and report conflicts in logical replication. This feature > > will enable users to monitor the unexpected conflicts that may occur. > > The second part involves the actual conflict resolution. Here, we will > > provide built-in resolutions for each conflict and allow user to > > choose which resolution will be used for which conflict(as described > > in the initial email of this thread). > > I agree with this direction that we focus on conflict detection (and > logging) first and then develop conflict resolution on top of that. Thanks for your reply ! > > > > > Of course, we are open to alternative ideas and suggestions, and the > > strategy above can be changed based on ongoing discussions and > > feedback received. > > > > Here is the patch of the first part work, which adds a new parameter > > detect_conflict for CREATE and ALTER subscription commands. This new > > parameter will decide if subscription will go for conflict detection. > > By default, conflict detection will be off for a subscription. > > > > When conflict detection is enabled, additional logging is triggered in > > the following conflict scenarios: > > > > * updating a row that was previously modified by another origin. > > * The tuple to be updated is not found. > > * The tuple to be deleted is not found. > > > > While there exist other conflict types in logical replication, such as > > an incoming insert conflicting with an existing row due to a primary > > key or unique index, these cases already result in constraint violation errors. > > What does detect_conflict being true actually mean to users? I understand that > detect_conflict being true could introduce some overhead to detect conflicts. > But in terms of conflict detection, even if detect_confict is false, we detect > some conflicts such as concurrent inserts with the same key. Once we > introduce the complete conflict detection feature, I'm not sure there is a case > where a user wants to detect only some particular types of conflict. > > > Therefore, additional conflict detection for these cases is currently > > omitted to minimize potential overhead. However, the pre-detection for > > conflict in these error cases is still essential to support automatic > > conflict resolution in the future. > > I feel that we should log all types of conflict in an uniform way. For example, > with detect_conflict being true, the update_differ conflict is reported as > "conflict %s detected on relation "%s"", whereas concurrent inserts with the > same key is reported as "duplicate key value violates unique constraint "%s"", > which could confuse users. Do you mean it's ok to add a pre-check before applying the INSERT, which will verify if the remote tuple violates any unique constraints, and if it violates then we log a conflict message ? I thought about this but was slightly worried about the extra cost it would bring. OTOH, if we think it's acceptable, we could do that since the cost is there only when detect_conflict is enabled. I also thought of logging such a conflict message in pg_catch(), but I think we lack some necessary info(relation, index name, column name) at the catch block. Best Regards, Hou zj
On Mon, Jun 17, 2024 at 3:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Jun 12, 2024 at 10:03 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Tue, Jun 11, 2024 at 7:44 PM Tomas Vondra > > <tomas.vondra@enterprisedb.com> wrote: > > > > > > Yes, that's correct. However, many cases could benefit from the > > > > update_deleted conflict type if it can be implemented reliably. That's > > > > why we wanted to give it a try. But if we can't achieve predictable > > > > results with it, I'm fine to drop this approach and conflict_type. We > > > > can consider a better design in the future that doesn't depend on > > > > non-vacuumed entries and provides a more robust method for identifying > > > > deleted rows. > > > > > > > > > > I agree having a separate update_deleted conflict would be beneficial, > > > I'm not arguing against that - my point is actually that I think this > > > conflict type is required, and that it needs to be detected reliably. > > > > > > > When working with a distributed system, we must accept some form of > > eventual consistency model. However, it's essential to design a > > predictable and acceptable behavior. For example, if a change is a > > result of a previous operation (such as an update on node B triggered > > after observing an operation on node A), we can say that the operation > > on node A happened before the operation on node B. Conversely, if > > operations on nodes A and B are independent, we consider them > > concurrent. > > > > In distributed systems, clock skew is a known issue. To establish a > > consistency model, we need to ensure it guarantees the > > "happens-before" relationship. Consider a scenario with three nodes: > > NodeA, NodeB, and NodeC. If NodeA sends changes to NodeB, and > > subsequently NodeB makes changes, and then both NodeA's and NodeB's > > changes are sent to NodeC, the clock skew might make NodeB's changes > > appear to have occurred before NodeA's changes. However, we should > > maintain data that indicates NodeB's changes were triggered after > > NodeA's changes arrived at NodeB. This implies that logically, NodeB's > > changes happened after NodeA's changes, despite what the timestamps > > suggest. > > > > A common method to handle such cases is using vector clocks for > > conflict resolution. > > > > I think the unbounded size of the vector could be a problem to store > for each event. However, while researching previous discussions, it > came to our notice that we have discussed this topic in the past as > well in the context of standbys. For recovery_min_apply_delay, we > decided the clock skew is not a problem as the settings of this > parameter are much larger than typical time deviations between servers > as mentioned in docs. Similarly for casual reads [1], there was a > proposal to introduce max_clock_skew parameter and suggesting the user > to make sure to have NTP set up correctly. We have tried to check > other databases (like Ora and BDR) where CDR is implemented but didn't > find anything specific to clock skew. So, I propose to go with a GUC > like max_clock_skew such that if the difference of time between the > incoming transaction's commit time and the local time is more than > max_clock_skew then we raise an ERROR. It is not clear to me that > putting bigger effort into clock skew is worth especially when other > systems providing CDR feature (like Ora or BDR) for decades have not > done anything like vector clocks. It is possible that this is less of > a problem w.r.t CDR and just detecting the anomaly in clock skew is > good enough. I believe that if we've accepted this solution elsewhere, then we can also consider the same. Basically, we're allowing the application to set its tolerance for clock skew. And, if the skew exceeds that tolerance, it's the application's responsibility to synchronize; otherwise, an error will occur. This approach seems reasonable. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Tue, Jun 18, 2024 at 10:17 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Mon, Jun 17, 2024 at 3:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Wed, Jun 12, 2024 at 10:03 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > On Tue, Jun 11, 2024 at 7:44 PM Tomas Vondra > > > <tomas.vondra@enterprisedb.com> wrote: > > > > > > > > Yes, that's correct. However, many cases could benefit from the > > > > > update_deleted conflict type if it can be implemented reliably. That's > > > > > why we wanted to give it a try. But if we can't achieve predictable > > > > > results with it, I'm fine to drop this approach and conflict_type. We > > > > > can consider a better design in the future that doesn't depend on > > > > > non-vacuumed entries and provides a more robust method for identifying > > > > > deleted rows. > > > > > > > > > > > > > I agree having a separate update_deleted conflict would be beneficial, > > > > I'm not arguing against that - my point is actually that I think this > > > > conflict type is required, and that it needs to be detected reliably. > > > > > > > > > > When working with a distributed system, we must accept some form of > > > eventual consistency model. However, it's essential to design a > > > predictable and acceptable behavior. For example, if a change is a > > > result of a previous operation (such as an update on node B triggered > > > after observing an operation on node A), we can say that the operation > > > on node A happened before the operation on node B. Conversely, if > > > operations on nodes A and B are independent, we consider them > > > concurrent. > > > > > > In distributed systems, clock skew is a known issue. To establish a > > > consistency model, we need to ensure it guarantees the > > > "happens-before" relationship. Consider a scenario with three nodes: > > > NodeA, NodeB, and NodeC. If NodeA sends changes to NodeB, and > > > subsequently NodeB makes changes, and then both NodeA's and NodeB's > > > changes are sent to NodeC, the clock skew might make NodeB's changes > > > appear to have occurred before NodeA's changes. However, we should > > > maintain data that indicates NodeB's changes were triggered after > > > NodeA's changes arrived at NodeB. This implies that logically, NodeB's > > > changes happened after NodeA's changes, despite what the timestamps > > > suggest. > > > > > > A common method to handle such cases is using vector clocks for > > > conflict resolution. > > > > > > > I think the unbounded size of the vector could be a problem to store > > for each event. However, while researching previous discussions, it > > came to our notice that we have discussed this topic in the past as > > well in the context of standbys. For recovery_min_apply_delay, we > > decided the clock skew is not a problem as the settings of this > > parameter are much larger than typical time deviations between servers > > as mentioned in docs. Similarly for casual reads [1], there was a > > proposal to introduce max_clock_skew parameter and suggesting the user > > to make sure to have NTP set up correctly. We have tried to check > > other databases (like Ora and BDR) where CDR is implemented but didn't > > find anything specific to clock skew. So, I propose to go with a GUC > > like max_clock_skew such that if the difference of time between the > > incoming transaction's commit time and the local time is more than > > max_clock_skew then we raise an ERROR. It is not clear to me that > > putting bigger effort into clock skew is worth especially when other > > systems providing CDR feature (like Ora or BDR) for decades have not > > done anything like vector clocks. It is possible that this is less of > > a problem w.r.t CDR and just detecting the anomaly in clock skew is > > good enough. > > I believe that if we've accepted this solution elsewhere, then we can > also consider the same. Basically, we're allowing the application to > set its tolerance for clock skew. And, if the skew exceeds that > tolerance, it's the application's responsibility to synchronize; > otherwise, an error will occur. This approach seems reasonable. This model can be further extended by making the apply worker wait if the remote transaction's commit_ts is greater than the local timestamp. This ensures that no local transactions occurring after the remote transaction appear to have happened earlier due to clock skew instead we make them happen before the remote transaction by delaying the remote transaction apply. Essentially, by having the remote application wait until the local timestamp matches the remote transaction's timestamp, we ensure that the remote transaction, which seems to occur after concurrent local transactions due to clock skew, is actually applied after those transactions. With this model, there should be no ordering errors from the application's perspective as well if synchronous commit is enabled. The transaction initiated by the publisher cannot be completed until it is applied to the synchronous subscriber. This ensures that if the subscriber's clock is lagging behind the publisher's clock, the transaction will not be applied until the subscriber's local clock is in sync, preventing the transaction from being completed out of order. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Mon, Jun 17, 2024 at 8:51 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Mon, Jun 17, 2024 at 1:42 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > The difference w.r.t the existing mechanisms for holding deleted data > > is that we don't know whether we need to hold off the vacuum from > > cleaning up the rows because we can't say with any certainty whether > > other nodes will perform any conflicting operations in the future. > > Using the example we discussed, > > Node A: > > T1: INSERT INTO t (id, value) VALUES (1,1); > > T2: DELETE FROM t WHERE id = 1; > > > > Node B: > > T3: UPDATE t SET value = 2 WHERE id = 1; > > > > Say the order of receiving the commands is T1-T2-T3. We can't predict > > whether we will ever get T-3, so on what basis shall we try to prevent > > vacuum from removing the deleted row? > > The problem arises because T2 and T3 might be applied out of order on > some nodes. Once either one of them has been applied on every node, no > further conflicts are possible. If we decide to skip the update whether the row is missing or deleted, we indeed reach the same end result regardless of the order of T2, T3, and Vacuum. Here's how it looks in each case: Case 1: T1, T2, Vacuum, T3 -> Skip the update for a non-existing row -> end result we do not have a row. Case 2: T1, T2, T3 -> Skip the update for a deleted row -> end result we do not have a row. Case 3: T1, T3, T2 -> deleted the row -> end result we do not have a row. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Tue, Jun 18, 2024 at 11:54 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Mon, Jun 17, 2024 at 8:51 PM Robert Haas <robertmhaas@gmail.com> wrote: > > > > On Mon, Jun 17, 2024 at 1:42 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > The difference w.r.t the existing mechanisms for holding deleted data > > > is that we don't know whether we need to hold off the vacuum from > > > cleaning up the rows because we can't say with any certainty whether > > > other nodes will perform any conflicting operations in the future. > > > Using the example we discussed, > > > Node A: > > > T1: INSERT INTO t (id, value) VALUES (1,1); > > > T2: DELETE FROM t WHERE id = 1; > > > > > > Node B: > > > T3: UPDATE t SET value = 2 WHERE id = 1; > > > > > > Say the order of receiving the commands is T1-T2-T3. We can't predict > > > whether we will ever get T-3, so on what basis shall we try to prevent > > > vacuum from removing the deleted row? > > > > The problem arises because T2 and T3 might be applied out of order on > > some nodes. Once either one of them has been applied on every node, no > > further conflicts are possible. > > If we decide to skip the update whether the row is missing or deleted, > we indeed reach the same end result regardless of the order of T2, T3, > and Vacuum. Here's how it looks in each case: > > Case 1: T1, T2, Vacuum, T3 -> Skip the update for a non-existing row > -> end result we do not have a row. > Case 2: T1, T2, T3 -> Skip the update for a deleted row -> end result > we do not have a row. > Case 3: T1, T3, T2 -> deleted the row -> end result we do not have a row. > In case 3, how can deletion be successful? The row required to be deleted has already been updated. -- With Regards, Amit Kapila.
On Tue, Jun 18, 2024 at 12:11 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Jun 18, 2024 at 11:54 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Mon, Jun 17, 2024 at 8:51 PM Robert Haas <robertmhaas@gmail.com> wrote: > > > > > > On Mon, Jun 17, 2024 at 1:42 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > The difference w.r.t the existing mechanisms for holding deleted data > > > > is that we don't know whether we need to hold off the vacuum from > > > > cleaning up the rows because we can't say with any certainty whether > > > > other nodes will perform any conflicting operations in the future. > > > > Using the example we discussed, > > > > Node A: > > > > T1: INSERT INTO t (id, value) VALUES (1,1); > > > > T2: DELETE FROM t WHERE id = 1; > > > > > > > > Node B: > > > > T3: UPDATE t SET value = 2 WHERE id = 1; > > > > > > > > Say the order of receiving the commands is T1-T2-T3. We can't predict > > > > whether we will ever get T-3, so on what basis shall we try to prevent > > > > vacuum from removing the deleted row? > > > > > > The problem arises because T2 and T3 might be applied out of order on > > > some nodes. Once either one of them has been applied on every node, no > > > further conflicts are possible. > > > > If we decide to skip the update whether the row is missing or deleted, > > we indeed reach the same end result regardless of the order of T2, T3, > > and Vacuum. Here's how it looks in each case: > > > > Case 1: T1, T2, Vacuum, T3 -> Skip the update for a non-existing row > > -> end result we do not have a row. > > Case 2: T1, T2, T3 -> Skip the update for a deleted row -> end result > > we do not have a row. > > Case 3: T1, T3, T2 -> deleted the row -> end result we do not have a row. > > > > In case 3, how can deletion be successful? The row required to be > deleted has already been updated. Hmm, I was considering this case in the example given by you above[1], so we have updated some fields of the row with id=1, isn't this row still detectable by the delete because delete will find this by id=1 as we haven't updated the id? I was making the point w.r.t. the example used above. [1] > > > > Node A: > > > > T1: INSERT INTO t (id, value) VALUES (1,1); > > > > T2: DELETE FROM t WHERE id = 1; > > > > > > > > Node B: > > > > T3: UPDATE t SET value = 2 WHERE id = 1; -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Tue, Jun 18, 2024 at 1:18 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Tue, Jun 18, 2024 at 12:11 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Tue, Jun 18, 2024 at 11:54 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > On Mon, Jun 17, 2024 at 8:51 PM Robert Haas <robertmhaas@gmail.com> wrote: > > > > > > > > On Mon, Jun 17, 2024 at 1:42 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > The difference w.r.t the existing mechanisms for holding deleted data > > > > > is that we don't know whether we need to hold off the vacuum from > > > > > cleaning up the rows because we can't say with any certainty whether > > > > > other nodes will perform any conflicting operations in the future. > > > > > Using the example we discussed, > > > > > Node A: > > > > > T1: INSERT INTO t (id, value) VALUES (1,1); > > > > > T2: DELETE FROM t WHERE id = 1; > > > > > > > > > > Node B: > > > > > T3: UPDATE t SET value = 2 WHERE id = 1; > > > > > > > > > > Say the order of receiving the commands is T1-T2-T3. We can't predict > > > > > whether we will ever get T-3, so on what basis shall we try to prevent > > > > > vacuum from removing the deleted row? > > > > > > > > The problem arises because T2 and T3 might be applied out of order on > > > > some nodes. Once either one of them has been applied on every node, no > > > > further conflicts are possible. > > > > > > If we decide to skip the update whether the row is missing or deleted, > > > we indeed reach the same end result regardless of the order of T2, T3, > > > and Vacuum. Here's how it looks in each case: > > > > > > Case 1: T1, T2, Vacuum, T3 -> Skip the update for a non-existing row > > > -> end result we do not have a row. > > > Case 2: T1, T2, T3 -> Skip the update for a deleted row -> end result > > > we do not have a row. > > > Case 3: T1, T3, T2 -> deleted the row -> end result we do not have a row. > > > > > > > In case 3, how can deletion be successful? The row required to be > > deleted has already been updated. > > Hmm, I was considering this case in the example given by you above[1], > so we have updated some fields of the row with id=1, isn't this row > still detectable by the delete because delete will find this by id=1 > as we haven't updated the id? I was making the point w.r.t. the > example used above. > Your point is correct w.r.t the example but I responded considering a general update-delete ordering. BTW, it is not clear to me how update_delete conflict will be handled with what Robert and you are saying. I'll try to say what I understood. If we assume that there are two nodes A & B as mentioned in the above example and DELETE has applied on both nodes, now say UPDATE has been performed on node B then irrespective of whether we consider the conflict as update_delete or update_missing, the data will remain same on both nodes. So, in such a case, we don't need to bother differentiating between those two types of conflicts. Is that what we can interpret from above? -- With Regards, Amit Kapila.
On Tue, Jun 18, 2024 at 11:34 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Tue, Jun 18, 2024 at 10:17 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Mon, Jun 17, 2024 at 3:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Wed, Jun 12, 2024 at 10:03 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > > > On Tue, Jun 11, 2024 at 7:44 PM Tomas Vondra > > > > <tomas.vondra@enterprisedb.com> wrote: > > > > > > > > > > Yes, that's correct. However, many cases could benefit from the > > > > > > update_deleted conflict type if it can be implemented reliably. That's > > > > > > why we wanted to give it a try. But if we can't achieve predictable > > > > > > results with it, I'm fine to drop this approach and conflict_type. We > > > > > > can consider a better design in the future that doesn't depend on > > > > > > non-vacuumed entries and provides a more robust method for identifying > > > > > > deleted rows. > > > > > > > > > > > > > > > > I agree having a separate update_deleted conflict would be beneficial, > > > > > I'm not arguing against that - my point is actually that I think this > > > > > conflict type is required, and that it needs to be detected reliably. > > > > > > > > > > > > > When working with a distributed system, we must accept some form of > > > > eventual consistency model. However, it's essential to design a > > > > predictable and acceptable behavior. For example, if a change is a > > > > result of a previous operation (such as an update on node B triggered > > > > after observing an operation on node A), we can say that the operation > > > > on node A happened before the operation on node B. Conversely, if > > > > operations on nodes A and B are independent, we consider them > > > > concurrent. > > > > > > > > In distributed systems, clock skew is a known issue. To establish a > > > > consistency model, we need to ensure it guarantees the > > > > "happens-before" relationship. Consider a scenario with three nodes: > > > > NodeA, NodeB, and NodeC. If NodeA sends changes to NodeB, and > > > > subsequently NodeB makes changes, and then both NodeA's and NodeB's > > > > changes are sent to NodeC, the clock skew might make NodeB's changes > > > > appear to have occurred before NodeA's changes. However, we should > > > > maintain data that indicates NodeB's changes were triggered after > > > > NodeA's changes arrived at NodeB. This implies that logically, NodeB's > > > > changes happened after NodeA's changes, despite what the timestamps > > > > suggest. > > > > > > > > A common method to handle such cases is using vector clocks for > > > > conflict resolution. > > > > > > > > > > I think the unbounded size of the vector could be a problem to store > > > for each event. However, while researching previous discussions, it > > > came to our notice that we have discussed this topic in the past as > > > well in the context of standbys. For recovery_min_apply_delay, we > > > decided the clock skew is not a problem as the settings of this > > > parameter are much larger than typical time deviations between servers > > > as mentioned in docs. Similarly for casual reads [1], there was a > > > proposal to introduce max_clock_skew parameter and suggesting the user > > > to make sure to have NTP set up correctly. We have tried to check > > > other databases (like Ora and BDR) where CDR is implemented but didn't > > > find anything specific to clock skew. So, I propose to go with a GUC > > > like max_clock_skew such that if the difference of time between the > > > incoming transaction's commit time and the local time is more than > > > max_clock_skew then we raise an ERROR. It is not clear to me that > > > putting bigger effort into clock skew is worth especially when other > > > systems providing CDR feature (like Ora or BDR) for decades have not > > > done anything like vector clocks. It is possible that this is less of > > > a problem w.r.t CDR and just detecting the anomaly in clock skew is > > > good enough. > > > > I believe that if we've accepted this solution elsewhere, then we can > > also consider the same. Basically, we're allowing the application to > > set its tolerance for clock skew. And, if the skew exceeds that > > tolerance, it's the application's responsibility to synchronize; > > otherwise, an error will occur. This approach seems reasonable. > > This model can be further extended by making the apply worker wait if > the remote transaction's commit_ts is greater than the local > timestamp. This ensures that no local transactions occurring after the > remote transaction appear to have happened earlier due to clock skew > instead we make them happen before the remote transaction by delaying > the remote transaction apply. Essentially, by having the remote > application wait until the local timestamp matches the remote > transaction's timestamp, we ensure that the remote transaction, which > seems to occur after concurrent local transactions due to clock skew, > is actually applied after those transactions. > > With this model, there should be no ordering errors from the > application's perspective as well if synchronous commit is enabled. > The transaction initiated by the publisher cannot be completed until > it is applied to the synchronous subscriber. This ensures that if the > subscriber's clock is lagging behind the publisher's clock, the > transaction will not be applied until the subscriber's local clock is > in sync, preventing the transaction from being completed out of order. I tried to work out a few scenarios with this, where the apply worker will wait until its local clock hits 'remote_commit_tts - max_skew permitted'. Please have a look. Let's say, we have a GUC to configure max_clock_skew permitted. Resolver is last_update_wins in both cases. ---------------- 1) Case 1: max_clock_skew set to 0 i.e. no tolerance for clock skew. Remote Update with commit_timestamp = 10.20AM. Local clock (which is say 5 min behind) shows = 10.15AM. When remote update arrives at local node, we see that skew is greater than max_clock_skew and thus apply worker waits till local clock hits 'remote's commit_tts - max_clock_skew' i.e. till 10.20 AM. Once the local clock hits 10.20 AM, the worker applies the remote change with commit_tts of 10.20AM. In the meantime (during wait period of apply worker)) if some local update on same row has happened at say 10.18am, that will applied first, which will be later overwritten by above remote change of 10.20AM as remote-change's timestamp appear more latest, even though it has happened earlier than local change. 2) Case 2: max_clock_skew is set to 2min. Remote Update with commit_timestamp=10.20AM Local clock (which is say 5 min behind) = 10.15AM. Now apply worker will notice skew greater than 2min and thus will wait till local clock hits 'remote's commit_tts - max_clock_skew' i.e. 10.18 and will apply the change with commit_tts of 10.20 ( as we always save the origin's commit timestamp into local commit_tts, see RecordTransactionCommit->TransactionTreeSetCommitTsData). Now lets say another local update is triggered at 10.19am, it will be applied locally but it will be ignored on remote node. On the remote node , the existing change with a timestamp of 10.20 am will win resulting in data divergence. ---------- In case 1, the local change which was otherwise triggered later than the remote change is overwritten by remote change. And in Case2, it results in data divergence. Is this behaviour in both cases expected? Or am I getting the wait logic wrong? Thoughts? thanks Shveta
On Tue, Jun 18, 2024 at 7:44 AM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote: > > On Thursday, June 13, 2024 2:11 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > Hi, > > > On Wed, Jun 5, 2024 at 3:32 PM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> > > wrote: > > > > > > This time at PGconf.dev[1], we had some discussions regarding this > > > project. The proposed approach is to split the work into two main > > > components. The first part focuses on conflict detection, which aims > > > to identify and report conflicts in logical replication. This feature > > > will enable users to monitor the unexpected conflicts that may occur. > > > The second part involves the actual conflict resolution. Here, we will > > > provide built-in resolutions for each conflict and allow user to > > > choose which resolution will be used for which conflict(as described > > > in the initial email of this thread). > > > > I agree with this direction that we focus on conflict detection (and > > logging) first and then develop conflict resolution on top of that. > > Thanks for your reply ! > > > > > > > > > Of course, we are open to alternative ideas and suggestions, and the > > > strategy above can be changed based on ongoing discussions and > > > feedback received. > > > > > > Here is the patch of the first part work, which adds a new parameter > > > detect_conflict for CREATE and ALTER subscription commands. This new > > > parameter will decide if subscription will go for conflict detection. > > > By default, conflict detection will be off for a subscription. > > > > > > When conflict detection is enabled, additional logging is triggered in > > > the following conflict scenarios: > > > > > > * updating a row that was previously modified by another origin. > > > * The tuple to be updated is not found. > > > * The tuple to be deleted is not found. > > > > > > While there exist other conflict types in logical replication, such as > > > an incoming insert conflicting with an existing row due to a primary > > > key or unique index, these cases already result in constraint violation errors. > > > > What does detect_conflict being true actually mean to users? I understand that > > detect_conflict being true could introduce some overhead to detect conflicts. > > But in terms of conflict detection, even if detect_confict is false, we detect > > some conflicts such as concurrent inserts with the same key. Once we > > introduce the complete conflict detection feature, I'm not sure there is a case > > where a user wants to detect only some particular types of conflict. > > > > > Therefore, additional conflict detection for these cases is currently > > > omitted to minimize potential overhead. However, the pre-detection for > > > conflict in these error cases is still essential to support automatic > > > conflict resolution in the future. > > > > I feel that we should log all types of conflict in an uniform way. For example, > > with detect_conflict being true, the update_differ conflict is reported as > > "conflict %s detected on relation "%s"", whereas concurrent inserts with the > > same key is reported as "duplicate key value violates unique constraint "%s"", > > which could confuse users. > > Do you mean it's ok to add a pre-check before applying the INSERT, which will > verify if the remote tuple violates any unique constraints, and if it violates > then we log a conflict message ? I thought about this but was slightly > worried about the extra cost it would bring. OTOH, if we think it's acceptable, > we could do that since the cost is there only when detect_conflict is enabled. > > I also thought of logging such a conflict message in pg_catch(), but I think we > lack some necessary info(relation, index name, column name) at the catch block. > Can't we use/extend existing 'apply_error_callback_arg' for this purpose? -- With Regards, Amit Kapila.
On Tue, Jun 11, 2024 at 3:12 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote: > > On Sat, Jun 8, 2024 at 3:52 PM Amit Kapila <amit.kapila16@gmail.com> wrote: >> >> On Fri, Jun 7, 2024 at 5:39 PM Ashutosh Bapat >> <ashutosh.bapat.oss@gmail.com> wrote: >> > >> > On Thu, Jun 6, 2024 at 5:16 PM Nisha Moond <nisha.moond412@gmail.com> wrote: >> >> >> >> > >> >> >> >> Here are more use cases of the "earliest_timestamp_wins" resolution method: >> >> 1) Applications where the record of first occurrence of an event is >> >> important. For example, sensor based applications like earthquake >> >> detection systems, capturing the first seismic wave's time is crucial. >> >> 2) Scheduling systems, like appointment booking, prioritize the >> >> earliest request when handling concurrent ones. >> >> 3) In contexts where maintaining chronological order is important - >> >> a) Social media platforms display comments ensuring that the >> >> earliest ones are visible first. >> >> b) Finance transaction processing systems rely on timestamps to >> >> prioritize the processing of transactions, ensuring that the earliest >> >> transaction is handled first >> > >> > >> > Thanks for sharing examples. However, these scenarios would be handled by the application and not during replication.What we are discussing here is the timestamp when a row was updated/inserted/deleted (or rather when the transactionthat updated row committed/became visible) and not a DML on column which is of type timestamp. Some implementationsuse a hidden timestamp column but that's different from a user column which captures timestamp of (say) anevent. The conflict resolution will be based on the timestamp when that column's value was recorded in the database whichmay be different from the value of the column itself. >> > >> >> It depends on how these operations are performed. For example, the >> appointment booking system could be prioritized via a transaction >> updating a row with columns emp_name, emp_id, reserved, time_slot. >> Now, if two employees at different geographical locations try to book >> the calendar, the earlier transaction will win. > > > I doubt that it would be that simple. The application will have to intervene and tell one of the employees that their reservationhas failed. It looks natural that the first one to reserve the room should get the reservation, but implementingthat is more complex than resolving a conflict in the database. In fact, mostly it will be handled outside database. > Sure, the application needs some handling but I have tried to explain with a simple way that comes to my mind and how it can be realized with db involved. This is a known conflict detection method but note that I am not insisting to have "earliest_timestamp_wins". Even, if we want this we can have a separate discussion on this and add it later. >> >> >> > If we use the transaction commit timestamp as basis for resolution, a transaction where multiple rows conflict may endup with different rows affected by that transaction being resolved differently. Say three transactions T1, T2 and T3 onseparate origins with timestamps t1, t2, and t3 respectively changed rows r1, r2 and r2, r3 and r1, r4 respectively. Changesto r1 and r2 will conflict. Let's say T2 and T3 are applied first and then T1 is applied. If t2 < t1 < t3, r1 willend up with version of T3 and r2 will end up with version of T1 after applying all the three transactions. >> > >> >> Are you telling the results based on latest_timestamp_wins? If so, >> then it is correct. OTOH, if the user has configured >> "earliest_timestamp_wins" resolution method, then we should end up >> with a version of r1 from T1 because t1 < t3. Also, due to the same >> reason, we should have version r2 from T2. >> >> > >> Would that introduce an inconsistency between r1 and r2? >> > >> >> As per my understanding, this shouldn't be an inconsistency. Won't it >> be true even when the transactions are performed on a single node with >> the same timing? >> > > The inconsistency will arise irrespective of conflict resolution method. On a single system effects of whichever transactionruns last will be visible entirely. But in the example above the node where T1, T2, and T3 (from *different*)origins) are applied, we might end up with a situation where some changes from T1 are applied whereas some changesfrom T3 are applied. > I still think it will lead to the same result if all three T1, T2, T3 happen on the same node in the same order as you mentioned. Say, we have a pre-existing table with rows r1, r2, r3, r4. Now, if we use the order of transactions to be applied on the same node based on t2 < t1 < t3. First T2 will be applied, so for now, r1 is a pre-existing version and r2 is from T2. Next, when T1 is performed, both r1 and r2 are from T1. Lastly, when T3 is applied, r1 will be from T3 and r2 will be from T1. This is what you mentioned will happen after conflict resolution in the above example. -- With Regards, Amit Kapila.
On Tue, Jun 18, 2024 at 3:29 PM shveta malik <shveta.malik@gmail.com> wrote: > On Tue, Jun 18, 2024 at 11:34 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > I tried to work out a few scenarios with this, where the apply worker > will wait until its local clock hits 'remote_commit_tts - max_skew > permitted'. Please have a look. > > Let's say, we have a GUC to configure max_clock_skew permitted. > Resolver is last_update_wins in both cases. > ---------------- > 1) Case 1: max_clock_skew set to 0 i.e. no tolerance for clock skew. > > Remote Update with commit_timestamp = 10.20AM. > Local clock (which is say 5 min behind) shows = 10.15AM. > > When remote update arrives at local node, we see that skew is greater > than max_clock_skew and thus apply worker waits till local clock hits > 'remote's commit_tts - max_clock_skew' i.e. till 10.20 AM. Once the > local clock hits 10.20 AM, the worker applies the remote change with > commit_tts of 10.20AM. In the meantime (during wait period of apply > worker)) if some local update on same row has happened at say 10.18am, > that will applied first, which will be later overwritten by above > remote change of 10.20AM as remote-change's timestamp appear more > latest, even though it has happened earlier than local change. For the sake of simplicity let's call the change that happened at 10:20 AM change-1 and the change that happened at 10:15 as change-2 and assume we are talking about the synchronous commit only. I think now from an application perspective the change-1 wouldn't have caused the change-2 because we delayed applying change-2 on the local node which would have delayed the confirmation of the change-1 to the application that means we have got the change-2 on the local node without the confirmation of change-1 hence change-2 has no causal dependency on the change-1. So it's fine that we perform change-1 before change-2 and the timestamp will also show the same at any other node if they receive these 2 changes. The goal is to ensure that if we define the order where change-2 happens before change-1, this same order should be visible on all other nodes. This will hold true because the commit timestamp of change-2 is earlier than that of change-1. > 2) Case 2: max_clock_skew is set to 2min. > > Remote Update with commit_timestamp=10.20AM > Local clock (which is say 5 min behind) = 10.15AM. > > Now apply worker will notice skew greater than 2min and thus will wait > till local clock hits 'remote's commit_tts - max_clock_skew' i.e. > 10.18 and will apply the change with commit_tts of 10.20 ( as we > always save the origin's commit timestamp into local commit_tts, see > RecordTransactionCommit->TransactionTreeSetCommitTsData). Now lets say > another local update is triggered at 10.19am, it will be applied > locally but it will be ignored on remote node. On the remote node , > the existing change with a timestamp of 10.20 am will win resulting in > data divergence. Let's call the 10:20 AM change as a change-1 and the change that happened at 10:19 as change-2 IIUC, although we apply the change-1 at 10:18 AM the commit_ts of that commit_ts of that change is 10:20, and the same will be visible to all other nodes. So in conflict resolution still the change-1 happened after the change-2 because change-2's commit_ts is 10:19 AM. Now there could be a problem with the causal order because we applied the change-1 at 10:18 AM so the application might have gotten confirmation at 10:18 AM and the change-2 of the local node may be triggered as a result of confirmation of the change-1 that means now change-2 has a causal dependency on the change-1 but commit_ts shows change-2 happened before the change-1 on all the nodes. So, is this acceptable? I think yes because the user has configured a maximum clock skew of 2 minutes, which means the detected order might not always align with the causal order for transactions occurring within that time frame. Generally, the ideal configuration for max_clock_skew should be in multiple of the network round trip time. Assuming this configuration, we wouldn’t encounter this problem because for change-2 to be caused by change-1, the client would need to get confirmation of change-1 and then trigger change-2, which would take at least 2-3 network round trips. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Wed, Jun 19, 2024 at 1:52 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Tue, Jun 18, 2024 at 3:29 PM shveta malik <shveta.malik@gmail.com> wrote: > > On Tue, Jun 18, 2024 at 11:34 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > I tried to work out a few scenarios with this, where the apply worker > > will wait until its local clock hits 'remote_commit_tts - max_skew > > permitted'. Please have a look. > > > > Let's say, we have a GUC to configure max_clock_skew permitted. > > Resolver is last_update_wins in both cases. > > ---------------- > > 1) Case 1: max_clock_skew set to 0 i.e. no tolerance for clock skew. > > > > Remote Update with commit_timestamp = 10.20AM. > > Local clock (which is say 5 min behind) shows = 10.15AM. > > > > When remote update arrives at local node, we see that skew is greater > > than max_clock_skew and thus apply worker waits till local clock hits > > 'remote's commit_tts - max_clock_skew' i.e. till 10.20 AM. Once the > > local clock hits 10.20 AM, the worker applies the remote change with > > commit_tts of 10.20AM. In the meantime (during wait period of apply > > worker)) if some local update on same row has happened at say 10.18am, > > that will applied first, which will be later overwritten by above > > remote change of 10.20AM as remote-change's timestamp appear more > > latest, even though it has happened earlier than local change. > > For the sake of simplicity let's call the change that happened at > 10:20 AM change-1 and the change that happened at 10:15 as change-2 > and assume we are talking about the synchronous commit only. Do you mean "the change that happened at 10:18 as change-2" > > I think now from an application perspective the change-1 wouldn't have > caused the change-2 because we delayed applying change-2 on the local > node Do you mean "we delayed applying change-1 on the local node." >which would have delayed the confirmation of the change-1 to the > application that means we have got the change-2 on the local node > without the confirmation of change-1 hence change-2 has no causal > dependency on the change-1. So it's fine that we perform change-1 > before change-2 Do you mean "So it's fine that we perform change-2 before change-1" >and the timestamp will also show the same at any other > node if they receive these 2 changes. > > The goal is to ensure that if we define the order where change-2 > happens before change-1, this same order should be visible on all > other nodes. This will hold true because the commit timestamp of > change-2 is earlier than that of change-1. Considering the above corrections as base, I agree with this. > > 2) Case 2: max_clock_skew is set to 2min. > > > > Remote Update with commit_timestamp=10.20AM > > Local clock (which is say 5 min behind) = 10.15AM. > > > > Now apply worker will notice skew greater than 2min and thus will wait > > till local clock hits 'remote's commit_tts - max_clock_skew' i.e. > > 10.18 and will apply the change with commit_tts of 10.20 ( as we > > always save the origin's commit timestamp into local commit_tts, see > > RecordTransactionCommit->TransactionTreeSetCommitTsData). Now lets say > > another local update is triggered at 10.19am, it will be applied > > locally but it will be ignored on remote node. On the remote node , > > the existing change with a timestamp of 10.20 am will win resulting in > > data divergence. > > Let's call the 10:20 AM change as a change-1 and the change that > happened at 10:19 as change-2 > > IIUC, although we apply the change-1 at 10:18 AM the commit_ts of that > commit_ts of that change is 10:20, and the same will be visible to all > other nodes. So in conflict resolution still the change-1 happened > after the change-2 because change-2's commit_ts is 10:19 AM. Now > there could be a problem with the causal order because we applied the > change-1 at 10:18 AM so the application might have gotten confirmation > at 10:18 AM and the change-2 of the local node may be triggered as a > result of confirmation of the change-1 that means now change-2 has a > causal dependency on the change-1 but commit_ts shows change-2 > happened before the change-1 on all the nodes. > > So, is this acceptable? I think yes because the user has configured a > maximum clock skew of 2 minutes, which means the detected order might > not always align with the causal order for transactions occurring > within that time frame. Agree. I had the same thoughts, and wanted to confirm my understanding. >Generally, the ideal configuration for > max_clock_skew should be in multiple of the network round trip time. > Assuming this configuration, we wouldn’t encounter this problem > because for change-2 to be caused by change-1, the client would need > to get confirmation of change-1 and then trigger change-2, which would > take at least 2-3 network round trips. thanks Shveta
On Wed, Jun 19, 2024 at 12:03 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> I doubt that it would be that simple. The application will have to intervene and tell one of the employees that their reservation has failed. It looks natural that the first one to reserve the room should get the reservation, but implementing that is more complex than resolving a conflict in the database. In fact, mostly it will be handled outside database.
>
Sure, the application needs some handling but I have tried to explain
with a simple way that comes to my mind and how it can be realized
with db involved. This is a known conflict detection method but note
that I am not insisting to have "earliest_timestamp_wins". Even, if we
want this we can have a separate discussion on this and add it later.
It will be good to add a minimal set of conflict resolution strategies to begin with, while designing the feature for extensibility. I imagine the first version might just detect the conflict and throw error or do nothing. That's already two simple conflict resolution strategies with minimal efforts. We can add more complicated ones incrementally.
>
> The inconsistency will arise irrespective of conflict resolution method. On a single system effects of whichever transaction runs last will be visible entirely. But in the example above the node where T1, T2, and T3 (from *different*) origins) are applied, we might end up with a situation where some changes from T1 are applied whereas some changes from T3 are applied.
>
I still think it will lead to the same result if all three T1, T2, T3
happen on the same node in the same order as you mentioned. Say, we
have a pre-existing table with rows r1, r2, r3, r4. Now, if we use the
order of transactions to be applied on the same node based on t2 < t1
< t3. First T2 will be applied, so for now, r1 is a pre-existing
version and r2 is from T2. Next, when T1 is performed, both r1 and r2
are from T1. Lastly, when T3 is applied, r1 will be from T3 and r2
will be from T1. This is what you mentioned will happen after conflict
resolution in the above example.
You are right. It won't affect the consistency. The contents of transaction on each node might vary after application depending upon the changes that conflict resolver makes; but the end result will be the same.
--
Best Wishes,
Ashutosh Bapat
On Wed, Jun 19, 2024 at 2:36 PM shveta malik <shveta.malik@gmail.com> wrote: > > On Wed, Jun 19, 2024 at 1:52 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Tue, Jun 18, 2024 at 3:29 PM shveta malik <shveta.malik@gmail.com> wrote: > > > On Tue, Jun 18, 2024 at 11:34 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > I tried to work out a few scenarios with this, where the apply worker > > > will wait until its local clock hits 'remote_commit_tts - max_skew > > > permitted'. Please have a look. > > > > > > Let's say, we have a GUC to configure max_clock_skew permitted. > > > Resolver is last_update_wins in both cases. > > > ---------------- > > > 1) Case 1: max_clock_skew set to 0 i.e. no tolerance for clock skew. > > > > > > Remote Update with commit_timestamp = 10.20AM. > > > Local clock (which is say 5 min behind) shows = 10.15AM. > > > > > > When remote update arrives at local node, we see that skew is greater > > > than max_clock_skew and thus apply worker waits till local clock hits > > > 'remote's commit_tts - max_clock_skew' i.e. till 10.20 AM. Once the > > > local clock hits 10.20 AM, the worker applies the remote change with > > > commit_tts of 10.20AM. In the meantime (during wait period of apply > > > worker)) if some local update on same row has happened at say 10.18am, > > > that will applied first, which will be later overwritten by above > > > remote change of 10.20AM as remote-change's timestamp appear more > > > latest, even though it has happened earlier than local change. Oops lot of mistakes in the usage of change-1 and change-2, sorry about that. > > For the sake of simplicity let's call the change that happened at > > 10:20 AM change-1 and the change that happened at 10:15 as change-2 > > and assume we are talking about the synchronous commit only. > > Do you mean "the change that happened at 10:18 as change-2" Right > > > > I think now from an application perspective the change-1 wouldn't have > > caused the change-2 because we delayed applying change-2 on the local > > node > > Do you mean "we delayed applying change-1 on the local node." Right > >which would have delayed the confirmation of the change-1 to the > > application that means we have got the change-2 on the local node > > without the confirmation of change-1 hence change-2 has no causal > > dependency on the change-1. So it's fine that we perform change-1 > > before change-2 > > Do you mean "So it's fine that we perform change-2 before change-1" Right > >and the timestamp will also show the same at any other > > node if they receive these 2 changes. > > > > The goal is to ensure that if we define the order where change-2 > > happens before change-1, this same order should be visible on all > > other nodes. This will hold true because the commit timestamp of > > change-2 is earlier than that of change-1. > > Considering the above corrections as base, I agree with this. +1 > > > 2) Case 2: max_clock_skew is set to 2min. > > > > > > Remote Update with commit_timestamp=10.20AM > > > Local clock (which is say 5 min behind) = 10.15AM. > > > > > > Now apply worker will notice skew greater than 2min and thus will wait > > > till local clock hits 'remote's commit_tts - max_clock_skew' i.e. > > > 10.18 and will apply the change with commit_tts of 10.20 ( as we > > > always save the origin's commit timestamp into local commit_tts, see > > > RecordTransactionCommit->TransactionTreeSetCommitTsData). Now lets say > > > another local update is triggered at 10.19am, it will be applied > > > locally but it will be ignored on remote node. On the remote node , > > > the existing change with a timestamp of 10.20 am will win resulting in > > > data divergence. > > > > Let's call the 10:20 AM change as a change-1 and the change that > > happened at 10:19 as change-2 > > > > IIUC, although we apply the change-1 at 10:18 AM the commit_ts of that > > commit_ts of that change is 10:20, and the same will be visible to all > > other nodes. So in conflict resolution still the change-1 happened > > after the change-2 because change-2's commit_ts is 10:19 AM. Now > > there could be a problem with the causal order because we applied the > > change-1 at 10:18 AM so the application might have gotten confirmation > > at 10:18 AM and the change-2 of the local node may be triggered as a > > result of confirmation of the change-1 that means now change-2 has a > > causal dependency on the change-1 but commit_ts shows change-2 > > happened before the change-1 on all the nodes. > > > > So, is this acceptable? I think yes because the user has configured a > > maximum clock skew of 2 minutes, which means the detected order might > > not always align with the causal order for transactions occurring > > within that time frame. > > Agree. I had the same thoughts, and wanted to confirm my understanding. Okay -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Wed, Jun 19, 2024 at 2:51 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote: > > On Wed, Jun 19, 2024 at 12:03 PM Amit Kapila <amit.kapila16@gmail.com> wrote: >> >> > I doubt that it would be that simple. The application will have to intervene and tell one of the employees that theirreservation has failed. It looks natural that the first one to reserve the room should get the reservation, but implementingthat is more complex than resolving a conflict in the database. In fact, mostly it will be handled outside database. >> > >> >> Sure, the application needs some handling but I have tried to explain >> with a simple way that comes to my mind and how it can be realized >> with db involved. This is a known conflict detection method but note >> that I am not insisting to have "earliest_timestamp_wins". Even, if we >> want this we can have a separate discussion on this and add it later. >> > > It will be good to add a minimal set of conflict resolution strategies to begin with, while designing the feature for extensibility.I imagine the first version might just detect the conflict and throw error or do nothing. That's already twosimple conflict resolution strategies with minimal efforts. We can add more complicated ones incrementally. > Agreed, splitting the work into multiple patches would help us to finish the easier ones first. I have thought to divide it such that in the first patch, we detect conflicts like 'insert_exists', 'update_differ', 'update_missing', and 'delete_missing' (the definition of each could be found in the initial email [1]) and throw an ERROR or write them in LOG. Various people agreed to have this as a separate committable work [2]. This can help users to detect and monitor the conflicts in a better way. I have intentionally skipped update_deleted as it would require more infrastructure and it would be helpful even without that. In the second patch, we can implement simple built-in resolution strategies like apply and skip (which can be named as remote_apply and keep_local, see [3][4] for details on these strategies) with ERROR or LOG being the default strategy. We can allow these strategies to be configured at the global and table level. In the third patch, we can add monitoring capability for conflicts and resolutions as mentioned by Jonathan [5]. Here, we can have stats like how many conflicts of a particular type have happened. In the meantime, we can keep discussing and try to reach a consensus on the timing-related resolution strategy like 'last_update_wins' and the conflict strategy 'update_deleted'. If we agree on the above, some of the work, especially the first one, could even be discussed in a separate thread. Thoughts? [1] - https://www.postgresql.org/message-id/CAJpy0uD0-DpYVMtsxK5R%3DzszXauZBayQMAYET9sWr_w0CNWXxQ%40mail.gmail.com [2] - https://www.postgresql.org/message-id/CAD21AoAa6JzqhXY02uNUPb-aTozu2RY9nMdD1%3DTUh%2BFpskkYtw%40mail.gmail.com [3] - https://www.postgresql.org/message-id/CAJpy0uD0-DpYVMtsxK5R%3DzszXauZBayQMAYET9sWr_w0CNWXxQ%40mail.gmail.com [4] - https://github.com/2ndquadrant/pglogical?tab=readme-ov-file#conflicts [5] - https://www.postgresql.org/message-id/1eb9242f-dcb6-45c3-871c-98ec324e03ef%40postgresql.org -- With Regards, Amit Kapila.
On Tue, Jun 18, 2024 at 11:34 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Tue, Jun 18, 2024 at 10:17 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > > > I think the unbounded size of the vector could be a problem to store > > > for each event. However, while researching previous discussions, it > > > came to our notice that we have discussed this topic in the past as > > > well in the context of standbys. For recovery_min_apply_delay, we > > > decided the clock skew is not a problem as the settings of this > > > parameter are much larger than typical time deviations between servers > > > as mentioned in docs. Similarly for casual reads [1], there was a > > > proposal to introduce max_clock_skew parameter and suggesting the user > > > to make sure to have NTP set up correctly. We have tried to check > > > other databases (like Ora and BDR) where CDR is implemented but didn't > > > find anything specific to clock skew. So, I propose to go with a GUC > > > like max_clock_skew such that if the difference of time between the > > > incoming transaction's commit time and the local time is more than > > > max_clock_skew then we raise an ERROR. It is not clear to me that > > > putting bigger effort into clock skew is worth especially when other > > > systems providing CDR feature (like Ora or BDR) for decades have not > > > done anything like vector clocks. It is possible that this is less of > > > a problem w.r.t CDR and just detecting the anomaly in clock skew is > > > good enough. > > > > I believe that if we've accepted this solution elsewhere, then we can > > also consider the same. Basically, we're allowing the application to > > set its tolerance for clock skew. And, if the skew exceeds that > > tolerance, it's the application's responsibility to synchronize; > > otherwise, an error will occur. This approach seems reasonable. > > This model can be further extended by making the apply worker wait if > the remote transaction's commit_ts is greater than the local > timestamp. This ensures that no local transactions occurring after the > remote transaction appear to have happened earlier due to clock skew > instead we make them happen before the remote transaction by delaying > the remote transaction apply. Essentially, by having the remote > application wait until the local timestamp matches the remote > transaction's timestamp, we ensure that the remote transaction, which > seems to occur after concurrent local transactions due to clock skew, > is actually applied after those transactions. > > With this model, there should be no ordering errors from the > application's perspective as well if synchronous commit is enabled. > The transaction initiated by the publisher cannot be completed until > it is applied to the synchronous subscriber. This ensures that if the > subscriber's clock is lagging behind the publisher's clock, the > transaction will not be applied until the subscriber's local clock is > in sync, preventing the transaction from being completed out of order. > As per the discussion, this idea will help us to resolve transaction ordering issues due to clock skew. I was thinking of having two variables max_clock_skew (indicates how much clock skew is acceptable), max_clock_skew_options: ERROR, LOG, WAIT (indicates the action we need to take once the clock skew is detected). There could be multiple ways to provide these parameters, one is providing them as GUCs, and another at the subscription or the table level. I am thinking whether users would only like to care about a table or set of tables or they would like to set such variables at the system level. We already have an SKIP option (that allows us to skip the transactions till a particular LSN) at the subscription level, so I am wondering if there is a sense to provide these new parameters related to conflict resolution also at the same level? -- With Regards, Amit Kapila.
On Thu, Jun 20, 2024 at 3:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Jun 19, 2024 at 2:51 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> On Wed, Jun 19, 2024 at 12:03 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>>
>> > I doubt that it would be that simple. The application will have to intervene and tell one of the employees that their reservation has failed. It looks natural that the first one to reserve the room should get the reservation, but implementing that is more complex than resolving a conflict in the database. In fact, mostly it will be handled outside database.
>> >
>>
>> Sure, the application needs some handling but I have tried to explain
>> with a simple way that comes to my mind and how it can be realized
>> with db involved. This is a known conflict detection method but note
>> that I am not insisting to have "earliest_timestamp_wins". Even, if we
>> want this we can have a separate discussion on this and add it later.
>>
>
> It will be good to add a minimal set of conflict resolution strategies to begin with, while designing the feature for extensibility. I imagine the first version might just detect the conflict and throw error or do nothing. That's already two simple conflict resolution strategies with minimal efforts. We can add more complicated ones incrementally.
>
Agreed, splitting the work into multiple patches would help us to
finish the easier ones first.
I have thought to divide it such that in the first patch, we detect
conflicts like 'insert_exists', 'update_differ', 'update_missing', and
'delete_missing' (the definition of each could be found in the initial
email [1]) and throw an ERROR or write them in LOG. Various people
agreed to have this as a separate committable work [2]. This can help
users to detect and monitor the conflicts in a better way. I have
intentionally skipped update_deleted as it would require more
infrastructure and it would be helpful even without that.
Since we are in the initial months of release, it will be good to take a stock of whether the receiver receives all the information needed for most (if not all) of the conflict detection and resolution strategies. If there are any missing pieces, we may want to add those in PG18 so that improved conflict detection and resolution on a higher version receiver can still work.
In the second patch, we can implement simple built-in resolution
strategies like apply and skip (which can be named as remote_apply and
keep_local, see [3][4] for details on these strategies) with ERROR or
LOG being the default strategy. We can allow these strategies to be
configured at the global and table level.
In the third patch, we can add monitoring capability for conflicts and
resolutions as mentioned by Jonathan [5]. Here, we can have stats like
how many conflicts of a particular type have happened.
That looks like a plan. Thanks for chalking it out.
Best Wishes,
Ashutosh Bapat
On Thu, Jun 20, 2024 at 5:06 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote: > > On Thu, Jun 20, 2024 at 3:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote: >> >> On Wed, Jun 19, 2024 at 2:51 PM Ashutosh Bapat >> <ashutosh.bapat.oss@gmail.com> wrote: >> > >> > On Wed, Jun 19, 2024 at 12:03 PM Amit Kapila <amit.kapila16@gmail.com> wrote: >> >> >> >> > I doubt that it would be that simple. The application will have to intervene and tell one of the employees that theirreservation has failed. It looks natural that the first one to reserve the room should get the reservation, but implementingthat is more complex than resolving a conflict in the database. In fact, mostly it will be handled outside database. >> >> > >> >> >> >> Sure, the application needs some handling but I have tried to explain >> >> with a simple way that comes to my mind and how it can be realized >> >> with db involved. This is a known conflict detection method but note >> >> that I am not insisting to have "earliest_timestamp_wins". Even, if we >> >> want this we can have a separate discussion on this and add it later. >> >> >> > >> > It will be good to add a minimal set of conflict resolution strategies to begin with, while designing the feature forextensibility. I imagine the first version might just detect the conflict and throw error or do nothing. That's alreadytwo simple conflict resolution strategies with minimal efforts. We can add more complicated ones incrementally. >> > >> >> Agreed, splitting the work into multiple patches would help us to >> finish the easier ones first. >> >> I have thought to divide it such that in the first patch, we detect >> conflicts like 'insert_exists', 'update_differ', 'update_missing', and >> 'delete_missing' (the definition of each could be found in the initial >> email [1]) and throw an ERROR or write them in LOG. Various people >> agreed to have this as a separate committable work [2]. This can help >> users to detect and monitor the conflicts in a better way. I have >> intentionally skipped update_deleted as it would require more >> infrastructure and it would be helpful even without that. > > > Since we are in the initial months of release, it will be good to take a stock of whether the receiver receives all theinformation needed for most (if not all) of the conflict detection and resolution strategies. If there are any missingpieces, we may want to add those in PG18 so that improved conflict detection and resolution on a higher version receivercan still work. > Good point. This can help us to detect conflicts if required even when we move to a higher version. As we continue to discuss/develop the features, I hope we will be able to see any missing pieces. >> >> >> In the second patch, we can implement simple built-in resolution >> strategies like apply and skip (which can be named as remote_apply and >> keep_local, see [3][4] for details on these strategies) with ERROR or >> LOG being the default strategy. We can allow these strategies to be >> configured at the global and table level. >> >> In the third patch, we can add monitoring capability for conflicts and >> resolutions as mentioned by Jonathan [5]. Here, we can have stats like >> how many conflicts of a particular type have happened. > > > That looks like a plan. Thanks for chalking it out. > Thanks! -- With Regards, Amit Kapila.
On Thu, Jun 20, 2024 at 6:41 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > >> In the second patch, we can implement simple built-in resolution > >> strategies like apply and skip (which can be named as remote_apply and > >> keep_local, see [3][4] for details on these strategies) with ERROR or > >> LOG being the default strategy. We can allow these strategies to be > >> configured at the global and table level. Before we implement resolvers, we need a way to configure them. Please find the patch002 which attempts to implement Global Level Conflict Resolvers Configuration. Note that patch002 is dependent upon Conflict-Detection patch001 which is reviewed in another thread [1]. I have attached patch001 here for convenience and to avoid CFBot failures. But please use [1] if you have any comments on patch001. New DDL commands in patch002 are: To set global resolver for given conflcit_type: SET CONFLICT RESOLVER 'conflict_resolver' FOR 'conflict_type' To reset to default resolver: RESET CONFLICT RESOLVER FOR 'conflict_type' TODO: Once we get initial consensus on DDL commands, I will add support for them in pg_dump/restore and will add doc. ------------ As suggested in [2] and above, it seems logical to have table-specific resolvers configuration along with global one. Here is the proposal for table level resolvers: 1) We can provide support for table level resolvers using ALTER TABLE: ALTER TABLE <name> SET CONFLICT RESOLVER <resolver1> on <conflict_type1>, SET CONFLICT RESOLVER <resolver2> on <conflict_type2>, ...; Reset can be done using: ALTER TABLE <name> RESET CONFLICT RESOLVER on <conflict_type1>, RESET CONFLICT RESOLVER on <conflict_type2>, ...; Above commands will save/remove configuration in/from the new system catalog pg_conflict_rel. 2) Table level configuration (if any) will be given preference over global ones. The tables not having table-specific resolvers will use global configured ones. 3) If the table is a partition table, then resolvers created for the parent will be inherited by all child partition tables. Multiple resolver entries will be created, one for each child partition in the system catalog (similar to constraints). 4) Users can also configure explicit resolvers for child partitions. In such a case, child's resolvers will override inherited resolvers (if any). 5) Any attempt to RESET (remove) inherited resolvers on the child partition table *alone* will result in error: "cannot reset inherited resolvers" (similar to constraints). But RESET of explicit created resolvers (non-inherited ones) will be permitted for child partitions. On RESET, the resolver configuration will not fallback to the inherited resolver again. Users need to explicitly configure new resolvers for the child partition tables (after RESET) if needed. 6) Removal/Reset of resolvers on parent will remove corresponding "inherited" resolvers on all the child partitions as well. If any child has overridden inherited resolvers earlier, those will stay. 7) For 'ALTER TABLE parent ATTACH PARTITION child'; if 'child' has its own resolvers set, those will not be overridden. But if it does not have resolvers set, it will inherit from the parent table. This will mean, for say out of 5 conflict_types, if the child table has resolvers configured for any 2, 'attach' will retain those; for the rest 3, it will inherit from the parent (if any). 8) Detach partition will not remove inherited resolvers, it will just mark them 'non inherited' (similar to constraints). Thoughts? ------------ [1]: https://www.postgresql.org/message-id/OS0PR01MB57161006B8F2779F2C97318194D42%40OS0PR01MB5716.jpnprd01.prod.outlook.com [2]: https://www.postgresql.org/message-id/4738d098-6378-494e-9f88-9e3a85a5de82%40enterprisedb.com thanks Shveta
Attachment
On Mon, Jun 24, 2024 at 1:47 PM shveta malik <shveta.malik@gmail.com> wrote: > > On Thu, Jun 20, 2024 at 6:41 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > >> In the second patch, we can implement simple built-in resolution > > >> strategies like apply and skip (which can be named as remote_apply and > > >> keep_local, see [3][4] for details on these strategies) with ERROR or > > >> LOG being the default strategy. We can allow these strategies to be > > >> configured at the global and table level. > > Before we implement resolvers, we need a way to configure them. Please > find the patch002 which attempts to implement Global Level Conflict > Resolvers Configuration. Note that patch002 is dependent upon > Conflict-Detection patch001 which is reviewed in another thread [1]. > I have attached patch001 here for convenience and to avoid CFBot > failures. But please use [1] if you have any comments on patch001. > > New DDL commands in patch002 are: > > To set global resolver for given conflcit_type: > SET CONFLICT RESOLVER 'conflict_resolver' FOR 'conflict_type' > > To reset to default resolver: > RESET CONFLICT RESOLVER FOR 'conflict_type' > Does setting up resolvers have any meaning without subscriptions? I am wondering whether we should allow to set up the resolvers at the subscription level. One benefit is that users don't need to use a different DDL to set up resolvers. The first patch gives a conflict detection option at the subscription level, so it would be symmetrical to provide a resolver at the subscription level. Yet another benefit could be that it provides users facility to configure different resolvers for a set of tables belonging to a particular publication/node. > > ------------ > > As suggested in [2] and above, it seems logical to have table-specific > resolvers configuration along with global one. > > Here is the proposal for table level resolvers: > > 1) We can provide support for table level resolvers using ALTER TABLE: > > ALTER TABLE <name> SET CONFLICT RESOLVER <resolver1> on <conflict_type1>, > SET CONFLICT RESOLVER > <resolver2> on <conflict_type2>, ...; > > Reset can be done using: > ALTER TABLE <name> RESET CONFLICT RESOLVER on <conflict_type1>, > RESET CONFLICT RESOLVER on > <conflict_type2>, ...; > > Above commands will save/remove configuration in/from the new system > catalog pg_conflict_rel. > > 2) Table level configuration (if any) will be given preference over > global ones. The tables not having table-specific resolvers will use > global configured ones. > > 3) If the table is a partition table, then resolvers created for the > parent will be inherited by all child partition tables. Multiple > resolver entries will be created, one for each child partition in the > system catalog (similar to constraints). > > 4) Users can also configure explicit resolvers for child partitions. > In such a case, child's resolvers will override inherited resolvers > (if any). > > 5) Any attempt to RESET (remove) inherited resolvers on the child > partition table *alone* will result in error: "cannot reset inherited > resolvers" (similar to constraints). But RESET of explicit created > resolvers (non-inherited ones) will be permitted for child partitions. > On RESET, the resolver configuration will not fallback to the > inherited resolver again. Users need to explicitly configure new > resolvers for the child partition tables (after RESET) if needed. > Why so? If we can allow the RESET command to fallback to the inherited resolver it would make the behavior consistent for the child table where we don't have performed SET. > 6) Removal/Reset of resolvers on parent will remove corresponding > "inherited" resolvers on all the child partitions as well. If any > child has overridden inherited resolvers earlier, those will stay. > > 7) For 'ALTER TABLE parent ATTACH PARTITION child'; if 'child' has its > own resolvers set, those will not be overridden. But if it does not > have resolvers set, it will inherit from the parent table. This will > mean, for say out of 5 conflict_types, if the child table has > resolvers configured for any 2, 'attach' will retain those; for the > rest 3, it will inherit from the parent (if any). > > 8) Detach partition will not remove inherited resolvers, it will just > mark them 'non inherited' (similar to constraints). > BTW, to keep the initial patch simple, can we prohibit setting resolvers at the child table level? If we follow this, then we can give an ERROR if the user tries to attach the table (with configured resolvers) to an existing partitioned table. -- With Regards, Amit Kapila.
On Tue, Jun 25, 2024 at 3:12 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Mon, Jun 24, 2024 at 1:47 PM shveta malik <shveta.malik@gmail.com> wrote: > > > > On Thu, Jun 20, 2024 at 6:41 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > >> In the second patch, we can implement simple built-in resolution > > > >> strategies like apply and skip (which can be named as remote_apply and > > > >> keep_local, see [3][4] for details on these strategies) with ERROR or > > > >> LOG being the default strategy. We can allow these strategies to be > > > >> configured at the global and table level. > > > > Before we implement resolvers, we need a way to configure them. Please > > find the patch002 which attempts to implement Global Level Conflict > > Resolvers Configuration. Note that patch002 is dependent upon > > Conflict-Detection patch001 which is reviewed in another thread [1]. > > I have attached patch001 here for convenience and to avoid CFBot > > failures. But please use [1] if you have any comments on patch001. > > > > New DDL commands in patch002 are: > > > > To set global resolver for given conflcit_type: > > SET CONFLICT RESOLVER 'conflict_resolver' FOR 'conflict_type' > > > > To reset to default resolver: > > RESET CONFLICT RESOLVER FOR 'conflict_type' > > > > Does setting up resolvers have any meaning without subscriptions? I am > wondering whether we should allow to set up the resolvers at the > subscription level. One benefit is that users don't need to use a > different DDL to set up resolvers. The first patch gives a conflict > detection option at the subscription level, so it would be symmetrical > to provide a resolver at the subscription level. Yet another benefit > could be that it provides users facility to configure different > resolvers for a set of tables belonging to a particular > publication/node. There can be multiple tables included in a publication with varying business use-cases and thus may need different resolvers set, even though they all are part of the same publication. > > > > ------------ > > > > As suggested in [2] and above, it seems logical to have table-specific > > resolvers configuration along with global one. > > > > Here is the proposal for table level resolvers: > > > > 1) We can provide support for table level resolvers using ALTER TABLE: > > > > ALTER TABLE <name> SET CONFLICT RESOLVER <resolver1> on <conflict_type1>, > > SET CONFLICT RESOLVER > > <resolver2> on <conflict_type2>, ...; > > > > Reset can be done using: > > ALTER TABLE <name> RESET CONFLICT RESOLVER on <conflict_type1>, > > RESET CONFLICT RESOLVER on > > <conflict_type2>, ...; > > > > Above commands will save/remove configuration in/from the new system > > catalog pg_conflict_rel. > > > > 2) Table level configuration (if any) will be given preference over > > global ones. The tables not having table-specific resolvers will use > > global configured ones. > > > > 3) If the table is a partition table, then resolvers created for the > > parent will be inherited by all child partition tables. Multiple > > resolver entries will be created, one for each child partition in the > > system catalog (similar to constraints). > > > > 4) Users can also configure explicit resolvers for child partitions. > > In such a case, child's resolvers will override inherited resolvers > > (if any). > > > > 5) Any attempt to RESET (remove) inherited resolvers on the child > > partition table *alone* will result in error: "cannot reset inherited > > resolvers" (similar to constraints). But RESET of explicit created > > resolvers (non-inherited ones) will be permitted for child partitions. > > On RESET, the resolver configuration will not fallback to the > > inherited resolver again. Users need to explicitly configure new > > resolvers for the child partition tables (after RESET) if needed. > > > > Why so? If we can allow the RESET command to fallback to the inherited > resolver it would make the behavior consistent for the child table > where we don't have performed SET. Thought behind not making it fallback is since the user has done 'RESET', he may want to remove the resolver completely. We don't know if he really wants to go back to the previous one. If he does, it is easy to set it again. But if he does not, and we set the inherited resolver again during 'RESET', there is no way he can drop that inherited resolver alone on the child partition. > > 6) Removal/Reset of resolvers on parent will remove corresponding > > "inherited" resolvers on all the child partitions as well. If any > > child has overridden inherited resolvers earlier, those will stay. > > > > 7) For 'ALTER TABLE parent ATTACH PARTITION child'; if 'child' has its > > own resolvers set, those will not be overridden. But if it does not > > have resolvers set, it will inherit from the parent table. This will > > mean, for say out of 5 conflict_types, if the child table has > > resolvers configured for any 2, 'attach' will retain those; for the > > rest 3, it will inherit from the parent (if any). > > > > 8) Detach partition will not remove inherited resolvers, it will just > > mark them 'non inherited' (similar to constraints). > > > > BTW, to keep the initial patch simple, can we prohibit setting > resolvers at the child table level? If we follow this, then we can > give an ERROR if the user tries to attach the table (with configured > resolvers) to an existing partitioned table. Okay, I will think about this if the patch becomes too complex. thanks Shveta
On Tue, Jun 25, 2024 at 3:39 PM shveta malik <shveta.malik@gmail.com> wrote: > > On Tue, Jun 25, 2024 at 3:12 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Mon, Jun 24, 2024 at 1:47 PM shveta malik <shveta.malik@gmail.com> wrote: > > > > > > On Thu, Jun 20, 2024 at 6:41 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > >> In the second patch, we can implement simple built-in resolution > > > > >> strategies like apply and skip (which can be named as remote_apply and > > > > >> keep_local, see [3][4] for details on these strategies) with ERROR or > > > > >> LOG being the default strategy. We can allow these strategies to be > > > > >> configured at the global and table level. > > > > > > Before we implement resolvers, we need a way to configure them. Please > > > find the patch002 which attempts to implement Global Level Conflict > > > Resolvers Configuration. Note that patch002 is dependent upon > > > Conflict-Detection patch001 which is reviewed in another thread [1]. > > > I have attached patch001 here for convenience and to avoid CFBot > > > failures. But please use [1] if you have any comments on patch001. > > > > > > New DDL commands in patch002 are: > > > > > > To set global resolver for given conflcit_type: > > > SET CONFLICT RESOLVER 'conflict_resolver' FOR 'conflict_type' > > > > > > To reset to default resolver: > > > RESET CONFLICT RESOLVER FOR 'conflict_type' > > > > > > > Does setting up resolvers have any meaning without subscriptions? I am > > wondering whether we should allow to set up the resolvers at the > > subscription level. One benefit is that users don't need to use a > > different DDL to set up resolvers. The first patch gives a conflict > > detection option at the subscription level, so it would be symmetrical > > to provide a resolver at the subscription level. Yet another benefit > > could be that it provides users facility to configure different > > resolvers for a set of tables belonging to a particular > > publication/node. > > There can be multiple tables included in a publication with varying > business use-cases and thus may need different resolvers set, even > though they all are part of the same publication. > Agreed but this is the reason we are planning to keep resolvers at the table level. Here, I am asking to set resolvers at the subscription level rather than at the global level. > > > > > > ------------ > > > > > > As suggested in [2] and above, it seems logical to have table-specific > > > resolvers configuration along with global one. > > > > > > Here is the proposal for table level resolvers: > > > > > > 1) We can provide support for table level resolvers using ALTER TABLE: > > > > > > ALTER TABLE <name> SET CONFLICT RESOLVER <resolver1> on <conflict_type1>, > > > SET CONFLICT RESOLVER > > > <resolver2> on <conflict_type2>, ...; > > > > > > Reset can be done using: > > > ALTER TABLE <name> RESET CONFLICT RESOLVER on <conflict_type1>, > > > RESET CONFLICT RESOLVER on > > > <conflict_type2>, ...; > > > > > > Above commands will save/remove configuration in/from the new system > > > catalog pg_conflict_rel. > > > > > > 2) Table level configuration (if any) will be given preference over > > > global ones. The tables not having table-specific resolvers will use > > > global configured ones. > > > > > > 3) If the table is a partition table, then resolvers created for the > > > parent will be inherited by all child partition tables. Multiple > > > resolver entries will be created, one for each child partition in the > > > system catalog (similar to constraints). > > > > > > 4) Users can also configure explicit resolvers for child partitions. > > > In such a case, child's resolvers will override inherited resolvers > > > (if any). > > > > > > 5) Any attempt to RESET (remove) inherited resolvers on the child > > > partition table *alone* will result in error: "cannot reset inherited > > > resolvers" (similar to constraints). But RESET of explicit created > > > resolvers (non-inherited ones) will be permitted for child partitions. > > > On RESET, the resolver configuration will not fallback to the > > > inherited resolver again. Users need to explicitly configure new > > > resolvers for the child partition tables (after RESET) if needed. > > > > > > > Why so? If we can allow the RESET command to fallback to the inherited > > resolver it would make the behavior consistent for the child table > > where we don't have performed SET. > > Thought behind not making it fallback is since the user has done > 'RESET', he may want to remove the resolver completely. We don't know > if he really wants to go back to the previous one. If he does, it is > easy to set it again. But if he does not, and we set the inherited > resolver again during 'RESET', there is no way he can drop that > inherited resolver alone on the child partition. > I see your point but normally RESET allows us to go back to the default which in this case would be the resolver inherited from the parent table. -- With Regards, Amit Kapila.
Please find the attached 'patch0003', which implements conflict resolutions according to the global resolver settings. Summary of Conflict Resolutions Implemented in 'patch0003': INSERT Conflicts: ------------------------ 1) Conflict Type: 'insert_exists' Supported Resolutions: a) 'remote_apply': Convert the INSERT to an UPDATE and apply. b) 'keep_local': Ignore the incoming (conflicting) INSERT and retain the local tuple. c) 'error': The apply worker will error out and restart. UPDATE Conflicts: ------------------------ 1) Conflict Type: 'update_differ' Supported Resolutions: a) 'remote_apply': Apply the remote update. b) 'keep_local': Skip the remote update and retain the local tuple. c) 'error': The apply worker will error out and restart. 2) Conflict Type: 'update_missing' Supported Resolutions: a) 'apply_or_skip': Try to convert the UPDATE to an INSERT; if unsuccessful, skip the remote update and continue. b) 'apply_or_error': Try to convert the UPDATE to an INSERT; if unsuccessful, error out. c) 'skip': Skip the remote update and continue. d) 'error': The apply worker will error out and restart. DELETE Conflicts: ------------------------ 1) Conflict Type: 'delete_missing' Supported Resolutions: a) 'skip': Skip the remote delete and continue. b) 'error': The apply worker will error out and restart. NOTE: With these basic resolution techniques, the patch does not aim to ensure consistency across nodes, so data divergence is expected. -- Thanks, Nisha
Attachment
On Wed, Jun 26, 2024 at 2:33 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Jun 25, 2024 at 3:39 PM shveta malik <shveta.malik@gmail.com> wrote: > > > > On Tue, Jun 25, 2024 at 3:12 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Mon, Jun 24, 2024 at 1:47 PM shveta malik <shveta.malik@gmail.com> wrote: > > > > > > > > On Thu, Jun 20, 2024 at 6:41 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > >> In the second patch, we can implement simple built-in resolution > > > > > >> strategies like apply and skip (which can be named as remote_apply and > > > > > >> keep_local, see [3][4] for details on these strategies) with ERROR or > > > > > >> LOG being the default strategy. We can allow these strategies to be > > > > > >> configured at the global and table level. > > > > > > > > Before we implement resolvers, we need a way to configure them. Please > > > > find the patch002 which attempts to implement Global Level Conflict > > > > Resolvers Configuration. Note that patch002 is dependent upon > > > > Conflict-Detection patch001 which is reviewed in another thread [1]. > > > > I have attached patch001 here for convenience and to avoid CFBot > > > > failures. But please use [1] if you have any comments on patch001. > > > > > > > > New DDL commands in patch002 are: > > > > > > > > To set global resolver for given conflcit_type: > > > > SET CONFLICT RESOLVER 'conflict_resolver' FOR 'conflict_type' > > > > > > > > To reset to default resolver: > > > > RESET CONFLICT RESOLVER FOR 'conflict_type' > > > > > > > > > > Does setting up resolvers have any meaning without subscriptions? I am > > > wondering whether we should allow to set up the resolvers at the > > > subscription level. One benefit is that users don't need to use a > > > different DDL to set up resolvers. The first patch gives a conflict > > > detection option at the subscription level, so it would be symmetrical > > > to provide a resolver at the subscription level. Yet another benefit > > > could be that it provides users facility to configure different > > > resolvers for a set of tables belonging to a particular > > > publication/node. > > > > There can be multiple tables included in a publication with varying > > business use-cases and thus may need different resolvers set, even > > though they all are part of the same publication. > > > > Agreed but this is the reason we are planning to keep resolvers at the > table level. Here, I am asking to set resolvers at the subscription > level rather than at the global level. Okay, got it. I misunderstood earlier that we want to replace table level resolvers with subscription ones. Having global configuration has one benefit that if the user has no requirement to set different resolvers for different subscriptions or tables, he may always set one global configuration and be done with it. OTOH, I also agree with benefits coming with subscription level configuration. thanks Shveta
On Thu, Jun 27, 2024 at 8:44 AM Nisha Moond <nisha.moond412@gmail.com> wrote: > > Please find the attached 'patch0003', which implements conflict > resolutions according to the global resolver settings. Thanks for providing the resolver patch. Please find new patches attached. Changes: patch002: --Fixed CFBot compilation failure where a header file was not included in meson.build --Also this is the correct version of patch. Previous email has attached an older version by mistake. patch004: This is a WIP progress which attempts to implement Configuration of table-level resolvers . It has below changes: --Alter table SET CONFLICT RESOLVER. --Alter table RESET CONFLICT RESOLVER. <Note that these 2 commands also take care of resolvers inheritance for partition tables as discussed in [1]>. --Resolver inheritance support during 'Alter table ATTACH PARTITION'. --Resolver inheritance removal during 'Alter table DETACH PARTITION'. Pending: --Resolver Inheritance support during 'CREATE TABLE .. PARTITION OF ..'. --Using tabel-level resolver while resolving conflicts. (Resolver patch003 still relies on global resolvers). Please refer [1] for the complete proposal for table-level resolvers. [1]: https://www.postgresql.org/message-id/CAJpy0uAqegGDbuJk3Z-ku8wYFZyPv7C1KmHCkJ3885O%2Bj5enFg%40mail.gmail.com thanks Shveta
Attachment
On Thu, Jun 27, 2024 at 4:03 PM shveta malik <shveta.malik@gmail.com> wrote: > > On Thu, Jun 27, 2024 at 8:44 AM Nisha Moond <nisha.moond412@gmail.com> wrote: > > > > Please find the attached 'patch0003', which implements conflict > > resolutions according to the global resolver settings. > > Thanks for providing the resolver patch. > > Please find new patches attached. Changes: > > patch002: > --Fixed CFBot compilation failure where a header file was not included > in meson.build > --Also this is the correct version of patch. Previous email has > attached an older version by mistake. > > patch004: > This is a WIP progress which attempts to implement Configuration of > table-level resolvers . It has below changes: > --Alter table SET CONFLICT RESOLVER. > --Alter table RESET CONFLICT RESOLVER. <Note that these 2 commands > also take care of resolvers inheritance for partition tables as > discussed in [1]>. > --Resolver inheritance support during 'Alter table ATTACH PARTITION'. > --Resolver inheritance removal during 'Alter table DETACH PARTITION'. > > Pending: > --Resolver Inheritance support during 'CREATE TABLE .. PARTITION OF > ..'. > --Using tabel-level resolver while resolving conflicts. (Resolver > patch003 still relies on global resolvers). > > Please refer [1] for the complete proposal for table-level resolvers. > Please find v2 attached. Changes are in patch004 only, which are: --Resolver Inheritance support during 'CREATE TABLE .. PARTITION OF'. --SPLIT and MERGE partition review and testing (it was missed earlier). --Test Cases added for all above cases. thanks Shveta
Attachment
Hi, On Thu, May 23, 2024 at 3:37 PM shveta malik <shveta.malik@gmail.com> wrote: > > DELETE > ================ > Conflict Type: > ---------------- > delete_missing: An incoming delete is trying to delete a row on a > target node which does not exist. IIUC the 'delete_missing' conflict doesn't cover the case where an incoming delete message is trying to delete a row that has already been updated locally or by another node. I think in update/delete conflict situations, we need to resolve the conflicts based on commit timestamps like we do for update/update and insert/update conflicts. For example, suppose there are two node-A and node-B and setup bi-directional replication, and suppose further that both have the row with id = 1, consider the following sequences: 09:00:00 DELETE ... WHERE id = 1 on node-A. 09:00:05 UPDATE ... WHERE id = 1 on node-B. 09:00:10 node-A received the update message from node-B. 09:00:15 node-B received the delete message from node-A. At 09:00:10 on node-A, an update_deleted conflict is generated since the row on node-A is already deleted locally. Suppose that we use 'apply_or_skip' resolution for this conflict, we convert the update message into an insertion, so node-A now has the row with id = 1. At 09:00:15 on node-B, the incoming delete message is applied and deletes the row with id = 1, even though the row has already been modified locally. The node-A and node-B are now inconsistent. This inconsistency can be avoided by using 'skip' resolution for the 'update_deleted' conflict on node-A, and 'skip' resolution is the default method for that actually. However, if we handle it as 'update_missing', the 'apply_or_skip' resolution is used by default. IIUC with the proposed architecture, DELETE always takes precedence over UPDATE since both 'update_deleted' and 'update_missing' don't use commit timestamps to resolve the conflicts. As long as that is true, I think there is no use case for 'apply_or_skip' and 'apply_or_error' resolutions in update/delete conflict cases. In short, I think we need something like 'delete_differ' conflict type as well. FYI PGD and Oracle GoldenGate seem to have this conflict type[1][2]. The 'delete'_differ' conflict type would have at least 'latest_timestamp_wins' resolution. With the timestamp based resolution method, we would deal with update/delete conflicts as follows: 09:00:00: DELETE ... WHERE id = 1 on node-A. 09:00:05: UPDATE ... WHERE id = 1 on node-B. - the updated row doesn't have the origin since it's a local change. 09:00:10: node-A received the update message from node-B. - the incoming update message has the origin of node-B whereas the local row is already removed locally. - 'update_deleted' conflict is generated. - do the insert of the new row instead, because the commit timestamp of UPDATE is newer than DELETE's one. 09:00:15: node-B received the delete message from node-A. - the incoming delete message has the origin of node-B whereas the (updated) row doesn't have the origin. - 'update_differ' conflict is generated. - discard DELETE, because the commit timestamp of UPDATE is newer than DELETE' one.ard DELETE, because the commit timestamp of UPDATE is newer than DELETE' one. As a result, both nodes have the new version row. Regards, [1] https://www.enterprisedb.com/docs/pgd/latest/consistency/conflicts/#updatedelete-conflicts [2] https://docs.oracle.com/goldengate/c1230/gg-winux/GWUAD/configuring-conflict-detection-and-resolution.htm (see DELETEROWEXISTS conflict type) -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
On Thu, Jun 27, 2024 at 1:14 PM Nisha Moond <nisha.moond412@gmail.com> wrote:
Please find the attached 'patch0003', which implements conflict
resolutions according to the global resolver settings.
Summary of Conflict Resolutions Implemented in 'patch0003':
INSERT Conflicts:
------------------------
1) Conflict Type: 'insert_exists'
Supported Resolutions:
a) 'remote_apply': Convert the INSERT to an UPDATE and apply.
b) 'keep_local': Ignore the incoming (conflicting) INSERT and retain
the local tuple.
c) 'error': The apply worker will error out and restart.
Hi Nisha,
While testing the patch, when conflict resolution is configured and insert_exists is set to "remote_apply", I see this warning in the logs due to a resource not being closed:
2024-07-01 02:52:59.427 EDT [20304] LOG: conflict insert_exists detected on relation "public.test1"
2024-07-01 02:52:59.427 EDT [20304] DETAIL: Key already exists. Applying resolution method "remote_apply"
2024-07-01 02:52:59.427 EDT [20304] CONTEXT: processing remote data for replication origin "pg_16417" during message type "INSERT" for replication target relation "public.test1" in transaction 763, finished at 0/15E7F68
2024-07-01 02:52:59.427 EDT [20304] WARNING: resource was not closed: [138] (rel=base/5/16413, blockNum=0, flags=0x93800000, refcount=1 1)
2024-07-01 02:52:59.427 EDT [20304] CONTEXT: processing remote data for replication origin "pg_16417" during message type "COMMIT" in transaction 763, finished at 0/15E7F68
2024-07-01 02:52:59.427 EDT [20304] WARNING: resource was not closed: TupleDesc 0x7f8c0439e448 (16402,-1)
2024-07-01 02:52:59.427 EDT [20304] CONTEXT: processing remote data for replication origin "pg_16417" during message type "COMMIT" in transaction 763, finished at 0/15E7F68
2024-07-01 02:52:59.427 EDT [20304] DETAIL: Key already exists. Applying resolution method "remote_apply"
2024-07-01 02:52:59.427 EDT [20304] CONTEXT: processing remote data for replication origin "pg_16417" during message type "INSERT" for replication target relation "public.test1" in transaction 763, finished at 0/15E7F68
2024-07-01 02:52:59.427 EDT [20304] WARNING: resource was not closed: [138] (rel=base/5/16413, blockNum=0, flags=0x93800000, refcount=1 1)
2024-07-01 02:52:59.427 EDT [20304] CONTEXT: processing remote data for replication origin "pg_16417" during message type "COMMIT" in transaction 763, finished at 0/15E7F68
2024-07-01 02:52:59.427 EDT [20304] WARNING: resource was not closed: TupleDesc 0x7f8c0439e448 (16402,-1)
2024-07-01 02:52:59.427 EDT [20304] CONTEXT: processing remote data for replication origin "pg_16417" during message type "COMMIT" in transaction 763, finished at 0/15E7F68
regards,
Ajin Cherian
Fujitsu Australia
On Thu, Jun 27, 2024 at 1:50 PM shveta malik <shveta.malik@gmail.com> wrote: > > On Wed, Jun 26, 2024 at 2:33 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Tue, Jun 25, 2024 at 3:39 PM shveta malik <shveta.malik@gmail.com> wrote: > > > > > > On Tue, Jun 25, 2024 at 3:12 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > On Mon, Jun 24, 2024 at 1:47 PM shveta malik <shveta.malik@gmail.com> wrote: > > > > > > > > > > On Thu, Jun 20, 2024 at 6:41 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > > > >> In the second patch, we can implement simple built-in resolution > > > > > > >> strategies like apply and skip (which can be named as remote_apply and > > > > > > >> keep_local, see [3][4] for details on these strategies) with ERROR or > > > > > > >> LOG being the default strategy. We can allow these strategies to be > > > > > > >> configured at the global and table level. > > > > > > > > > > Before we implement resolvers, we need a way to configure them. Please > > > > > find the patch002 which attempts to implement Global Level Conflict > > > > > Resolvers Configuration. Note that patch002 is dependent upon > > > > > Conflict-Detection patch001 which is reviewed in another thread [1]. > > > > > I have attached patch001 here for convenience and to avoid CFBot > > > > > failures. But please use [1] if you have any comments on patch001. > > > > > > > > > > New DDL commands in patch002 are: > > > > > > > > > > To set global resolver for given conflcit_type: > > > > > SET CONFLICT RESOLVER 'conflict_resolver' FOR 'conflict_type' > > > > > > > > > > To reset to default resolver: > > > > > RESET CONFLICT RESOLVER FOR 'conflict_type' > > > > > > > > > > > > > Does setting up resolvers have any meaning without subscriptions? I am > > > > wondering whether we should allow to set up the resolvers at the > > > > subscription level. One benefit is that users don't need to use a > > > > different DDL to set up resolvers. The first patch gives a conflict > > > > detection option at the subscription level, so it would be symmetrical > > > > to provide a resolver at the subscription level. Yet another benefit > > > > could be that it provides users facility to configure different > > > > resolvers for a set of tables belonging to a particular > > > > publication/node. > > > > > > There can be multiple tables included in a publication with varying > > > business use-cases and thus may need different resolvers set, even > > > though they all are part of the same publication. > > > > > > > Agreed but this is the reason we are planning to keep resolvers at the > > table level. Here, I am asking to set resolvers at the subscription > > level rather than at the global level. > > Okay, got it. I misunderstood earlier that we want to replace table > level resolvers with subscription ones. > Having global configuration has one benefit that if the user has no > requirement to set different resolvers for different subscriptions or > tables, he may always set one global configuration and be done with > it. OTOH, I also agree with benefits coming with subscription level > configuration. Setting resolvers at table-level and subscription-level sounds good to me. DDLs for setting resolvers at subscription-level would need the subscription name to be specified? And another question is: a table-level resolver setting is precedent over all subscriber-level resolver settings in the database? Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
On Mon, Jul 1, 2024 at 11:47 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Thu, May 23, 2024 at 3:37 PM shveta malik <shveta.malik@gmail.com> wrote: > > > > DELETE > > ================ > > Conflict Type: > > ---------------- > > delete_missing: An incoming delete is trying to delete a row on a > > target node which does not exist. > > IIUC the 'delete_missing' conflict doesn't cover the case where an > incoming delete message is trying to delete a row that has already > been updated locally or by another node. I think in update/delete > conflict situations, we need to resolve the conflicts based on commit > timestamps like we do for update/update and insert/update conflicts. > > For example, suppose there are two node-A and node-B and setup > bi-directional replication, and suppose further that both have the row > with id = 1, consider the following sequences: > > 09:00:00 DELETE ... WHERE id = 1 on node-A. > 09:00:05 UPDATE ... WHERE id = 1 on node-B. > 09:00:10 node-A received the update message from node-B. > 09:00:15 node-B received the delete message from node-A. > > At 09:00:10 on node-A, an update_deleted conflict is generated since > the row on node-A is already deleted locally. Suppose that we use > 'apply_or_skip' resolution for this conflict, we convert the update > message into an insertion, so node-A now has the row with id = 1. At > 09:00:15 on node-B, the incoming delete message is applied and deletes > the row with id = 1, even though the row has already been modified > locally. The node-A and node-B are now inconsistent. This > inconsistency can be avoided by using 'skip' resolution for the > 'update_deleted' conflict on node-A, and 'skip' resolution is the > default method for that actually. However, if we handle it as > 'update_missing', the 'apply_or_skip' resolution is used by default. > > IIUC with the proposed architecture, DELETE always takes precedence > over UPDATE since both 'update_deleted' and 'update_missing' don't use > commit timestamps to resolve the conflicts. As long as that is true, I > think there is no use case for 'apply_or_skip' and 'apply_or_error' > resolutions in update/delete conflict cases. In short, I think we need > something like 'delete_differ' conflict type as well. FYI PGD and > Oracle GoldenGate seem to have this conflict type[1][2]. > Your explanation makes sense to me and I agree that we should implement 'delete_differ' conflict type. > The 'delete'_differ' conflict type would have at least > 'latest_timestamp_wins' resolution. With the timestamp based > resolution method, we would deal with update/delete conflicts as > follows: > > 09:00:00: DELETE ... WHERE id = 1 on node-A. > 09:00:05: UPDATE ... WHERE id = 1 on node-B. > - the updated row doesn't have the origin since it's a local change. > 09:00:10: node-A received the update message from node-B. > - the incoming update message has the origin of node-B whereas the > local row is already removed locally. > - 'update_deleted' conflict is generated. > FYI, as of now, we don't have a reliable way to detect 'update_deleted' type of conflicts but we had some discussion about the same [1]. > - do the insert of the new row instead, because the commit > timestamp of UPDATE is newer than DELETE's one. > 09:00:15: node-B received the delete message from node-A. > - the incoming delete message has the origin of node-B whereas the > (updated) row doesn't have the origin. > - 'update_differ' conflict is generated. > - discard DELETE, because the commit timestamp of UPDATE is newer > than DELETE' one.ard DELETE, because the commit timestamp of UPDATE is > newer than DELETE' one. > > As a result, both nodes have the new version row. > Right, it seems to me that we should implement 'latest_time_wins' if we want consistency in such cases. [1] - https://www.postgresql.org/message-id/CAA4eK1Lj-PWrP789KnKxZydisHajd38rSihWXO8MVBLDwxG1Kg%40mail.gmail.com -- With Regards, Amit Kapila.
On Mon, Jul 1, 2024 at 1:35 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > Setting resolvers at table-level and subscription-level sounds good to > me. DDLs for setting resolvers at subscription-level would need the > subscription name to be specified? > Yes, it should be part of the ALTER/CREATE SUBSCRIPTION command. One idea could be to have syntax as follows: ALTER SUBSCRIPTION name SET CONFLICT RESOLVER 'conflict_resolver' FOR 'conflict_type'; ALTER SUBSCRIPTION name RESET CONFLICT RESOLVER FOR 'conflict_type'; CREATE SUBSCRIPTION subscription_name CONNECTION 'conninfo' PUBLICATION publication_name [, ...] CONFLICT RESOLVER 'conflict_resolver' FOR 'conflict_type'; > And another question is: a > table-level resolver setting is precedent over all subscriber-level > resolver settings in the database? > Yes. -- With Regards, Amit Kapila.
On Mon, Jul 1, 2024 at 11:47 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > Hi, > > On Thu, May 23, 2024 at 3:37 PM shveta malik <shveta.malik@gmail.com> wrote: > > > > DELETE > > ================ > > Conflict Type: > > ---------------- > > delete_missing: An incoming delete is trying to delete a row on a > > target node which does not exist. > > IIUC the 'delete_missing' conflict doesn't cover the case where an > incoming delete message is trying to delete a row that has already > been updated locally or by another node. I think in update/delete > conflict situations, we need to resolve the conflicts based on commit > timestamps like we do for update/update and insert/update conflicts. > > For example, suppose there are two node-A and node-B and setup > bi-directional replication, and suppose further that both have the row > with id = 1, consider the following sequences: > > 09:00:00 DELETE ... WHERE id = 1 on node-A. > 09:00:05 UPDATE ... WHERE id = 1 on node-B. > 09:00:10 node-A received the update message from node-B. > 09:00:15 node-B received the delete message from node-A. > > At 09:00:10 on node-A, an update_deleted conflict is generated since > the row on node-A is already deleted locally. Suppose that we use > 'apply_or_skip' resolution for this conflict, we convert the update > message into an insertion, so node-A now has the row with id = 1. At > 09:00:15 on node-B, the incoming delete message is applied and deletes > the row with id = 1, even though the row has already been modified > locally. The node-A and node-B are now inconsistent. This > inconsistency can be avoided by using 'skip' resolution for the > 'update_deleted' conflict on node-A, and 'skip' resolution is the > default method for that actually. However, if we handle it as > 'update_missing', the 'apply_or_skip' resolution is used by default. > > IIUC with the proposed architecture, DELETE always takes precedence > over UPDATE since both 'update_deleted' and 'update_missing' don't use > commit timestamps to resolve the conflicts. As long as that is true, I > think there is no use case for 'apply_or_skip' and 'apply_or_error' > resolutions in update/delete conflict cases. In short, I think we need > something like 'delete_differ' conflict type as well. Thanks for the feedback. Sure, we can have 'delete_differ'. > FYI PGD and > Oracle GoldenGate seem to have this conflict type[1][2]. > > The 'delete'_differ' conflict type would have at least > 'latest_timestamp_wins' resolution. With the timestamp based > resolution method, we would deal with update/delete conflicts as > follows: > > 09:00:00: DELETE ... WHERE id = 1 on node-A. > 09:00:05: UPDATE ... WHERE id = 1 on node-B. > - the updated row doesn't have the origin since it's a local change. > 09:00:10: node-A received the update message from node-B. > - the incoming update message has the origin of node-B whereas the > local row is already removed locally. > - 'update_deleted' conflict is generated. > - do the insert of the new row instead, because the commit > timestamp of UPDATE is newer than DELETE's one. So, are you suggesting to support latest_tmestamp_wins for 'update_deleted' case? And shall 'latest_tmestamp_wins' be default then instead of 'skip'? In some cases, the complete row can not be constructed, and then 'insertion' might not be possible even if the timestamp of 'update' is latest. Then shall we skip or error out at latest_tmestamp_wins config? Even if we support 'latest_timestamp_wins' as default, we can still have 'apply_or_skip' and 'apply_or_error' as other options for 'update_deleted' case. Or do you suggest getting rid of these options completely? > 09:00:15: node-B received the delete message from node-A. > - the incoming delete message has the origin of node-B whereas the > (updated) row doesn't have the origin. > - 'update_differ' conflict is generated. Here, do you mean 'delete_differ' conflict is generated? thanks Shveta
On Wed, Jun 19, 2024 at 1:52 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Tue, Jun 18, 2024 at 3:29 PM shveta malik <shveta.malik@gmail.com> wrote: > > On Tue, Jun 18, 2024 at 11:34 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > I tried to work out a few scenarios with this, where the apply worker > > will wait until its local clock hits 'remote_commit_tts - max_skew > > permitted'. Please have a look. > > > > Let's say, we have a GUC to configure max_clock_skew permitted. > > Resolver is last_update_wins in both cases. > > ---------------- > > 1) Case 1: max_clock_skew set to 0 i.e. no tolerance for clock skew. > > > > Remote Update with commit_timestamp = 10.20AM. > > Local clock (which is say 5 min behind) shows = 10.15AM. > > > > When remote update arrives at local node, we see that skew is greater > > than max_clock_skew and thus apply worker waits till local clock hits > > 'remote's commit_tts - max_clock_skew' i.e. till 10.20 AM. Once the > > local clock hits 10.20 AM, the worker applies the remote change with > > commit_tts of 10.20AM. In the meantime (during wait period of apply > > worker)) if some local update on same row has happened at say 10.18am, > > that will applied first, which will be later overwritten by above > > remote change of 10.20AM as remote-change's timestamp appear more > > latest, even though it has happened earlier than local change. > > For the sake of simplicity let's call the change that happened at > 10:20 AM change-1 and the change that happened at 10:15 as change-2 > and assume we are talking about the synchronous commit only. > > I think now from an application perspective the change-1 wouldn't have > caused the change-2 because we delayed applying change-2 on the local > node which would have delayed the confirmation of the change-1 to the > application that means we have got the change-2 on the local node > without the confirmation of change-1 hence change-2 has no causal > dependency on the change-1. So it's fine that we perform change-1 > before change-2 and the timestamp will also show the same at any other > node if they receive these 2 changes. > > The goal is to ensure that if we define the order where change-2 > happens before change-1, this same order should be visible on all > other nodes. This will hold true because the commit timestamp of > change-2 is earlier than that of change-1. > > > 2) Case 2: max_clock_skew is set to 2min. > > > > Remote Update with commit_timestamp=10.20AM > > Local clock (which is say 5 min behind) = 10.15AM. > > > > Now apply worker will notice skew greater than 2min and thus will wait > > till local clock hits 'remote's commit_tts - max_clock_skew' i.e. > > 10.18 and will apply the change with commit_tts of 10.20 ( as we > > always save the origin's commit timestamp into local commit_tts, see > > RecordTransactionCommit->TransactionTreeSetCommitTsData). Now lets say > > another local update is triggered at 10.19am, it will be applied > > locally but it will be ignored on remote node. On the remote node , > > the existing change with a timestamp of 10.20 am will win resulting in > > data divergence. > > Let's call the 10:20 AM change as a change-1 and the change that > happened at 10:19 as change-2 > > IIUC, although we apply the change-1 at 10:18 AM the commit_ts of that > commit_ts of that change is 10:20, and the same will be visible to all > other nodes. So in conflict resolution still the change-1 happened > after the change-2 because change-2's commit_ts is 10:19 AM. Now > there could be a problem with the causal order because we applied the > change-1 at 10:18 AM so the application might have gotten confirmation > at 10:18 AM and the change-2 of the local node may be triggered as a > result of confirmation of the change-1 that means now change-2 has a > causal dependency on the change-1 but commit_ts shows change-2 > happened before the change-1 on all the nodes. > > So, is this acceptable? I think yes because the user has configured a > maximum clock skew of 2 minutes, which means the detected order might > not always align with the causal order for transactions occurring > within that time frame. Generally, the ideal configuration for > max_clock_skew should be in multiple of the network round trip time. > Assuming this configuration, we wouldn’t encounter this problem > because for change-2 to be caused by change-1, the client would need > to get confirmation of change-1 and then trigger change-2, which would > take at least 2-3 network round trips. As we agreed, the subscriber should wait before applying an operation if the commit timestamp of the currently replayed transaction is in the future and the difference exceeds the maximum clock skew. This raises the question: should the subscriber wait only for insert, update, and delete operations when timestamp-based resolution methods are set, or should it wait regardless of the type of remote operation, the presence or absence of conflicts, and the resolvers configured? I believe the latter approach is the way to go i.e. this should be independent of CDR, though needed by CDR for better timestamp based resolutions. Thoughts? thanks Shveta
On Tue, Jul 2, 2024 at 2:40 PM shveta malik <shveta.malik@gmail.com> wrote: > > On Wed, Jun 19, 2024 at 1:52 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Tue, Jun 18, 2024 at 3:29 PM shveta malik <shveta.malik@gmail.com> wrote: > > > On Tue, Jun 18, 2024 at 11:34 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > I tried to work out a few scenarios with this, where the apply worker > > > will wait until its local clock hits 'remote_commit_tts - max_skew > > > permitted'. Please have a look. > > > > > > Let's say, we have a GUC to configure max_clock_skew permitted. > > > Resolver is last_update_wins in both cases. > > > ---------------- > > > 1) Case 1: max_clock_skew set to 0 i.e. no tolerance for clock skew. > > > > > > Remote Update with commit_timestamp = 10.20AM. > > > Local clock (which is say 5 min behind) shows = 10.15AM. > > > > > > When remote update arrives at local node, we see that skew is greater > > > than max_clock_skew and thus apply worker waits till local clock hits > > > 'remote's commit_tts - max_clock_skew' i.e. till 10.20 AM. Once the > > > local clock hits 10.20 AM, the worker applies the remote change with > > > commit_tts of 10.20AM. In the meantime (during wait period of apply > > > worker)) if some local update on same row has happened at say 10.18am, > > > that will applied first, which will be later overwritten by above > > > remote change of 10.20AM as remote-change's timestamp appear more > > > latest, even though it has happened earlier than local change. > > > > For the sake of simplicity let's call the change that happened at > > 10:20 AM change-1 and the change that happened at 10:15 as change-2 > > and assume we are talking about the synchronous commit only. > > > > I think now from an application perspective the change-1 wouldn't have > > caused the change-2 because we delayed applying change-2 on the local > > node which would have delayed the confirmation of the change-1 to the > > application that means we have got the change-2 on the local node > > without the confirmation of change-1 hence change-2 has no causal > > dependency on the change-1. So it's fine that we perform change-1 > > before change-2 and the timestamp will also show the same at any other > > node if they receive these 2 changes. > > > > The goal is to ensure that if we define the order where change-2 > > happens before change-1, this same order should be visible on all > > other nodes. This will hold true because the commit timestamp of > > change-2 is earlier than that of change-1. > > > > > 2) Case 2: max_clock_skew is set to 2min. > > > > > > Remote Update with commit_timestamp=10.20AM > > > Local clock (which is say 5 min behind) = 10.15AM. > > > > > > Now apply worker will notice skew greater than 2min and thus will wait > > > till local clock hits 'remote's commit_tts - max_clock_skew' i.e. > > > 10.18 and will apply the change with commit_tts of 10.20 ( as we > > > always save the origin's commit timestamp into local commit_tts, see > > > RecordTransactionCommit->TransactionTreeSetCommitTsData). Now lets say > > > another local update is triggered at 10.19am, it will be applied > > > locally but it will be ignored on remote node. On the remote node , > > > the existing change with a timestamp of 10.20 am will win resulting in > > > data divergence. > > > > Let's call the 10:20 AM change as a change-1 and the change that > > happened at 10:19 as change-2 > > > > IIUC, although we apply the change-1 at 10:18 AM the commit_ts of that > > commit_ts of that change is 10:20, and the same will be visible to all > > other nodes. So in conflict resolution still the change-1 happened > > after the change-2 because change-2's commit_ts is 10:19 AM. Now > > there could be a problem with the causal order because we applied the > > change-1 at 10:18 AM so the application might have gotten confirmation > > at 10:18 AM and the change-2 of the local node may be triggered as a > > result of confirmation of the change-1 that means now change-2 has a > > causal dependency on the change-1 but commit_ts shows change-2 > > happened before the change-1 on all the nodes. > > > > So, is this acceptable? I think yes because the user has configured a > > maximum clock skew of 2 minutes, which means the detected order might > > not always align with the causal order for transactions occurring > > within that time frame. Generally, the ideal configuration for > > max_clock_skew should be in multiple of the network round trip time. > > Assuming this configuration, we wouldn’t encounter this problem > > because for change-2 to be caused by change-1, the client would need > > to get confirmation of change-1 and then trigger change-2, which would > > take at least 2-3 network round trips. > > As we agreed, the subscriber should wait before applying an operation > if the commit timestamp of the currently replayed transaction is in > the future and the difference exceeds the maximum clock skew. This > raises the question: should the subscriber wait only for insert, > update, and delete operations when timestamp-based resolution methods > are set, or should it wait regardless of the type of remote operation, > the presence or absence of conflicts, and the resolvers configured? > I believe the latter approach is the way to go i.e. this should be > independent of CDR, though needed by CDR for better timestamp based > resolutions. Thoughts? Yes, I also think it should be independent of CDR. IMHO, it should be based on the user-configured maximum clock skew tolerance and can be independent of CDR. IIUC we would make the remote apply wait just before committing if the remote commit timestamp is ahead of the local clock by more than the maximum clock skew tolerance, is that correct? -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Wed, Jul 3, 2024 at 10:47 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Tue, Jul 2, 2024 at 2:40 PM shveta malik <shveta.malik@gmail.com> wrote: > > > > On Wed, Jun 19, 2024 at 1:52 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > On Tue, Jun 18, 2024 at 3:29 PM shveta malik <shveta.malik@gmail.com> wrote: > > > > On Tue, Jun 18, 2024 at 11:34 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > > > I tried to work out a few scenarios with this, where the apply worker > > > > will wait until its local clock hits 'remote_commit_tts - max_skew > > > > permitted'. Please have a look. > > > > > > > > Let's say, we have a GUC to configure max_clock_skew permitted. > > > > Resolver is last_update_wins in both cases. > > > > ---------------- > > > > 1) Case 1: max_clock_skew set to 0 i.e. no tolerance for clock skew. > > > > > > > > Remote Update with commit_timestamp = 10.20AM. > > > > Local clock (which is say 5 min behind) shows = 10.15AM. > > > > > > > > When remote update arrives at local node, we see that skew is greater > > > > than max_clock_skew and thus apply worker waits till local clock hits > > > > 'remote's commit_tts - max_clock_skew' i.e. till 10.20 AM. Once the > > > > local clock hits 10.20 AM, the worker applies the remote change with > > > > commit_tts of 10.20AM. In the meantime (during wait period of apply > > > > worker)) if some local update on same row has happened at say 10.18am, > > > > that will applied first, which will be later overwritten by above > > > > remote change of 10.20AM as remote-change's timestamp appear more > > > > latest, even though it has happened earlier than local change. > > > > > > For the sake of simplicity let's call the change that happened at > > > 10:20 AM change-1 and the change that happened at 10:15 as change-2 > > > and assume we are talking about the synchronous commit only. > > > > > > I think now from an application perspective the change-1 wouldn't have > > > caused the change-2 because we delayed applying change-2 on the local > > > node which would have delayed the confirmation of the change-1 to the > > > application that means we have got the change-2 on the local node > > > without the confirmation of change-1 hence change-2 has no causal > > > dependency on the change-1. So it's fine that we perform change-1 > > > before change-2 and the timestamp will also show the same at any other > > > node if they receive these 2 changes. > > > > > > The goal is to ensure that if we define the order where change-2 > > > happens before change-1, this same order should be visible on all > > > other nodes. This will hold true because the commit timestamp of > > > change-2 is earlier than that of change-1. > > > > > > > 2) Case 2: max_clock_skew is set to 2min. > > > > > > > > Remote Update with commit_timestamp=10.20AM > > > > Local clock (which is say 5 min behind) = 10.15AM. > > > > > > > > Now apply worker will notice skew greater than 2min and thus will wait > > > > till local clock hits 'remote's commit_tts - max_clock_skew' i.e. > > > > 10.18 and will apply the change with commit_tts of 10.20 ( as we > > > > always save the origin's commit timestamp into local commit_tts, see > > > > RecordTransactionCommit->TransactionTreeSetCommitTsData). Now lets say > > > > another local update is triggered at 10.19am, it will be applied > > > > locally but it will be ignored on remote node. On the remote node , > > > > the existing change with a timestamp of 10.20 am will win resulting in > > > > data divergence. > > > > > > Let's call the 10:20 AM change as a change-1 and the change that > > > happened at 10:19 as change-2 > > > > > > IIUC, although we apply the change-1 at 10:18 AM the commit_ts of that > > > commit_ts of that change is 10:20, and the same will be visible to all > > > other nodes. So in conflict resolution still the change-1 happened > > > after the change-2 because change-2's commit_ts is 10:19 AM. Now > > > there could be a problem with the causal order because we applied the > > > change-1 at 10:18 AM so the application might have gotten confirmation > > > at 10:18 AM and the change-2 of the local node may be triggered as a > > > result of confirmation of the change-1 that means now change-2 has a > > > causal dependency on the change-1 but commit_ts shows change-2 > > > happened before the change-1 on all the nodes. > > > > > > So, is this acceptable? I think yes because the user has configured a > > > maximum clock skew of 2 minutes, which means the detected order might > > > not always align with the causal order for transactions occurring > > > within that time frame. Generally, the ideal configuration for > > > max_clock_skew should be in multiple of the network round trip time. > > > Assuming this configuration, we wouldn’t encounter this problem > > > because for change-2 to be caused by change-1, the client would need > > > to get confirmation of change-1 and then trigger change-2, which would > > > take at least 2-3 network round trips. > > > > As we agreed, the subscriber should wait before applying an operation > > if the commit timestamp of the currently replayed transaction is in > > the future and the difference exceeds the maximum clock skew. This > > raises the question: should the subscriber wait only for insert, > > update, and delete operations when timestamp-based resolution methods > > are set, or should it wait regardless of the type of remote operation, > > the presence or absence of conflicts, and the resolvers configured? > > I believe the latter approach is the way to go i.e. this should be > > independent of CDR, though needed by CDR for better timestamp based > > resolutions. Thoughts? > > Yes, I also think it should be independent of CDR. IMHO, it should be > based on the user-configured maximum clock skew tolerance and can be > independent of CDR. +1 > IIUC we would make the remote apply wait just > before committing if the remote commit timestamp is ahead of the local > clock by more than the maximum clock skew tolerance, is that correct? +1 on condition to wait. But I think we should make apply worker wait during begin (apply_handle_begin) instead of commit. It makes more sense to delay the entire operation to manage clock-skew rather than the commit alone. And only then CDR's timestamp based resolution which are much prior to commit-stage can benefit from this. Thoughts? thanks Shveta
On Wed, Jul 3, 2024 at 11:00 AM shveta malik <shveta.malik@gmail.com> wrote: > > > Yes, I also think it should be independent of CDR. IMHO, it should be > > based on the user-configured maximum clock skew tolerance and can be > > independent of CDR. > > +1 > > > IIUC we would make the remote apply wait just > > before committing if the remote commit timestamp is ahead of the local > > clock by more than the maximum clock skew tolerance, is that correct? > > +1 on condition to wait. > > But I think we should make apply worker wait during begin > (apply_handle_begin) instead of commit. It makes more sense to delay > the entire operation to manage clock-skew rather than the commit > alone. And only then CDR's timestamp based resolution which are much > prior to commit-stage can benefit from this. Thoughts? But do we really need to wait at apply_handle_begin()? I mean if we already know the commit_ts then we can perform the conflict resolution no? I mean we should wait before committing because we are considering this remote transaction to be in the future and we do not want to confirm the commit of this transaction to the remote node before the local clock reaches the record commit_ts to preserve the causal order. However, we can still perform conflict resolution beforehand since we already know the commit_ts. The conflict resolution function will be something like "out_version = CRF(version1_commit_ts, version2_commit_ts)," so the result should be the same regardless of when we apply it, correct? From a performance standpoint, wouldn't it be beneficial to perform as much work as possible in advance? By the time we apply all the operations, the local clock might already be in sync with the commit_ts of the remote transaction. Am I missing something? However, while thinking about this, I'm wondering about how we will handle the streaming of in-progress transactions. If we start applying with parallel workers, we might not know the commit_ts of those transactions since they may not have been committed yet. One simple option could be to prevent parallel workers from applying in-progress transactions when CDR is set up. Instead, we could let these transactions spill to files and only apply them once we receive the commit record. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Wed, Jul 3, 2024 at 11:29 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Wed, Jul 3, 2024 at 11:00 AM shveta malik <shveta.malik@gmail.com> wrote: > > > > > Yes, I also think it should be independent of CDR. IMHO, it should be > > > based on the user-configured maximum clock skew tolerance and can be > > > independent of CDR. > > > > +1 > > > > > IIUC we would make the remote apply wait just > > > before committing if the remote commit timestamp is ahead of the local > > > clock by more than the maximum clock skew tolerance, is that correct? > > > > +1 on condition to wait. > > > > But I think we should make apply worker wait during begin > > (apply_handle_begin) instead of commit. It makes more sense to delay > > the entire operation to manage clock-skew rather than the commit > > alone. And only then CDR's timestamp based resolution which are much > > prior to commit-stage can benefit from this. Thoughts? > > But do we really need to wait at apply_handle_begin()? I mean if we > already know the commit_ts then we can perform the conflict resolution > no? I mean we should wait before committing because we are > considering this remote transaction to be in the future and we do not > want to confirm the commit of this transaction to the remote node > before the local clock reaches the record commit_ts to preserve the > causal order. However, we can still perform conflict resolution > beforehand since we already know the commit_ts. The conflict > resolution function will be something like "out_version = > CRF(version1_commit_ts, version2_commit_ts)," so the result should be > the same regardless of when we apply it, correct? From a performance > standpoint, wouldn't it be beneficial to perform as much work as > possible in advance? By the time we apply all the operations, the > local clock might already be in sync with the commit_ts of the remote > transaction. Am I missing something? > But waiting after applying the operations and before applying the commit would mean that we need to wait with the locks held. That could be a recipe for deadlocks in the system. I see your point related to performance but as we are not expecting clock skew in normal cases, we shouldn't be too much bothered on the performance due to this. If there is clock skew, we expect users to fix it, this is just a worst-case aid for users. > However, while thinking about this, I'm wondering about how we will > handle the streaming of in-progress transactions. If we start applying > with parallel workers, we might not know the commit_ts of those > transactions since they may not have been committed yet. One simple > option could be to prevent parallel workers from applying in-progress > transactions when CDR is set up. Instead, we could let these > transactions spill to files and only apply them once we receive the > commit record. > Agreed, we should do it as you have suggested and document it. -- With Regards, Amit Kapila.
On Wed, Jul 3, 2024 at 12:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Jul 3, 2024 at 11:29 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > But waiting after applying the operations and before applying the > commit would mean that we need to wait with the locks held. That could > be a recipe for deadlocks in the system. I see your point related to > performance but as we are not expecting clock skew in normal cases, we > shouldn't be too much bothered on the performance due to this. If > there is clock skew, we expect users to fix it, this is just a > worst-case aid for users. But if we make it wait at the very first operation that means we will not suck more decoded data from the network and wouldn't that make the sender wait for the network buffer to get sucked in by the receiver? Also, we already have a handling of parallel apply workers so if we do not have an issue of deadlock there or if we can handle those issues there we can do it here as well no? -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Wed, Jul 3, 2024 at 2:16 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Wed, Jul 3, 2024 at 12:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Wed, Jul 3, 2024 at 11:29 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > But waiting after applying the operations and before applying the > > commit would mean that we need to wait with the locks held. That could > > be a recipe for deadlocks in the system. I see your point related to > > performance but as we are not expecting clock skew in normal cases, we > > shouldn't be too much bothered on the performance due to this. If > > there is clock skew, we expect users to fix it, this is just a > > worst-case aid for users. > > But if we make it wait at the very first operation that means we will > not suck more decoded data from the network and wouldn't that make the > sender wait for the network buffer to get sucked in by the receiver? > That would be true even if we wait just before applying the commit record considering the transaction is small and the wait time is large. > Also, we already have a handling of parallel apply workers so if we do > not have an issue of deadlock there or if we can handle those issues > there we can do it here as well no? > Parallel apply workers won't wait for a long time. There is some similarity and in both cases, deadlock will be detected but chances of such implementation-related deadlocks will be higher if we start waiting for a random amount of times. The other possibility is that we can keep a cap on the max clock skew time above which we will give ERROR even if the user has configured wait. This is because anyway the system will be choked (walsender won't be able to send more data, vacuum on publisher won't be able to remove dead rows) if we wait for longer times. But even with that, I am not sure if waiting after holding locks is a good idea or gives us the benefit that is worth the risk of deadlocks. -- With Regards, Amit Kapila.
On Wed, Jul 3, 2024 at 11:29 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Wed, Jul 3, 2024 at 11:00 AM shveta malik <shveta.malik@gmail.com> wrote: > > > > > Yes, I also think it should be independent of CDR. IMHO, it should be > > > based on the user-configured maximum clock skew tolerance and can be > > > independent of CDR. > > > > +1 > > > > > IIUC we would make the remote apply wait just > > > before committing if the remote commit timestamp is ahead of the local > > > clock by more than the maximum clock skew tolerance, is that correct? > > > > +1 on condition to wait. > > > > But I think we should make apply worker wait during begin > > (apply_handle_begin) instead of commit. It makes more sense to delay > > the entire operation to manage clock-skew rather than the commit > > alone. And only then CDR's timestamp based resolution which are much > > prior to commit-stage can benefit from this. Thoughts? > > But do we really need to wait at apply_handle_begin()? I mean if we > already know the commit_ts then we can perform the conflict resolution > no? I would like to highlight one point here that the resultant data may be different depending upon at what stage (begin or commit) we conclude to wait. Example: --max_clock_skew set to 0 i.e. no tolerance for clock skew. --Remote Update with commit_timestamp = 10.20AM. --Local clock (which is say 5 min behind) shows = 10.15AM. Case 1: Wait during Begin: When remote update arrives at local node, apply worker waits till local clock hits 'remote's commit_tts - max_clock_skew' i.e. till 10.20 AM. In the meantime (during the wait period of apply worker) if some local update on the same row has happened at say 10.18am (local clock), that will be applied first. Now when apply worker's wait is over, it will detect 'update_diffe'r conflict and as per 'last_update_win', remote_tuple will win as 10.20 is latest than 10.18. Case 2: Wait during Commit: When remote update arrives at local node, it finds no conflict and goes for commit. But before commit, it waits till the local clock hits 10.20 AM. In the meantime (during wait period of apply worker)) if some local update is trying to update the same row say at 10.18, it has to wait (due to locks taken by remote update on that row) and remote tuple will get committed first with commit timestamp of 10.20. Then local update will proceed and will overwrite remote tuple. So in case1, remote tuple is the final change while in case2, local tuple is the final change. thanks Shveta
On Wed, Jul 3, 2024 at 3:35 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Jul 3, 2024 at 2:16 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Wed, Jul 3, 2024 at 12:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Wed, Jul 3, 2024 at 11:29 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > But waiting after applying the operations and before applying the > > > commit would mean that we need to wait with the locks held. That could > > > be a recipe for deadlocks in the system. I see your point related to > > > performance but as we are not expecting clock skew in normal cases, we > > > shouldn't be too much bothered on the performance due to this. If > > > there is clock skew, we expect users to fix it, this is just a > > > worst-case aid for users. > > > > But if we make it wait at the very first operation that means we will > > not suck more decoded data from the network and wouldn't that make the > > sender wait for the network buffer to get sucked in by the receiver? > > > > That would be true even if we wait just before applying the commit > record considering the transaction is small and the wait time is > large. What I am saying is that if we are not applying the whole transaction, it means we are not receiving it either unless we plan to spill it to a file. If we don't spill it to a file, the network buffer will fill up very quickly. This issue wouldn't occur if we waited right before the commit because, by that time, we would have already received all the data from the network. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Wed, Jul 3, 2024 at 3:35 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Jul 3, 2024 at 2:16 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Wed, Jul 3, 2024 at 12:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Wed, Jul 3, 2024 at 11:29 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > But waiting after applying the operations and before applying the > > > commit would mean that we need to wait with the locks held. That could > > > be a recipe for deadlocks in the system. I see your point related to > > > performance but as we are not expecting clock skew in normal cases, we > > > shouldn't be too much bothered on the performance due to this. If > > > there is clock skew, we expect users to fix it, this is just a > > > worst-case aid for users. > > > > But if we make it wait at the very first operation that means we will > > not suck more decoded data from the network and wouldn't that make the > > sender wait for the network buffer to get sucked in by the receiver? > > > > That would be true even if we wait just before applying the commit > record considering the transaction is small and the wait time is > large. > > > Also, we already have a handling of parallel apply workers so if we do > > not have an issue of deadlock there or if we can handle those issues > > there we can do it here as well no? > > > > Parallel apply workers won't wait for a long time. There is some > similarity and in both cases, deadlock will be detected but chances of > such implementation-related deadlocks will be higher if we start > waiting for a random amount of times. The other possibility is that we > can keep a cap on the max clock skew time above which we will give > ERROR even if the user has configured wait. +1. But I think cap has to be on wait-time. As an example, let's say the user has configured 'clock skew tolerance' of 10sec while the actual clock skew between nodes is 5 min. It means, we will mostly have to wait '5 min - 10sec' to bring the clock skew to a tolerable limit, which is a huge waiting time. We can keep a max limit on this wait time. thanks Shveta
On Wed, Jul 3, 2024 at 4:02 PM shveta malik <shveta.malik@gmail.com> wrote: > > On Wed, Jul 3, 2024 at 11:29 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Wed, Jul 3, 2024 at 11:00 AM shveta malik <shveta.malik@gmail.com> wrote: > > > > > > > Yes, I also think it should be independent of CDR. IMHO, it should be > > > > based on the user-configured maximum clock skew tolerance and can be > > > > independent of CDR. > > > > > > +1 > > > > > > > IIUC we would make the remote apply wait just > > > > before committing if the remote commit timestamp is ahead of the local > > > > clock by more than the maximum clock skew tolerance, is that correct? > > > > > > +1 on condition to wait. > > > > > > But I think we should make apply worker wait during begin > > > (apply_handle_begin) instead of commit. It makes more sense to delay > > > the entire operation to manage clock-skew rather than the commit > > > alone. And only then CDR's timestamp based resolution which are much > > > prior to commit-stage can benefit from this. Thoughts? > > > > But do we really need to wait at apply_handle_begin()? I mean if we > > already know the commit_ts then we can perform the conflict resolution > > no? > > I would like to highlight one point here that the resultant data may > be different depending upon at what stage (begin or commit) we > conclude to wait. Example: > > --max_clock_skew set to 0 i.e. no tolerance for clock skew. > --Remote Update with commit_timestamp = 10.20AM. > --Local clock (which is say 5 min behind) shows = 10.15AM. > > Case 1: Wait during Begin: > When remote update arrives at local node, apply worker waits till > local clock hits 'remote's commit_tts - max_clock_skew' i.e. till > 10.20 AM. In the meantime (during the wait period of apply worker) if > some local update on the same row has happened at say 10.18am (local > clock), that will be applied first. Now when apply worker's wait is > over, it will detect 'update_diffe'r conflict and as per > 'last_update_win', remote_tuple will win as 10.20 is latest than > 10.18. > > Case 2: Wait during Commit: > When remote update arrives at local node, it finds no conflict and > goes for commit. But before commit, it waits till the local clock hits > 10.20 AM. In the meantime (during wait period of apply worker)) if > some local update is trying to update the same row say at 10.18, it > has to wait (due to locks taken by remote update on that row) and > remote tuple will get committed first with commit timestamp of 10.20. > Then local update will proceed and will overwrite remote tuple. > > So in case1, remote tuple is the final change while in case2, local > tuple is the final change. Got it, but which case is correct, I think both. Because in case-1 local commit's commit_ts is 10:18 and the remote commit's commit_ts is 10:20 so remote apply wins. And case 2, the remote commit's commit_ts is 10:20 whereas the local commit's commit_ts must be 10:20 + delta (because it waited for the remote transaction to get committed). Now say which is better, in case-1 we have to make the remote apply to wait at the beginning state without knowing what would be the local clock when it actually comes to commit, it may so happen that if we choose case-2 by the time the remote transaction finish applying the local clock is beyond 10:20 and we do not even need to wait? -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Wed, Jul 3, 2024 at 4:04 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Wed, Jul 3, 2024 at 3:35 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Wed, Jul 3, 2024 at 2:16 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > On Wed, Jul 3, 2024 at 12:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > On Wed, Jul 3, 2024 at 11:29 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > > But waiting after applying the operations and before applying the > > > > commit would mean that we need to wait with the locks held. That could > > > > be a recipe for deadlocks in the system. I see your point related to > > > > performance but as we are not expecting clock skew in normal cases, we > > > > shouldn't be too much bothered on the performance due to this. If > > > > there is clock skew, we expect users to fix it, this is just a > > > > worst-case aid for users. > > > > > > But if we make it wait at the very first operation that means we will > > > not suck more decoded data from the network and wouldn't that make the > > > sender wait for the network buffer to get sucked in by the receiver? > > > > > > > That would be true even if we wait just before applying the commit > > record considering the transaction is small and the wait time is > > large. > > What I am saying is that if we are not applying the whole transaction, > it means we are not receiving it either unless we plan to spill it to > a file. If we don't spill it to a file, the network buffer will fill > up very quickly. This issue wouldn't occur if we waited right before > the commit because, by that time, we would have already received all > the data from the network. > We would have received the transaction data but there could be other transactions that need to wait because the apply worker is waiting before the commit. So, the situation will be the same. We can even decide to spill the data to files if the decision is that we need to wait to avoid network buffer-fill situations. But note that the wait in apply worker has consequences that the subscriber won't be able to confirm the flush position and publisher won't be able to vacuum the dead rows and we won't be remove WAL as well. Last time when we discussed the delay_apply feature, we decided not to proceed because of such issues. This is the reason I proposed a cap on wait time. -- With Regards, Amit Kapila.
On Wed, Jul 3, 2024 at 4:48 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Jul 3, 2024 at 4:04 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Wed, Jul 3, 2024 at 3:35 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Wed, Jul 3, 2024 at 2:16 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > > > On Wed, Jul 3, 2024 at 12:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > On Wed, Jul 3, 2024 at 11:29 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > > > > But waiting after applying the operations and before applying the > > > > > commit would mean that we need to wait with the locks held. That could > > > > > be a recipe for deadlocks in the system. I see your point related to > > > > > performance but as we are not expecting clock skew in normal cases, we > > > > > shouldn't be too much bothered on the performance due to this. If > > > > > there is clock skew, we expect users to fix it, this is just a > > > > > worst-case aid for users. > > > > > > > > But if we make it wait at the very first operation that means we will > > > > not suck more decoded data from the network and wouldn't that make the > > > > sender wait for the network buffer to get sucked in by the receiver? > > > > > > > > > > That would be true even if we wait just before applying the commit > > > record considering the transaction is small and the wait time is > > > large. > > > > What I am saying is that if we are not applying the whole transaction, > > it means we are not receiving it either unless we plan to spill it to > > a file. If we don't spill it to a file, the network buffer will fill > > up very quickly. This issue wouldn't occur if we waited right before > > the commit because, by that time, we would have already received all > > the data from the network. > > > > We would have received the transaction data but there could be other > transactions that need to wait because the apply worker is waiting > before the commit. Yeah, that's a valid point, can parallel apply worker help here? So, the situation will be the same. We can even > decide to spill the data to files if the decision is that we need to > wait to avoid network buffer-fill situations. But note that the wait > in apply worker has consequences that the subscriber won't be able to > confirm the flush position and publisher won't be able to vacuum the > dead rows and we won't be remove WAL as well. Last time when we > discussed the delay_apply feature, we decided not to proceed because > of such issues. This is the reason I proposed a cap on wait time. Yes, spilling to file or cap on the wait time should help, and as I said above maybe a parallel apply worker can also help. So I agree that the problem with network buffers arises in both cases, whether we wait before committing or before beginning. So keeping that in mind I don't have any strong objections against waiting at the beginning if it simplifies the design compared to waiting at the commit. However, one point to remember in favor of waiting before applying the commit is that if we decide to wait before beginning the transaction, we would end up waiting in many more cases compared to waiting before committing. Because in cases, when transactions are large and the clock skew is small, the local clock would have already passed the remote commit_ts by the time we reach the commit. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Wed, Jul 3, 2024 at 4:12 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Wed, Jul 3, 2024 at 4:02 PM shveta malik <shveta.malik@gmail.com> wrote: > > > > On Wed, Jul 3, 2024 at 11:29 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > On Wed, Jul 3, 2024 at 11:00 AM shveta malik <shveta.malik@gmail.com> wrote: > > > > > > > > > Yes, I also think it should be independent of CDR. IMHO, it should be > > > > > based on the user-configured maximum clock skew tolerance and can be > > > > > independent of CDR. > > > > > > > > +1 > > > > > > > > > IIUC we would make the remote apply wait just > > > > > before committing if the remote commit timestamp is ahead of the local > > > > > clock by more than the maximum clock skew tolerance, is that correct? > > > > > > > > +1 on condition to wait. > > > > > > > > But I think we should make apply worker wait during begin > > > > (apply_handle_begin) instead of commit. It makes more sense to delay > > > > the entire operation to manage clock-skew rather than the commit > > > > alone. And only then CDR's timestamp based resolution which are much > > > > prior to commit-stage can benefit from this. Thoughts? > > > > > > But do we really need to wait at apply_handle_begin()? I mean if we > > > already know the commit_ts then we can perform the conflict resolution > > > no? > > > > I would like to highlight one point here that the resultant data may > > be different depending upon at what stage (begin or commit) we > > conclude to wait. Example: > > > > --max_clock_skew set to 0 i.e. no tolerance for clock skew. > > --Remote Update with commit_timestamp = 10.20AM. > > --Local clock (which is say 5 min behind) shows = 10.15AM. > > > > Case 1: Wait during Begin: > > When remote update arrives at local node, apply worker waits till > > local clock hits 'remote's commit_tts - max_clock_skew' i.e. till > > 10.20 AM. In the meantime (during the wait period of apply worker) if > > some local update on the same row has happened at say 10.18am (local > > clock), that will be applied first. Now when apply worker's wait is > > over, it will detect 'update_diffe'r conflict and as per > > 'last_update_win', remote_tuple will win as 10.20 is latest than > > 10.18. > > > > Case 2: Wait during Commit: > > When remote update arrives at local node, it finds no conflict and > > goes for commit. But before commit, it waits till the local clock hits > > 10.20 AM. In the meantime (during wait period of apply worker)) if > > some local update is trying to update the same row say at 10.18, it > > has to wait (due to locks taken by remote update on that row) and > > remote tuple will get committed first with commit timestamp of 10.20. > > Then local update will proceed and will overwrite remote tuple. > > > > So in case1, remote tuple is the final change while in case2, local > > tuple is the final change. > > Got it, but which case is correct, I think both. Because in case-1 > local commit's commit_ts is 10:18 and the remote commit's commit_ts is > 10:20 so remote apply wins. And case 2, the remote commit's commit_ts > is 10:20 whereas the local commit's commit_ts must be 10:20 + delta > (because it waited for the remote transaction to get committed). > > Now say which is better, in case-1 we have to make the remote apply to > wait at the beginning state without knowing what would be the local > clock when it actually comes to commit, it may so happen that if we > choose case-2 by the time the remote transaction finish applying the > local clock is beyond 10:20 and we do not even need to wait? yes, agree that wait time could be lesser to some extent in case 2. But the wait during commit will make user operations on the same row wait, without user having any clue on concurrent blocking operations. I am not sure if it will be acceptable. thanks Shveta
On Wed, Jul 3, 2024 at 5:08 PM shveta malik <shveta.malik@gmail.com> wrote: > > On Wed, Jul 3, 2024 at 4:12 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Wed, Jul 3, 2024 at 4:02 PM shveta malik <shveta.malik@gmail.com> wrote: > > > > > > On Wed, Jul 3, 2024 at 11:29 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > > > On Wed, Jul 3, 2024 at 11:00 AM shveta malik <shveta.malik@gmail.com> wrote: > > > > > > > > > > > Yes, I also think it should be independent of CDR. IMHO, it should be > > > > > > based on the user-configured maximum clock skew tolerance and can be > > > > > > independent of CDR. > > > > > > > > > > +1 > > > > > > > > > > > IIUC we would make the remote apply wait just > > > > > > before committing if the remote commit timestamp is ahead of the local > > > > > > clock by more than the maximum clock skew tolerance, is that correct? > > > > > > > > > > +1 on condition to wait. > > > > > > > > > > But I think we should make apply worker wait during begin > > > > > (apply_handle_begin) instead of commit. It makes more sense to delay > > > > > the entire operation to manage clock-skew rather than the commit > > > > > alone. And only then CDR's timestamp based resolution which are much > > > > > prior to commit-stage can benefit from this. Thoughts? > > > > > > > > But do we really need to wait at apply_handle_begin()? I mean if we > > > > already know the commit_ts then we can perform the conflict resolution > > > > no? > > > > > > I would like to highlight one point here that the resultant data may > > > be different depending upon at what stage (begin or commit) we > > > conclude to wait. Example: > > > > > > --max_clock_skew set to 0 i.e. no tolerance for clock skew. > > > --Remote Update with commit_timestamp = 10.20AM. > > > --Local clock (which is say 5 min behind) shows = 10.15AM. > > > > > > Case 1: Wait during Begin: > > > When remote update arrives at local node, apply worker waits till > > > local clock hits 'remote's commit_tts - max_clock_skew' i.e. till > > > 10.20 AM. In the meantime (during the wait period of apply worker) if > > > some local update on the same row has happened at say 10.18am (local > > > clock), that will be applied first. Now when apply worker's wait is > > > over, it will detect 'update_diffe'r conflict and as per > > > 'last_update_win', remote_tuple will win as 10.20 is latest than > > > 10.18. > > > > > > Case 2: Wait during Commit: > > > When remote update arrives at local node, it finds no conflict and > > > goes for commit. But before commit, it waits till the local clock hits > > > 10.20 AM. In the meantime (during wait period of apply worker)) if > > > some local update is trying to update the same row say at 10.18, it > > > has to wait (due to locks taken by remote update on that row) and > > > remote tuple will get committed first with commit timestamp of 10.20. > > > Then local update will proceed and will overwrite remote tuple. > > > > > > So in case1, remote tuple is the final change while in case2, local > > > tuple is the final change. > > > > Got it, but which case is correct, I think both. Because in case-1 > > local commit's commit_ts is 10:18 and the remote commit's commit_ts is > > 10:20 so remote apply wins. And case 2, the remote commit's commit_ts > > is 10:20 whereas the local commit's commit_ts must be 10:20 + delta > > (because it waited for the remote transaction to get committed). > > > > Now say which is better, in case-1 we have to make the remote apply to > > wait at the beginning state without knowing what would be the local > > clock when it actually comes to commit, it may so happen that if we > > choose case-2 by the time the remote transaction finish applying the > > local clock is beyond 10:20 and we do not even need to wait? > > yes, agree that wait time could be lesser to some extent in case 2. > But the wait during commit will make user operations on the same row > wait, without user having any clue on concurrent blocking operations. > I am not sure if it will be acceptable. I don't think there is any problem with the acceptance of user experience because even while applying the remote transaction (irrespective of whether we implement this wait feature) the user transaction might have to wait if updating the common rows right? -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Wed, Jul 3, 2024 at 12:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Jul 3, 2024 at 11:29 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Wed, Jul 3, 2024 at 11:00 AM shveta malik <shveta.malik@gmail.com> wrote: > > > > > > > Yes, I also think it should be independent of CDR. IMHO, it should be > > > > based on the user-configured maximum clock skew tolerance and can be > > > > independent of CDR. > > > > > > +1 > > > > > > > IIUC we would make the remote apply wait just > > > > before committing if the remote commit timestamp is ahead of the local > > > > clock by more than the maximum clock skew tolerance, is that correct? > > > > > > +1 on condition to wait. > > > > > > But I think we should make apply worker wait during begin > > > (apply_handle_begin) instead of commit. It makes more sense to delay > > > the entire operation to manage clock-skew rather than the commit > > > alone. And only then CDR's timestamp based resolution which are much > > > prior to commit-stage can benefit from this. Thoughts? > > > > But do we really need to wait at apply_handle_begin()? I mean if we > > already know the commit_ts then we can perform the conflict resolution > > no? I mean we should wait before committing because we are > > considering this remote transaction to be in the future and we do not > > want to confirm the commit of this transaction to the remote node > > before the local clock reaches the record commit_ts to preserve the > > causal order. However, we can still perform conflict resolution > > beforehand since we already know the commit_ts. The conflict > > resolution function will be something like "out_version = > > CRF(version1_commit_ts, version2_commit_ts)," so the result should be > > the same regardless of when we apply it, correct? From a performance > > standpoint, wouldn't it be beneficial to perform as much work as > > possible in advance? By the time we apply all the operations, the > > local clock might already be in sync with the commit_ts of the remote > > transaction. Am I missing something? > > > > But waiting after applying the operations and before applying the > commit would mean that we need to wait with the locks held. That could > be a recipe for deadlocks in the system. I see your point related to > performance but as we are not expecting clock skew in normal cases, we > shouldn't be too much bothered on the performance due to this. If > there is clock skew, we expect users to fix it, this is just a > worst-case aid for users. > Please find the new patch set. patch004 is the new patch which attempts to implement: 1) Either wait or error out on clock skew as configured. Please note that currently wait is implemented during 'begin'. Once the ongoing discussion is concluded, it can be changed as needed. 2) last_update_wins resolver. Thanks Nisha for providing the resolver related changes. Next to be done: 1) parallel apply worker related changes as mentioned in [1] 2) cap on wait time due to clock skew 3) resolvers for delete_differ as conflict detection thread [2] has implemented detection for that. [1]: https://www.postgresql.org/message-id/CAFiTN-sf23K%3DsRsnxw-BKNJqg5P6JXcqXBBkx%3DEULX8QGSQYaw%40mail.gmail.com [2]: https://www.postgresql.org/message-id/OS0PR01MB571686E464A325F26CEFCCEF94DD2%40OS0PR01MB5716.jpnprd01.prod.outlook.com thanks Shveta
Attachment
- v3-0005-Configure-table-level-conflict-resolvers.patch
- v3-0001-Detect-and-log-conflicts-in-logical-replication.patch
- v3-0003-Implement-conflict-resolution-for-INSERT-UPDATE-a.patch
- v3-0002-DDL-command-to-configure-Global-Conflict-Resolver.patch
- v3-0004-Manage-Clock-skew-and-implement-last_update_wins.patch
On Mon, Jul 1, 2024 at 6:54 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Mon, Jul 1, 2024 at 1:35 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > Setting resolvers at table-level and subscription-level sounds good to > > me. DDLs for setting resolvers at subscription-level would need the > > subscription name to be specified? > > > > Yes, it should be part of the ALTER/CREATE SUBSCRIPTION command. One > idea could be to have syntax as follows: > > ALTER SUBSCRIPTION name SET CONFLICT RESOLVER 'conflict_resolver' FOR > 'conflict_type'; > ALTER SUBSCRIPTION name RESET CONFLICT RESOLVER FOR 'conflict_type'; > > CREATE SUBSCRIPTION subscription_name CONNECTION 'conninfo' > PUBLICATION publication_name [, ...] CONFLICT RESOLVER > 'conflict_resolver' FOR 'conflict_type'; Looks good to me. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
On Wed, Jul 3, 2024 at 5:06 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Wed, Jul 3, 2024 at 4:48 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Wed, Jul 3, 2024 at 4:04 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > > > > What I am saying is that if we are not applying the whole transaction, > > > it means we are not receiving it either unless we plan to spill it to > > > a file. If we don't spill it to a file, the network buffer will fill > > > up very quickly. This issue wouldn't occur if we waited right before > > > the commit because, by that time, we would have already received all > > > the data from the network. > > > > > > > We would have received the transaction data but there could be other > > transactions that need to wait because the apply worker is waiting > > before the commit. > > Yeah, that's a valid point, can parallel apply worker help here? > > So, the situation will be the same. We can even > > decide to spill the data to files if the decision is that we need to > > wait to avoid network buffer-fill situations. But note that the wait > > in apply worker has consequences that the subscriber won't be able to > > confirm the flush position and publisher won't be able to vacuum the > > dead rows and we won't be remove WAL as well. Last time when we > > discussed the delay_apply feature, we decided not to proceed because > > of such issues. This is the reason I proposed a cap on wait time. > > Yes, spilling to file or cap on the wait time should help, and as I > said above maybe a parallel apply worker can also help. > It is not clear to me how a parallel apply worker can help in this case. Can you elaborate on what you have in mind? -- With Regards, Amit Kapila.
On Thu, Jul 4, 2024 at 5:37 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > So, the situation will be the same. We can even > > > decide to spill the data to files if the decision is that we need to > > > wait to avoid network buffer-fill situations. But note that the wait > > > in apply worker has consequences that the subscriber won't be able to > > > confirm the flush position and publisher won't be able to vacuum the > > > dead rows and we won't be remove WAL as well. Last time when we > > > discussed the delay_apply feature, we decided not to proceed because > > > of such issues. This is the reason I proposed a cap on wait time. > > > > Yes, spilling to file or cap on the wait time should help, and as I > > said above maybe a parallel apply worker can also help. > > > > It is not clear to me how a parallel apply worker can help in this > case. Can you elaborate on what you have in mind? If we decide to wait at commit time, and before starting to apply if we already see a remote commit_ts clock is ahead, then if we apply such transactions using the parallel worker, wouldn't it solve the issue of the network buffer congestion? Now the apply worker can move ahead and fetch new transactions from the buffer as our waiting transaction will not block it. I understand that if this transaction is going to wait at commit then any future transaction that we are going to fetch might also going to wait again because if the previous transaction committed before is in the future then the subsequent transaction committed after this must also be in future so eventually that will also go to some another parallel worker and soon we end up consuming all the parallel worker if the clock skew is large. So I won't say this will resolve the problem and we would still have to fall back to the spilling to the disk but that's just in the worst case when the clock skew is really huge. In most cases which is due to slight clock drift by the time we apply the medium to large size transaction, the local clock should be able to catch up the remote commit_ts and we might not have to wait in most of the cases. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Please find the new patch set (v4). It implements the resolvers for conflict type : 'delete_differ'. Supported resolutions for ‘delete_differ’ are : - ‘last_update_wins': Apply the change with the latest timestamp (default) - 'remote_apply': Apply the remote delete. - 'keep_local': Skip the remote delete and continue. - 'error': The apply worker will error out and restart. The changes made in the patches are as follows: - Updated the conflict detection patch (patch0001) to the latest version from [1], which implements delete_differ conflict detection. - Patch0002 now supports resolver settings for delete_differ. - Patch0003 implements resolutions for delete_differ as well. - Patch0004 includes changes to support last_update_wins resolution for delete_differ. [1] https://www.postgresql.org/message-id/OS0PR01MB571686E464A325F26CEFCCEF94DD2%40OS0PR01MB5716.jpnprd01.prod.outlook.com -- Thanks, Nisha
Attachment
- v4-0001-Detect-and-log-conflicts-in-logical-replication.patch
- v4-0002-DDL-command-to-configure-Global-Conflict-Resolver.patch
- v4-0003-Implement-conflict-resolution-for-INSERT-UPDATE-a.patch
- v4-0004-Manage-Clock-skew-and-implement-last_update_wins.patch
- v4-0005-Configure-table-level-conflict-resolvers.patch
On Fri, Jul 5, 2024 at 11:58 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Thu, Jul 4, 2024 at 5:37 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > So, the situation will be the same. We can even > > > > decide to spill the data to files if the decision is that we need to > > > > wait to avoid network buffer-fill situations. But note that the wait > > > > in apply worker has consequences that the subscriber won't be able to > > > > confirm the flush position and publisher won't be able to vacuum the > > > > dead rows and we won't be remove WAL as well. Last time when we > > > > discussed the delay_apply feature, we decided not to proceed because > > > > of such issues. This is the reason I proposed a cap on wait time. > > > > > > Yes, spilling to file or cap on the wait time should help, and as I > > > said above maybe a parallel apply worker can also help. > > > > > > > It is not clear to me how a parallel apply worker can help in this > > case. Can you elaborate on what you have in mind? > > If we decide to wait at commit time, and before starting to apply if > we already see a remote commit_ts clock is ahead, then if we apply > such transactions using the parallel worker, wouldn't it solve the > issue of the network buffer congestion? Now the apply worker can move > ahead and fetch new transactions from the buffer as our waiting > transaction will not block it. I understand that if this transaction > is going to wait at commit then any future transaction that we are > going to fetch might also going to wait again because if the previous > transaction committed before is in the future then the subsequent > transaction committed after this must also be in future so eventually > that will also go to some another parallel worker and soon we end up > consuming all the parallel worker if the clock skew is large. So I > won't say this will resolve the problem and we would still have to > fall back to the spilling to the disk but that's just in the worst > case when the clock skew is really huge. In most cases which is due > to slight clock drift by the time we apply the medium to large size > transaction, the local clock should be able to catch up the remote > commit_ts and we might not have to wait in most of the cases. > Yeah, this is possible but even if go with the spilling logic at first it should work for all cases. If we get some complaints then we can explore executing such transactions by parallel apply workers. Personally, I am of the opinion that clock synchronization should be handled outside the database system via network time protocols like NTP. Still, we can have some simple solution to inform users about the clock_skew. -- With Regards, Amit Kapila.
On Fri, Jul 5, 2024 at 2:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Jul 5, 2024 at 11:58 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Thu, Jul 4, 2024 at 5:37 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > So, the situation will be the same. We can even > > > > > decide to spill the data to files if the decision is that we need to > > > > > wait to avoid network buffer-fill situations. But note that the wait > > > > > in apply worker has consequences that the subscriber won't be able to > > > > > confirm the flush position and publisher won't be able to vacuum the > > > > > dead rows and we won't be remove WAL as well. Last time when we > > > > > discussed the delay_apply feature, we decided not to proceed because > > > > > of such issues. This is the reason I proposed a cap on wait time. > > > > > > > > Yes, spilling to file or cap on the wait time should help, and as I > > > > said above maybe a parallel apply worker can also help. > > > > > > > > > > It is not clear to me how a parallel apply worker can help in this > > > case. Can you elaborate on what you have in mind? > > > > If we decide to wait at commit time, and before starting to apply if > > we already see a remote commit_ts clock is ahead, then if we apply > > such transactions using the parallel worker, wouldn't it solve the > > issue of the network buffer congestion? Now the apply worker can move > > ahead and fetch new transactions from the buffer as our waiting > > transaction will not block it. I understand that if this transaction > > is going to wait at commit then any future transaction that we are > > going to fetch might also going to wait again because if the previous > > transaction committed before is in the future then the subsequent > > transaction committed after this must also be in future so eventually > > that will also go to some another parallel worker and soon we end up > > consuming all the parallel worker if the clock skew is large. So I > > won't say this will resolve the problem and we would still have to > > fall back to the spilling to the disk but that's just in the worst > > case when the clock skew is really huge. In most cases which is due > > to slight clock drift by the time we apply the medium to large size > > transaction, the local clock should be able to catch up the remote > > commit_ts and we might not have to wait in most of the cases. > > > > Yeah, this is possible but even if go with the spilling logic at first > it should work for all cases. If we get some complaints then we can > explore executing such transactions by parallel apply workers. > Personally, I am of the opinion that clock synchronization should be > handled outside the database system via network time protocols like > NTP. Still, we can have some simple solution to inform users about the > clock_skew. Yeah, that makes sense, in the first version we can have a simple solution and we can further improvise it based on the feedback. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Mon, Jul 1, 2024 at 1:17 PM Ajin Cherian <itsajin@gmail.com> wrote: > > > > On Thu, Jun 27, 2024 at 1:14 PM Nisha Moond <nisha.moond412@gmail.com> wrote: >> >> Please find the attached 'patch0003', which implements conflict >> resolutions according to the global resolver settings. >> >> Summary of Conflict Resolutions Implemented in 'patch0003': >> >> INSERT Conflicts: >> ------------------------ >> 1) Conflict Type: 'insert_exists' >> >> Supported Resolutions: >> a) 'remote_apply': Convert the INSERT to an UPDATE and apply. >> b) 'keep_local': Ignore the incoming (conflicting) INSERT and retain >> the local tuple. >> c) 'error': The apply worker will error out and restart. >> > > Hi Nisha, > > While testing the patch, when conflict resolution is configured and insert_exists is set to "remote_apply", I see thiswarning in the logs due to a resource not being closed: > > 2024-07-01 02:52:59.427 EDT [20304] LOG: conflict insert_exists detected on relation "public.test1" > 2024-07-01 02:52:59.427 EDT [20304] DETAIL: Key already exists. Applying resolution method "remote_apply" > 2024-07-01 02:52:59.427 EDT [20304] CONTEXT: processing remote data for replication origin "pg_16417" during message type"INSERT" for replication target relation "public.test1" in transaction 763, finished at 0/15E7F68 > 2024-07-01 02:52:59.427 EDT [20304] WARNING: resource was not closed: [138] (rel=base/5/16413, blockNum=0, flags=0x93800000,refcount=1 1) > 2024-07-01 02:52:59.427 EDT [20304] CONTEXT: processing remote data for replication origin "pg_16417" during message type"COMMIT" in transaction 763, finished at 0/15E7F68 > 2024-07-01 02:52:59.427 EDT [20304] WARNING: resource was not closed: TupleDesc 0x7f8c0439e448 (16402,-1) > 2024-07-01 02:52:59.427 EDT [20304] CONTEXT: processing remote data for replication origin "pg_16417" during message type"COMMIT" in transaction 763, finished at 0/15E7F68 > Thank you Ajin for reporting the issue, This is now fixed with the v4-0003 patch. -- Thanks, Nisha
Hi, I researched about how to detect the resolve update_deleted and thought about one idea: which is to maintain the xmin in logical slot to preserve the dead row and support latest_timestamp_xmin resolution for update_deleted to maintain data consistency. Here are details of the xmin idea and resolution of update_deleted: 1. how to preserve the dead row so that we can detect update_delete conflict correctly. (In the following explanation, let's assume there is a a multimeter setup with node A, B). To preserve the dead row on node A, I think we could maintain the "xmin" in the logical replication slot on Node A to prevent the VACCUM from removing the dead row in user table. The walsender that acquires the slot is responsible to advance the xmin. (Node that I am trying to explore xmin idea as it could be more efficient than using commit_timestamp, and the logic could be simpler as we are already maintaining catalog_xmin in logical slot and xmin in physical slot) - Strategy for advancing xmin: The xmin can be advanced if a) a transaction (xid:1000) has been flushed to the remote node (Node B in this case). *AND* b) On Node B, the local transactions that happened before applying the remote transaction(xid:1000) were also sent and flushed to the Node A. - The implementation: condition a) can be achieved with existing codes, the walsender can advance the xmin similar to the catalog_xmin. For condition b), we can add a subscription option (say 'feedback_slot'). The feedback_slot indicates the replication slot that will send changes to the origin (On Node B, the slot should be subBA). The apply worker will check the status(confirmed flush lsn) of the 'feedback slot' and send feedback to the walsender about the WAL position that has been sent and flushed via the feedback_slot. For example, on Node B, we specify the replication slot (subBA) that is sending changes to Node A. The apply worker on Node B will send feedback(WAL position that has been sent to the Node A) to Node A regularly. Then the Node A can use the position to advance the xmin. (Similar to the hot_standby_feedback). 2. The resolution for update_delete The current design doesn't support 'last_timestamp_win'. But this could be a problem if update_deleted is detected due to some very old dead row. Assume the update has the latest timestamp, and if we skip the update due to these very old dead rows, the data would be inconsistent because the latest update data is missing. The ideal resolution should compare the timestamp of the UPDATE and the timestamp of the transaction that produced these dead rows. If the UPDATE is newer, the convert the UDPATE to INSERT, otherwise, skip the UPDATE. Best Regards, Hou zj
On Monday, July 8, 2024 12:32 PM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote: > > I researched about how to detect the resolve update_deleted and thought > about one idea: which is to maintain the xmin in logical slot to preserve > the dead row and support latest_timestamp_xmin resolution for > update_deleted to maintain data consistency. > > Here are details of the xmin idea and resolution of update_deleted: > > 1. how to preserve the dead row so that we can detect update_delete > conflict correctly. (In the following explanation, let's assume there is a > a multimeter setup with node A, B). > > To preserve the dead row on node A, I think we could maintain the "xmin" > in the logical replication slot on Node A to prevent the VACCUM from > removing the dead row in user table. The walsender that acquires the slot > is responsible to advance the xmin. (Node that I am trying to explore > xmin idea as it could be more efficient than using commit_timestamp, and the > logic could be simpler as we are already maintaining catalog_xmin in > logical slot and xmin in physical slot) > > - Strategy for advancing xmin: > > The xmin can be advanced if a) a transaction (xid:1000) has been flushed > to the remote node (Node B in this case). *AND* b) On Node B, the local > transactions that happened before applying the remote > transaction(xid:1000) were also sent and flushed to the Node A. > > - The implementation: > > condition a) can be achieved with existing codes, the walsender can > advance the xmin similar to the catalog_xmin. > > For condition b), we can add a subscription option (say 'feedback_slot'). > The feedback_slot indicates the replication slot that will send changes to > the origin (On Node B, the slot should be subBA). The apply worker will > check the status(confirmed flush lsn) of the 'feedback slot' and send > feedback to the walsender about the WAL position that has been sent and > flushed via the feedback_slot. The above are some initial thoughts of how to preserve the dead row for update_deleted conflict detection. After thinking more, I have identified a few additional cases that I missed to analyze regarding the design. One aspect that needs more thoughts is the possibility of multiple slots on each node. In this scenario, the 'feedback_slot' subscription option would need to be structured as a list. However, requiring users to specify all the slots may not be user-friendly. I will explore if this process can be automated. In addition, I will think more about the potential impact of re-using the existing 'xmin' of the slot which may affect existing logic that relies on 'xmin'. I will analyze more and reply about these points. Best Regards, Hou zj
On Fri, Jul 5, 2024 at 5:12 PM Nisha Moond <nisha.moond412@gmail.com> wrote: > > Thank you Ajin for reporting the issue, This is now fixed with the > v4-0003 patch. Please find v5 patch-set. Changes are: 1) patch003: Added test cases for all resolvers (034_conflict_resolver.pl). 2) Patch004: a) Emit error while resolving conflict if conflict resolver is default 'last_update_wins' but track_commit_timetsamp is not enabled. b) Emit Warning during create and alter subscription when 'detect_conflict' is ON but 'track_commit_timetsamp' is not enabled. c) Restrict start of pa worker if either max-clock-skew is configured or conflict detection and resolution is enabled for a subscription. d) Implement clock-skew delay/error when changes are applied from a file (apply_spooled_messages). e) Implement clock-skew delay while applying prepared changes (two phase txns). The prepare-timestamp to be considered as base for clock-skew handling as well as for last_update_win resolver. <TODO: This needs to be analyzed and tested further to see if there is any side effect of taking prepare-timestamp as base.> Thanks Ajin fo working on 1. Thanks Nisha for working on 2a,2b. thanks Shveta
Attachment
- v5-0002-DDL-command-to-configure-Global-Conflict-Resolver.patch
- v5-0003-Conflict-resolvers-for-insert-update-and-delete.patch
- v5-0004-Manage-Clock-skew-and-implement-last_update_wins.patch
- v5-0001-Detect-and-log-conflicts-in-logical-replication.patch
- v5-0005-Configure-table-level-conflict-resolvers.patch
On Tue, Jul 9, 2024 at 3:09 PM shveta malik <shveta.malik@gmail.com> wrote: > > On Fri, Jul 5, 2024 at 5:12 PM Nisha Moond <nisha.moond412@gmail.com> wrote: > > > > Thank you Ajin for reporting the issue, This is now fixed with the > > v4-0003 patch. > > Please find v5 patch-set. Changes are: > > 1) patch003: > Added test cases for all resolvers (034_conflict_resolver.pl). > > 2) Patch004: > a) Emit error while resolving conflict if conflict resolver is default > 'last_update_wins' but track_commit_timetsamp is not enabled. > b) Emit Warning during create and alter subscription when > 'detect_conflict' is ON but 'track_commit_timetsamp' is not enabled. > c) Restrict start of pa worker if either max-clock-skew is configured > or conflict detection and resolution is enabled for a subscription. > d) Implement clock-skew delay/error when changes are applied from a > file (apply_spooled_messages). > e) Implement clock-skew delay while applying prepared changes (two > phase txns). The prepare-timestamp to be considered as base for > clock-skew handling as well as for last_update_win resolver. > <TODO: This needs to be analyzed and tested further to see if there is > any side effect of taking prepare-timestamp as base.> > > Thanks Ajin fo working on 1. > Thanks Nisha for working on 2a,2b. > Please find v6 patch-set. Changes are: 1) patch003: 1a) Improved log and restructured code around it. 1b) added test case for delete_differ. 2) patch004: 2a) Local and remote timestamps were logged incorrectly due to a bug, corrected that. 2b) Added tests for last_update_wins. 2c) Added a cap on wait time; introduced a new GUC for this. Apply worker will now error out without waiting if the computed wait exceeds this GUC's value. 2d) Restricted enabling two_phase and detect_conflict together for a subscription. This is because the time based resolvers may result in data divergence for two phase commit transactions if prepare-timestamp is used for comparison. Thanks Nisha for working on 1a to 2b. thanks Shveta
Attachment
- v6-0003-Conflict-resolvers-for-insert-update-and-delete.patch
- v6-0004-Manage-Clock-skew-and-implement-last_update_wins.patch
- v6-0001-Detect-and-log-conflicts-in-logical-replication.patch
- v6-0002-DDL-command-to-configure-Global-Conflict-Resolver.patch
- v6-0005-Configure-table-level-conflict-resolvers.patch
On Wed, Jul 17, 2024 at 4:01 PM shveta malik <shveta.malik@gmail.com> wrote:
Please find v6 patch-set. Changes are:
Please find v7 patch-set, the changes are:
Patch 0001 - Reflects v5 of Conflict Detection patch in [1].
Patch 0002:
a) Removed global CONFLICT RESOLVER syntax and logic.
b) Added new syntax for creating CONFLICT RESOLVERs at the subscription level.
Syntax for CREATE SUBSCRIPTION:
CREATE SUBSCRIPTION <subname> CONNECTION <conninfo> PUBLICATION <pubname> CONFLICT RESOLVER
(conflict_type1 = resolver1, conflict_type2 = resolver2, conflict_type3 = resolver3, ...);
Syntax for ALTER SUBSCRIPTION:
ALTER SUBSCRIPTION <subname> CONFLICT RESOLVER
(conflict_type1 = resolver1, conflict_type2 = resolver2, conflict_type3 = resolver3, ...);
Patch 0003 - Supports subscription-level resolvers for conflict resolution.
Patch 0004 - Modified last_update_win related test cases to reflect the new syntax.
Patch 0005 - Dropped for the time being; will rebase and post in the next version.
Thanks to Shveta for design discussions and thanks to Nisha for helping in rebasing the patch and helping in testing and stabilizing the patch by providing comments off-list.
[1] - https://www.postgresql.org/message-id/OS0PR01MB57166C2566E00676649CF48B94AC2@OS0PR01MB5716.jpnprd01.prod.outlook.com
Patch 0001 - Reflects v5 of Conflict Detection patch in [1].
Patch 0002:
a) Removed global CONFLICT RESOLVER syntax and logic.
b) Added new syntax for creating CONFLICT RESOLVERs at the subscription level.
Syntax for CREATE SUBSCRIPTION:
CREATE SUBSCRIPTION <subname> CONNECTION <conninfo> PUBLICATION <pubname> CONFLICT RESOLVER
(conflict_type1 = resolver1, conflict_type2 = resolver2, conflict_type3 = resolver3, ...);
Syntax for ALTER SUBSCRIPTION:
ALTER SUBSCRIPTION <subname> CONFLICT RESOLVER
(conflict_type1 = resolver1, conflict_type2 = resolver2, conflict_type3 = resolver3, ...);
Patch 0003 - Supports subscription-level resolvers for conflict resolution.
Patch 0004 - Modified last_update_win related test cases to reflect the new syntax.
Patch 0005 - Dropped for the time being; will rebase and post in the next version.
Thanks to Shveta for design discussions and thanks to Nisha for helping in rebasing the patch and helping in testing and stabilizing the patch by providing comments off-list.
[1] - https://www.postgresql.org/message-id/OS0PR01MB57166C2566E00676649CF48B94AC2@OS0PR01MB5716.jpnprd01.prod.outlook.com
Attachment
On Fri, Jul 26, 2024 at 9:50 AM Ajin Cherian <itsajin@gmail.com> wrote: >> > Please find v7 patch-set, the changes are: > Thanks Ajin for working on this. Please find few comments: 1) parse_subscription_conflict_resolvers(): Here we loop in this function to find the given conflict type in the supported list and error out if conflict-type is not valid. Also we call validate_conflict_type_and_resolver() which again validates conflict-type. I would recommend to loop 'stmtresolvers' in parse function and then read each type and resolver and pass that to validate_conflict_type_and_resolver(). Avoid double validation. 2) SetSubConflictResolver(): It works well, but it does not look apt that the 'resolvers' passed to this function by the caller is an array and this function knows the array range and traverse from CT_MIN to CT_MAX assuming this array maps directly to ConflictType. I think it would be better to have it passed as a list and then SetSubConflictResolver() traverse the list without knowing the range of it. Similar to what we do in alter-sub-flow in and around UpdateSubConflictResolvers(). 3) When I execute 'alter subscription ..(detect_conflict=on)' for a subscription which *already* has detect_conflict as ON, it tries to reset resolvers to default and ends up in error. It should actually be no-op in this particular situation and should not reset resolvers to default. postgres=# alter subscription sub1 set (detect_conflict=on); WARNING: Using default conflict resolvers ERROR: duplicate key value violates unique constraint "pg_subscription_conflict_sub_index" 4) Do we need SUBSCRIPTIONCONFLICTOID cache? We are not using it anywhere. Shall we remove this and the corresponding index? 5) RemoveSubscriptionConflictBySubid(). --We can remove extra blank line before table_open. --We can get rid of curly braces around CatalogTupleDelete() as it is a single line in loop. thanks Shveta
On Fri, Jul 26, 2024 at 9:50 AM Ajin Cherian <itsajin@gmail.com> wrote: Comment in 0002, 1) I do not see any test case that set a proper conflict type and conflict resolver, all tests either give incorrect conflict type/conflict resolver or the conflict resolver is ignored 0003 2) I was trying to think about this patch, so suppose we consider this case conflict_type-> update_differ resolver->remote_apply, my question is to confirm whether my understanding is correct. So if this is set and we have 2 nodes and set up a 2-way logical replication, and if a conflict occurs node-1 will take the changes of node-2 and node-2 will take the changes of node-1? Maybe so I think to avoid such cases user needs to set the resolver more thoughtfully, on node-1 it may be set as "skip" and on node-1 as "remote-apply" so in such cases if conflict happens both nodes will have the value from node-1. But maybe it would be more difficult to get a consistent value if we are setting up a mess replication topology right? Maybe there I think a more advanced timestamp-based option would work better IMHO. I am doing code level review as well and will share my comments soon on 0003 and 0004 -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Tue, Jul 30, 2024 at 4:04 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Fri, Jul 26, 2024 at 9:50 AM Ajin Cherian <itsajin@gmail.com> wrote: > > Comment in 0002, > > 1) I do not see any test case that set a proper conflict type and > conflict resolver, all tests either give incorrect conflict > type/conflict resolver or the conflict resolver is ignored > > 0003 > 2) I was trying to think about this patch, so suppose we consider this > case conflict_type-> update_differ resolver->remote_apply, my > question is to confirm whether my understanding is correct. So if > this is set and we have 2 nodes and set up a 2-way logical > replication, and if a conflict occurs node-1 will take the changes of > node-2 and node-2 will take the changes of node-1? Yes, that's right. > Maybe so I think > to avoid such cases user needs to set the resolver more thoughtfully, > on node-1 it may be set as "skip" and on node-1 as "remote-apply" so > in such cases if conflict happens both nodes will have the value from > node-1. But maybe it would be more difficult to get a consistent > value if we are setting up a mess replication topology right? Maybe > there I think a more advanced timestamp-based option would work better > IMHO. Yes, that's correct. We can get data divergence with resolvers like 'remote_apply', 'keep_local' etc. If you meant 'mesh' replication topology, then yes, it is difficult to get consistent value there with resolvers other than timestamp based. And thus timestamp based resolvers are needed and should be the default when implemented. thanks Shveta
On Tue, Jul 30, 2024 at 4:56 PM shveta malik <shveta.malik@gmail.com> wrote: > > On Tue, Jul 30, 2024 at 4:04 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Fri, Jul 26, 2024 at 9:50 AM Ajin Cherian <itsajin@gmail.com> wrote: > > > > Comment in 0002, > > > > 1) I do not see any test case that set a proper conflict type and > > conflict resolver, all tests either give incorrect conflict > > type/conflict resolver or the conflict resolver is ignored > > > > 0003 > > 2) I was trying to think about this patch, so suppose we consider this > > case conflict_type-> update_differ resolver->remote_apply, my > > question is to confirm whether my understanding is correct. So if > > this is set and we have 2 nodes and set up a 2-way logical > > replication, and if a conflict occurs node-1 will take the changes of > > node-2 and node-2 will take the changes of node-1? > > Yes, that's right. > > > Maybe so I think > > to avoid such cases user needs to set the resolver more thoughtfully, > > on node-1 it may be set as "skip" and on node-1 as "remote-apply" so > > in such cases if conflict happens both nodes will have the value from > > node-1. But maybe it would be more difficult to get a consistent > > value if we are setting up a mess replication topology right? Maybe > > there I think a more advanced timestamp-based option would work better > > IMHO. > > Yes, that's correct. We can get data divergence with resolvers like > 'remote_apply', 'keep_local' etc. If you meant 'mesh' replication > topology, then yes, it is difficult to get consistent value there with > resolvers other than timestamp based. And thus timestamp based > resolvers are needed and should be the default when implemented. > Thanks for the clarification. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Tue, Jul 30, 2024 at 2:19 PM shveta malik <shveta.malik@gmail.com> wrote:
On Fri, Jul 26, 2024 at 9:50 AM Ajin Cherian <itsajin@gmail.com> wrote:
>>
> Please find v7 patch-set, the changes are:
>
Thanks Ajin for working on this. Please find few comments:
1)
parse_subscription_conflict_resolvers():
Here we loop in this function to find the given conflict type in the
supported list and error out if conflict-type is not valid. Also we
call validate_conflict_type_and_resolver() which again validates
conflict-type. I would recommend to loop 'stmtresolvers' in parse
function and then read each type and resolver and pass that to
validate_conflict_type_and_resolver(). Avoid double validation.
I have modified this as per comment.
2)
SetSubConflictResolver():
It works well, but it does not look apt that the 'resolvers' passed to
this function by the caller is an array and this function knows the
array range and traverse from CT_MIN to CT_MAX assuming this array
maps directly to ConflictType. I think it would be better to have it
passed as a list and then SetSubConflictResolver() traverse the list
without knowing the range of it. Similar to what we do in
alter-sub-flow in and around UpdateSubConflictResolvers().
I have kept the array as it requires that all conflict resolvers be set, if not provided by the user then default needs to be used. However, I have modified SetSubConflictResolver such that it takes in the size of the array and does not assume it.
3)
When I execute 'alter subscription ..(detect_conflict=on)' for a
subscription which *already* has detect_conflict as ON, it tries to
reset resolvers to default and ends up in error. It should actually be
no-op in this particular situation and should not reset resolvers to
default.
postgres=# alter subscription sub1 set (detect_conflict=on);
WARNING: Using default conflict resolvers
ERROR: duplicate key value violates unique constraint
"pg_subscription_conflict_sub_index"
fixed
4)
Do we need SUBSCRIPTIONCONFLICTOID cache? We are not using it
anywhere. Shall we remove this and the corresponding index?
We are using the index but not the cache, so removing the cache.
5)
RemoveSubscriptionConflictBySubid().
--We can remove extra blank line before table_open.
--We can get rid of curly braces around CatalogTupleDelete() as it is
a single line in loop.
fixed.
On Tue, Jul 30, 2024 at 8:34 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Fri, Jul 26, 2024 at 9:50 AM Ajin Cherian <itsajin@gmail.com> wrote:
Comment in 0002,
1) I do not see any test case that set a proper conflict type and
conflict resolver, all tests either give incorrect conflict
type/conflict resolver or the conflict resolver is ignored
fixed.
I've also fixed a cfbot error due to patch 0001. Rebase of table resolver patch is still pending, will try and target that in the next patch-set.
regards,
Ajin Cherian
Fujitsu Australia
Attachment
On Wed, Aug 21, 2024 at 4:08 PM Nisha Moond <nisha.moond412@gmail.com> wrote: > > The patches have been rebased on the latest pgHead following the merge > of the conflict detection patch [1]. Thanks for working on patches. Summarizing the issues which need some suggestions/thoughts. 1) For subscription based resolvers, currently the syntax implemented is: 1a) CREATE SUBSCRIPTION <subname> CONNECTION <conninfo> PUBLICATION <pubname> CONFLICT RESOLVER (conflict_type1 = resolver1, conflict_type2 = resolver2, conflict_type3 = resolver3,...); 1b) ALTER SUBSCRIPTION <subname> CONFLICT RESOLVER (conflict_type1 = resolver1, conflict_type2 = resolver2, conflict_type3 = resolver3,...); Earlier the syntax suggested in [1] was: CREATE SUBSCRIPTION <subname> CONNECTION <conninfo> PUBLICATION <pubname> CONFLICT RESOLVER 'conflict_resolver1' FOR 'conflict_type1', CONFLICT RESOLVER 'conflict_resolver2' FOR 'conflict_type2'; I think the currently implemented syntax is good as it has less repetition, unless others think otherwise. ~~ 2) For subscription based resolvers, do we need a RESET command to reset resolvers to default? Any one of below or both? 2a) reset all at once: ALTER SUBSCRIPTION <name> RESET CONFLICT RESOLVERS 2b) reset one at a time: ALTER SUBSCRIPTION <name> RESET CONFLICT RESOLVER for 'conflict_type'; The issue I see here is, to implement 1a and 1b, we have introduced the 'RESOLVER' keyword. If we want to implement 2a, we will have to introduce the 'RESOLVERS' keyword as well. But we can come up with some alternative syntax if we plan to implement these. Thoughts? ~~ 3) Regarding update_exists: 3a) Currently update_exists resolver patch is kept separate. The reason being, it performs resolution which will need deletion of multiple rows. It will be good to discuss if we want to target this in the first draft. Please see the example: create table tab (a int primary key, b int unique, c int unique); Pub: insert into tab values (1,1,1); Sub: insert into tab values (2,20,30); insert into tab values (3,40,50); insert into tab values (4,60,70); Pub: update tab set a=2,b=40,c=70 where a=1; The above 'update' on pub will result in 'update_exists' on sub and if resolution is in favour of 'apply', then it will conflict with all the three local rows of subscriber due to unique constraint present on all three columns. Thus in order to resolve the conflict, it will have to delete these 3 rows on sub: 2,20,30 3,40,50 4,60,70 and then update 1,1,1 to 2,40,70. Just need opinion on if we shall target this in the initial draft. 3b) If we plan to implement this, we need to work on optimal design where we can find all the conflicting rows at once and delete those. Currently the implementation has been done using recursion i.e. find one conflicting row, then delete it and then next and so on i.e. we call apply_handle_update_internal() recursively. On initial code review, I feel it is doable to scan all indexes at once and get conflicting-tuple-ids in one go and get rid of recursion. It can be attempted once we decide on 3a. ~~ 4) Now for insert_exists and update_exists, we are doing a pre-scan of all unique indexes to find conflict. Also there is post-scan to figure out if the conflicting row is inserted meanwhile. This needs to be reviewed for optimization. We need to avoid pre-scan wherever possible. I think the only case for which it can be avoided is 'ERROR'. For the cases where resolver is in favor of remote-apply, we need to check conflict beforehand to avoid rollback of already inserted data. And for the case where resolver is in favor of skipping the change, then too we should know beforehand about the conflict to avoid heap-insertion and rollback. Thoughts? ~~ 5) Currently we only capture update_missing conflict i.e. we are not distinguishing between the missing row and the deleted row. We had discussed this in the past a couple of times. If we plan to target it in draft 1, I can dig up all old emails and resume discussion on this. ~~ 6) Table-level resolves. There was a suggestion earlier to implement table-level resolvers. The patch has been implemented to some extent, it can be completed and posted when we are done reviewing subscription level resolvers. ~~ [1]: https://www.postgresql.org/message-id/CAA4eK1LhD%3DC5UwDeKxC_5jK4_ADtM7g%2BMoFW9qhziSxHbVVfeQ%40mail.gmail.com For clock-skew and timestamp based resolution, if needed, I will post another email for the design items where suggestions are needed. thanks Shveta
On Thu, Aug 22, 2024 at 3:44 PM shveta malik <shveta.malik@gmail.com> wrote: > > > For clock-skew and timestamp based resolution, if needed, I will post > another email for the design items where suggestions are needed. > Please find issues which need some thoughts and approval for time-based resolution and clock-skew. 1) Time based conflict resolution and two phase transactions: Time based conflict resolution (last_update_wins) is the one resolution which will not result in data-divergence considering clock-skew is taken care of. But when it comes to two-phase transactions, it might not be the case. For two-phase transaction, we do not have commit timestamp when the changes are being applied. Thus for time-based comparison, initially it was decided to user prepare timestamp but it may result in data-divergence. Please see the example at [1]. Example at [1] is a tricky situation, and thus in the initial draft, we decided to restrict usage of 2pc and CDR together. The plan is: a) During Create subscription, if the user has given last_update_wins resolver for any conflict_type and 'two_phase' is also enabled, we ERROR out. b) During Alter subscription, if the user tries to update resolver to 'last_update_wins' but 'two_phase' is enabled, we error out. Another solution could be to save both prepare_ts and commit_ts. And when any txn comes for conflict resolution, we first check if prepare_ts is available, use that else use commit_ts. Availability of prepare_ts would indicate it was a prepared txn and thus even if it is committed, we should use prepare_ts for comparison for consistency. This will have some overhead of storing prepare_ts along with commit_ts. But if the number of prepared txns are reasonably small, this overhead should be less. We currently plan to go with restricting 2pc and last_update_wins together, unless others have different opinions. ~~ 2) parallel apply worker and conflict-resolution: As discussed in [2] (see last paragraph in [2]), for streaming of in-progress transactions by parallel worker, we do not have commit-timestamp with each change and thus it makes sense to disable parallel apply worker with CDR. The plan is to not start parallel apply worker if 'last_update_wins' is configured for any conflict_type. ~~ 3) parallel apply worker and clock skew management: Regarding clock-skew management as discussed in [3], we will wait for the local clock to come within tolerable range during 'begin' rather than before 'commit'. And this wait needs commit-timestamp in the beginning, thus we plan to restrict starting pa-worker even when clock-skew related GUCs are configured. Earlier we had restricted both 2pc and parallel worker worker start when detect_conflict was enabled, but now since detect_conflict parameter is removed, we will change the implementation to restrict all 3 above cases when last_update_wins is configured. When the changes are done, we will post the patch. ~~ 4) <not related to timestamp and clock skew> Earlier when 'detect_conflict' was enabled, we were giving WARNING if 'track_commit_timestamp' was not enabled. This was during CREATE and ALTER subscription. Now with this parameter removed, this WARNING has also been removed. But I think we need to bring back this WARNING. Currently default resolvers set may work without 'track_commit_timestamp' but when user gives CONFLICT RESOLVER in create-sub or alter-sub explicitly making them configured to non-default values (or say any values, does not matter if few are defaults), we may still emit this warning to alert user: 2024-07-26 09:14:03.152 IST [195415] WARNING: conflict detection could be incomplete due to disabled track_commit_timestamp 2024-07-26 09:14:03.152 IST [195415] DETAIL: Conflicts update_differ and delete_differ cannot be detected, and the origin and commit timestamp for the local row will not be logged. Thoughts? If we emit this WARNING during each resolution, then it may flood our log files, thus it seems better to emit it during create or alter subscription instead of during resolution. ~~ [1]: Example of 2pc inconsistency: --------------------------------------------------------- Two nodes, A and B, are subscribed to each other and have identical data. The last_update_wins strategy is configured. Both contain the data: '1, x, node'. Timeline of Events: 9:00 AM on Node A: A transaction (txn1) is prepared to update the row to '1, x, nodeAAA'. We'll refer to this as change1 on Node A. 9:01 AM on Node B: An update occurs for the row, changing it to '1, x, nodeBBB'. This update is then sent to Node A. We'll call this change2 on Node B. At 9:02 AM: --Node A: Still holds '1, x, node' because txn1 is not yet committed. --Node B: Holds '1, x, nodeBBB'. --Node B receives the prepared transaction from Node A at 9:02 AM and raises an update_differ conflict. --Since the local change occurred at 9:01 AM, which is later than the 9:00 AM prepare-timestamp from Node A, Node B retains its local change. At 9:05 AM: --Node A commits the prepared txn1. --The apply worker on Node A has been waiting to apply the changes from Node B because the tuple was locked by txn1. --Once the commit occurs, the apply worker proceeds with the update from Node B. --When update_differ is triggered, since the 9:05 AM commit-timestamp from Node A is later than the 9:01 AM commit-timestamp from Node B, Node A’s update wins. Final Data on Nodes: Node A: '1, x, nodeAAA' Node B: '1, x, nodeBBB' Despite the last_update_wins resolution, the nodes end up with different data. The data divergence happened because on node B, we used change1's prepare_ts (9.00) for comparison; while on node A, we used change1's commit_ts(9.05) for comparison. --------------------------------------------------------- [2]: https://www.postgresql.org/message-id/CAFiTN-sf23K%3DsRsnxw-BKNJqg5P6JXcqXBBkx%3DEULX8QGSQYaw%40mail.gmail.com [3]: https://www.postgresql.org/message-id/CAA4eK1%2BhdMmwEEiMb4z6x7JgQbw1jU2XyP1U7dNObyUe4JQQWg%40mail.gmail.com thanks Shveta
On Thu, Aug 22, 2024 at 8:15 PM shveta malik <shveta.malik@gmail.com> wrote: > > On Wed, Aug 21, 2024 at 4:08 PM Nisha Moond <nisha.moond412@gmail.com> wrote: > > > > The patches have been rebased on the latest pgHead following the merge > > of the conflict detection patch [1]. > > Thanks for working on patches. > > Summarizing the issues which need some suggestions/thoughts. > > 1) > For subscription based resolvers, currently the syntax implemented is: > > 1a) > CREATE SUBSCRIPTION <subname> > CONNECTION <conninfo> PUBLICATION <pubname> > CONFLICT RESOLVER > (conflict_type1 = resolver1, conflict_type2 = resolver2, > conflict_type3 = resolver3,...); > > 1b) > ALTER SUBSCRIPTION <subname> CONFLICT RESOLVER > (conflict_type1 = resolver1, conflict_type2 = resolver2, > conflict_type3 = resolver3,...); > > Earlier the syntax suggested in [1] was: > CREATE SUBSCRIPTION <subname> CONNECTION <conninfo> PUBLICATION <pubname> > CONFLICT RESOLVER 'conflict_resolver1' FOR 'conflict_type1', > CONFLICT RESOLVER 'conflict_resolver2' FOR 'conflict_type2'; > > I think the currently implemented syntax is good as it has less > repetition, unless others think otherwise. > > ~~ > > 2) > For subscription based resolvers, do we need a RESET command to reset > resolvers to default? Any one of below or both? > > 2a) reset all at once: > ALTER SUBSCRIPTION <name> RESET CONFLICT RESOLVERS > > 2b) reset one at a time: > ALTER SUBSCRIPTION <name> RESET CONFLICT RESOLVER for 'conflict_type'; > > The issue I see here is, to implement 1a and 1b, we have introduced > the 'RESOLVER' keyword. If we want to implement 2a, we will have to > introduce the 'RESOLVERS' keyword as well. But we can come up with > some alternative syntax if we plan to implement these. Thoughts? > Hi Shveta, I felt it would be better to keep the syntax similar to the existing INSERT ... ON CONFLICT [1]. I'd suggest a syntax like this: ... ON CONFLICT ['conflict_type'] DO { 'conflict_action' | DEFAULT } ~~~ e.g. To configure conflict resolvers for the SUBSCRIPTION: CREATE SUBSCRIPTION subname CONNECTION coninfo PUBLICATION pubname ON CONFLICT 'conflict_type1' DO 'conflict_action1', ON CONFLICT 'conflict_type2' DO 'conflict_action2'; Likewise, for ALTER: ALTER SUBSCRIPTION <subname> ON CONFLICT 'conflict_type1' DO 'conflict_action1', ON CONFLICT 'conflict_type2' DO 'conflict_action2'; To RESET all at once: ALTER SUBSCRIPTION <subname> ON CONFLICT DO DEFAULT; And, to RESET one at a time: ALTER SUBSCRIPTION <subname> ON CONFLICT 'conflict_type1' DO DEFAULT; ~~~ Although your list format "('conflict_type1' = 'conflict_action1', 'conflict_type2' = 'conflict_action2')" is clear and without repetition, I predict this terse style could end up being troublesome because it does not offer much flexibility for whatever the future might hold for CDR. e.g. ability to handle the conflict with a user-defined resolver e.g. ability to handle the conflict conditionally (e.g. with a WHERE clause...) e.g. ability to handle all conflicts with a common resolver etc. ~~~~ Advantages of my suggestion: - Close to existing SQL syntax - No loss of clarity by removing the word "RESOLVER" - No requirement for new keyword/s - The commands now read more like English - Offers more flexibility for any unknown future requirements - The setup (via create subscription) and the alter/reset all look the same. ====== [1] https://www.postgresql.org/docs/current/sql-insert.html#SQL-ON-CONFLICT Kind Regards, Peter Smith. Fujitsu Australia
On Wed, Aug 21, 2024 at 4:08 PM Nisha Moond <nisha.moond412@gmail.com> wrote: > > The patches have been rebased on the latest pgHead following the merge > of the conflict detection patch [1]. The detect_conflict option has > been removed, and conflict detection is now enabled by default. This > change required the following updates in resolver patches: > patch-0001: > - Removed dependency on the detect_conflict option. Now, default > conflict resolvers are set on CREATE SUBSCRIPTION if no values are > provided. > - To keep the behavior unchanged, the default resolvers are now set as - > insert_exists = error > update_exists = error > update_differ = apply_remote > update_missing = skip > delete_missing = skip > delete_differ = apply_remote > - Added documentation for conflict resolvers. > > patch-0002: > - Removed dependency on the detect_conflict option. > - Updated test cases in 034_conflict_resolver.pl to reflect new > default resolvers and the removal of the detect_conflict option. > > patch-0003: > - Implemented resolver for the update_exists conflict type. Supported > resolvers are: apply_remote, keep_local, error. > Thanks Nisha for the patches, I was running some tests on update_exists and found this case wherein it misses to LOG one conflict out of 3. create table tab (a int primary key, b int unique, c int unique); Pub: insert into tab values (1,1,1); Sub: insert into tab values (2,20,30); insert into tab values (3,40,50); insert into tab values (4,60,70); Pub: update tab set a=2,b=40,c=70 where a=1; Here it logs update_exists conflict and the resolution for Key (b)=(40) and Key (c)=(70) but misses to LOG first one which is with Key (a)=(2). thanks Shveta
On Mon, Aug 26, 2024 at 7:28 AM Peter Smith <smithpb2250@gmail.com> wrote: > > On Thu, Aug 22, 2024 at 8:15 PM shveta malik <shveta.malik@gmail.com> wrote: > > > > On Wed, Aug 21, 2024 at 4:08 PM Nisha Moond <nisha.moond412@gmail.com> wrote: > > > > > > The patches have been rebased on the latest pgHead following the merge > > > of the conflict detection patch [1]. > > > > Thanks for working on patches. > > > > Summarizing the issues which need some suggestions/thoughts. > > > > 1) > > For subscription based resolvers, currently the syntax implemented is: > > > > 1a) > > CREATE SUBSCRIPTION <subname> > > CONNECTION <conninfo> PUBLICATION <pubname> > > CONFLICT RESOLVER > > (conflict_type1 = resolver1, conflict_type2 = resolver2, > > conflict_type3 = resolver3,...); > > > > 1b) > > ALTER SUBSCRIPTION <subname> CONFLICT RESOLVER > > (conflict_type1 = resolver1, conflict_type2 = resolver2, > > conflict_type3 = resolver3,...); > > > > Earlier the syntax suggested in [1] was: > > CREATE SUBSCRIPTION <subname> CONNECTION <conninfo> PUBLICATION <pubname> > > CONFLICT RESOLVER 'conflict_resolver1' FOR 'conflict_type1', > > CONFLICT RESOLVER 'conflict_resolver2' FOR 'conflict_type2'; > > > > I think the currently implemented syntax is good as it has less > > repetition, unless others think otherwise. > > > > ~~ > > > > 2) > > For subscription based resolvers, do we need a RESET command to reset > > resolvers to default? Any one of below or both? > > > > 2a) reset all at once: > > ALTER SUBSCRIPTION <name> RESET CONFLICT RESOLVERS > > > > 2b) reset one at a time: > > ALTER SUBSCRIPTION <name> RESET CONFLICT RESOLVER for 'conflict_type'; > > > > The issue I see here is, to implement 1a and 1b, we have introduced > > the 'RESOLVER' keyword. If we want to implement 2a, we will have to > > introduce the 'RESOLVERS' keyword as well. But we can come up with > > some alternative syntax if we plan to implement these. Thoughts? > > > > Hi Shveta, > > I felt it would be better to keep the syntax similar to the existing > INSERT ... ON CONFLICT [1]. > > I'd suggest a syntax like this: > > ... ON CONFLICT ['conflict_type'] DO { 'conflict_action' | DEFAULT } > > ~~~ > > e.g. > > To configure conflict resolvers for the SUBSCRIPTION: > > CREATE SUBSCRIPTION subname CONNECTION coninfo PUBLICATION pubname > ON CONFLICT 'conflict_type1' DO 'conflict_action1', > ON CONFLICT 'conflict_type2' DO 'conflict_action2'; > > Likewise, for ALTER: > > ALTER SUBSCRIPTION <subname> > ON CONFLICT 'conflict_type1' DO 'conflict_action1', > ON CONFLICT 'conflict_type2' DO 'conflict_action2'; > > To RESET all at once: > > ALTER SUBSCRIPTION <subname> > ON CONFLICT DO DEFAULT; > > And, to RESET one at a time: > > ALTER SUBSCRIPTION <subname> > ON CONFLICT 'conflict_type1' DO DEFAULT; > Thanks for the suggestion. The idea looks good to me. But we need to once check the complexity involved in its implementation in gram.y. Initial analysis says that it will need something like 'action' which we have for ALTER TABLE command ([1]) to have these multiple subcommands implemented. For INSERT case, it is a just a subclause but for create/alter sub we hill have it multiple times under one command. Let us review. Also I would like to know opinion of others on this. [1]: https://www.postgresql.org/docs/current/sql-altertable.html > > Although your list format "('conflict_type1' = 'conflict_action1', > 'conflict_type2' = 'conflict_action2')" is clear and without > repetition, I predict this terse style could end up being troublesome > because it does not offer much flexibility for whatever the future > might hold for CDR. > > e.g. ability to handle the conflict with a user-defined resolver > e.g. ability to handle the conflict conditionally (e.g. with a WHERE clause...) > e.g. ability to handle all conflicts with a common resolver > etc. > > ~~~~ > > Advantages of my suggestion: > - Close to existing SQL syntax > - No loss of clarity by removing the word "RESOLVER" > - No requirement for new keyword/s > - The commands now read more like English > - Offers more flexibility for any unknown future requirements > - The setup (via create subscription) and the alter/reset all look the same. > > ====== > [1] https://www.postgresql.org/docs/current/sql-insert.html#SQL-ON-CONFLICT > > Kind Regards, > Peter Smith. > Fujitsu Australia
On Thu, Aug 22, 2024 at 3:45 PM shveta malik <shveta.malik@gmail.com> wrote: > > On Wed, Aug 21, 2024 at 4:08 PM Nisha Moond <nisha.moond412@gmail.com> wrote: > > > > The patches have been rebased on the latest pgHead following the merge > > of the conflict detection patch [1]. > > Thanks for working on patches. > > Summarizing the issues which need some suggestions/thoughts. > > 1) > For subscription based resolvers, currently the syntax implemented is: > > 1a) > CREATE SUBSCRIPTION <subname> > CONNECTION <conninfo> PUBLICATION <pubname> > CONFLICT RESOLVER > (conflict_type1 = resolver1, conflict_type2 = resolver2, > conflict_type3 = resolver3,...); > > 1b) > ALTER SUBSCRIPTION <subname> CONFLICT RESOLVER > (conflict_type1 = resolver1, conflict_type2 = resolver2, > conflict_type3 = resolver3,...); > > Earlier the syntax suggested in [1] was: > CREATE SUBSCRIPTION <subname> CONNECTION <conninfo> PUBLICATION <pubname> > CONFLICT RESOLVER 'conflict_resolver1' FOR 'conflict_type1', > CONFLICT RESOLVER 'conflict_resolver2' FOR 'conflict_type2'; > > I think the currently implemented syntax is good as it has less > repetition, unless others think otherwise. > > ~~ > > 2) > For subscription based resolvers, do we need a RESET command to reset > resolvers to default? Any one of below or both? > > 2a) reset all at once: > ALTER SUBSCRIPTION <name> RESET CONFLICT RESOLVERS > > 2b) reset one at a time: > ALTER SUBSCRIPTION <name> RESET CONFLICT RESOLVER for 'conflict_type'; > > The issue I see here is, to implement 1a and 1b, we have introduced > the 'RESOLVER' keyword. If we want to implement 2a, we will have to > introduce the 'RESOLVERS' keyword as well. But we can come up with > some alternative syntax if we plan to implement these. Thoughts? > > ~~ > > 3) Regarding update_exists: > > 3a) > Currently update_exists resolver patch is kept separate. The reason > being, it performs resolution which will need deletion of multiple > rows. It will be good to discuss if we want to target this in the > first draft. Please see the example: > > create table tab (a int primary key, b int unique, c int unique); > > Pub: insert into tab values (1,1,1); > Sub: > insert into tab values (2,20,30); > insert into tab values (3,40,50); > insert into tab values (4,60,70); > > Pub: update tab set a=2,b=40,c=70 where a=1; > > The above 'update' on pub will result in 'update_exists' on sub and if > resolution is in favour of 'apply', then it will conflict with all the > three local rows of subscriber due to unique constraint present on all > three columns. Thus in order to resolve the conflict, it will have to > delete these 3 rows on sub: > > 2,20,30 > 3,40,50 > 4,60,70 > and then update 1,1,1 to 2,40,70. > > Just need opinion on if we shall target this in the initial draft. > > 3b) > If we plan to implement this, we need to work on optimal design where > we can find all the conflicting rows at once and delete those. > Currently the implementation has been done using recursion i.e. find > one conflicting row, then delete it and then next and so on i.e. we > call apply_handle_update_internal() recursively. On initial code > review, I feel it is doable to scan all indexes at once and get > conflicting-tuple-ids in one go and get rid of recursion. It can be > attempted once we decide on 3a. > > ~~ > > 4) > Now for insert_exists and update_exists, we are doing a pre-scan of > all unique indexes to find conflict. Also there is post-scan to figure > out if the conflicting row is inserted meanwhile. This needs to be > reviewed for optimization. We need to avoid pre-scan wherever > possible. I think the only case for which it can be avoided is > 'ERROR'. For the cases where resolver is in favor of remote-apply, we > need to check conflict beforehand to avoid rollback of already > inserted data. And for the case where resolver is in favor of skipping > the change, then too we should know beforehand about the conflict to > avoid heap-insertion and rollback. Thoughts? > +1 to the idea of optimization, but it seems that when the resolver is set to ERROR, skipping the pre-scan only optimizes the case where no conflict exists. If a conflict is found, the apply-worker will error out during the pre-scan, and no post-scan occurs, so there's no opportunity for optimization. However, if no conflict is present, we currently do both pre-scan and post-scan. Skipping the pre-scan in this scenario could be a worthwhile optimization, even if it only benefits the no-conflict case. -- Thanks, Nisha
On Thu, Aug 22, 2024 at 3:45 PM shveta malik <shveta.malik@gmail.com> wrote: > > On Wed, Aug 21, 2024 at 4:08 PM Nisha Moond <nisha.moond412@gmail.com> wrote: > > > > The patches have been rebased on the latest pgHead following the merge > > of the conflict detection patch [1]. > > Thanks for working on patches. > > Summarizing the issues which need some suggestions/thoughts. > > 1) > For subscription based resolvers, currently the syntax implemented is: > > 1a) > CREATE SUBSCRIPTION <subname> > CONNECTION <conninfo> PUBLICATION <pubname> > CONFLICT RESOLVER > (conflict_type1 = resolver1, conflict_type2 = resolver2, > conflict_type3 = resolver3,...); > > 1b) > ALTER SUBSCRIPTION <subname> CONFLICT RESOLVER > (conflict_type1 = resolver1, conflict_type2 = resolver2, > conflict_type3 = resolver3,...); > > Earlier the syntax suggested in [1] was: > CREATE SUBSCRIPTION <subname> CONNECTION <conninfo> PUBLICATION <pubname> > CONFLICT RESOLVER 'conflict_resolver1' FOR 'conflict_type1', > CONFLICT RESOLVER 'conflict_resolver2' FOR 'conflict_type2'; > > I think the currently implemented syntax is good as it has less > repetition, unless others think otherwise. > > ~~ > > 2) > For subscription based resolvers, do we need a RESET command to reset > resolvers to default? Any one of below or both? > > 2a) reset all at once: > ALTER SUBSCRIPTION <name> RESET CONFLICT RESOLVERS > > 2b) reset one at a time: > ALTER SUBSCRIPTION <name> RESET CONFLICT RESOLVER for 'conflict_type'; > > The issue I see here is, to implement 1a and 1b, we have introduced > the 'RESOLVER' keyword. If we want to implement 2a, we will have to > introduce the 'RESOLVERS' keyword as well. But we can come up with > some alternative syntax if we plan to implement these. Thoughts? > It makes sense to have a RESET on the lines of (a) and (b). At this stage, we should do minimal in extending the syntax. How about RESET CONFLICT RESOLVER ALL for (a)? > ~~ > > 3) Regarding update_exists: > > 3a) > Currently update_exists resolver patch is kept separate. The reason > being, it performs resolution which will need deletion of multiple > rows. It will be good to discuss if we want to target this in the > first draft. Please see the example: > > create table tab (a int primary key, b int unique, c int unique); > > Pub: insert into tab values (1,1,1); > Sub: > insert into tab values (2,20,30); > insert into tab values (3,40,50); > insert into tab values (4,60,70); > > Pub: update tab set a=2,b=40,c=70 where a=1; > > The above 'update' on pub will result in 'update_exists' on sub and if > resolution is in favour of 'apply', then it will conflict with all the > three local rows of subscriber due to unique constraint present on all > three columns. Thus in order to resolve the conflict, it will have to > delete these 3 rows on sub: > > 2,20,30 > 3,40,50 > 4,60,70 > and then update 1,1,1 to 2,40,70. > > Just need opinion on if we shall target this in the initial draft. > This case looks a bit complicated. It seems there is no other alternative than to delete the multiple rows. It is better to create a separate top-up patch for this and we can discuss in detail about this once the basic patch is in better shape. > 3b) > If we plan to implement this, we need to work on optimal design where > we can find all the conflicting rows at once and delete those. > Currently the implementation has been done using recursion i.e. find > one conflicting row, then delete it and then next and so on i.e. we > call apply_handle_update_internal() recursively. On initial code > review, I feel it is doable to scan all indexes at once and get > conflicting-tuple-ids in one go and get rid of recursion. It can be > attempted once we decide on 3a. > I suggest following the simplest strategy (even if that means calling the update function recursively) by adding comments on the optimal strategy. We can optimize it later as well. > ~~ > > 4) > Now for insert_exists and update_exists, we are doing a pre-scan of > all unique indexes to find conflict. Also there is post-scan to figure > out if the conflicting row is inserted meanwhile. This needs to be > reviewed for optimization. We need to avoid pre-scan wherever > possible. I think the only case for which it can be avoided is > 'ERROR'. For the cases where resolver is in favor of remote-apply, we > need to check conflict beforehand to avoid rollback of already > inserted data. And for the case where resolver is in favor of skipping > the change, then too we should know beforehand about the conflict to > avoid heap-insertion and rollback. Thoughts? > It makes sense to skip the pre-scan wherever possible. Your analysis sounds reasonable to me. > ~~ > > 5) > Currently we only capture update_missing conflict i.e. we are not > distinguishing between the missing row and the deleted row. We had > discussed this in the past a couple of times. If we plan to target it > in draft 1, I can dig up all old emails and resume discussion on this. > This is a separate conflict detection project in itself. I am thinking about the solution to this problem. We will talk about this in a separate thread. > ~~ > > 6) > Table-level resolves. There was a suggestion earlier to implement > table-level resolvers. The patch has been implemented to some extent, > it can be completed and posted when we are done reviewing subscription > level resolvers. > Yeah, it makes sense to do it after the subscription-level resolution patch is ready. -- With Regards, Amit Kapila.
On Mon, Aug 26, 2024 at 7:28 AM Peter Smith <smithpb2250@gmail.com> wrote: > > On Thu, Aug 22, 2024 at 8:15 PM shveta malik <shveta.malik@gmail.com> wrote: > > > > Hi Shveta, > > I felt it would be better to keep the syntax similar to the existing > INSERT ... ON CONFLICT [1]. > > I'd suggest a syntax like this: > > ... ON CONFLICT ['conflict_type'] DO { 'conflict_action' | DEFAULT } > > ~~~ > > e.g. > > To configure conflict resolvers for the SUBSCRIPTION: > > CREATE SUBSCRIPTION subname CONNECTION coninfo PUBLICATION pubname > ON CONFLICT 'conflict_type1' DO 'conflict_action1', > ON CONFLICT 'conflict_type2' DO 'conflict_action2'; > One thing that looks odd to me about this is the resolution part of it. For example, ON CONFLICT 'insert_exists' DO 'keep_local'. The action part doesn't go well without being explicit that it is a resolution method. Another variant could be ON CONFLICT 'insert_exists' USE RESOLUTION [METHOD] 'keep_local'. I think we can keep all these syntax alternatives either in the form of comments or in the commit message and discuss more on these once we agree on the solutions to the key design issues pointed out by Shveta. -- With Regards, Amit Kapila.
On Mon, Aug 26, 2024 at 2:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Aug 22, 2024 at 3:45 PM shveta malik <shveta.malik@gmail.com> wrote: > > > > On Wed, Aug 21, 2024 at 4:08 PM Nisha Moond <nisha.moond412@gmail.com> wrote: > > > > > > The patches have been rebased on the latest pgHead following the merge > > > of the conflict detection patch [1]. > > > > Thanks for working on patches. > > > > Summarizing the issues which need some suggestions/thoughts. > > > > 1) > > For subscription based resolvers, currently the syntax implemented is: > > > > 1a) > > CREATE SUBSCRIPTION <subname> > > CONNECTION <conninfo> PUBLICATION <pubname> > > CONFLICT RESOLVER > > (conflict_type1 = resolver1, conflict_type2 = resolver2, > > conflict_type3 = resolver3,...); > > > > 1b) > > ALTER SUBSCRIPTION <subname> CONFLICT RESOLVER > > (conflict_type1 = resolver1, conflict_type2 = resolver2, > > conflict_type3 = resolver3,...); > > > > Earlier the syntax suggested in [1] was: > > CREATE SUBSCRIPTION <subname> CONNECTION <conninfo> PUBLICATION <pubname> > > CONFLICT RESOLVER 'conflict_resolver1' FOR 'conflict_type1', > > CONFLICT RESOLVER 'conflict_resolver2' FOR 'conflict_type2'; > > > > I think the currently implemented syntax is good as it has less > > repetition, unless others think otherwise. > > > > ~~ > > > > 2) > > For subscription based resolvers, do we need a RESET command to reset > > resolvers to default? Any one of below or both? > > > > 2a) reset all at once: > > ALTER SUBSCRIPTION <name> RESET CONFLICT RESOLVERS > > > > 2b) reset one at a time: > > ALTER SUBSCRIPTION <name> RESET CONFLICT RESOLVER for 'conflict_type'; > > > > The issue I see here is, to implement 1a and 1b, we have introduced > > the 'RESOLVER' keyword. If we want to implement 2a, we will have to > > introduce the 'RESOLVERS' keyword as well. But we can come up with > > some alternative syntax if we plan to implement these. Thoughts? > > > > It makes sense to have a RESET on the lines of (a) and (b). At this > stage, we should do minimal in extending the syntax. How about RESET > CONFLICT RESOLVER ALL for (a)? Yes, the syntax looks good. > > ~~ > > > > 3) Regarding update_exists: > > > > 3a) > > Currently update_exists resolver patch is kept separate. The reason > > being, it performs resolution which will need deletion of multiple > > rows. It will be good to discuss if we want to target this in the > > first draft. Please see the example: > > > > create table tab (a int primary key, b int unique, c int unique); > > > > Pub: insert into tab values (1,1,1); > > Sub: > > insert into tab values (2,20,30); > > insert into tab values (3,40,50); > > insert into tab values (4,60,70); > > > > Pub: update tab set a=2,b=40,c=70 where a=1; > > > > The above 'update' on pub will result in 'update_exists' on sub and if > > resolution is in favour of 'apply', then it will conflict with all the > > three local rows of subscriber due to unique constraint present on all > > three columns. Thus in order to resolve the conflict, it will have to > > delete these 3 rows on sub: > > > > 2,20,30 > > 3,40,50 > > 4,60,70 > > and then update 1,1,1 to 2,40,70. > > > > Just need opinion on if we shall target this in the initial draft. > > > > This case looks a bit complicated. It seems there is no other > alternative than to delete the multiple rows. It is better to create a > separate top-up patch for this and we can discuss in detail about this > once the basic patch is in better shape. Agreed. > > > 3b) > > If we plan to implement this, we need to work on optimal design where > > we can find all the conflicting rows at once and delete those. > > Currently the implementation has been done using recursion i.e. find > > one conflicting row, then delete it and then next and so on i.e. we > > call apply_handle_update_internal() recursively. On initial code > > review, I feel it is doable to scan all indexes at once and get > > conflicting-tuple-ids in one go and get rid of recursion. It can be > > attempted once we decide on 3a. > > > > I suggest following the simplest strategy (even if that means calling > the update function recursively) by adding comments on the optimal > strategy. We can optimize it later as well. Sure. > > > ~~ > > > > 4) > > Now for insert_exists and update_exists, we are doing a pre-scan of > > all unique indexes to find conflict. Also there is post-scan to figure > > out if the conflicting row is inserted meanwhile. This needs to be > > reviewed for optimization. We need to avoid pre-scan wherever > > possible. I think the only case for which it can be avoided is > > 'ERROR'. For the cases where resolver is in favor of remote-apply, we > > need to check conflict beforehand to avoid rollback of already > > inserted data. And for the case where resolver is in favor of skipping > > the change, then too we should know beforehand about the conflict to > > avoid heap-insertion and rollback. Thoughts? > > > > It makes sense to skip the pre-scan wherever possible. Your analysis > sounds reasonable to me. > > > ~~ > > > > 5) > > Currently we only capture update_missing conflict i.e. we are not > > distinguishing between the missing row and the deleted row. We had > > discussed this in the past a couple of times. If we plan to target it > > in draft 1, I can dig up all old emails and resume discussion on this. > > > > This is a separate conflict detection project in itself. I am thinking > about the solution to this problem. We will talk about this in a > separate thread. > > > ~~ > > > > 6) > > Table-level resolves. There was a suggestion earlier to implement > > table-level resolvers. The patch has been implemented to some extent, > > it can be completed and posted when we are done reviewing subscription > > level resolvers. > > > > Yeah, it makes sense to do it after the subscription-level resolution > patch is ready. > > -- > With Regards, > Amit Kapila.
On Mon, Aug 26, 2024 at 9:05 AM shveta malik <shveta.malik@gmail.com> wrote: > > On Wed, Aug 21, 2024 at 4:08 PM Nisha Moond <nisha.moond412@gmail.com> wrote: > > > > The patches have been rebased on the latest pgHead following the merge > > of the conflict detection patch [1]. The detect_conflict option has > > been removed, and conflict detection is now enabled by default. This > > change required the following updates in resolver patches: > > patch-0001: > > - Removed dependency on the detect_conflict option. Now, default > > conflict resolvers are set on CREATE SUBSCRIPTION if no values are > > provided. > > - To keep the behavior unchanged, the default resolvers are now set as - > > insert_exists = error > > update_exists = error > > update_differ = apply_remote > > update_missing = skip > > delete_missing = skip > > delete_differ = apply_remote > > - Added documentation for conflict resolvers. > > > > patch-0002: > > - Removed dependency on the detect_conflict option. > > - Updated test cases in 034_conflict_resolver.pl to reflect new > > default resolvers and the removal of the detect_conflict option. > > > > patch-0003: > > - Implemented resolver for the update_exists conflict type. Supported > > resolvers are: apply_remote, keep_local, error. > > > > Thanks Nisha for the patches, I was running some tests on > update_exists and found this case wherein it misses to LOG one > conflict out of 3. > > create table tab (a int primary key, b int unique, c int unique); > Pub: insert into tab values (1,1,1); > > Sub: > insert into tab values (2,20,30); > insert into tab values (3,40,50); > insert into tab values (4,60,70); > > Pub: update tab set a=2,b=40,c=70 where a=1; > > Here it logs update_exists conflict and the resolution for Key > (b)=(40) and Key (c)=(70) but misses to LOG first one which is with > Key (a)=(2). > Fixed. Thanks, Nisha
On Mon, Aug 26, 2024 at 2:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Aug 22, 2024 at 3:45 PM shveta malik <shveta.malik@gmail.com> wrote: > > > > On Wed, Aug 21, 2024 at 4:08 PM Nisha Moond <nisha.moond412@gmail.com> wrote: > > > > > > The patches have been rebased on the latest pgHead following the merge > > > of the conflict detection patch [1]. > > > > Thanks for working on patches. > > > > Summarizing the issues which need some suggestions/thoughts. > > > > 1) > > For subscription based resolvers, currently the syntax implemented is: > > > > 1a) > > CREATE SUBSCRIPTION <subname> > > CONNECTION <conninfo> PUBLICATION <pubname> > > CONFLICT RESOLVER > > (conflict_type1 = resolver1, conflict_type2 = resolver2, > > conflict_type3 = resolver3,...); > > > > 1b) > > ALTER SUBSCRIPTION <subname> CONFLICT RESOLVER > > (conflict_type1 = resolver1, conflict_type2 = resolver2, > > conflict_type3 = resolver3,...); > > > > Earlier the syntax suggested in [1] was: > > CREATE SUBSCRIPTION <subname> CONNECTION <conninfo> PUBLICATION <pubname> > > CONFLICT RESOLVER 'conflict_resolver1' FOR 'conflict_type1', > > CONFLICT RESOLVER 'conflict_resolver2' FOR 'conflict_type2'; > > > > I think the currently implemented syntax is good as it has less > > repetition, unless others think otherwise. > > > > ~~ > > > > 2) > > For subscription based resolvers, do we need a RESET command to reset > > resolvers to default? Any one of below or both? > > > > 2a) reset all at once: > > ALTER SUBSCRIPTION <name> RESET CONFLICT RESOLVERS > > > > 2b) reset one at a time: > > ALTER SUBSCRIPTION <name> RESET CONFLICT RESOLVER for 'conflict_type'; > > > > The issue I see here is, to implement 1a and 1b, we have introduced > > the 'RESOLVER' keyword. If we want to implement 2a, we will have to > > introduce the 'RESOLVERS' keyword as well. But we can come up with > > some alternative syntax if we plan to implement these. Thoughts? > > > > It makes sense to have a RESET on the lines of (a) and (b). At this > stage, we should do minimal in extending the syntax. How about RESET > CONFLICT RESOLVER ALL for (a)? > > > ~~ > > > > 3) Regarding update_exists: > > > > 3a) > > Currently update_exists resolver patch is kept separate. The reason > > being, it performs resolution which will need deletion of multiple > > rows. It will be good to discuss if we want to target this in the > > first draft. Please see the example: > > > > create table tab (a int primary key, b int unique, c int unique); > > > > Pub: insert into tab values (1,1,1); > > Sub: > > insert into tab values (2,20,30); > > insert into tab values (3,40,50); > > insert into tab values (4,60,70); > > > > Pub: update tab set a=2,b=40,c=70 where a=1; > > > > The above 'update' on pub will result in 'update_exists' on sub and if > > resolution is in favour of 'apply', then it will conflict with all the > > three local rows of subscriber due to unique constraint present on all > > three columns. Thus in order to resolve the conflict, it will have to > > delete these 3 rows on sub: > > > > 2,20,30 > > 3,40,50 > > 4,60,70 > > and then update 1,1,1 to 2,40,70. > > > > Just need opinion on if we shall target this in the initial draft. > > > > This case looks a bit complicated. It seems there is no other > alternative than to delete the multiple rows. It is better to create a > separate top-up patch for this and we can discuss in detail about this > once the basic patch is in better shape. v9 onwards the patch-0003 is a separate top-up patch implementing update_exists. > > 3b) > > If we plan to implement this, we need to work on optimal design where > > we can find all the conflicting rows at once and delete those. > > Currently the implementation has been done using recursion i.e. find > > one conflicting row, then delete it and then next and so on i.e. we > > call apply_handle_update_internal() recursively. On initial code > > review, I feel it is doable to scan all indexes at once and get > > conflicting-tuple-ids in one go and get rid of recursion. It can be > > attempted once we decide on 3a. > > > > I suggest following the simplest strategy (even if that means calling > the update function recursively) by adding comments on the optimal > strategy. We can optimize it later as well. > > > ~~ > > > > 4) > > Now for insert_exists and update_exists, we are doing a pre-scan of > > all unique indexes to find conflict. Also there is post-scan to figure > > out if the conflicting row is inserted meanwhile. This needs to be > > reviewed for optimization. We need to avoid pre-scan wherever > > possible. I think the only case for which it can be avoided is > > 'ERROR'. For the cases where resolver is in favor of remote-apply, we > > need to check conflict beforehand to avoid rollback of already > > inserted data. And for the case where resolver is in favor of skipping > > the change, then too we should know beforehand about the conflict to > > avoid heap-insertion and rollback. Thoughts? > > > > It makes sense to skip the pre-scan wherever possible. Your analysis > sounds reasonable to me. > Done. -- Thanks, Nisha
On Fri, Aug 23, 2024 at 10:39 AM shveta malik <shveta.malik@gmail.com> wrote: > > On Thu, Aug 22, 2024 at 3:44 PM shveta malik <shveta.malik@gmail.com> wrote: > > > > > > For clock-skew and timestamp based resolution, if needed, I will post > > another email for the design items where suggestions are needed. > > > > Please find issues which need some thoughts and approval for > time-based resolution and clock-skew. > > 1) > Time based conflict resolution and two phase transactions: > > Time based conflict resolution (last_update_wins) is the one > resolution which will not result in data-divergence considering > clock-skew is taken care of. But when it comes to two-phase > transactions, it might not be the case. For two-phase transaction, we > do not have commit timestamp when the changes are being applied. Thus > for time-based comparison, initially it was decided to user prepare > timestamp but it may result in data-divergence. Please see the > example at [1]. > > Example at [1] is a tricky situation, and thus in the initial draft, > we decided to restrict usage of 2pc and CDR together. The plan is: > > a) During Create subscription, if the user has given last_update_wins > resolver for any conflict_type and 'two_phase' is also enabled, we > ERROR out. > b) During Alter subscription, if the user tries to update resolver to > 'last_update_wins' but 'two_phase' is enabled, we error out. > > Another solution could be to save both prepare_ts and commit_ts. And > when any txn comes for conflict resolution, we first check if > prepare_ts is available, use that else use commit_ts. Availability of > prepare_ts would indicate it was a prepared txn and thus even if it is > committed, we should use prepare_ts for comparison for consistency. > This will have some overhead of storing prepare_ts along with > commit_ts. But if the number of prepared txns are reasonably small, > this overhead should be less. > > We currently plan to go with restricting 2pc and last_update_wins > together, unless others have different opinions. > > ~~ > > 2) > parallel apply worker and conflict-resolution: > As discussed in [2] (see last paragraph in [2]), for streaming of > in-progress transactions by parallel worker, we do not have > commit-timestamp with each change and thus it makes sense to disable > parallel apply worker with CDR. The plan is to not start parallel > apply worker if 'last_update_wins' is configured for any > conflict_type. > > ~~ > > 3) > parallel apply worker and clock skew management: > Regarding clock-skew management as discussed in [3], we will wait for > the local clock to come within tolerable range during 'begin' rather > than before 'commit'. And this wait needs commit-timestamp in the > beginning, thus we plan to restrict starting pa-worker even when > clock-skew related GUCs are configured. > > Earlier we had restricted both 2pc and parallel worker worker start > when detect_conflict was enabled, but now since detect_conflict > parameter is removed, we will change the implementation to restrict > all 3 above cases when last_update_wins is configured. When the > changes are done, we will post the patch. > > ~~ > > 4) > <not related to timestamp and clock skew> > Earlier when 'detect_conflict' was enabled, we were giving WARNING if > 'track_commit_timestamp' was not enabled. This was during CREATE and > ALTER subscription. Now with this parameter removed, this WARNING has > also been removed. But I think we need to bring back this WARNING. > Currently default resolvers set may work without > 'track_commit_timestamp' but when user gives CONFLICT RESOLVER in > create-sub or alter-sub explicitly making them configured to > non-default values (or say any values, does not matter if few are > defaults), we may still emit this warning to alert user: > > 2024-07-26 09:14:03.152 IST [195415] WARNING: conflict detection > could be incomplete due to disabled track_commit_timestamp > 2024-07-26 09:14:03.152 IST [195415] DETAIL: Conflicts update_differ > and delete_differ cannot be detected, and the origin and commit > timestamp for the local row will not be logged. > > Thoughts? > > If we emit this WARNING during each resolution, then it may flood our > log files, thus it seems better to emit it during create or alter > subscription instead of during resolution. > > Done. v10 has implemented the suggested warning when a user gives CONFLICT RESOLVER in create-sub or alter-sub explicitly. Thanks, Nisha
On Tue, Aug 27, 2024 at 1:51 PM Nisha Moond <nisha.moond412@gmail.com> wrote: > > Please find v10 patch-set. Changes are: > > 1) patch-001: > - Corrected a patch application warning. > - Added support for pg_dump. > - As suggested in pt.4 of [1]: added a warning during CREATE and > ALTER subscription when track_commit_timestamp is OFF. > > 2) patch-002 & patch-003: > - Reduced code duplication in execReplication.c > - As suggested in pt.4 of [2]: Optimized the pre-scan for > insert_exists and update_exists cases when resolver is set to ERROR. > - Fixed a bug reported by Shveta in [3] > > Thank You Ajin for working on pg_dump support changes. > Thank You for the patches. Few comments for pg_dump 1) If there are multiple subscriptions with different resolver configuration, pg_dump currently dumps resolver in different orders for each subscription. It is not a problem, but it will be better to have it in the same order. We can have an order-by in the pg_dump's code while querying resolvers. 2) Currently pg_dump is dumping even the default resolvers configuration. As an example if I have not changed default configuration for say sub1, it still dumps all: CREATE SUBSCRIPTION sub1 CONNECTION '..' PUBLICATION pub1 WITH (....) CONFLICT RESOLVER (insert_exists = 'error', update_differ = 'apply_remote', update_exists = 'error', update_missing = 'skip', delete_differ = 'apply_remote', delete_missing = 'skip'); I am not sure if we need to dump default resolvers. Would like to know what others think on this. 3) Why in 002_pg_dump.pl we have default resolvers set explicitly? thanks Shveta
On Wed, Aug 28, 2024 at 2:27 PM shveta malik <shveta.malik@gmail.com> wrote:
2)
Currently pg_dump is dumping even the default resolvers configuration.
As an example if I have not changed default configuration for say
sub1, it still dumps all:
CREATE SUBSCRIPTION sub1 CONNECTION '..' PUBLICATION pub1 WITH (....)
CONFLICT RESOLVER (insert_exists = 'error', update_differ =
'apply_remote', update_exists = 'error', update_missing = 'skip',
delete_differ = 'apply_remote', delete_missing = 'skip');
I am not sure if we need to dump default resolvers. Would like to know
what others think on this.
3)
Why in 002_pg_dump.pl we have default resolvers set explicitly?
In 003_pg_dump.pl, default resolvers are not set explicitly, that is the regexp to check the pg_dump generated command for creating subscriptions. This is again connected to your 2nd question.
regards,
Ajin Cherian
Fujitsu Australia
On Wed, Aug 28, 2024 at 10:30 AM Ajin Cherian <itsajin@gmail.com> wrote: > >> 2) >> Currently pg_dump is dumping even the default resolvers configuration. >> As an example if I have not changed default configuration for say >> sub1, it still dumps all: >> >> CREATE SUBSCRIPTION sub1 CONNECTION '..' PUBLICATION pub1 WITH (....) >> CONFLICT RESOLVER (insert_exists = 'error', update_differ = >> 'apply_remote', update_exists = 'error', update_missing = 'skip', >> delete_differ = 'apply_remote', delete_missing = 'skip'); >> >> I am not sure if we need to dump default resolvers. Would like to know >> what others think on this. >> >> 3) >> Why in 002_pg_dump.pl we have default resolvers set explicitly? >> > In 003_pg_dump.pl, default resolvers are not set explicitly, that is the regexp to check the pg_dump generated commandfor creating subscriptions. This is again connected to your 2nd question. Okay so we may not need this change if we plan to *not *dump defaults in pg_dump. Another point about 'defaults' is regarding insertion into the pg_subscription_conflict table. We currently do insert default resolvers into 'pg_subscription_conflict' even if the user has not explicitly configured them. I think it is okay to insert defaults there as the user will be able to know which resolver is picked for any conflict type. But again, I would like to know the thoughts of others on this. thanks Shveta
> On Wed, Aug 28, 2024 at 10:30 AM Ajin Cherian <itsajin@gmail.com> wrote: > > The review is WIP. Please find a few comments on patch001. 1) logical-repliction.sgmlL + Additional logging is triggered for specific conflict_resolvers. Users can also configure conflict_types while creating the subscription. Refer to section CONFLICT RESOLVERS for details on conflict_types and conflict_resolvers. Can we please change it to: Additional logging is triggered in various conflict scenarios, each identified as a conflict type. Users have the option to configure a conflict resolver for each conflict type when creating a subscription. For more information on the conflict types detected and the supported conflict resolvers, refer to the section <CONFLICT RESOLVERS> 2) SetSubConflictResolver + for (type = 0; type < resolvers_cnt; type++) 'type' does not look like the correct name here. The variable does not state conflict_type, it is instead a resolver-array-index, so please rename accordingly. Maybe idx or res_idx? 3) CreateSubscription(): + if (stmt->resolvers) + check_conflict_detection(); 3a) We can have a comment saying warn users if prerequisites are not met. 3b) Also, I do not find the name 'check_conflict_detection' appropriate. One suggestion could be 'conf_detection_check_prerequisites' (similar to replorigin_check_prerequisites) 3c) We can move the below comment after check_conflict_detection() as it makes more sense there. /* * Parse and check conflict resolvers. Initialize with default values */ 4) Should we allow repetition/duplicates of 'conflict_type=..' in CREATE and ALTER SUB? As an example: ALTER SUBSCRIPTION sub1 CONFLICT RESOLVER (insert_exists = 'apply_remote', insert_exists = 'error'); Such a repetition works for Create-Sub but gives some internal error for alter-sub. (ERROR: tuple already updated by self). Behaviour should be the same for both. And if we give an error, it should be some user understandable one. But I would like to know the opinions of others. Shall it give an error or the last one should be accepted as valid configuration in case of repetition? 5) GetAndValidateSubsConflictResolverList(): + ConflictTypeResolver *CTR = NULL; We can change the name to a more appropriate one similar to other variables. It need not be in all capital. thanks Shveta
On Wed, Aug 28, 2024 at 4:07 PM shveta malik <shveta.malik@gmail.com> wrote: > > > On Wed, Aug 28, 2024 at 10:30 AM Ajin Cherian <itsajin@gmail.com> wrote: > > > > > The review is WIP. Please find a few comments on patch001. > More comments on ptach001 in continuation of previous comments: 6) SetDefaultResolvers() can be called from parse_subscription_conflict_resolvers() itself. This will be similar to how parse_subscription_options() sets defaults internally. 7) parse_subscription_conflict_resolvers(): + if (!stmtresolvers) + return; I think we do not need the above, 'foreach' will take care of it. Since we do not have any logic after foreach, we should be good without the above check explicitly added. 8) I think SetSubConflictResolver() should be moved before replorigin_create(). We can insert resolver entries immediately after we insert subscription entries. 9) check_conflict_detection/conf_detection_check_prerequisites shall be moved to conflict.c file. 10) validate_conflict_type_and_resolver(): Please mention in header that: It returns an enum ConflictType corresponding to the conflict type string passed by the caller. 11) UpdateSubConflictResolvers(): 11a) Rename CTR similar to other variables. 11b) Please correct the header as we deal with multiple conflict-types in it instead of 1. Suggestion: Update the subscription's conflict resolvers in pg_subscription_conflict system catalog for the given conflict types. 12) SetSubConflictResolver(): 12a) I think we do not need 'replaces' during INSERT and thus this is not needed: + memset(replaces, false, sizeof(replaces)); 12b) Shouldn't below be outside of loop: + memset(nulls, false, sizeof(nulls)); 13) Shall we rename RemoveSubscriptionConflictBySubid with RemoveSubscriptionConflictResolvers()? 'BySubid' is not needed as we have Subscription in the name and we do not have any other variation of removal. 14) We shall rename pg_subscription_conflict_sub_index to pg_subscription_conflict_confsubid_confrtype_index to give more clarity that it is any index on subid and conftype And SubscriptionConflictSubIndexId to SubscriptionConflictSubidTypeIndexId And SUBSCRIPTIONCONFLICTSUBOID to SUBSCRIPTIONCONFLMAP 15) conflict.h: + See ConflictTypeResolverMap in conflcit.c to find out which all conflcit.c --> conflict.c 16) subscription.sql: 16a) add one more test case for 'fail' scenario where both conflict type and resolver are valid but resolver is not for that particular conflict type. 16b) --try setting resolvers for few types Change to below (similar to other comments) -- ok - valid conflict types and resolvers 16c) -- ok - valid conflict type and resolver maybe change to: -- ok - valid conflict types and resolvers thanks Shveta
On Fri, Aug 23, 2024 at 10:39 AM shveta malik <shveta.malik@gmail.com> wrote: > > Please find issues which need some thoughts and approval for > time-based resolution and clock-skew. > > 1) > Time based conflict resolution and two phase transactions: > > Time based conflict resolution (last_update_wins) is the one > resolution which will not result in data-divergence considering > clock-skew is taken care of. But when it comes to two-phase > transactions, it might not be the case. For two-phase transaction, we > do not have commit timestamp when the changes are being applied. Thus > for time-based comparison, initially it was decided to user prepare > timestamp but it may result in data-divergence. Please see the > example at [1]. > > Example at [1] is a tricky situation, and thus in the initial draft, > we decided to restrict usage of 2pc and CDR together. The plan is: > > a) During Create subscription, if the user has given last_update_wins > resolver for any conflict_type and 'two_phase' is also enabled, we > ERROR out. > b) During Alter subscription, if the user tries to update resolver to > 'last_update_wins' but 'two_phase' is enabled, we error out. > > Another solution could be to save both prepare_ts and commit_ts. And > when any txn comes for conflict resolution, we first check if > prepare_ts is available, use that else use commit_ts. Availability of > prepare_ts would indicate it was a prepared txn and thus even if it is > committed, we should use prepare_ts for comparison for consistency. > This will have some overhead of storing prepare_ts along with > commit_ts. But if the number of prepared txns are reasonably small, > this overhead should be less. > Yet another idea is that if the conflict is detected and the resolution strategy is last_update_wins then from that point we start writing all the changes to the file similar to what we do for streaming mode and only once commit_prepared arrives, we will read and apply changes. That will solve this problem. > We currently plan to go with restricting 2pc and last_update_wins > together, unless others have different opinions. > Sounds reasonable but we should add comments on the possible solution like the one I have mentioned so that we can extend it afterwards. > ~~ > > 2) > parallel apply worker and conflict-resolution: > As discussed in [2] (see last paragraph in [2]), for streaming of > in-progress transactions by parallel worker, we do not have > commit-timestamp with each change and thus it makes sense to disable > parallel apply worker with CDR. The plan is to not start parallel > apply worker if 'last_update_wins' is configured for any > conflict_type. > The other idea is that we can let the changes written to file if any conflict is detected and then at commit time let the remaining changes be applied by apply worker. This can introduce some complexity, so similar to two_pc we can extend this functionality later. > ~~ > > 3) > parallel apply worker and clock skew management: > Regarding clock-skew management as discussed in [3], we will wait for > the local clock to come within tolerable range during 'begin' rather > than before 'commit'. And this wait needs commit-timestamp in the > beginning, thus we plan to restrict starting pa-worker even when > clock-skew related GUCs are configured. > > Earlier we had restricted both 2pc and parallel worker worker start > when detect_conflict was enabled, but now since detect_conflict > parameter is removed, we will change the implementation to restrict > all 3 above cases when last_update_wins is configured. When the > changes are done, we will post the patch. > At this stage, we are not sure how we want to deal with clock skew. There is an argument that clock-skew should be handled outside the database, so we can probably have the clock-skew-related stuff in a separate patch. > ~~ > > 4) > <not related to timestamp and clock skew> > Earlier when 'detect_conflict' was enabled, we were giving WARNING if > 'track_commit_timestamp' was not enabled. This was during CREATE and > ALTER subscription. Now with this parameter removed, this WARNING has > also been removed. But I think we need to bring back this WARNING. > Currently default resolvers set may work without > 'track_commit_timestamp' but when user gives CONFLICT RESOLVER in > create-sub or alter-sub explicitly making them configured to > non-default values (or say any values, does not matter if few are > defaults), we may still emit this warning to alert user: > > 2024-07-26 09:14:03.152 IST [195415] WARNING: conflict detection > could be incomplete due to disabled track_commit_timestamp > 2024-07-26 09:14:03.152 IST [195415] DETAIL: Conflicts update_differ > and delete_differ cannot be detected, and the origin and commit > timestamp for the local row will not be logged. > > Thoughts? > > If we emit this WARNING during each resolution, then it may flood our > log files, thus it seems better to emit it during create or alter > subscription instead of during resolution. > Sounds reasonable. -- With Regards, Amit Kapila.
On Fri, Aug 23, 2024 at 10:39 AM shveta malik <shveta.malik@gmail.com> wrote: > > On Thu, Aug 22, 2024 at 3:44 PM shveta malik <shveta.malik@gmail.com> wrote: > > > > > > For clock-skew and timestamp based resolution, if needed, I will post > > another email for the design items where suggestions are needed. > > > > Please find issues which need some thoughts and approval for > time-based resolution and clock-skew. > > 1) > Time based conflict resolution and two phase transactions: > > Time based conflict resolution (last_update_wins) is the one > resolution which will not result in data-divergence considering > clock-skew is taken care of. But when it comes to two-phase > transactions, it might not be the case. For two-phase transaction, we > do not have commit timestamp when the changes are being applied. Thus > for time-based comparison, initially it was decided to user prepare > timestamp but it may result in data-divergence. Please see the > example at [1]. > > Example at [1] is a tricky situation, and thus in the initial draft, > we decided to restrict usage of 2pc and CDR together. The plan is: > > a) During Create subscription, if the user has given last_update_wins > resolver for any conflict_type and 'two_phase' is also enabled, we > ERROR out. > b) During Alter subscription, if the user tries to update resolver to > 'last_update_wins' but 'two_phase' is enabled, we error out. > > Another solution could be to save both prepare_ts and commit_ts. And > when any txn comes for conflict resolution, we first check if > prepare_ts is available, use that else use commit_ts. Availability of > prepare_ts would indicate it was a prepared txn and thus even if it is > committed, we should use prepare_ts for comparison for consistency. > This will have some overhead of storing prepare_ts along with > commit_ts. But if the number of prepared txns are reasonably small, > this overhead should be less. > > We currently plan to go with restricting 2pc and last_update_wins > together, unless others have different opinions. > Done. v11-004 implements the idea of restricting 2pc and last_update_wins together. > ~~ > > 2) > parallel apply worker and conflict-resolution: > As discussed in [2] (see last paragraph in [2]), for streaming of > in-progress transactions by parallel worker, we do not have > commit-timestamp with each change and thus it makes sense to disable > parallel apply worker with CDR. The plan is to not start parallel > apply worker if 'last_update_wins' is configured for any > conflict_type. > Done. > ~~ > > 3) > parallel apply worker and clock skew management: > Regarding clock-skew management as discussed in [3], we will wait for > the local clock to come within tolerable range during 'begin' rather > than before 'commit'. And this wait needs commit-timestamp in the > beginning, thus we plan to restrict starting pa-worker even when > clock-skew related GUCs are configured. > Done. v11 implements it. > Earlier we had restricted both 2pc and parallel worker worker start > when detect_conflict was enabled, but now since detect_conflict > parameter is removed, we will change the implementation to restrict > all 3 above cases when last_update_wins is configured. When the > changes are done, we will post the patch. > > ~~ > -- Thanks, Nisha
On Mon, Aug 26, 2024 at 2:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Aug 22, 2024 at 3:45 PM shveta malik <shveta.malik@gmail.com> wrote: > > > > On Wed, Aug 21, 2024 at 4:08 PM Nisha Moond <nisha.moond412@gmail.com> wrote: > > > > > > The patches have been rebased on the latest pgHead following the merge > > > of the conflict detection patch [1]. > > > > Thanks for working on patches. > > > > Summarizing the issues which need some suggestions/thoughts. > > > > 1) > > For subscription based resolvers, currently the syntax implemented is: > > > > 1a) > > CREATE SUBSCRIPTION <subname> > > CONNECTION <conninfo> PUBLICATION <pubname> > > CONFLICT RESOLVER > > (conflict_type1 = resolver1, conflict_type2 = resolver2, > > conflict_type3 = resolver3,...); > > > > 1b) > > ALTER SUBSCRIPTION <subname> CONFLICT RESOLVER > > (conflict_type1 = resolver1, conflict_type2 = resolver2, > > conflict_type3 = resolver3,...); > > > > Earlier the syntax suggested in [1] was: > > CREATE SUBSCRIPTION <subname> CONNECTION <conninfo> PUBLICATION <pubname> > > CONFLICT RESOLVER 'conflict_resolver1' FOR 'conflict_type1', > > CONFLICT RESOLVER 'conflict_resolver2' FOR 'conflict_type2'; > > > > I think the currently implemented syntax is good as it has less > > repetition, unless others think otherwise. > > > > ~~ > > > > 2) > > For subscription based resolvers, do we need a RESET command to reset > > resolvers to default? Any one of below or both? > > > > 2a) reset all at once: > > ALTER SUBSCRIPTION <name> RESET CONFLICT RESOLVERS > > > > 2b) reset one at a time: > > ALTER SUBSCRIPTION <name> RESET CONFLICT RESOLVER for 'conflict_type'; > > > > The issue I see here is, to implement 1a and 1b, we have introduced > > the 'RESOLVER' keyword. If we want to implement 2a, we will have to > > introduce the 'RESOLVERS' keyword as well. But we can come up with > > some alternative syntax if we plan to implement these. Thoughts? > > > > It makes sense to have a RESET on the lines of (a) and (b). At this > stage, we should do minimal in extending the syntax. How about RESET > CONFLICT RESOLVER ALL for (a)? > Done, v11 implements the suggested RESET command. > > ~~ > > -- Thanks, Nisha
On Wed, Aug 28, 2024 at 10:58 AM shveta malik <shveta.malik@gmail.com> wrote: > > On Wed, Aug 28, 2024 at 10:30 AM Ajin Cherian <itsajin@gmail.com> wrote: > > > >> 2) > >> Currently pg_dump is dumping even the default resolvers configuration. > >> As an example if I have not changed default configuration for say > >> sub1, it still dumps all: > >> > >> CREATE SUBSCRIPTION sub1 CONNECTION '..' PUBLICATION pub1 WITH (....) > >> CONFLICT RESOLVER (insert_exists = 'error', update_differ = > >> 'apply_remote', update_exists = 'error', update_missing = 'skip', > >> delete_differ = 'apply_remote', delete_missing = 'skip'); > >> > >> I am not sure if we need to dump default resolvers. Would like to know > >> what others think on this. > >> Normally, we don't add defaults in the dumped command. For example, dumpSubscription won't dump the options where the default is unchanged. We shouldn't do it unless we have a reason for dumping defaults. > >> 3) > >> Why in 002_pg_dump.pl we have default resolvers set explicitly? > >> > > In 003_pg_dump.pl, default resolvers are not set explicitly, that is the regexp to check the pg_dump generated commandfor creating subscriptions. This is again connected to your 2nd question. > > Okay so we may not need this change if we plan to *not *dump defaults > in pg_dump. > > Another point about 'defaults' is regarding insertion into the > pg_subscription_conflict table. We currently do insert default > resolvers into 'pg_subscription_conflict' even if the user has not > explicitly configured them. > I don't see any problem with it. BTW, if we don't do it, I think wherever we are referring the resolvers for a conflict, we need some special handling for default and non-default. Am I missing something? -- With Regards, Amit Kapila.
On Fri, Aug 30, 2024 at 12:13 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Aug 28, 2024 at 10:58 AM shveta malik <shveta.malik@gmail.com> wrote: > > > > On Wed, Aug 28, 2024 at 10:30 AM Ajin Cherian <itsajin@gmail.com> wrote: > > > > > >> 2) > > >> Currently pg_dump is dumping even the default resolvers configuration. > > >> As an example if I have not changed default configuration for say > > >> sub1, it still dumps all: > > >> > > >> CREATE SUBSCRIPTION sub1 CONNECTION '..' PUBLICATION pub1 WITH (....) > > >> CONFLICT RESOLVER (insert_exists = 'error', update_differ = > > >> 'apply_remote', update_exists = 'error', update_missing = 'skip', > > >> delete_differ = 'apply_remote', delete_missing = 'skip'); > > >> > > >> I am not sure if we need to dump default resolvers. Would like to know > > >> what others think on this. > > >> > > Normally, we don't add defaults in the dumped command. For example, > dumpSubscription won't dump the options where the default is > unchanged. We shouldn't do it unless we have a reason for dumping > defaults. Agreed, we should not dump defaults. I had the same opinion. > > > >> 3) > > >> Why in 002_pg_dump.pl we have default resolvers set explicitly? > > >> > > > In 003_pg_dump.pl, default resolvers are not set explicitly, that is the regexp to check the pg_dump generated commandfor creating subscriptions. This is again connected to your 2nd question. > > > > Okay so we may not need this change if we plan to *not *dump defaults > > in pg_dump. > > > > Another point about 'defaults' is regarding insertion into the > > pg_subscription_conflict table. We currently do insert default > > resolvers into 'pg_subscription_conflict' even if the user has not > > explicitly configured them. > > > > I don't see any problem with it. Yes, no problem > BTW, if we don't do it, I think > wherever we are referring the resolvers for a conflict, we need some > special handling for default and non-default. Yes, we will need special handling in such a case. Thus we shall go with inserting defaults. > Am I missing something? No, I just wanted to know others' opinions, so I asked. thanks Shveta
On Wed, Aug 28, 2024 at 4:07 PM shveta malik <shveta.malik@gmail.com> wrote: > > > On Wed, Aug 28, 2024 at 10:30 AM Ajin Cherian <itsajin@gmail.com> wrote: > > > > > The review is WIP. Please find a few comments on patch001. > > 1) > logical-repliction.sgmlL > > + Additional logging is triggered for specific conflict_resolvers. > Users can also configure conflict_types while creating the > subscription. Refer to section CONFLICT RESOLVERS for details on > conflict_types and conflict_resolvers. > > Can we please change it to: > > Additional logging is triggered in various conflict scenarios, each > identified as a conflict type. Users have the option to configure a > conflict resolver for each conflict type when creating a subscription. > For more information on the conflict types detected and the supported > conflict resolvers, refer to the section <CONFLICT RESOLVERS> > > 2) > SetSubConflictResolver > > + for (type = 0; type < resolvers_cnt; type++) > > 'type' does not look like the correct name here. The variable does not > state conflict_type, it is instead a resolver-array-index, so please > rename accordingly. Maybe idx or res_idx? > > 3) > CreateSubscription(): > > + if (stmt->resolvers) > + check_conflict_detection(); > > 3a) We can have a comment saying warn users if prerequisites are not met. > > 3b) Also, I do not find the name 'check_conflict_detection' > appropriate. One suggestion could be > 'conf_detection_check_prerequisites' (similar to > replorigin_check_prerequisites) > > 3c) We can move the below comment after check_conflict_detection() as > it makes more sense there. > /* > * Parse and check conflict resolvers. Initialize with default values > */ > > 4) > Should we allow repetition/duplicates of 'conflict_type=..' in CREATE > and ALTER SUB? As an example: > ALTER SUBSCRIPTION sub1 CONFLICT RESOLVER (insert_exists = > 'apply_remote', insert_exists = 'error'); > > Such a repetition works for Create-Sub but gives some internal error > for alter-sub. (ERROR: tuple already updated by self). Behaviour > should be the same for both. And if we give an error, it should be > some user understandable one. But I would like to know the opinions of > others. Shall it give an error or the last one should be accepted as > valid configuration in case of repetition? > I have tried the below statement to check existing behavior: create subscription sub1 connection 'dbname=postgres' publication pub1 with (streaming = on, streaming=off); ERROR: conflicting or redundant options LINE 1: ...=postgres' publication pub1 with (streaming = on, streaming=... So duplicate options are not allowed. If we see any challenges to follow same for resolvers then we can discuss but it seems better to follow the existing behavior of other subscription options. Also, the behavior for CREATE/ALTER should be the same. -- With Regards, Amit Kapila.
On Fri, 30 Aug 2024 at 11:01, Nisha Moond <nisha.moond412@gmail.com> wrote: > > Here is the v11 patch-set. Changes are: 1) This command crashes: ALTER SUBSCRIPTION name RESET CONFLICT RESOLVER FOR NULL; #0 __strcmp_avx2 () at ../sysdeps/x86_64/multiarch/strcmp-avx2.S:116 #1 0x000055c67270600a in ResetConflictResolver (subid=16404, conflict_type=0x0) at conflict.c:744 #2 0x000055c67247e0c3 in AlterSubscription (pstate=0x55c6748ff9d0, stmt=0x55c67497dfe0, isTopLevel=true) at subscriptioncmds.c:1664 + | ALTER SUBSCRIPTION name RESET CONFLICT RESOLVER FOR conflict_type + { + AlterSubscriptionStmt *n = + makeNode(AlterSubscriptionStmt); + + n->kind = ALTER_SUBSCRIPTION_RESET_CONFLICT_RESOLVER; + n->subname = $3; + n->conflict_type = $8; + $$ = (Node *) n; + } + ; +conflict_type: + Sconst { $$ = $1; } + | NULL_P { $$ = NULL; } ; May be conflict_type should be changed to: +conflict_type: + Sconst { $$ = $1; } ; 2) Conflict resolver is not shown in describe command: postgres=# \dRs+ List of subscriptions Name | Owner | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit | Conninfo | Skip LSN ------+---------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+---------------------------------- --------+---------- sub1 | vignesh | t | {pub1} | f | off | d | f | any | t | f | f | off | dbname=postgres host=localhost po rt=5432 | 0/0 sub2 | vignesh | t | {pub1} | f | off | d | f | any | t | f | f | off | dbname=postgres host=localhost po rt=5432 | 0/0 (2 rows) 3) Tab completion is not handled to include Conflict resolver: postgres=# alter subscription sub1 ADD PUBLICATION CONNECTION DISABLE DROP PUBLICATION ENABLE OWNER TO REFRESH PUBLICATION RENAME TO SET SKIP ( Regards, Vignesh
On Fri, 30 Aug 2024 at 11:01, Nisha Moond <nisha.moond412@gmail.com> wrote: > > Here is the v11 patch-set. Changes are: > 1) Updated conflict type names in accordance with the recent commit[1] as - > update_differ --> update_origin_differs > delete_differ --> delete_origin_differs > > 2) patch-001: > - Implemented the RESET command to restore the default resolvers as > suggested in pt.2a & 2b in [2] Few comments on 0001 patch: 1) Currently create subscription has WITH option before conflict resolver, I felt WITH option can be after CONNECTION, PUBLICATION and CONFLICT RESOLVER option and WITH option at the end: CreateSubscriptionStmt: - CREATE SUBSCRIPTION name CONNECTION Sconst PUBLICATION name_list opt_definition + CREATE SUBSCRIPTION name CONNECTION Sconst PUBLICATION name_list opt_definition opt_resolver_definition { CreateSubscriptionStmt *n = makeNode(CreateSubscriptionStmt); @@ -10696,6 +10702,7 @@ CreateSubscriptionStmt: n->conninfo = $5; n->publication = $7; n->options = $8; + n->resolvers = $9; $$ = (Node *) n; 2) Case sensitive: 2.a) Should conflict type be case insensitive: CREATE SUBSCRIPTION sub3 CONNECTION 'dbname=postgres host=localhost port=5432' PUBLICATION pub1 with (copy_data= true) CONFLICT RESOLVER ("INSERT_EXISTS" = 'error'); ERROR: INSERT_EXISTS is not a valid conflict type In few other places it is not case sensitive: create publication pub1 with ( PUBLISH= 'INSERT,UPDATE,delete'); set log_min_MESSAGES TO warning ; 2.b) Similarly in case of conflict resolver too: CREATE SUBSCRIPTION sub3 CONNECTION 'dbname=postgres host=localhost port=5432' PUBLICATION pub1 with (copy_data= true) CONFLICT RESOLVER ("insert_exists" = 'erroR'); ERROR: erroR is not a valid conflict resolver 3) Since there is only one key used to search, we can remove nkeys variable and directly specify as 1: +RemoveSubscriptionConflictBySubid(Oid subid) +{ + Relation rel; + HeapTuple tup; + TableScanDesc scan; + ScanKeyData skey[1]; + int nkeys = 0; + + rel = table_open(SubscriptionConflictId, RowExclusiveLock); + + /* + * Search using the subid, this should return all conflict resolvers for + * this sub + */ + ScanKeyInit(&skey[nkeys++], + Anum_pg_subscription_conflict_confsubid, + BTEqualStrategyNumber, + F_OIDEQ, + ObjectIdGetDatum(subid)); + + scan = table_beginscan_catalog(rel, nkeys, skey); 4) Currently we are including CONFLICT RESOLVER even if a subscription with default CONFLICT RESOLVER is created, we can add the CONFLICT RESOLVER option only for non-default subscription option: + /* add conflict resolvers, if any */ + if (fout->remoteVersion >= 180000) + { + PQExpBuffer InQry = createPQExpBuffer(); + PGresult *res; + int i_confrtype; + int i_confrres; + + /* get the conflict types and their resolvers from the catalog */ + appendPQExpBuffer(InQry, + "SELECT confrtype, confrres " + "FROM pg_catalog.pg_subscription_conflict" + " WHERE confsubid = %u;\n", subinfo->dobj.catId.oid); + res = ExecuteSqlQuery(fout, InQry->data, PGRES_TUPLES_OK); + + i_confrtype = PQfnumber(res, "confrtype"); + i_confrres = PQfnumber(res, "confrres"); + + if (PQntuples(res) > 0) + { + appendPQExpBufferStr(query, ") CONFLICT RESOLVER ("); 5) Should remote_apply be apply_remote here as this is what is specified in code: + <varlistentry id="sql-createsubscription-params-with-conflict_resolver-remote-apply"> + <term><literal>remote_apply</literal> (<type>enum</type>)</term> + <listitem> + <para> 6) I think this should be "It is the default resolver for update_origin_differs" 6.a) + <varlistentry id="sql-createsubscription-params-with-conflict_resolver-remote-apply"> + <term><literal>remote_apply</literal> (<type>enum</type>)</term> + <listitem> + <para> + This resolver applies the remote change. It can be used for + <literal>insert_exists</literal>, <literal>update_exists</literal>, + <literal>update_differ</literal> and <literal>delete_differ</literal>. + It is the default resolver for <literal>insert_exists</literal> and + <literal>update_exists</literal>. + </para> + </listitem> + </varlistentry> 6.b) + <varlistentry id="sql-createsubscription-params-with-conflict_type-update-differ"> + <term><literal>update_differ</literal> (<type>enum</type>)</term> + <listitem> + <para> + This conflict occurs when updating a row that was previously 6.c) + <varlistentry id="sql-createsubscription-params-with-conflict_resolver-remote-apply"> + <term><literal>remote_apply</literal> (<type>enum</type>)</term> + <listitem> + <para> + This resolver applies the remote change. It can be used for + <literal>insert_exists</literal>, <literal>update_exists</literal>, + <literal>update_differ</literal> and <literal>delete_differ</literal>. + It is the default resolver for <literal>insert_exists</literal> and + <literal>update_exists</literal>. 6.d) + <varlistentry id="sql-createsubscription-params-with-conflict_resolver-remote-apply"> + <term><literal>remote_apply</literal> (<type>enum</type>)</term> + <listitem> + <para> + This resolver applies the remote change. It can be used for + <literal>insert_exists</literal>, <literal>update_exists</literal>, + <literal>update_differ</literal> and <literal>delete_differ</literal>. Similarly this change should be done in other places too. 7) 7.a) Should delete_differ be changed to delete_origin_differs as that is what is specified in the subscription commands: +check_conflict_detection(void) +{ + if (!track_commit_timestamp) + ereport(WARNING, + errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), + errmsg("conflict detection and resolution could be incomplete due to disabled track_commit_timestamp"), + errdetail("Conflicts update_differ and delete_differ cannot be detected, " + "and the origin and commit timestamp for the local row will not be logged.")); 7.b) similarly here too: + <varlistentry id="sql-createsubscription-params-with-conflict_type-delete-differ"> + <term><literal>delete_differ</literal> (<type>enum</type>)</term> + <listitem> + <para> + This conflict occurs when deleting a row that was previously modified + by another origin. Note that this conflict can only be detected when + <link linkend="guc-track-commit-timestamp"><varname>track_commit_timestamp</varname></link> + is enabled on the subscriber. Currently, the delete is always applied + regardless of the origin of the local row. + </para> + </listitem> + </varlistentry> Similarly this change should be done in other places too. 8) ConflictTypeResolver should be added to typedefs.list to resolve the pgindent issues: 8.a) +static void +parse_subscription_conflict_resolvers(List *stmtresolvers, + ConflictTypeResolver * resolvers) 8.b) Similarly FormData_pg_subscription_conflict should also be added: } FormData_pg_subscription_conflict; /* ---------------- * Form_pg_subscription_conflict corresponds to a pointer to a row with * the format of pg_subscription_conflict relation. * ---------------- */ typedef FormData_pg_subscription_conflict * Form_pg_subscription_conflict; Regards, Vignesh
On Thu, Aug 29, 2024 at 4:43 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Aug 23, 2024 at 10:39 AM shveta malik <shveta.malik@gmail.com> wrote: > > > > Please find issues which need some thoughts and approval for > > time-based resolution and clock-skew. > > > > 1) > > Time based conflict resolution and two phase transactions: > > > > Time based conflict resolution (last_update_wins) is the one > > resolution which will not result in data-divergence considering > > clock-skew is taken care of. But when it comes to two-phase > > transactions, it might not be the case. For two-phase transaction, we > > do not have commit timestamp when the changes are being applied. Thus > > for time-based comparison, initially it was decided to user prepare > > timestamp but it may result in data-divergence. Please see the > > example at [1]. > > > > Example at [1] is a tricky situation, and thus in the initial draft, > > we decided to restrict usage of 2pc and CDR together. The plan is: > > > > a) During Create subscription, if the user has given last_update_wins > > resolver for any conflict_type and 'two_phase' is also enabled, we > > ERROR out. > > b) During Alter subscription, if the user tries to update resolver to > > 'last_update_wins' but 'two_phase' is enabled, we error out. > > > > Another solution could be to save both prepare_ts and commit_ts. And > > when any txn comes for conflict resolution, we first check if > > prepare_ts is available, use that else use commit_ts. Availability of > > prepare_ts would indicate it was a prepared txn and thus even if it is > > committed, we should use prepare_ts for comparison for consistency. > > This will have some overhead of storing prepare_ts along with > > commit_ts. But if the number of prepared txns are reasonably small, > > this overhead should be less. > > > > Yet another idea is that if the conflict is detected and the > resolution strategy is last_update_wins then from that point we start > writing all the changes to the file similar to what we do for > streaming mode and only once commit_prepared arrives, we will read and > apply changes. That will solve this problem. > > > We currently plan to go with restricting 2pc and last_update_wins > > together, unless others have different opinions. > > > > Sounds reasonable but we should add comments on the possible solution > like the one I have mentioned so that we can extend it afterwards. > Done, v12-004 patch has the comments for the possible solution. > > ~~ > > > > 2) > > parallel apply worker and conflict-resolution: > > As discussed in [2] (see last paragraph in [2]), for streaming of > > in-progress transactions by parallel worker, we do not have > > commit-timestamp with each change and thus it makes sense to disable > > parallel apply worker with CDR. The plan is to not start parallel > > apply worker if 'last_update_wins' is configured for any > > conflict_type. > > > > The other idea is that we can let the changes written to file if any > conflict is detected and then at commit time let the remaining changes > be applied by apply worker. This can introduce some complexity, so > similar to two_pc we can extend this functionality later. > v12-004 patch has the comments to extend it later. > > ~~ > > > > 3) > > parallel apply worker and clock skew management: > > Regarding clock-skew management as discussed in [3], we will wait for > > the local clock to come within tolerable range during 'begin' rather > > than before 'commit'. And this wait needs commit-timestamp in the > > beginning, thus we plan to restrict starting pa-worker even when > > clock-skew related GUCs are configured. > > > > Earlier we had restricted both 2pc and parallel worker worker start > > when detect_conflict was enabled, but now since detect_conflict > > parameter is removed, we will change the implementation to restrict > > all 3 above cases when last_update_wins is configured. When the > > changes are done, we will post the patch. > > > > At this stage, we are not sure how we want to deal with clock skew. > There is an argument that clock-skew should be handled outside the > database, so we can probably have the clock-skew-related stuff in a > separate patch. > Separated the clock-skew related code in v12-005 patch. -- Thanks, Nisha
On Fri, Sep 6, 2024 at 2:05 PM Ajin Cherian <itsajin@gmail.com> wrote: > > > Thank you for your feedback, Shveta. I've addressed both sets of comments you provided. Thanks for the patches. I am reviewing v12-patch001, it is WIP. But please find first set of comments: 1) src/sgml/logical-replication.sgml: + Users have the option to configure a conflict_resolver Full stop for previous line is missing. 2) + For more information on the conflict_types detected and the supported conflict_resolvers, refer to section CONFLICT RESOLVERS. We may change to : For more information on the supported conflict_types and conflict_resolvers, refer to section CONFLICT RESOLVERS. 3) src/backend/commands/subscriptioncmds.c: Line removed. This change is not needed. static void CheckAlterSubOption(Subscription *sub, const char *option, bool slot_needs_update, bool isTopLevel); - 4) Let's stick to the same comments format as the rest of the file i.e. first letter in caps. + /* first initialise the resolvers with default values */ first --> First initialise --> initialize Same for below comments: + /* validate the conflict type and resolver */ + /* update the corresponding resolver for the given conflict type */ Please verify the rest of the file for the same. 5) Please add below in header of parse_subscription_conflict_resolvers (similar to parse_subscription_options): * This function will report an error if mutually exclusive options are specified. 6) + * Warn users if prerequisites are not met. + * Initialize with default values. + */ + if (stmt->resolvers) + conf_detection_check_prerequisites(); + Would it be better to move the above call inside parse_subscription_conflict_resolvers(), then we will have all resolver related stuff at one place? Irrespective of whether we move it or not, please remove 'Initialize with default values.' from above as that is now not done here. thanks Shveta
On Fri, Sep 6, 2024 at 2:05 PM Ajin Cherian <itsajin@gmail.com> wrote: > > > > On Thu, Aug 29, 2024 at 2:50 PM shveta malik <shveta.malik@gmail.com> wrote: >> >> On Wed, Aug 28, 2024 at 4:07 PM shveta malik <shveta.malik@gmail.com> wrote: >> > >> > > On Wed, Aug 28, 2024 at 10:30 AM Ajin Cherian <itsajin@gmail.com> wrote: >> > > > >> > >> > The review is WIP. Please find a few comments on patch001. >> > >> >> More comments on ptach001 in continuation of previous comments: >> > > Thank you for your feedback, Shveta. I've addressed both sets of comments you provided. Thanks for the patches. I tested the v12-0001 patch, and here are my comments: 1) An unexpected error occurs when attempting to alter the resolver for multiple conflict_type(s) in ALTER SUB...CONFLICT RESOLVER command. See below examples : postgres=# alter subscription sub2 CONFLICT RESOLVER (update_exists=keep_local, delete_missing=error, update_origin_differs=error); ERROR: unrecognized node type: 1633972341 postgres=# alter subscription sub2 CONFLICT RESOLVER ( update_origin_differs=error, update_exists=error); ERROR: unrecognized node type: 1633972341 postgres=# alter subscription sub2 CONFLICT RESOLVER ( delete_origin_differs=error, delete_missing=error); ERROR: unrecognized node type: 1701602660 postgres=# alter subscription sub2 CONFLICT RESOLVER (update_exists=keep_local, delete_missing=error); ALTER SUBSCRIPTION -- It appears that the error occurs only when at least two conflict types belong to the same category, either UPDATE or DELETE. 2) Given the above issue, it would be beneficial to add a test in subscription.sql to cover cases where all valid conflict types are set with appropriate resolvers in both the ALTER and CREATE commands. Thanks, Nisha
On Mon, Sep 9, 2024 at 2:58 PM shveta malik <shveta.malik@gmail.com> wrote: > > On Fri, Sep 6, 2024 at 2:05 PM Ajin Cherian <itsajin@gmail.com> wrote: > > > > > > Thank you for your feedback, Shveta. I've addressed both sets of comments you provided. > > Thanks for the patches. I am reviewing v12-patch001, it is WIP. But > please find first set of comments: > It will be good if we can use parse_subscription_conflict_resolvers() from both CREATE and ALTER flow instead of writing different functions for both the flows. Please review once to see this feasibility. thanks Shveta
On Thu, 12 Sept 2024 at 14:03, Ajin Cherian <itsajin@gmail.com> wrote: > > On Tue, Sep 3, 2024 at 7:42 PM vignesh C <vignesh21@gmail.com> wrote: > > On Fri, 30 Aug 2024 at 11:01, Nisha Moond <nisha.moond412@gmail.com> wrote: > > > > Here is the v11 patch-set. Changes are: > > 1) This command crashes: > ALTER SUBSCRIPTION name RESET CONFLICT RESOLVER FOR NULL; > #0 __strcmp_avx2 () at ../sysdeps/x86_64/multiarch/strcmp-avx2.S:116 > #1 0x000055c67270600a in ResetConflictResolver (subid=16404, > conflict_type=0x0) at conflict.c:744 > #2 0x000055c67247e0c3 in AlterSubscription (pstate=0x55c6748ff9d0, > stmt=0x55c67497dfe0, isTopLevel=true) at subscriptioncmds.c:1664 > > + | ALTER SUBSCRIPTION name RESET CONFLICT > RESOLVER FOR conflict_type > + { > + AlterSubscriptionStmt *n = > + makeNode(AlterSubscriptionStmt); > + > + n->kind = > ALTER_SUBSCRIPTION_RESET_CONFLICT_RESOLVER; > + n->subname = $3; > + n->conflict_type = $8; > + $$ = (Node *) n; > + } > + ; > +conflict_type: > + Sconst > { $$ = $1; } > + | NULL_P > { $$ = NULL; } > ; > > May be conflict_type should be changed to: > +conflict_type: > + Sconst > { $$ = $1; } > ; > > > Fixed. Few comments: 1) Tab completion missing for: a) ALTER SUBSCRIPTION name CONFLICT RESOLVER b) ALTER SUBSCRIPTION name RESET CONFLICT RESOLVER ALL c) ALTER SUBSCRIPTION name RESET CONFLICT RESOLVER FOR 2) Documentation missing for: a) ALTER SUBSCRIPTION name RESET CONFLICT RESOLVER ALL b) ALTER SUBSCRIPTION name RESET CONFLICT RESOLVER FOR 3) This reset is not required here, if valid was false it would have thrown an error and exited: a) + if (!valid) + ereport(ERROR, + errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("%s is not a valid conflict type", conflict_type)); + + /* Reset */ + valid = false; b) Similarly here too: + if (!valid) + ereport(ERROR, + errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("%s is not a valid conflict resolver", conflict_resolver)); + + /* Reset */ + valid = false; 4) How about adding CT_MAX inside the enum itself as the last enum value: typedef enum { /* The row to be inserted violates unique constraint */ CT_INSERT_EXISTS, /* The row to be updated was modified by a different origin */ CT_UPDATE_ORIGIN_DIFFERS, /* The updated row value violates unique constraint */ CT_UPDATE_EXISTS, /* The row to be updated is missing */ CT_UPDATE_MISSING, /* The row to be deleted was modified by a different origin */ CT_DELETE_ORIGIN_DIFFERS, /* The row to be deleted is missing */ CT_DELETE_MISSING, /* * Other conflicts, such as exclusion constraint violations, involve more * complex rules than simple equality checks. These conflicts are left for * future improvements. */ } ConflictType; #define CONFLICT_NUM_TYPES (CT_DELETE_MISSING + 1) /* Min and max conflict type */ #define CT_MIN CT_INSERT_EXISTS #define CT_MAX CT_DELETE_MISSING and the for loop can be changed to: for (type = 0; type < CT_MAX; type++) This way CT_MIN can be removed and CT_MAX need not be changed every time a new enum is added. Also the following +1 can be removed from the variables: ConflictTypeResolver conflictResolvers[CT_MAX + 1]; 5) Similar thing can be done with ConflictResolver enum too. i.e remove CR_MIN and add CR_MAX as the last element of enum typedef enum ConflictResolver { /* Apply the remote change */ CR_APPLY_REMOTE = 1, /* Keep the local change */ CR_KEEP_LOCAL, /* Apply the remote change; skip if it can not be applied */ CR_APPLY_OR_SKIP, /* Apply the remote change; emit error if it can not be applied */ CR_APPLY_OR_ERROR, /* Skip applying the change */ CR_SKIP, /* Error out */ CR_ERROR, } ConflictResolver; /* Min and max conflict resolver */ #define CR_MIN CR_APPLY_REMOTE #define CR_MAX CR_ERROR 6) Except scansup.h inclusion, other inclusions added are not required in subscriptioncmds.c file. 7)The inclusions "access/heaptoast.h", "access/table.h", "access/tableam.h", "catalog/dependency.h", "catalog/pg_subscription.h", "catalog/pg_subscription_conflict.h" and "catalog/pg_inherits.h" are not required in conflict.c file. 8) Can we change this to use the new foreach_ptr implementations added: + foreach(lc, stmtresolvers) + { + DefElem *defel = (DefElem *) lfirst(lc); + ConflictType type; + char *resolver; to use foreach_ptr like: foreach_ptr(DefElem, defel, stmtresolvers) { + ConflictType type; + char *resolver; .... } Regards, Vignesh
On Thu, 12 Sept 2024 at 14:03, Ajin Cherian <itsajin@gmail.com> wrote: > > On Tue, Sep 3, 2024 at 7:42 PM vignesh C <vignesh21@gmail.com> wrote: > > On Fri, 30 Aug 2024 at 11:01, Nisha Moond <nisha.moond412@gmail.com> wrote: > > > > Here is the v11 patch-set. Changes are: > > 1) This command crashes: > ALTER SUBSCRIPTION name RESET CONFLICT RESOLVER FOR NULL; > #0 __strcmp_avx2 () at ../sysdeps/x86_64/multiarch/strcmp-avx2.S:116 > #1 0x000055c67270600a in ResetConflictResolver (subid=16404, > conflict_type=0x0) at conflict.c:744 > #2 0x000055c67247e0c3 in AlterSubscription (pstate=0x55c6748ff9d0, > stmt=0x55c67497dfe0, isTopLevel=true) at subscriptioncmds.c:1664 > > + | ALTER SUBSCRIPTION name RESET CONFLICT > RESOLVER FOR conflict_type > + { > + AlterSubscriptionStmt *n = > + makeNode(AlterSubscriptionStmt); > + > + n->kind = > ALTER_SUBSCRIPTION_RESET_CONFLICT_RESOLVER; > + n->subname = $3; > + n->conflict_type = $8; > + $$ = (Node *) n; > + } > + ; > +conflict_type: > + Sconst > { $$ = $1; } > + | NULL_P > { $$ = NULL; } > ; > > May be conflict_type should be changed to: > +conflict_type: > + Sconst > { $$ = $1; } > ; > > > Fixed. > Few comments: 1) This should be in (fout->remoteVersion >= 180000) check to support dumping backward compatible server objects, else dump with older version will fail: + /* Populate conflict type fields using the new query */ + confQuery = createPQExpBuffer(); + appendPQExpBuffer(confQuery, + "SELECT confrtype, confrres FROM pg_catalog.pg_subscription_conflict " + "WHERE confsubid = %u;", subinfo[i].dobj.catId.oid); + confRes = ExecuteSqlQuery(fout, confQuery->data, PGRES_TUPLES_OK); + + ntuples = PQntuples(confRes); + for (j = 0; j < ntuples; j++) 2) Can we check and throw an error before the warning is logged in this case as it seems strange to throw a warning first and then an error for the same track_commit_timestamp configuration: postgres=# create subscription sub1 connection ... publication pub1 conflict resolver (insert_exists = 'last_update_wins'); WARNING: conflict detection and resolution could be incomplete due to disabled track_commit_timestamp DETAIL: Conflicts update_origin_differs and delete_origin_differs cannot be detected, and the origin and commit timestamp for the local row will not be logged. ERROR: resolver last_update_wins requires "track_commit_timestamp" to be enabled HINT: Make sure the configuration parameter "track_commit_timestamp" is set. Regards, Vignesh
On Thu, 12 Sept 2024 at 14:03, Ajin Cherian <itsajin@gmail.com> wrote: > > On Tue, Sep 3, 2024 at 7:42 PM vignesh C <vignesh21@gmail.com> wrote: > > On Fri, 30 Aug 2024 at 11:01, Nisha Moond <nisha.moond412@gmail.com> wrote: > > > > Here is the v11 patch-set. Changes are: > I was reviewing the CONFLICT RESOLVER (insert_exists='apply_remote') and found that one conflict remains unresolved in the following scenario: Pub: CREATE TABLE circles(c1 CIRCLE, c2 text, EXCLUDE USING gist (c1 WITH &&)); CREATE PUBLICATION pub1 for table circles; Sub: CREATE TABLE circles(c1 CIRCLE, c2 text, EXCLUDE USING gist (c1 WITH &&)) insert into circles values('<(0,0), 5>', 'sub'); CREATE SUBSCRIPTION ... PUBLICATION pub1 CONFLICT RESOLVER (insert_exists='apply_remote'); The following conflict is not detected and resolved with remote tuple data: Pub: INSERT INTO circles VALUES('<(0,0), 5>', 'pub'); 2024-09-19 17:32:36.637 IST [31463] 31463 LOG: conflict detected on relation "public.t1": conflict=insert_exists, Resolution=apply_remote. 2024-09-19 17:32:36.637 IST [31463] 31463 DETAIL: Key already exists in unique index "t1_pkey", modified in transaction 742, applying the remote changes. Key (c1)=(1); existing local tuple (1, sub); remote tuple (1, pub). 2024-09-19 17:32:36.637 IST [31463] 31463 CONTEXT: processing remote data for replication origin "pg_16398" during message type "INSERT" for replication target relation "public.t1" in transaction 744, finished at 0/1528E88 ........ 2024-09-19 17:32:44.653 IST [31463] 31463 ERROR: conflicting key value violates exclusion constraint "circles_c1_excl" 2024-09-19 17:32:44.653 IST [31463] 31463 DETAIL: Key (c1)=(<(0,0),5>) conflicts with existing key (c1)=(<(0,0),5>). ........ Regards, Vignesh
Hello!
Sorry for being noisy, just for the case, want to notice that [1] needs to be addressed before any real usage of conflict resolution.
On Thu, Sep 19, 2024 at 5:43 PM vignesh C <vignesh21@gmail.com> wrote: > > > > > I was reviewing the CONFLICT RESOLVER (insert_exists='apply_remote') > and found that one conflict remains unresolved in the following > scenario: Thanks for the review and testing. > Pub: > CREATE TABLE circles(c1 CIRCLE, c2 text, EXCLUDE USING gist (c1 WITH &&)); > CREATE PUBLICATION pub1 for table circles; > > Sub: > CREATE TABLE circles(c1 CIRCLE, c2 text, EXCLUDE USING gist (c1 WITH &&)) > insert into circles values('<(0,0), 5>', 'sub'); > CREATE SUBSCRIPTION ... PUBLICATION pub1 CONFLICT RESOLVER > (insert_exists='apply_remote'); > > The following conflict is not detected and resolved with remote tuple data: > Pub: > INSERT INTO circles VALUES('<(0,0), 5>', 'pub'); > > 2024-09-19 17:32:36.637 IST [31463] 31463 LOG: conflict detected on > relation "public.t1": conflict=insert_exists, Resolution=apply_remote. > 2024-09-19 17:32:36.637 IST [31463] 31463 DETAIL: Key already > exists in unique index "t1_pkey", modified in transaction 742, > applying the remote changes. > Key (c1)=(1); existing local tuple (1, sub); remote tuple (1, pub). > 2024-09-19 17:32:36.637 IST [31463] 31463 CONTEXT: processing > remote data for replication origin "pg_16398" during message type > "INSERT" for replication target relation "public.t1" in transaction > 744, finished at 0/1528E88 > ........ > 2024-09-19 17:32:44.653 IST [31463] 31463 ERROR: conflicting key > value violates exclusion constraint "circles_c1_excl" > 2024-09-19 17:32:44.653 IST [31463] 31463 DETAIL: Key > (c1)=(<(0,0),5>) conflicts with existing key (c1)=(<(0,0),5>). > ........ We don't support conflict detection for exclusion constraints yet. Please see the similar issue raised in the conflict-detection thread and the responses at [1] and [2]. Also see the docs at [3]. [1]: https://www.postgresql.org/message-id/TYAPR01MB569224262F44875973FAF344F5B22%40TYAPR01MB5692.jpnprd01.prod.outlook.com [2]: https://www.postgresql.org/message-id/CAA4eK1KwqAUGDV3trUZf4hkrUYO3yzwjmBqYtoyFAPMFXpHy3g%40mail.gmail.com [3]: https://www.postgresql.org/docs/devel/logical-replication-conflicts.html <See this in doc: Note that there are other conflict scenarios, such as exclusion constraint violations. Currently, we do not provide additional details for them in the log.> thanks Shveta
On Fri, Sep 20, 2024 at 8:40 AM Nisha Moond <nisha.moond412@gmail.com> wrote: > > On Wed, Sep 18, 2024 at 10:46 AM vignesh C <vignesh21@gmail.com> wrote: > > > > On Thu, 12 Sept 2024 at 14:03, Ajin Cherian <itsajin@gmail.com> wrote: > > > > > > On Tue, Sep 3, 2024 at 7:42 PM vignesh C <vignesh21@gmail.com> wrote: > > > > > > On Fri, 30 Aug 2024 at 11:01, Nisha Moond <nisha.moond412@gmail.com> wrote: > > > > > > > > Here is the v11 patch-set. Changes are: > > > > > > 1) This command crashes: > > > ALTER SUBSCRIPTION name RESET CONFLICT RESOLVER FOR NULL; > > > #0 __strcmp_avx2 () at ../sysdeps/x86_64/multiarch/strcmp-avx2.S:116 > > > #1 0x000055c67270600a in ResetConflictResolver (subid=16404, > > > conflict_type=0x0) at conflict.c:744 > > > #2 0x000055c67247e0c3 in AlterSubscription (pstate=0x55c6748ff9d0, > > > stmt=0x55c67497dfe0, isTopLevel=true) at subscriptioncmds.c:1664 > > > > > > + | ALTER SUBSCRIPTION name RESET CONFLICT > > > RESOLVER FOR conflict_type > > > + { > > > + AlterSubscriptionStmt *n = > > > + makeNode(AlterSubscriptionStmt); > > > + > > > + n->kind = > > > ALTER_SUBSCRIPTION_RESET_CONFLICT_RESOLVER; > > > + n->subname = $3; > > > + n->conflict_type = $8; > > > + $$ = (Node *) n; > > > + } > > > + ; > > > +conflict_type: > > > + Sconst > > > { $$ = $1; } > > > + | NULL_P > > > { $$ = NULL; } > > > ; > > > > > > May be conflict_type should be changed to: > > > +conflict_type: > > > + Sconst > > > { $$ = $1; } > > > ; > > > > > > > > > Fixed. > > > > > > > Few comments: > > 1) This should be in (fout->remoteVersion >= 180000) check to support > > dumping backward compatible server objects, else dump with older > > version will fail: > > + /* Populate conflict type fields using the new query */ > > + confQuery = createPQExpBuffer(); > > + appendPQExpBuffer(confQuery, > > + "SELECT confrtype, > > confrres FROM pg_catalog.pg_subscription_conflict " > > + "WHERE confsubid = > > %u;", subinfo[i].dobj.catId.oid); > > + confRes = ExecuteSqlQuery(fout, confQuery->data, > > PGRES_TUPLES_OK); > > + > > + ntuples = PQntuples(confRes); > > + for (j = 0; j < ntuples; j++) > > > > 2) Can we check and throw an error before the warning is logged in > > this case as it seems strange to throw a warning first and then an > > error for the same track_commit_timestamp configuration: > > postgres=# create subscription sub1 connection ... publication pub1 > > conflict resolver (insert_exists = 'last_update_wins'); > > WARNING: conflict detection and resolution could be incomplete due to > > disabled track_commit_timestamp > > DETAIL: Conflicts update_origin_differs and delete_origin_differs > > cannot be detected, and the origin and commit timestamp for the local > > row will not be logged. > > ERROR: resolver last_update_wins requires "track_commit_timestamp" to > > be enabled > > HINT: Make sure the configuration parameter "track_commit_timestamp" is set. > > > > Thanks for the review. > Here is the v14 patch-set fixing review comments in [1] and [2]. Clarification: The fixes for mentioned comments from Vignesh - [1] & [2] are fixed in patch-001. Thank you Ajin for providing the changes. > New in patches: > 1) Added partition table tests in 034_conflict_resolver.pl in 002 and > 003 patches. > 2) 003 has a bug fix for update_exists conflict resolution on > partitioned tables. > > [1]: https://www.postgresql.org/message-id/CALDaNm3es1JqU8Qcv5Yw%3D7Ts2dOvaV8a_boxPSdofB%2BDTx1oFg%40mail.gmail.com > [2]: https://www.postgresql.org/message-id/CALDaNm18HuAcNsEC47J6qLRC7rMD2Q9_wT_hFtcc4UWqsfkgjA%40mail.gmail.com > > Thanks, > Nisha
On Fri, Sep 13, 2024 at 10:20 PM vignesh C <vignesh21@gmail.com> wrote:
Few comments:
1) Tab completion missing for:
a) ALTER SUBSCRIPTION name CONFLICT RESOLVER
b) ALTER SUBSCRIPTION name RESET CONFLICT RESOLVER ALL
c) ALTER SUBSCRIPTION name RESET CONFLICT RESOLVER FOR
Added.
2) Documentation missing for:
a) ALTER SUBSCRIPTION name RESET CONFLICT RESOLVER ALL
b) ALTER SUBSCRIPTION name RESET CONFLICT RESOLVER FOR
Added.
3) This reset is not required here, if valid was false it would have
thrown an error and exited:
a)
+ if (!valid)
+ ereport(ERROR,
+ errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("%s is not a valid conflict
type", conflict_type));
+
+ /* Reset */
+ valid = false;
b)
Similarly here too:
+ if (!valid)
+ ereport(ERROR,
+ errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("%s is not a valid conflict
resolver", conflict_resolver));
+
+ /* Reset */
+ valid = false;
Actually, the reset is for when valid becomes true. I think it it is required here.
4) How about adding CT_MAX inside the enum itself as the last enum value:
typedef enum
{
/* The row to be inserted violates unique constraint */
CT_INSERT_EXISTS,
/* The row to be updated was modified by a different origin */
CT_UPDATE_ORIGIN_DIFFERS,
/* The updated row value violates unique constraint */
CT_UPDATE_EXISTS,
/* The row to be updated is missing */
CT_UPDATE_MISSING,
/* The row to be deleted was modified by a different origin */
CT_DELETE_ORIGIN_DIFFERS,
/* The row to be deleted is missing */
CT_DELETE_MISSING,
/*
* Other conflicts, such as exclusion constraint violations, involve more
* complex rules than simple equality checks. These conflicts are left for
* future improvements.
*/
} ConflictType;
#define CONFLICT_NUM_TYPES (CT_DELETE_MISSING + 1)
/* Min and max conflict type */
#define CT_MIN CT_INSERT_EXISTS
#define CT_MAX CT_DELETE_MISSING
and the for loop can be changed to:
for (type = 0; type < CT_MAX; type++)
This way CT_MIN can be removed and CT_MAX need not be changed every
time a new enum is added.
Also the following +1 can be removed from the variables:
ConflictTypeResolver conflictResolvers[CT_MAX + 1];
I tried changing this, but the enums are used in swicth cases and this throws a compiler warning that CT_MAX is not checked in the switch case. However, I have changed the use of (CT_MAX +1) and instead used CONFLICT_NUM_TYPES in those places.
5) Similar thing can be done with ConflictResolver enum too. i.e
remove CR_MIN and add CR_MAX as the last element of enum
typedef enum ConflictResolver
{
/* Apply the remote change */
CR_APPLY_REMOTE = 1,
/* Keep the local change */
CR_KEEP_LOCAL,
/* Apply the remote change; skip if it can not be applied */
CR_APPLY_OR_SKIP,
/* Apply the remote change; emit error if it can not be applied */
CR_APPLY_OR_ERROR,
/* Skip applying the change */
CR_SKIP,
/* Error out */
CR_ERROR,
} ConflictResolver;
/* Min and max conflict resolver */
#define CR_MIN CR_APPLY_REMOTE
#define CR_MAX CR_ERROR
same as previous comment.
6) Except scansup.h inclusion, other inclusions added are not required
in subscriptioncmds.c file.
7)The inclusions "access/heaptoast.h", "access/table.h",
"access/tableam.h", "catalog/dependency.h",
"catalog/pg_subscription.h", "catalog/pg_subscription_conflict.h" and
"catalog/pg_inherits.h" are not required in conflict.c file.
Removed.
8) Can we change this to use the new foreach_ptr implementations added:
+ foreach(lc, stmtresolvers)
+ {
+ DefElem *defel = (DefElem *) lfirst(lc);
+ ConflictType type;
+ char *resolver;
to use foreach_ptr like:
foreach_ptr(DefElem, defel, stmtresolvers)
{
+ ConflictType type;
+ char *resolver;
....
}
Changed accordingly.
regards,
Ajin Cherian
Fujitsu Australia
On Fri, Sep 20, 2024 at 8:40 AM Nisha Moond <nisha.moond412@gmail.com> wrote: > > On Wed, Sep 18, 2024 at 10:46 AM vignesh C <vignesh21@gmail.com> wrote: > > > > On Thu, 12 Sept 2024 at 14:03, Ajin Cherian <itsajin@gmail.com> wrote: > > > > > > On Tue, Sep 3, 2024 at 7:42 PM vignesh C <vignesh21@gmail.com> wrote: > > > > > > On Fri, 30 Aug 2024 at 11:01, Nisha Moond <nisha.moond412@gmail.com> wrote: > > > > > > > > Here is the v11 patch-set. Changes are: > > > > > > 1) This command crashes: > > > ALTER SUBSCRIPTION name RESET CONFLICT RESOLVER FOR NULL; > > > #0 __strcmp_avx2 () at ../sysdeps/x86_64/multiarch/strcmp-avx2.S:116 > > > #1 0x000055c67270600a in ResetConflictResolver (subid=16404, > > > conflict_type=0x0) at conflict.c:744 > > > #2 0x000055c67247e0c3 in AlterSubscription (pstate=0x55c6748ff9d0, > > > stmt=0x55c67497dfe0, isTopLevel=true) at subscriptioncmds.c:1664 > > > > > > + | ALTER SUBSCRIPTION name RESET CONFLICT > > > RESOLVER FOR conflict_type > > > + { > > > + AlterSubscriptionStmt *n = > > > + makeNode(AlterSubscriptionStmt); > > > + > > > + n->kind = > > > ALTER_SUBSCRIPTION_RESET_CONFLICT_RESOLVER; > > > + n->subname = $3; > > > + n->conflict_type = $8; > > > + $$ = (Node *) n; > > > + } > > > + ; > > > +conflict_type: > > > + Sconst > > > { $$ = $1; } > > > + | NULL_P > > > { $$ = NULL; } > > > ; > > > > > > May be conflict_type should be changed to: > > > +conflict_type: > > > + Sconst > > > { $$ = $1; } > > > ; > > > > > > > > > Fixed. > > > > > > > Few comments: > > 1) This should be in (fout->remoteVersion >= 180000) check to support > > dumping backward compatible server objects, else dump with older > > version will fail: > > + /* Populate conflict type fields using the new query */ > > + confQuery = createPQExpBuffer(); > > + appendPQExpBuffer(confQuery, > > + "SELECT confrtype, > > confrres FROM pg_catalog.pg_subscription_conflict " > > + "WHERE confsubid = > > %u;", subinfo[i].dobj.catId.oid); > > + confRes = ExecuteSqlQuery(fout, confQuery->data, > > PGRES_TUPLES_OK); > > + > > + ntuples = PQntuples(confRes); > > + for (j = 0; j < ntuples; j++) > > > > 2) Can we check and throw an error before the warning is logged in > > this case as it seems strange to throw a warning first and then an > > error for the same track_commit_timestamp configuration: > > postgres=# create subscription sub1 connection ... publication pub1 > > conflict resolver (insert_exists = 'last_update_wins'); > > WARNING: conflict detection and resolution could be incomplete due to > > disabled track_commit_timestamp > > DETAIL: Conflicts update_origin_differs and delete_origin_differs > > cannot be detected, and the origin and commit timestamp for the local > > row will not be logged. > > ERROR: resolver last_update_wins requires "track_commit_timestamp" to > > be enabled > > HINT: Make sure the configuration parameter "track_commit_timestamp" is set. > > > > Thanks for the review. > Here is the v14 patch-set fixing review comments in [1] and [2]. Just noticed that there are failures for the new 034_conflict_resolver.pl test on CFbot. From the initial review it seems to be a test issue and not a bug. We will fix these along with the next version of patch-sets. Thanks, Nisha
On Fri, Sep 20, 2024 at 8:40 AM Nisha Moond <nisha.moond412@gmail.com> wrote: > > Thanks for the review. > Here is the v14 patch-set fixing review comments in [1] and [2]. > Thanks for the patches. I am reviewing patch001, it is WIP, but please find initial set of comments: 1) Please see these 2 errors: postgres=# create subscription sub2 connection '....' publication pub1 CONFLICT RESOLVER(insert_exists = 'error') WITH (two_phase=true, streaming=ON, streaming=OFF); ERROR: conflicting or redundant options LINE 1: ...ists='error') WITH (two_phase=true, streaming=ON, streaming=... ^ postgres=# create subscription sub2 connection '....' publication pub1 CONFLICT RESOLVER(insert_exists = 'error', insert_exists = 'error') WITH (two_phase=true); ERROR: duplicate conflict type "insert_exists" found When we give duplicate options in 'WITH', we get an error as 'conflicting or redundant options' with 'position' pointed out, while in case of CONFLICT RESOLVER, it is different. Can we review to see if we can have similar error in CONFLICT RESOLVER as that of WITH? Perhaps we need to call 'errorConflictingDefElem' from resolver flow. 2) +static void +parse_subscription_conflict_resolvers(List *stmtresolvers, + ConflictTypeResolver *resolvers) +{ + ListCell *lc; + List *SeenTypes = NIL; + + Remove redundant blank line 3) parse_subscription_conflict_resolvers(): + if (stmtresolvers) + conf_detection_check_prerequisites(); + +} Remove redundant blank line 4) parse_subscription_conflict_resolvers(): + resolver = defGetString(defel); + type = validate_conflict_type_and_resolver(defel->defname, + defGetString(defel)); Shall we use 'resolver' as arg to validate function instead of doing defGetStringagain? 5) parse_subscription_conflict_resolvers(): + /* Update the corresponding resolver for the given conflict type. */ + resolvers[type].resolver = downcase_truncate_identifier(resolver, strlen(resolver), false); Shouldn't we do this before validate_conflict_type_and_resolver() itself like we do it in GetAndValidateSubsConflictResolverList()? And do we need downcase_truncate_identifier on defel->defname as well before we do validate_conflict_type_and_resolver()? 6) GetAndValidateSubsConflictResolverList() and parse_subscription_conflict_resolvers() are similar but yet have so many differences which I pointed out above. Not a good idea to maintain 2 such functions. We should have a common parsing function for both Create and Alter Sub. Can you please review the possibility of that? ~~ conflict.c: 7) + + +/* + * Set default values for CONFLICT RESOLVERS for each conflict type + */ +void +SetDefaultResolvers(ConflictTypeResolver * conflictResolvers) Remove redundant blank line 8) * Set default values for CONFLICT RESOLVERS for each conflict type Is it better to change to: Set default resolver for each conflict type 9) validate_conflict_type_and_resolver(): Since it is called from other file as well, shall we rename to ValidateConflictTypeAndResolver() 10) + return type; + +} Remove redundant blank line after 'return' thanks Shveta
On Thu, Sep 26, 2024 at 2:57 PM shveta malik <shveta.malik@gmail.com> wrote: > > On Fri, Sep 20, 2024 at 8:40 AM Nisha Moond <nisha.moond412@gmail.com> wrote: > > > > Thanks for the review. > > Here is the v14 patch-set fixing review comments in [1] and [2]. > > > > Thanks for the patches. I am reviewing patch001, it is WIP, but please > find initial set of comments: > Please find next set of comments on patch001: 11) conflict.c #include "access/tableam.h" (existing) #include "replication/logicalproto.h" (added by patch002) Above 2 are not needed. The code compiles without these. I think the first one has become redundant due to inclusion of other header files which indirectly include this. 12) create_subscription.sgml: + apply_remote (enum) + This resolver applies the remote change. It can be used for insert_exists, update_exists, update_origin_differs and delete_origin_differs. It is the default resolver for insert_exists and update_exists. Wrong info, it is default for update_origin_differs and delete_origin_differs 13) alter_subscription.sgml: Synopsis: + ALTER SUBSCRIPTION name RESET CONFLICT RESOLVER FOR (conflict_type) we don't support parenthesis in the syntax. So please correct the doc. postgres=# ALTER SUBSCRIPTION sub1 RESET CONFLICT RESOLVER FOR ('insert_exists'); ERROR: syntax error at or near "(" 14) alter_subscription.sgml: + CONFLICT RESOLVER ( conflict_type [= conflict_resolver] [, ... ] ) + This clause alters either the default conflict resolvers or those set by CREATE SUBSCRIPTION. Refer to section CONFLICT RESOLVERS for the details on supported conflict_types and conflict_resolvers. + conflict_type + The conflict type being reset to its default resolver setting. For details on conflict types and their default resolvers, refer to section CONFLICT RESOLVERS a) These details seem problematic. Shouldn't we have RESET as heading similar to SKIP and then try explaining both ALL and conflict_type under that. Above seems we are trying to explain conflict_type of 'CONFLICT RESOLVER ( conflict_type [= conflict_resolver]' subcommand while giving details of RESET subcommand. b) OTOH, 'CONFLICT RESOLVER ( conflict_type [= conflict_resolver]' should have its own explanation of conflict_type and conflict_resolver parameters. 15) logical-replication.sgml: Existing: + Additional logging is triggered in various conflict scenarios, each identified as a conflict type, and the conflict statistics are collected (displayed in the pg_stat_subscription_stats view). Users have the option to configure a conflict_resolver for each conflict_type when creating a subscription. For more information on the supported conflict_types detected and conflict_resolvers, refer to section CONFLICT RESOLVERS. Suggestion: Additional logging is triggered for various conflict scenarios, each categorized by a specific conflict type, with conflict statistics being gathered and displayed in the pg_stat_subscription_stats view. Users can configure a conflict_resolver for each conflict_type when creating a subscription. For more details on the supported conflict types and corresponding conflict resolvers, refer to the section on <CONFLICT RESOLVERS>. thanks Shveta
On Fri, Sep 27, 2024 at 10:44 AM shveta malik <shveta.malik@gmail.com> wrote: > > > > > > > Thanks for the review. > > > Here is the v14 patch-set fixing review comments in [1] and [2]. > > > > > > > Thanks for the patches. I am reviewing patch001, it is WIP, but please > > find initial set of comments: > > > Please find the next set of comments. 16) In pg_dump.h, there is a lot of duplication of structures from conflict.h, we can avoid that by making below changes: --In SubscriptionInfo(), we can have a list of ConflictTypeResolver structure and fill the elements of the list in getSubscriptions() simply by output of pg_subscription_conflict. --Then in dumpSubscription() we can traverse the list to verify if the resolver is the default one, if so, skip the dump. We can create a new function to return whether the resolver is default or not. --We can get rid of enum ConflictType, enum ConflictResolver, ConflictResolverNames, ConflictTypeDefaultResolvers from pg_dump.h 17) In describe.c, we can have an 'order by' in the query so that order is not changed everytime we update a resolver. Please see this: For sub1, \dRs was showing below as output for Conflict Resolvers: insert_exists = error, update_origin_differs = apply_remote, update_exists = error, update_missing = skip, delete_origin_differs = apply_remote, delete_missing = skip Once I update resolver, the order gets changed: postgres=# ALTER SUBSCRIPTION sub1 CONFLICT RESOLVER (insert_exists='apply_remote'); ALTER SUBSCRIPTION \dRs: update_origin_differs = apply_remote, update_exists = error, update_missing = skip, delete_origin_differs = apply_remote, delete_missing = skip, insert_exists = apply_remote 18) Similarly after making change 16, for pg_dump too, it will be good if we maintain the order and thus can have order-by in pg_dump's query as well. thanks Shveta
On Fri, Sep 27, 2024 at 1:00 PM Peter Smith <smithpb2250@gmail.com> wrote: > > Here are some review comments for v14-0001. > > This is a WIP, but here are my comments for all the SGML parts. > > (There will be some overlap here with comments already posted by Shveta) > > ====== > 1. file modes after applying the patch > > mode change 100644 => 100755 doc/src/sgml/ref/alter_subscription.sgml > mode change 100644 => 100755 doc/src/sgml/ref/create_subscription.sgml > > What's going on here? Why are those SGMLs changed to executable? > > ====== > Commit message > > 2. > nit - a missing period in the first sentence > nit - typo /reseting/resetting/ > > ====== > doc/src/sgml/logical-replication.sgml > > 3. > - <title>Conflicts</title> > + <title>Conflicts and conflict resolution</title> > > nit - change the capitalisation to "and Conflict Resolution" to match > other titles. > > ~~~ > > 4. > + Additional logging is triggered in various conflict scenarios, > each identified as a > + conflict type, and the conflict statistics are collected (displayed in the > + <link linkend="monitoring-pg-stat-subscription-stats"><structname>pg_stat_subscription_stats</structname></link> > view). > + Users have the option to configure a > <literal>conflict_resolver</literal> for each > + <literal>conflict_type</literal> when creating a subscription. > + For more information on the supported > <literal>conflict_types</literal> detected and > + <literal>conflict_resolvers</literal>, refer to section > + <link linkend="sql-createsubscription-params-with-conflict-resolver"><literal>CONFLICT > RESOLVERS</literal></link>. > + > > nit - "Additional logging is triggered" sounds strange. I reworded > this in the nits attachment. Please see if you approve. > nit - The "conflict_type" and "conflict_resolver" you are referring to > here are syntax elements of the CREATE SUBSCRIPTION, so here I think > they should just be called (without the underscores) "conflict type" > and "conflict resolver". > nit - IMO this would be better split into multiple paragraphs. > nit - There is no such section called "CONFLICT RESOLVERS". I reworded > this link text. > > ====== > doc/src/sgml/monitoring.sgml > > 5. > The changes here all render with the link including the type "(enum)" > displayed, which I thought it unnecessary/strange. > > For example: > See insert_exists (enum) for details about this conflict. > > IIUC there is no problem here, but maybe the other end of the link > needed to define xreflabels. I have made the necessary modifications > in the create_subscription.sgml. > > ====== > doc/src/sgml/ref/alter_subscription.sgml > > 6. > +ALTER SUBSCRIPTION <replaceable class="parameter">name</replaceable> > CONFLICT RESOLVER ( <replaceable > class="parameter">conflict_type</replaceable> [= <replaceable > class="parameter">conflict_resolver</replaceable>] [, ...] ) > > This syntax seems wrong to me. > > Currently, it says: > ALTER SUBSCRIPTION name CONFLICT RESOLVER ( conflict_type [= > conflict_resolver] [, ...] ) > > But, shouldn't that say: > ALTER SUBSCRIPTION name CONFLICT RESOLVER ( conflict_type = > conflict_resolver [, ...] ) > > ~~~ > 7. > +ALTER SUBSCRIPTION <replaceable class="parameter">name</replaceable> > RESET CONFLICT RESOLVER FOR (<replaceable > class="parameter">conflict_type</replaceable>) > > I can see that this matches the implementation, but I was wondering > why don't you permit resetting multiple conflict_types at the same > time. e.g. what if I want to reset some but not ALL? > > ~~~ > > nit - there are some minor whitespace indent problems in the SGML > > ~~~ > > 8. > + <varlistentry id="sql-altersubscription-params-conflict-resolver"> > + <term><literal>CONFLICT RESOLVER ( <replaceable > class="parameter">conflict_type</replaceable> [= <replaceable > class="parameter">conflict_resolver</replaceable>] [, ... ] > )</literal></term> > + <listitem> > + <para> > + This clause alters either the default conflict resolvers or > those set by <xref linkend="sql-createsubscription"/>. > + Refer to section <link > linkend="sql-createsubscription-params-with-conflict-resolver"><literal>CONFLICT > RESOLVERS</literal></link> > + for the details on supported <literal>conflict_types</literal> > and <literal>conflict_resolvers</literal>. > + </para> > + </listitem> > + </varlistentry> > + > + <varlistentry id="sql-altersubscription-params-conflict-type"> > + <term><replaceable class="parameter">conflict_type</replaceable></term> > + <listitem> > + <para> > + The conflict type being reset to its default resolver setting. > + For details on conflict types and their default resolvers, refer > to section <link > linkend="sql-createsubscription-params-with-conflict-resolver"><literal>CONFLICT > RESOLVERS</literal></link> > + </para> > + </listitem> > + </varlistentry> > + </variablelist> > > This section seems problematic: > e.g the syntax seems wrong same as before. > > ~ > There are other nits. > (I've given a rough fix in the nits attachment. Please see it and make > it better). > > nit - why do you care if it is "either the default conflict resolvers > or those set...". Why not just say "current resolver" > nit - it does not mention 'conflict_resolver' type in the normal way > nit - there is no actual section called "CONFLICT RESOLVERS" > nit - the part that says "The conflict type being reset to its default > resolver setting." is bogus for this form of the ALTER statement. > > ~~~ > > 9. > There is no description for the "RESET CONFLICT RESOLVER ALL" > > ~~~ > > 10. > There is no description for the "RESET CONFLICT RESOLVER FOR (conflict_type)" > > ====== > doc/src/sgml/ref/create_subscription.sgml > > 11. General - Order > > + <varlistentry id="sql-createsubscription-params-with-conflict-resolver"> > + <term><literal>CONFLICT RESOLVER ( <replaceable > class="parameter">conflict_type</replaceable> = <replaceable > > nit - IMO this entire new entry about "CONFLICT RESOLVER" should > appear on the page *above* the "WITH" section, because that is the > order that it is defined in the CREATE SUBSCRIPTION syntax. > > ~~~ > > 12. General - whitespace > > nit - Much of this new section seems to have a slightly wrong > indentation in the SGML. Mostly it is out by 1 or 2 spaces. > > ~~~ > > 13. General - ordering of conflict_type. > > nit - Instead of just some apparent random order, let's put each > insert/update/delete conflict type in alphabetical order, so at least > users can find them where they would expect to find them. This ordering was decided while implementing the 'conflict-detection and logging' patch and thus perhaps should be maintained as same. The ordering is insert, update and delete (different variants of these). Please see a comment on it in [1] (comment #2). [1]:https://www.postgresql.org/message-id/TYAPR01MB569224262F44875973FAF344F5B22%40TYAPR01MB5692.jpnprd01.prod.outlook.com > ~~~ > > 14. > 99. General - ordering of conflict_resolver > > nit - ditto. Let's name these in alphabetical order. IMO it makes more > sense than the current random ordering. > I feel ordering of resolvers should be same as that of conflict types, i.e. resolvers of insert variants first, then update variants, then delete variants. But would like to know what others think on this. > ~~~ > > 15. > + <para> > + This optional clause specifies options for conflict resolvers > for different conflict_types. > + </para> > > nit - IMO we don't need the words "options for" here. > > ~~~ > > 16. > + <para> > + The <replaceable class="parameter">conflict_type</replaceable> > and their default behaviour are listed below. > > nit - sounded strange to me. reworded it slightly. > > ~~~ > > 17. > + <varlistentry > id="sql-createsubscription-params-with-conflict_type-insert-exists"> > > nit - Here, and for all other conflict types, add "xreflabel". See my > review comment #5 for the reason why. > > ~~~ > > 18. > + <para> > + The <replaceable > class="parameter">conflict_resolver</replaceable> and their behaviour > + are listed below. Users can use any of the following resolvers > for automatic conflict > + resolution. > + <variablelist> > > nit - reworded this too, to be like the previous review comment. > > ~~~ > > 19. General - readability. > > 19a. > IMO the information about what are the default resolvers for each > conflict type, and what resolvers are allowed for each conflict type > should ideally be documented in a tabular form. > > Maybe all information is already present in the current document, but > it is certainly hard to easily see it. > > As an example, I have added a table in this section. Maybe it is the > best placement for this table, but I gave it mostly how you can > present the same information so it is easier to read. > > ~ > 19b. > Bug. In doing this exercise I discovered there are 2 resolvers > ("error" and "apply_remote") that both claim to be defaults for the > same conflict types. > > They both say: > > + It is the default resolver for <literal>insert_exists</literal> and > + <literal>update_exists</literal>. > > Anyway, this demonstrates that the current information was hard to read. > > I can tell from the code implementation what the document was supposed > to say, but I will leave it to the patch authors to fix this one. > (e.g. "apply_remote" says the wrong defaults) > > ====== > Kind Regards, > Peter Smith. > Fujitsu Australia
On Mon, Sep 30, 2024 at 2:27 PM shveta malik <shveta.malik@gmail.com> wrote: > > On Fri, Sep 27, 2024 at 1:00 PM Peter Smith <smithpb2250@gmail.com> wrote: ... > > > > 13. General - ordering of conflict_type. > > > > nit - Instead of just some apparent random order, let's put each > > insert/update/delete conflict type in alphabetical order, so at least > > users can find them where they would expect to find them. > > This ordering was decided while implementing the 'conflict-detection > and logging' patch and thus perhaps should be maintained as same. The > ordering is insert, update and delete (different variants of these). > Please see a comment on it in [1] (comment #2). > > [1]:https://www.postgresql.org/message-id/TYAPR01MB569224262F44875973FAF344F5B22%40TYAPR01MB5692.jpnprd01.prod.outlook.com > +1 for order insert/update/delete. My issue was only about the order *within* each of those variants. e.g. I think it should be alphabetical: CURRENT insert_exists update_origin_differs update_exists update_missing delete_origin_differs delete_missing SUGGESTED insert_exists update_exists update_missing update_origin_differs delete_missing delete_origin_differs > > > ~~~ > > > > 14. > > 99. General - ordering of conflict_resolver > > > > nit - ditto. Let's name these in alphabetical order. IMO it makes more > > sense than the current random ordering. > > > > I feel ordering of resolvers should be same as that of conflict > types, i.e. resolvers of insert variants first, then update variants, > then delete variants. But would like to know what others think on > this. > Resolvers in v14 were documented in this random order: error skip apply_remote keep_local apply_or_skip apply_or_error Some of these are resolvers for different conflicts. How can you order these as "resolvers for insert" followed by "resolvers for update" followed by "resolvers for delete" without it all still appearing in random order? ====== Kind Regards, Peter Smith. Fujitsu Australia
On Mon, Sep 30, 2024 at 11:04 AM Peter Smith <smithpb2250@gmail.com> wrote: > > On Mon, Sep 30, 2024 at 2:27 PM shveta malik <shveta.malik@gmail.com> wrote: > > > > On Fri, Sep 27, 2024 at 1:00 PM Peter Smith <smithpb2250@gmail.com> wrote: > ... > > > > > > 13. General - ordering of conflict_type. > > > > > > nit - Instead of just some apparent random order, let's put each > > > insert/update/delete conflict type in alphabetical order, so at least > > > users can find them where they would expect to find them. > > > > This ordering was decided while implementing the 'conflict-detection > > and logging' patch and thus perhaps should be maintained as same. The > > ordering is insert, update and delete (different variants of these). > > Please see a comment on it in [1] (comment #2). > > > > [1]:https://www.postgresql.org/message-id/TYAPR01MB569224262F44875973FAF344F5B22%40TYAPR01MB5692.jpnprd01.prod.outlook.com > > > > +1 for order insert/update/delete. > > My issue was only about the order *within* each of those variants. > e.g. I think it should be alphabetical: > > CURRENT > insert_exists > update_origin_differs > update_exists > update_missing > delete_origin_differs > delete_missing > > SUGGESTED > insert_exists > update_exists > update_missing > update_origin_differs > delete_missing > delete_origin_differs > Okay, got it now. I have no strong opinion here. I am okay with both. But since it was originally added by other thread, so it will be good to know the respective author's opinion as well. > > > > > ~~~ > > > > > > 14. > > > 99. General - ordering of conflict_resolver > > > > > > nit - ditto. Let's name these in alphabetical order. IMO it makes more > > > sense than the current random ordering. > > > > > > > I feel ordering of resolvers should be same as that of conflict > > types, i.e. resolvers of insert variants first, then update variants, > > then delete variants. But would like to know what others think on > > this. > > > > Resolvers in v14 were documented in this random order: > error > skip > apply_remote > keep_local > apply_or_skip > apply_or_error > Yes, these should be changed. > Some of these are resolvers for different conflicts. How can you order > these as "resolvers for insert" followed by "resolvers for update" > followed by "resolvers for delete" without it all still appearing in > random order? I was thinking of ordering them like this: apply_remote: applicable to insert_exists, update_exists, update_origin_differ, delete_origin_differ keep_local: applicable to insert_exists, update_exists, update_origin_differ, delete_origin_differ apply_or_skip: applicable to update_missing apply_or_error : applicable to update_missing skip: applicable to update_missing and delete_missing error: applicable to all. i.e. in order of how they are applicable to conflict_types starting from insert_exists till delete_origin_differ (i.e. reading ConflictTypeResolverMap, from left to right and then top to bottom). Except I have kept 'error' at the end instead of keeping it after 'keep_local' as the former makes more sense there. thanks Shveta
On Fri, Sep 27, 2024 at 2:33 PM shveta malik <shveta.malik@gmail.com> wrote: > > On Fri, Sep 27, 2024 at 10:44 AM shveta malik <shveta.malik@gmail.com> wrote: > > > > > > > > > > Thanks for the review. > > > > Here is the v14 patch-set fixing review comments in [1] and [2]. > > > > > > > > > > Thanks for the patches. I am reviewing patch001, it is WIP, but please > > > find initial set of comments: > > > > > Please find next set of comments: 1) parse_subscription_conflict_resolvers() Shall we free 'SeenTypes' list at the end? 2) General logic comment: I think SetSubConflictResolver should also accept a list similar to UpdateSubConflictResolvers() instead of array. Then we can even try merging these 2 functions later (once we do this change, it will be more clear). For SetSubConflictResolver to accept a list, SetDefaultResolvers should give a list as output instead of an array currently. 3) Existing logic: case ALTER_SUBSCRIPTION_RESET_ALL_CONFLICT_RESOLVERS: { ConflictTypeResolver conflictResolvers[CONFLICT_NUM_TYPES]; /* Remove the existing conflict resolvers. */ RemoveSubscriptionConflictResolvers(subid); /* * Create list of conflict resolvers and set them in the * catalog. */ SetDefaultResolvers(conflictResolvers); SetSubConflictResolver(subid, conflictResolvers, CONFLICT_NUM_TYPES); } Suggestion: If we fix comment #2 and make SetSubConflictResolver and SetDefaultResolvers to deal with list, then here we can get rid of RemoveSubscriptionConflictResolvers(), we can simply make a default list using SetDefaultResolvers and call UpdateSubConflictResolvers(). No need for 2 separate calls for delete and insert/set. 4) Shall ResetConflictResolver() function also call UpdateSubConflictResolvers internally? It will get rid of a lot code duplication. ResetConflictResolver()'s new approach could be: a) validate conflict type and get enum value. To do this job, make a sub-function validate_conflict_type() which will be called both from here and from validate_conflict_type_and_resolver(). b) get default resolver for given conflict-type enum and then get resolver string for that to help step c. c) create a list of single element of ConflictTypeResolver and call UpdateSubConflictResolvers. 5) typedefs.list ConflictResolver is missed? 6) subscriptioncmds.c /* Get the list of conflict types and resolvers and validate them. */ conflict_resolvers = GetAndValidateSubsConflictResolverList(stmt->resolvers); No full stop needed in one line comment. But since it is >80 chars, it is good to split it to multiple lines and then full stop can be retained. 7) Shall we move the call to conf_detection_check_prerequisites() to GetAndValidateSubsConflictResolverList() similar to how we do it for parse_subscription_conflict_resolvers()? (I still prefer that GetAndValidateSubsConflictResolverList and parse_subscription_conflict_resolvers should be merged in the first place. Array to list conversion as suggested in comment #2 will make these two functions more similar, and then we can review to merge them.) 8) Shall parse_subscription_conflict_resolvers() be moved to conflict.c as well? Or since it is subscription options' parsing, is it more suited in the current file? Thoughts? 9) Existing: /* * Parsing function for conflict resolvers in CREATE SUBSCRIPTION command. * This function will report an error if mutually exclusive or duplicate * options are specified. */ Suggestion: /* * Parsing function for conflict resolvers in CREATE SUBSCRIPTION command. * * In addition to parsing and validating the resolvers' configuration, this function * also reports an error if mutually exclusive options are specified. */ 10) Test comments (subscription.sql): ------ a) -- fail - invalid conflict resolvers CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub CONFLICT RESOLVER (insert_exists = foo) WITH (connect = false); -- fail - invalid conflict types CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub CONFLICT RESOLVER (foo = 'keep_local') WITH (connect = false); We should swap the order of these 2 tests. Make it similar to ALTER tests. b) -- fail - invalid conflict resolvers resolvers-->resolver -- fail - invalid conflict types types-->type -- fail - duplicate conflict types types->type c) -- creating subscription should create default conflict resolvers Suggestion: -- creating subscription with no explicit conflict resolvers should configure default conflict resolvers d) -- ok - valid conflict type and resolvers type-->types e) -- fail - altering with duplicate conflict types types --> type ------ thanks Shveta
On Mon, Sep 30, 2024 at 4:29 PM shveta malik <shveta.malik@gmail.com> wrote: > > On Mon, Sep 30, 2024 at 11:04 AM Peter Smith <smithpb2250@gmail.com> wrote: > > > > On Mon, Sep 30, 2024 at 2:27 PM shveta malik <shveta.malik@gmail.com> wrote: > > > > > > On Fri, Sep 27, 2024 at 1:00 PM Peter Smith <smithpb2250@gmail.com> wrote: > > > > > > > ~~~ > > > > > > > > 14. > > > > 99. General - ordering of conflict_resolver > > > > > > > > nit - ditto. Let's name these in alphabetical order. IMO it makes more > > > > sense than the current random ordering. > > > > > > > > > > I feel ordering of resolvers should be same as that of conflict > > > types, i.e. resolvers of insert variants first, then update variants, > > > then delete variants. But would like to know what others think on > > > this. > > > > > > > Resolvers in v14 were documented in this random order: > > error > > skip > > apply_remote > > keep_local > > apply_or_skip > > apply_or_error > > > > Yes, these should be changed. > > > Some of these are resolvers for different conflicts. How can you order > > these as "resolvers for insert" followed by "resolvers for update" > > followed by "resolvers for delete" without it all still appearing in > > random order? > > I was thinking of ordering them like this: > > apply_remote: applicable to insert_exists, update_exists, > update_origin_differ, delete_origin_differ > keep_local: applicable to insert_exists, > update_exists, update_origin_differ, delete_origin_differ > apply_or_skip: applicable to update_missing > apply_or_error : applicable to update_missing > skip: applicable to update_missing and > delete_missing > error: applicable to all. > > i.e. in order of how they are applicable to conflict_types starting > from insert_exists till delete_origin_differ (i.e. reading > ConflictTypeResolverMap, from left to right and then top to bottom). > Except I have kept 'error' at the end instead of keeping it after > 'keep_local' as the former makes more sense there. > This proves my point because, without your complicated explanation to accompany it, the final order (below) just looks random to me: apply_remote keep_local apply_or_skip apply_or_error skip error Unless there is some compelling reason to do it differently, I still prefer A-Z (the KISS principle). ====== Kind Regards, Peter Smith. Fujitsu Australia
On Mon, Sep 30, 2024 at 2:55 PM Peter Smith <smithpb2250@gmail.com> wrote: > > On Mon, Sep 30, 2024 at 4:29 PM shveta malik <shveta.malik@gmail.com> wrote: > > > > On Mon, Sep 30, 2024 at 11:04 AM Peter Smith <smithpb2250@gmail.com> wrote: > > > > > > On Mon, Sep 30, 2024 at 2:27 PM shveta malik <shveta.malik@gmail.com> wrote: > > > > > > > > On Fri, Sep 27, 2024 at 1:00 PM Peter Smith <smithpb2250@gmail.com> wrote: > > > > > > > > > ~~~ > > > > > > > > > > 14. > > > > > 99. General - ordering of conflict_resolver > > > > > > > > > > nit - ditto. Let's name these in alphabetical order. IMO it makes more > > > > > sense than the current random ordering. > > > > > > > > > > > > > I feel ordering of resolvers should be same as that of conflict > > > > types, i.e. resolvers of insert variants first, then update variants, > > > > then delete variants. But would like to know what others think on > > > > this. > > > > > > > > > > Resolvers in v14 were documented in this random order: > > > error > > > skip > > > apply_remote > > > keep_local > > > apply_or_skip > > > apply_or_error > > > > > > > Yes, these should be changed. > > > > > Some of these are resolvers for different conflicts. How can you order > > > these as "resolvers for insert" followed by "resolvers for update" > > > followed by "resolvers for delete" without it all still appearing in > > > random order? > > > > I was thinking of ordering them like this: > > > > apply_remote: applicable to insert_exists, update_exists, > > update_origin_differ, delete_origin_differ > > keep_local: applicable to insert_exists, > > update_exists, update_origin_differ, delete_origin_differ > > apply_or_skip: applicable to update_missing > > apply_or_error : applicable to update_missing > > skip: applicable to update_missing and > > delete_missing > > error: applicable to all. > > > > i.e. in order of how they are applicable to conflict_types starting > > from insert_exists till delete_origin_differ (i.e. reading > > ConflictTypeResolverMap, from left to right and then top to bottom). > > Except I have kept 'error' at the end instead of keeping it after > > 'keep_local' as the former makes more sense there. > > > > This proves my point because, without your complicated explanation to > accompany it, the final order (below) just looks random to me: > apply_remote > keep_local > apply_or_skip > apply_or_error > skip > error > > Unless there is some compelling reason to do it differently, I still > prefer A-Z (the KISS principle). > The "applicable to conflict_types" against each resolver (which will be mentioned in doc too) is a pretty good reason in itself to keep the resolvers in the suggested order. To me, it seems more logical than placing 'apply_or_error' which only applies to the 'update_missing' conflict_type at the top, while 'error,' which applies to all conflict_types, placed in the middle. But I understand that preferences may vary, so I'll leave this to the discretion of others. thanks Shveta
On Tue, Oct 1, 2024 at 9:48 AM shveta malik <shveta.malik@gmail.com> wrote: > I have started reviewing patch002, it is WIP, but please find initial set of comments: 1. ExecSimpleRelationInsert(): + /* Check for conflict and return to caller for resolution if found */ + if (resolver != CR_ERROR && + has_conflicting_tuple(estate, resultRelInfo, &(*conflictslot), + CT_INSERT_EXISTS, resolver, slot, subid, + apply_remote)) Why are we calling has_conflicting_tuple only if the resolver is not 'ERROR '? Is it for optimization to avoid pre-scan for ERROR cases? If so, please add a comment. 2) has_conflicting_tuple(): + /* + * Return if any conflict is found other than one with 'ERROR' + * resolver configured. In case of 'ERROR' resolver, emit error here; + * otherwise return to caller for resolutions. + */ + if (FindConflictTuple(resultRelInfo, estate, uniqueidx, slot, + &(*conflictslot))) has_conflicting_tuple() is called only from ExecSimpleRelationInsert() when the resolver of 'insert_exists' is not 'ERROR', then why do we have the above comment in has_conflicting_tuple()? 3) Since has_conflicting_tuple() is only called for insert_exists conflict, better to name it as 'has_insert_conflicting_tuple' or 'find_insert_conflicting_tuple'. My preference is the second one, similar to FindConflictTuple(). 4) We can have an ASSERT in has_conflicting_tuple() that conflict_type is only insert_exists. 5) has_conflicting_tuple(): + } + return false; +} we can have a blank line before returning. 6) Existing has_conflicting_tuple header comment: +/* + * Check all the unique indexes for conflicts and return true if found. + * If the configured resolver is in favour of apply, give the conflicted + * tuple information in conflictslot. + */ Suggestion: /* * Check the unique indexes for conflicts. Return true on finding the first conflict itself. * * If the configured resolver is in favour of apply, give the conflicted * tuple information in conflictslot. */ <A change in first line and then a blank line.> 7) Can we please rearrange 'has_conflicting_tuple' arguments. First non-pointers, then single pointers and then double pointers. Oid subid, ConflictType type, ConflictResolver resolver, bool apply_remote, ResultRelInfo *resultRelInfo, EState *estate, TupleTableSlot *slot, TupleTableSlot **conflictslot 8) Now since we are doing a pre-scan of indexes before the actual table-insert, this existing comment needs some change. Also we need to mention why we are scanning again when we have done pre-scan already. /* * Checks the conflict indexes to fetch the conflicting local tuple * and reports the conflict. We perform this check here, instead of * performing an additional index scan before the actual insertion and * reporting the conflict if any conflicting tuples are found. This is * to avoid the overhead of executing the extra scan for each INSERT * operation, .... */ thanks Shveta
On Tue, Oct 1, 2024 at 9:54 AM shveta malik <shveta.malik@gmail.com> wrote: > > On Tue, Oct 1, 2024 at 9:48 AM shveta malik <shveta.malik@gmail.com> wrote: > > > > I have started reviewing patch002, it is WIP, but please find initial > set of comments: > Please find second set of comments for patch002: 9) can_create_full_tuple(): + for (i = 0; i < newtup->ncols; i++) + { + if (newtup->colstatus[i] == LOGICALREP_COLUMN_UNCHANGED) + return false; + } Why are we comparing it with 'LOGICALREP_COLUMN_UNCHANGED'? I assume toast-values come as LOGICALREP_COLUMN_UNCHANGED. In any case, please add comments. 10) There are some alignment changes in GetAndValidateSubsConflictResolverList() and the next few functions in the same file which belongs to patch-001. Please move these changes to patch001. 11) + * Find the resolver of the conflict type set under the given subscription. Suggestion: Find the resolver for the given conflict type and subscription 12) + #include "replication/logicalproto.h" The code compiles even without the above new inclusion. 13) ReportApplyConflict: + errmsg("conflict detected on relation \"%s.%s\": conflict=%s, Resolution=%s.", We can have 'resolution' instead of 'Resolution', similar to lower-case 'conflict' 14) errdetail_apply_conflict: CT_UPDATE_MISSING logs should be improved. As an example: LOG: conflict detected on relation "public.t1": conflict=update_missing, Resolution=apply_or_skip. DETAIL: Could not find the row to be updated, Convert UPDATE to INSERT and applying the remote changes. Suggestion: Could not find the row to be updated, thus converting the UPDATE to an INSERT and applying the remote changes. Similarly for other lines: Could not find the row to be updated, and the UPDATE cannot be converted to an INSERT, thus skipping the remote changes. Could not find the row to be updated, and the UPDATE cannot be converted to an INSERT, thus raising the error. 15) errdetail_apply_conflict: Can we pull out the sentence 'Could not find the row to be updated', as it is common for all the cases of 'CT_UPDATE_MISSING' and then append the rest of the string to it case-wise? 16) +ConflictResolver +GetConflictResolver(Relation localrel, ConflictType type, bool *apply_remote, + LogicalRepTupleData *newtup, Oid subid) Can we please change the order of args to: Oid subid, ConflictType type, Relation localrel, LogicalRepTupleData *newtup, bool *apply_remote Since we are getting resolvers for 'subid' and 'type', I have kept those as initial args and OUT argument as last one. 17) apply_handle_insert_internal: + /* + * If a conflict is detected and resolver is in favor of applying the + * remote changes, update the conflicting tuple by converting the remote + * INSERT to an UPDATE. + */ + if (conflictslot) The comment conveys 2 conditions while the code checks only one condition, thus it is slightly misleading. Perhaps change comment to: /* * If a conflict is detected, update the conflicting tuple by converting the remote * INSERT to an UPDATE. Note that conflictslot will have the conflicting tuple only if * the resolver is in favor of applying the changes, otherwise it will be NULL. */ <Rephrase if needed> 18) apply_handle_update_internal(): * Report the conflict and configured resolver if the tuple was Remove extra space after conflict. thanks Shveta
On Fri, Sep 27, 2024 at 1:00 PM Peter Smith <smithpb2250@gmail.com> wrote: > > Here are some review comments for v14-0001. > ~~~ > 7. > +ALTER SUBSCRIPTION <replaceable class="parameter">name</replaceable> > RESET CONFLICT RESOLVER FOR (<replaceable > class="parameter">conflict_type</replaceable>) > > I can see that this matches the implementation, but I was wondering > why don't you permit resetting multiple conflict_types at the same > time. e.g. what if I want to reset some but not ALL? > Thank you for your input. The RESET command was not part of the initial design, and our current implementation for resetting ALL or a specific 'conflict_type' effectively serves its purpose. Allowing the option to reset two or more conflict types in one command may complicate the implementation. However, if others also feel the same, we can implement it. Let's wait for others' feedback. -- Thanks, Nisha
On Mon, Sep 30, 2024 at 11:59 AM shveta malik <shveta.malik@gmail.com> wrote: > > On Mon, Sep 30, 2024 at 11:04 AM Peter Smith <smithpb2250@gmail.com> wrote: > > > > On Mon, Sep 30, 2024 at 2:27 PM shveta malik <shveta.malik@gmail.com> wrote: > > > > > > On Fri, Sep 27, 2024 at 1:00 PM Peter Smith <smithpb2250@gmail.com> wrote: > > ... > > > > > > > > 13. General - ordering of conflict_type. > > > > > > > > nit - Instead of just some apparent random order, let's put each > > > > insert/update/delete conflict type in alphabetical order, so at least > > > > users can find them where they would expect to find them. > > > > > > This ordering was decided while implementing the 'conflict-detection > > > and logging' patch and thus perhaps should be maintained as same. The > > > ordering is insert, update and delete (different variants of these). > > > Please see a comment on it in [1] (comment #2). > > > > > > [1]:https://www.postgresql.org/message-id/TYAPR01MB569224262F44875973FAF344F5B22%40TYAPR01MB5692.jpnprd01.prod.outlook.com > > > > > > > +1 for order insert/update/delete. > > > > My issue was only about the order *within* each of those variants. > > e.g. I think it should be alphabetical: > > > > CURRENT > > insert_exists > > update_origin_differs > > update_exists > > update_missing > > delete_origin_differs > > delete_missing > > > > SUGGESTED > > insert_exists > > update_exists > > update_missing > > update_origin_differs > > delete_missing > > delete_origin_differs > > > > Okay, got it now. I have no strong opinion here. I am okay with both. > But since it was originally added by other thread, so it will be good > to know the respective author's opinion as well. > v15 has the above "SUGGESTED" order of conflict_type. We can update it if the original thread's author or others have different preferences. Thanks, Nisha
On Tue, Oct 1, 2024 at 9:48 AM shveta malik <shveta.malik@gmail.com> wrote: > > On Mon, Sep 30, 2024 at 2:55 PM Peter Smith <smithpb2250@gmail.com> wrote: > > > > On Mon, Sep 30, 2024 at 4:29 PM shveta malik <shveta.malik@gmail.com> wrote: > > > > > > On Mon, Sep 30, 2024 at 11:04 AM Peter Smith <smithpb2250@gmail.com> wrote: > > > > > > > > On Mon, Sep 30, 2024 at 2:27 PM shveta malik <shveta.malik@gmail.com> wrote: > > > > > > > > > > On Fri, Sep 27, 2024 at 1:00 PM Peter Smith <smithpb2250@gmail.com> wrote: > > > > > > > > > > > ~~~ > > > > > > > > > > > > 14. > > > > > > 99. General - ordering of conflict_resolver > > > > > > > > > > > > nit - ditto. Let's name these in alphabetical order. IMO it makes more > > > > > > sense than the current random ordering. > > > > > > > > > > > > > > > > I feel ordering of resolvers should be same as that of conflict > > > > > types, i.e. resolvers of insert variants first, then update variants, > > > > > then delete variants. But would like to know what others think on > > > > > this. > > > > > > > > > > > > > Resolvers in v14 were documented in this random order: > > > > error > > > > skip > > > > apply_remote > > > > keep_local > > > > apply_or_skip > > > > apply_or_error > > > > > > > > > > Yes, these should be changed. > > > > > > > Some of these are resolvers for different conflicts. How can you order > > > > these as "resolvers for insert" followed by "resolvers for update" > > > > followed by "resolvers for delete" without it all still appearing in > > > > random order? > > > > > > I was thinking of ordering them like this: > > > > > > apply_remote: applicable to insert_exists, update_exists, > > > update_origin_differ, delete_origin_differ > > > keep_local: applicable to insert_exists, > > > update_exists, update_origin_differ, delete_origin_differ > > > apply_or_skip: applicable to update_missing > > > apply_or_error : applicable to update_missing > > > skip: applicable to update_missing and > > > delete_missing > > > error: applicable to all. > > > > > > i.e. in order of how they are applicable to conflict_types starting > > > from insert_exists till delete_origin_differ (i.e. reading > > > ConflictTypeResolverMap, from left to right and then top to bottom). > > > Except I have kept 'error' at the end instead of keeping it after > > > 'keep_local' as the former makes more sense there. > > > > > > > This proves my point because, without your complicated explanation to > > accompany it, the final order (below) just looks random to me: > > apply_remote > > keep_local > > apply_or_skip > > apply_or_error > > skip > > error > > > > Unless there is some compelling reason to do it differently, I still > > prefer A-Z (the KISS principle). > > > > The "applicable to conflict_types" against each resolver (which will > be mentioned in doc too) is a pretty good reason in itself to keep the > resolvers in the suggested order. To me, it seems more logical than > placing 'apply_or_error' which only applies to the 'update_missing' > conflict_type at the top, while 'error,' which applies to all > conflict_types, placed in the middle. But I understand that > preferences may vary, so I'll leave this to the discretion of others. > In v15, I maintained the original order of conflict_resolver, which to me seems reasonable from a user perspective: error skip apply_remote keep_local apply_or_error apply_or_skip I will hold this order until we receive feedback from others, and we can finalize the new order if necessary. Thanks, Nisha
On Tue, Oct 8, 2024 at 3:12 PM Nisha Moond <nisha.moond412@gmail.com> wrote: > I have not started reviewing v15 yet, but here are few comments for v14-patch003: 1) In apply_handle_update_internal(), I see that FindReplTupleInLocalRel() used to lock the row to be updated in exclusive mode, but now since we are avoiding this call in recursion and thus the second call onwards, the tuple to be updated or deleted will not be locked in exclusive mode. You are perhaps locking it (conflictslot passsed as localslot in recursion) somewhere in FindConflictTuple in shared mode. Since we are going to delete/update this tuple, shouldn't it be locked in exclusive mode? 2) Also, for multiple-key violations case, it would be good to verify the behavior that when say last update fails for some reason, all the deleted rows are reverted back? It seems so, but please test once by forcing the last operation to fail. thanks Shveta
On Wed, Oct 9, 2024 at 8:58 AM shveta malik <shveta.malik@gmail.com> wrote: > > On Tue, Oct 8, 2024 at 3:12 PM Nisha Moond <nisha.moond412@gmail.com> wrote: > > > Please find few comments on v14-patch004: patch004: 1) GetConflictResolver currently errors out when the resolver is last_update_wins and track_commit_timestamp is disabled. It means every conflict resolution with this resolver will keep on erroring out. I am not sure if we should emit ERROR here. We do emit ERROR when someone tries to configure last_update_wins but track_commit_timestamp is disabled. I think that should suffice. The one in GetConflictResolver can be converted to WARNING max. What could be the side-effect if we do not emit error here? In such a case, the local timestamp will be 0 and remote change will always win. Is that right? If so, then if needed, we can emit a warning saying something like: 'track_commit_timestamp is disabled and thus remote change is applied always.' Thoughts? 2) execReplication.c: There are some optimizations in this file (moving duplicate code to has_conflicting_tuple), I think these optimizations are applicable even to patch003 (or patch002 as well?) and thus can be moved there. Please review once. thanks Shveta
On Wednesday, October 9, 2024 2:34 PM shveta malik <shveta.malik@gmail.com> wrote: > > On Wed, Oct 9, 2024 at 8:58 AM shveta malik <shveta.malik@gmail.com> > wrote: > > > > On Tue, Oct 8, 2024 at 3:12 PM Nisha Moond > <nisha.moond412@gmail.com> wrote: > > > > > > > Please find few comments on v14-patch004: > > patch004: > 1) > GetConflictResolver currently errors out when the resolver is last_update_wins > and track_commit_timestamp is disabled. It means every conflict resolution > with this resolver will keep on erroring out. I am not sure if we should emit > ERROR here. We do emit ERROR when someone tries to configure > last_update_wins but track_commit_timestamp is disabled. I think that should > suffice. The one in GetConflictResolver can be converted to WARNING max. > > What could be the side-effect if we do not emit error here? In such a case, the > local timestamp will be 0 and remote change will always win. > Is that right? If so, then if needed, we can emit a warning saying something like: > 'track_commit_timestamp is disabled and thus remote change is applied > always.' > > Thoughts? I think simply reporting a warning and applying remote changes without further action could lead to data inconsistencies between nodes. Considering the potential challenges and time required to recover from these inconsistencies, I prefer to keep reporting errors, in which case users have an opportunity to resolve the issue by enabling track_commit_timestamp. Best Regards, Hou zj
On Fri, Oct 18, 2024 at 4:30 PM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote: > > On Wednesday, October 9, 2024 2:34 PM shveta malik <shveta.malik@gmail.com> wrote: > > > > On Wed, Oct 9, 2024 at 8:58 AM shveta malik <shveta.malik@gmail.com> > > wrote: > > > > > > On Tue, Oct 8, 2024 at 3:12 PM Nisha Moond > > <nisha.moond412@gmail.com> wrote: > > > > > > > > > > > Please find few comments on v14-patch004: > > > > patch004: > > 1) > > GetConflictResolver currently errors out when the resolver is last_update_wins > > and track_commit_timestamp is disabled. It means every conflict resolution > > with this resolver will keep on erroring out. I am not sure if we should emit > > ERROR here. We do emit ERROR when someone tries to configure > > last_update_wins but track_commit_timestamp is disabled. I think that should > > suffice. The one in GetConflictResolver can be converted to WARNING max. > > > > What could be the side-effect if we do not emit error here? In such a case, the > > local timestamp will be 0 and remote change will always win. > > Is that right? If so, then if needed, we can emit a warning saying something like: > > 'track_commit_timestamp is disabled and thus remote change is applied > > always.' > > > > Thoughts? > > I think simply reporting a warning and applying remote changes without further > action could lead to data inconsistencies between nodes. Considering the > potential challenges and time required to recover from these inconsistencies, I > prefer to keep reporting errors, in which case users have an opportunity to > resolve the issue by enabling track_commit_timestamp. > Okay, makes sense. We should raise ERROR then. thanks Shveta
Hello hackers,
Hey I'm Diego and I do work for Percona and started to work on PostgreSQL and I would like to contribute to the project moving forward.
I have been following this thread since the beginning, but due to my limited knowledge of the overall code structure, my first review of the provided patches was more focused on validating the logic and general flow.
I have been testing the provided patches and so far the only issue I have is the one reported about DirtySnapshot scans over a B-tree with parallel updates, which may skip/not find some records.
That said, I'd like to know if it's worthwhile pulling the proposed fix on [0] and validating/updating the code to fix the issue or if there are other better solutions being discussed?
Thanks for your attention,
Diego
[0]: https://www.postgresql.org/message-id/flat/CANtu0oiziTBM8+WDtkktMZv0rhGBroYGWwqSQW+MzOWpmk-XEw@mail.gmail.com#74f5f05594bb6f10b1d882a1ebce377c
Hey I'm Diego and I do work for Percona and started to work on PostgreSQL and I would like to contribute to the project moving forward.
I have been following this thread since the beginning, but due to my limited knowledge of the overall code structure, my first review of the provided patches was more focused on validating the logic and general flow.
I have been testing the provided patches and so far the only issue I have is the one reported about DirtySnapshot scans over a B-tree with parallel updates, which may skip/not find some records.
That said, I'd like to know if it's worthwhile pulling the proposed fix on [0] and validating/updating the code to fix the issue or if there are other better solutions being discussed?
Thanks for your attention,
Diego
[0]: https://www.postgresql.org/message-id/flat/CANtu0oiziTBM8+WDtkktMZv0rhGBroYGWwqSQW+MzOWpmk-XEw@mail.gmail.com#74f5f05594bb6f10b1d882a1ebce377c
On Mon, Oct 21, 2024 at 2:04 AM shveta malik <shveta.malik@gmail.com> wrote:
On Fri, Oct 18, 2024 at 4:30 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
>
> On Wednesday, October 9, 2024 2:34 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > On Wed, Oct 9, 2024 at 8:58 AM shveta malik <shveta.malik@gmail.com>
> > wrote:
> > >
> > > On Tue, Oct 8, 2024 at 3:12 PM Nisha Moond
> > <nisha.moond412@gmail.com> wrote:
> > > >
> > >
> >
> > Please find few comments on v14-patch004:
> >
> > patch004:
> > 1)
> > GetConflictResolver currently errors out when the resolver is last_update_wins
> > and track_commit_timestamp is disabled. It means every conflict resolution
> > with this resolver will keep on erroring out. I am not sure if we should emit
> > ERROR here. We do emit ERROR when someone tries to configure
> > last_update_wins but track_commit_timestamp is disabled. I think that should
> > suffice. The one in GetConflictResolver can be converted to WARNING max.
> >
> > What could be the side-effect if we do not emit error here? In such a case, the
> > local timestamp will be 0 and remote change will always win.
> > Is that right? If so, then if needed, we can emit a warning saying something like:
> > 'track_commit_timestamp is disabled and thus remote change is applied
> > always.'
> >
> > Thoughts?
>
> I think simply reporting a warning and applying remote changes without further
> action could lead to data inconsistencies between nodes. Considering the
> potential challenges and time required to recover from these inconsistencies, I
> prefer to keep reporting errors, in which case users have an opportunity to
> resolve the issue by enabling track_commit_timestamp.
>
Okay, makes sense. We should raise ERROR then.
thanks
Shveta