Thread: Commit Timestamp and LSN Inversion issue

Commit Timestamp and LSN Inversion issue

From
shveta malik
Date:
Hello hackers,
(Cc people involved in the earlier discussion)

I would like to discuss the $Subject.

While discussing Logical Replication's Conflict Detection and
Resolution (CDR) design in [1] , it came to  our notice that the
commit LSN and timestamp may not correlate perfectly i.e. commits may
happen with LSN1 < LSN2 but with Ts1 > Ts2. This issue may arise
because, during the commit process, the timestamp (xactStopTimestamp)
is captured slightly earlier than when space is reserved in the WAL.

 ~~

 Reproducibility of conflict-resolution problem due to the timestamp inversion
------------------------------------------------
It was suggested that timestamp inversion *may* impact the time-based
resolutions such as last_update_wins (targeted to be implemented in
[1]) as we may end up making wrong decisions if timestamps and LSNs
are not correctly ordered. And thus we tried some tests but failed to
find any practical scenario where it could be a problem.

Basically, the proposed conflict resolution is a row-level resolution,
and to cause the row value to be inconsistent, we need to modify the
same row in concurrent transactions and commit the changes
concurrently. But this doesn't seem possible because concurrent
updates on the same row are disallowed (e.g., the later update will be
blocked due to the row lock).  See [2] for the details.

We tried to give some thoughts on multi table cases as well e.g.,
update table A with foreign key and update the table B that table A
refers to. But update on table A will block the update on table B as
well, so we could not reproduce data-divergence due to the
LSN/timestamp mismatch issue there.

 ~~

Idea proposed to fix the timestamp inversion issue
------------------------------------------------
There was a suggestion in [3] to acquire the timestamp while reserving
the space (because that happens in LSN order). The clock would need to
be monotonic (easy enough with CLOCK_MONOTONIC), but also cheap. The
main problem why it's being done outside the critical section, because
gettimeofday() may be quite expensive. There's a concept of hybrid
clock, combining "time" and logical counter, which might be useful
independently of CDR.

On further analyzing this idea, we found that CLOCK_MONOTONIC can be
accepted only by clock_gettime() which has more precision than
gettimeofday() and thus is equally or more expensive theoretically (we
plan to test it and post the results). It does not look like a good
idea to call any of these when holding spinlock to reserve the wal
position.  As for the suggested solution "hybrid clock", it might not
help here because the logical counter is only used to order the
transactions with the same timestamp. The problem here is how to get
the timestamp along with wal position
reservation(ReserveXLogInsertLocation).

 ~~

 We can explore further but as we are not able to see any real-time
scenario where this could actually be problem, it may or may not be
worth to spend time on this. Thoughts?


[1]:
(See: "How is this going to deal with the fact that commit LSN and
timestamps may not correlate perfectly?").
https://www.postgresql.org/message-id/CAJpy0uBWBEveM8LO2b7wNZ47raZ9tVJw3D2_WXd8-b6LSqP6HA%40mail.gmail.com

[2]:
https://www.postgresql.org/message-id/CAA4eK1JTMiBOoGqkt%3DaLPLU8Rs45ihbLhXaGHsz8XC76%2BOG3%2BQ%40mail.gmail.com

[3]:
(See: "The clock would need to be monotonic (easy enough with
CLOCK_MONOTONIC").
https://www.postgresql.org/message-id/a3a70a19-a35e-426c-8646-0898cdc207c8%40enterprisedb.com

thanks
Shveta



Re: Commit Timestamp and LSN Inversion issue

From
Aleksander Alekseev
Date:
Hi Shveta,

> While discussing Logical Replication's Conflict Detection and
> Resolution (CDR) design in [1] , it came to  our notice that the
> commit LSN and timestamp may not correlate perfectly i.e. commits may
> happen with LSN1 < LSN2 but with Ts1 > Ts2. This issue may arise
> because, during the commit process, the timestamp (xactStopTimestamp)
> is captured slightly earlier than when space is reserved in the WAL.
>  [...]
> There was a suggestion in [3] to acquire the timestamp while reserving
> the space (because that happens in LSN order). The clock would need to
> be monotonic (easy enough with CLOCK_MONOTONIC), but also cheap. The
> main problem why it's being done outside the critical section, because
> gettimeofday() may be quite expensive. There's a concept of hybrid
> clock, combining "time" and logical counter, which might be useful
> independently of CDR.

I don't think you can rely on a system clock for conflict resolution.
In a corner case a DBA can move the clock forward or backward between
recordings of Ts1 and Ts2. On top of that there is no guarantee that
2+ servers have synchronised clocks. It seems to me that what you are
proposing will just hide the problem instead of solving it in the
general case.

This is the reason why {latest|earliest}_timestamp_wins strategies you
are proposing to use for CDR are poor strategies. In practice they
work as random_key_wins which is not extremely useful (what it does is
basically _deleting_ random data, not solving conflicts). On top of
that strategies like this don't take into account checks and
constraints the database may have, including high-level constraints
that may not be explicitly defined in the DBMS but may exist in the
application logic.

Unfortunately I can't reference a particular article or white paper on
the subject but I know that "last write wins" was widely criticized
back in the 2010s when people were building distributed systems on
commodity hardware. In this time period I worked on several projects
as a backend software engineer and I can assure you that LWW is not
something you want.

IMO the right approach to the problem would be defining procedures for
conflict resolution that may not only semi-randomly choose between two
tuples but also implement a user-defined logic. Similarly to INSERT
INTO ... ON CONFLICT ... semantics, or similar approaches from
long-lived and well-explored distributed system, e.g. Riak.
Alternatively / additionally we could support CRDTs in Postgres.

-- 
Best regards,
Aleksander Alekseev



Re: Commit Timestamp and LSN Inversion issue

From
Amit Kapila
Date:
On Wed, Sep 4, 2024 at 2:05 PM Aleksander Alekseev
<aleksander@timescale.com> wrote:
>
> > While discussing Logical Replication's Conflict Detection and
> > Resolution (CDR) design in [1] , it came to  our notice that the
> > commit LSN and timestamp may not correlate perfectly i.e. commits may
> > happen with LSN1 < LSN2 but with Ts1 > Ts2. This issue may arise
> > because, during the commit process, the timestamp (xactStopTimestamp)
> > is captured slightly earlier than when space is reserved in the WAL.
> >  [...]
> > There was a suggestion in [3] to acquire the timestamp while reserving
> > the space (because that happens in LSN order). The clock would need to
> > be monotonic (easy enough with CLOCK_MONOTONIC), but also cheap. The
> > main problem why it's being done outside the critical section, because
> > gettimeofday() may be quite expensive. There's a concept of hybrid
> > clock, combining "time" and logical counter, which might be useful
> > independently of CDR.
>
> I don't think you can rely on a system clock for conflict resolution.
> In a corner case a DBA can move the clock forward or backward between
> recordings of Ts1 and Ts2. On top of that there is no guarantee that
> 2+ servers have synchronised clocks. It seems to me that what you are
> proposing will just hide the problem instead of solving it in the
> general case.
>

It is possible that we can't rely on the system clock for conflict
resolution but that is not the specific point of this thread. As
mentioned in the subject of this thread, the problem is "Commit
Timestamp and LSN Inversion issue". The LSN value and timestamp for a
commit are not generated atomically, so two different transactions can
have them in different order.

Your point as far as I can understand is that in the first place, it
is not a good idea to have a strategy like "last_write_wins" which
relies on the system clock. So, even if LSN->Timestamp ordering has
any problem, it won't matter to us. Now, we can discuss whether
"last_write_wins" is a poor strategy or not but if possible, for the
sake of the point of this thread, let's assume that users using the
resolution feature ("last_write_wins") ensure that clocks are synced
or they won't enable this feature and then see if we can think of any
problem w.r.t the current code.

--
With Regards,
Amit Kapila.



Re: Commit Timestamp and LSN Inversion issue

From
Aleksander Alekseev
Date:
Hi Amit,

> > I don't think you can rely on a system clock for conflict resolution.
> > In a corner case a DBA can move the clock forward or backward between
> > recordings of Ts1 and Ts2. On top of that there is no guarantee that
> > 2+ servers have synchronised clocks. It seems to me that what you are
> > proposing will just hide the problem instead of solving it in the
> > general case.
> >
>
> It is possible that we can't rely on the system clock for conflict
> resolution but that is not the specific point of this thread. As
> mentioned in the subject of this thread, the problem is "Commit
> Timestamp and LSN Inversion issue". The LSN value and timestamp for a
> commit are not generated atomically, so two different transactions can
> have them in different order.

Hm.... Then I'm having difficulties understanding why this is a
problem and why it was necessary to mention CDR in this context in the
first place.

OK, let's forget about CDR completely. Who is affected by the current
behavior and why would it be beneficial changing it?

-- 
Best regards,
Aleksander Alekseev



Re: Commit Timestamp and LSN Inversion issue

From
Amit Kapila
Date:
On Wed, Sep 4, 2024 at 6:35 PM Aleksander Alekseev
<aleksander@timescale.com> wrote:
>
> > > I don't think you can rely on a system clock for conflict resolution.
> > > In a corner case a DBA can move the clock forward or backward between
> > > recordings of Ts1 and Ts2. On top of that there is no guarantee that
> > > 2+ servers have synchronised clocks. It seems to me that what you are
> > > proposing will just hide the problem instead of solving it in the
> > > general case.
> > >
> >
> > It is possible that we can't rely on the system clock for conflict
> > resolution but that is not the specific point of this thread. As
> > mentioned in the subject of this thread, the problem is "Commit
> > Timestamp and LSN Inversion issue". The LSN value and timestamp for a
> > commit are not generated atomically, so two different transactions can
> > have them in different order.
>
> Hm.... Then I'm having difficulties understanding why this is a
> problem

This is a potential problem pointed out during discussion of CDR [1]
(Please read the point starting from "How is this going to deal .."
and response by Shveta). The point of this thread is that though it
appears to be a problem but practically there is no scenario where it
can impact even when we implement "last_write_wins" startegy as
explained in the initial email. If you or someone sees a problem due
to LSN<->timestamp inversion then we need to explore the solution for
it.

>
> and why it was necessary to mention CDR in this context in the
> first place.
>
> OK, let's forget about CDR completely. Who is affected by the current
> behavior and why would it be beneficial changing it?
>

We can't forget CDR completely as this could only be a potential
problem in that context. Right now, we don't have any built-in
resolution strategies, so this can't impact but if this is a problem
then we need to have a solution for it before considering a solution
like "last_write_wins" strategy.

Now, instead of discussing LSN<->timestamp inversion issue, you
started to discuss "last_write_wins" strategy itself which we have
discussed to some extent in the thread [2]. BTW, we are planning to
start a separate thread as well just to discuss the clock skew problem
w.r.t resolution strategies like "last_write_wins" strategy. So, we
can discuss clock skew in that thread and keep the focus of this
thread LSN<->timestamp inversion problem.

[1] - https://www.postgresql.org/message-id/CAJpy0uBWBEveM8LO2b7wNZ47raZ9tVJw3D2_WXd8-b6LSqP6HA%40mail.gmail.com
[2] - https://www.postgresql.org/message-id/CAJpy0uD0-DpYVMtsxK5R%3DzszXauZBayQMAYET9sWr_w0CNWXxQ%40mail.gmail.com

--
With Regards,
Amit Kapila.