On Mon, Nov 11, 2024 at 9:05 PM Tomas Vondra <tomas@vondra.me> wrote:
>
> Alternatively, we could simply stop relying on the timestamps recorded
> in the commit, and instead derive "monotonic" commit timestamps after
> the fact. For example, we could track timestamps for some subset of
> commits, and then approximate the rest using LSN.
>
> AFAIK the inversion can happen only for concurrent commits, and there's
> can't be that many of those. So if we record the (LSN, timestamp) for
> every 1MB of WAL, we approximate timestamps for commits in that 1MB by
> linear approximation.
>
> Of course, there's a lot of questions and details to solve - e.g. how
> often would it need to happen, when exactly would it happen, etc. And
> also how would that integrate with the logical decoding - it's easy to
> just get the timestamp from the WAL record, this would require more work
> to actually calculate it. It's only a very rough idea.
>
I think for logical decoding it would be probably easy because it
reads all the WAL. So, it can remember the commit time of the previous
commit, and if any future commit has a commit timestamp lower than
that it can fix it by incrementing it. But outside logical decoding,
it would be tricky because we may need a separate process to fix up
commit timestamps by using linear approximation. IIUC, the bigger
challenge is that such a solution would require us to read the WAL on
a continuous basis and keep fixing commit timestamps or we need to
read the extra WAL before using or relying on commit timestamp. This
sounds to be a somewhat complex and costlier solution though the cost
is outside the hot-code path but still, it matters as it leads to
extra read I/O.
--
With Regards,
Amit Kapila.