Re: Adding REPACK [concurrently] - Mailing list pgsql-hackers

From Mihail Nikalayeu
Subject Re: Adding REPACK [concurrently]
Date
Msg-id CADzfLwXpyzGGVB+nbSAMBWDBhzTyn6F2hRrAqcyJd3c6gT2EOQ@mail.gmail.com
Whole thread Raw
In response to Re: Adding REPACK [concurrently]  (Antonin Houska <ah@cybertec.at>)
Responses Re: Adding REPACK [concurrently]
List pgsql-hackers
Hello, Antonin!

> When posting version 12 of the patch [1] I raised a concern that the the MVCC
> safety is too expensive when it comes to logical decoding. Therefore, I
> abandoned the concept for now, and v13 [2] uses plain heap_insert(). Once we
> implement the MVCC safety, we simply rewrite the tuple like v12 did - that's
> the simplest way to preserve fields like xmin, cmin, ...

Thanks for the explanation.

I was looking into catalog-related logical decoding features, and it
seems like they are clearly overkill for the repack case.
We don't need CID tracking or even a snapshot for each commit if we’re
okay with passing xmin/xmax as arguments.

What do you think about the following approach for replaying:
* use the extracted XID as the value for xmin/xmax.
* use SnapshotSelf to find the tuple for update/delete operations.

SnapshotSelf seems like a good fit here:
* it sees the last "existing" version.
* any XID set as xmin/xmax in the repacked version is already
committed - so each update/insert is effectively "committed" once
written.
* it works with multiple updates of the same tuple within a single
transaction - SnapshotSelf sees the last version.
* all updates are ordered and replayed sequentially - so the last
version is always the one we want.

If I'm not missing anything, this looks like something worth including
in the patch set.
If so, I can try implementing a test version.

Best regards,
Mikhail



pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: Non-text mode for pg_dumpall
Next
From: Yugo Nagata
Date:
Subject: Re: Inconsistent update in the MERGE command