Re: Adding REPACK [concurrently] - Mailing list pgsql-hackers

From Mihail Nikalayeu
Subject Re: Adding REPACK [concurrently]
Date
Msg-id CADzfLwXp4c-MJx7yVDxAGNNxPbX4o9dqyivxavtHvmUsdXYqBQ@mail.gmail.com
Whole thread Raw
In response to Re: Adding REPACK [concurrently]  (Antonin Houska <ah@cybertec.at>)
List pgsql-hackers
Hello, Antonin!

On Tue, Dec 9, 2025 at 7:52 PM Antonin Houska <ah@cybertec.at> wrote:
> Worker makes more sense to me - the initial implementation is in 0005.

Comments for 0005, so far:

---
> export_initial_snapshot

Hm, should we use ExportSnapshot instead? And ImportSnapshort to import it.

---
> get_initial_snapshot

Should we check if a worker is still alive while waiting? Also is
"process_concurrent_changes".

And AFAIU RegisterDynamicBackgroundWorker does not guarantee new
workers to be started (in case of some fork-related issues).

---
> Assert(res = SHM_MQ_DETACHED);

==

---
> /* Wait a bit before we retry reading WAL. */
> (void) WaitLatch(MyLatch,
>              WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
>              1000L,
>              WAIT_EVENT_REPACK_WORKER_MAIN);

Looks like we need ResetLatch(MyLatch); here.

---
> * - decoding_ctx - logical decoding context, to capture concurrent data

Need to be removed together with parameters.

---
> hpm_context = AllocSetContextCreate(TopMemoryContext,
>                            "ProcessParallelMessages",
>                            ALLOCSET_DEFAULT_SIZES);

"ProcessRepacklMessages"

---
> if (XLogRecPtrIsInvalid(lsn_upto))
> {
>    SpinLockAcquire(&shared->mutex);
>    lsn_upto = shared->lsn_upto;
>    /* 'done' should be set at the same time as 'lsn_upto' */
>    done = shared->done;
>    SpinLockRelease(&shared->mutex);
>
>    /* Check if the work happens to be complete. */
>    continue;
> }

May be moved to the start of the loop to avoid duplication.

---
> SpinLockAcquire(&shared->mutex);
> valid = shared->sfs_valid;
> SpinLockRelease(&shared->mutex);

Better to remember last_exported here to avoid any races/misses.

---
> shared->lsn_upto = InvalidXLogRecPtr;

I think it is better to clear it once it is read (after removing duplication).

---
> bool       done;

bool exit_after_lsn_upto?

---
> bool       sfs_valid;

Do we really need it? I think it is better to leave only last_exported
and in process_concurrent_changes wait add argument
(last_processed_file) and wait for last_exported to become higher.

---
What if we reverse roles of leader-worker?

Leader gets a snapshot, transfers it to workers (multiple probably for
parallel scan) using already ready mechanics - workers are processing
the scan of the table in parallel. Leader decodes the WAL.

Also, workers may be assigned with a list of indexes they need to build.

Feels like it reuses more from current infrastructure and also needs
less different synchronization logic. But I'm not sure about the
indexes phase - maybe it is not so easy to do.

---
Also, should we add some kind of back pressure between building
indexes/new heap and num of WAL we have?
But probably it is out of scope of the patch.

---
To build N indexes we need to scan table N times. What is about
building multiple indexes during a single heap scan?

--
Just a gentle reminder about the XMIN_COMMITTED flag and WAL storm
after the switch.

Best regards,
Mikhail.



pgsql-hackers by date:

Previous
From: Nathan Bossart
Date:
Subject: Re: enhance wraparound warnings
Next
From: surya poondla
Date:
Subject: Re: [PATCH] psql: tab completion for ALTER ROLE ... IN DATABASE ...