Re: Adding REPACK [concurrently] - Mailing list pgsql-hackers
| From | Mihail Nikalayeu |
|---|---|
| Subject | Re: Adding REPACK [concurrently] |
| Date | |
| Msg-id | CADzfLwXp4c-MJx7yVDxAGNNxPbX4o9dqyivxavtHvmUsdXYqBQ@mail.gmail.com Whole thread Raw |
| In response to | Re: Adding REPACK [concurrently] (Antonin Houska <ah@cybertec.at>) |
| List | pgsql-hackers |
Hello, Antonin!
On Tue, Dec 9, 2025 at 7:52 PM Antonin Houska <ah@cybertec.at> wrote:
> Worker makes more sense to me - the initial implementation is in 0005.
Comments for 0005, so far:
---
> export_initial_snapshot
Hm, should we use ExportSnapshot instead? And ImportSnapshort to import it.
---
> get_initial_snapshot
Should we check if a worker is still alive while waiting? Also is
"process_concurrent_changes".
And AFAIU RegisterDynamicBackgroundWorker does not guarantee new
workers to be started (in case of some fork-related issues).
---
> Assert(res = SHM_MQ_DETACHED);
==
---
> /* Wait a bit before we retry reading WAL. */
> (void) WaitLatch(MyLatch,
> WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
> 1000L,
> WAIT_EVENT_REPACK_WORKER_MAIN);
Looks like we need ResetLatch(MyLatch); here.
---
> * - decoding_ctx - logical decoding context, to capture concurrent data
Need to be removed together with parameters.
---
> hpm_context = AllocSetContextCreate(TopMemoryContext,
> "ProcessParallelMessages",
> ALLOCSET_DEFAULT_SIZES);
"ProcessRepacklMessages"
---
> if (XLogRecPtrIsInvalid(lsn_upto))
> {
> SpinLockAcquire(&shared->mutex);
> lsn_upto = shared->lsn_upto;
> /* 'done' should be set at the same time as 'lsn_upto' */
> done = shared->done;
> SpinLockRelease(&shared->mutex);
>
> /* Check if the work happens to be complete. */
> continue;
> }
May be moved to the start of the loop to avoid duplication.
---
> SpinLockAcquire(&shared->mutex);
> valid = shared->sfs_valid;
> SpinLockRelease(&shared->mutex);
Better to remember last_exported here to avoid any races/misses.
---
> shared->lsn_upto = InvalidXLogRecPtr;
I think it is better to clear it once it is read (after removing duplication).
---
> bool done;
bool exit_after_lsn_upto?
---
> bool sfs_valid;
Do we really need it? I think it is better to leave only last_exported
and in process_concurrent_changes wait add argument
(last_processed_file) and wait for last_exported to become higher.
---
What if we reverse roles of leader-worker?
Leader gets a snapshot, transfers it to workers (multiple probably for
parallel scan) using already ready mechanics - workers are processing
the scan of the table in parallel. Leader decodes the WAL.
Also, workers may be assigned with a list of indexes they need to build.
Feels like it reuses more from current infrastructure and also needs
less different synchronization logic. But I'm not sure about the
indexes phase - maybe it is not so easy to do.
---
Also, should we add some kind of back pressure between building
indexes/new heap and num of WAL we have?
But probably it is out of scope of the patch.
---
To build N indexes we need to scan table N times. What is about
building multiple indexes during a single heap scan?
--
Just a gentle reminder about the XMIN_COMMITTED flag and WAL storm
after the switch.
Best regards,
Mikhail.
pgsql-hackers by date: