RE: Wake up autovacuum launcher from postmaster when a worker exits - Mailing list pgsql-hackers
| From | li carol |
|---|---|
| Subject | RE: Wake up autovacuum launcher from postmaster when a worker exits |
| Date | |
| Msg-id | KL1PR01MB6662407F7A58A3FCEA4B29C88182A@KL1PR01MB6662.apcprd01.prod.exchangelabs.com Whole thread Raw |
| In response to | Re: Wake up autovacuum launcher from postmaster when a worker exits (Masahiko Sawada <sawada.mshk@gmail.com>) |
| List | pgsql-hackers |
> On Thu, Jan 8, 2026 at 11:57 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > > > > When an autovacuum worker exits, ProcKill() sends SIGUSR2 to the > > launcher. I propose moving that responsibility to the postmaster, because: > > > > * It's simpler IMHO > > > > * The postmaster is already responsible for sending the signal if > > fork() fails > > > > * It makes it consistent with background workers. When a background > > worker exits, the postmaster sends the signal to the launching process > > (if requested). > > > > * Postmaster doesn't need to worry about sending the signal to the > > wrong process if the launcher's PID is reused, because it always has > > up-to-date PID information, because the launcher is postmaster's child > > process. That risk was negligible to begin with, but this eliminates > > completely, so we don't need the comment excusing it it anymore. > > It sounds reasonable to me too. +1. > > Regards, > > -- > Masahiko Sawada > Amazon Web Services: https://aws.amazon.com > Hi all, I have completed the testing for this patch with a focus on the signaling logic from postmaster to launcher. Test Environment: - autovacuum_max_workers = 1 - autovacuum_naptime = 5s - Multiple databases (4 total) Observations: I performed cross-database vacuum tests. With the 1-worker limit, I observed that the launcher starts the next worker fora pending database immediately (within the ~1.25s scheduled stagger) after the previous worker exits. The logs show a seamless handover between worker processes (e.g., Worker A exits at 16:06:20.209, and the next scheduledWorker B starts at 16:06:21.447). 2026-01-09 16:06:20.209 CST [1918017] LOG: automatic vacuum of table "db_test2.public.t2": index scans: 0 pages: 222 removed, 0 remain, 222 scanned (100.00% of total), 0 eagerly scanned tuples: 50000 removed, 0 remain, 0 are dead but not yet removable removable cutoff: 813, which was 1 XIDs old when operation ended new relfrozenxid: 813, which is 3 XIDs ahead of previous value frozen: 0 pages from table (0.00% of total) had 0 tuples frozen visibility map: 222 pages set all-visible, 222 pages set all-frozen (0 were all-visible) index scan not needed: 0 pages from table (0.00% of total) had 0 dead item identifiers removed avg read rate: 0.000 MB/s, avg write rate: 2.794 MB/s buffer usage: 697 hits, 0 reads, 4 dirtied WAL usage: 450 records, 4 full page images, 158352 bytes, 32768 full page image bytes, 0 buffers full memory usage: dead item storage 0.02 MB accumulated across 0 resets (limit 64.00 MB each) system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.01 s 2026-01-09 16:06:21.447 CST [1918022] DEBUG: autovacuum: processing database "template1" 2026-01-09 16:06:22.699 CST [1918027] DEBUG: autovacuum: processing database "postgres" 2026-01-09 16:06:23.946 CST [1918044] DEBUG: autovacuum: processing database "db_test1" 2026-01-09 16:06:23.958 CST [1918044] LOG: automatic vacuum of table "db_test1.public.t1": index scans: 0 pages: 222 removed, 0 remain, 222 scanned (100.00% of total), 0 eagerly scanned tuples: 50000 removed, 0 remain, 0 are dead but not yet removable removable cutoff: 814, which was 1 XIDs old when operation ended new relfrozenxid: 814, which is 7 XIDs ahead of previous value frozen: 0 pages from table (0.00% of total) had 0 tuples frozen visibility map: 222 pages set all-visible, 222 pages set all-frozen (0 were all-visible) index scan not needed: 0 pages from table (0.00% of total) had 0 dead item identifiers removed avg read rate: 0.000 MB/s, avg write rate: 2.793 MB/s buffer usage: 697 hits, 0 reads, 4 dirtied WAL usage: 450 records, 4 full page images, 158352 bytes, 32768 full page image bytes, 0 buffers full memory usage: dead item storage 0.02 MB accumulated across 0 resets (limit 64.00 MB each) system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.01 s This confirms that the postmaster successfully notifies the launcher and the worker slot is freed appropriately before thenotification. The patch looks correct and robust. +1 from my side. Best Regards, Yuan Li(carol)
pgsql-hackers by date: