Re: Parallel heap vacuum - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: Parallel heap vacuum |
Date | |
Msg-id | CAD21AoC8T13bvR46hPXj2hjAnUam2OXR2qweXaHUnrDw=VCEuw@mail.gmail.com Whole thread Raw |
In response to | Re: Parallel heap vacuum (vignesh C <vignesh21@gmail.com>) |
List | pgsql-hackers |
On Tue, Nov 12, 2024 at 3:21 AM vignesh C <vignesh21@gmail.com> wrote: > > On Wed, 30 Oct 2024 at 22:48, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > I've attached new version patches that fixes failures reported by > > cfbot. I hope these changes make cfbot happy. > > > > I just started reviewing the patch and found the following comments > while going through the patch: > 1) I felt we should add some documentation for this at [1]. > > 2) Can we add some tests in vacuum_parallel with tables having no > indexes and having dead tuples. > > 3) This should be included in typedefs.list: > 3.a) > +/* > + * Relation statistics collected during heap scanning and need to be > shared among > + * parallel vacuum workers. > + */ > +typedef struct LVRelScanStats > +{ > + BlockNumber scanned_pages; /* # pages examined (not > skipped via VM) */ > + BlockNumber removed_pages; /* # pages removed by relation > truncation */ > + BlockNumber frozen_pages; /* # pages with newly frozen tuples */ > > 3.b) Similarly this too: > +/* > + * Struct for information that need to be shared among parallel vacuum workers > + */ > +typedef struct PHVShared > +{ > + bool aggressive; > + bool skipwithvm; > + > > 3.c) Similarly this too: > +/* Per-worker scan state for parallel heap vacuum scan */ > +typedef struct PHVScanWorkerState > +{ > + bool initialized; > > 3.d) Similarly this too: > +/* Struct for parallel heap vacuum */ > +typedef struct PHVState > +{ > + /* Parallel scan description shared among parallel workers */ > > 4) Since we are initializing almost all the members of structure, > should we use palloc0 in this case: > + scan_stats = palloc(sizeof(LVRelScanStats)); > + scan_stats->scanned_pages = 0; > + scan_stats->removed_pages = 0; > + scan_stats->frozen_pages = 0; > + scan_stats->lpdead_item_pages = 0; > + scan_stats->missed_dead_pages = 0; > + scan_stats->nonempty_pages = 0; > + > + /* Initialize remaining counters (be tidy) */ > + scan_stats->tuples_deleted = 0; > + scan_stats->tuples_frozen = 0; > + scan_stats->lpdead_items = 0; > + scan_stats->live_tuples = 0; > + scan_stats->recently_dead_tuples = 0; > + scan_stats->missed_dead_tuples = 0; > > 5) typo paralle should be parallel > +/* > + * Return the number of parallel workers for a parallel vacuum scan of this > + * relation. > + */ > +static inline int > +table_paralle_vacuum_compute_workers(Relation rel, int requested) > +{ > + return rel->rd_tableam->parallel_vacuum_compute_workers(rel, requested); > +} > Thank you for the comments! I'll address these comments in the next version patch. BTW while updating the patch, I found that we might want to launch different numbers of workers for scanning heap and vacuuming heap. The number of parallel workers is determined based on the number of blocks in the table. However, even if this number is high, it could happen that we want to launch fewer workers to vacuum heap pages when there are not many pages having garbage. And the number of workers for vacuuming heap could vary on each vacuum pass. I'm considering implementing it. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
pgsql-hackers by date: