Re: Parallel heap vacuum - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Parallel heap vacuum
Date
Msg-id CAD21AoC8T13bvR46hPXj2hjAnUam2OXR2qweXaHUnrDw=VCEuw@mail.gmail.com
Whole thread Raw
In response to Re: Parallel heap vacuum  (vignesh C <vignesh21@gmail.com>)
List pgsql-hackers
On Tue, Nov 12, 2024 at 3:21 AM vignesh C <vignesh21@gmail.com> wrote:
>
> On Wed, 30 Oct 2024 at 22:48, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> >
> > I've attached new version patches that fixes failures reported by
> > cfbot. I hope these changes make cfbot happy.
> >
>
> I just started reviewing the patch and found the following comments
> while going through the patch:
> 1) I felt we should add some documentation for this at [1].
>
> 2) Can we add some tests in vacuum_parallel with tables having no
> indexes and having dead tuples.
>
> 3) This should be included in typedefs.list:
> 3.a)
> +/*
> + * Relation statistics collected during heap scanning and need to be
> shared among
> + * parallel vacuum workers.
> + */
> +typedef struct LVRelScanStats
> +{
> +       BlockNumber scanned_pages;      /* # pages examined (not
> skipped via VM) */
> +       BlockNumber removed_pages;      /* # pages removed by relation
> truncation */
> +       BlockNumber frozen_pages;       /* # pages with newly frozen tuples */
>
> 3.b) Similarly this too:
> +/*
> + * Struct for information that need to be shared among parallel vacuum workers
> + */
> +typedef struct PHVShared
> +{
> +       bool            aggressive;
> +       bool            skipwithvm;
> +
>
> 3.c) Similarly this too:
> +/* Per-worker scan state for parallel heap vacuum scan */
> +typedef struct PHVScanWorkerState
> +{
> +       bool            initialized;
>
> 3.d)  Similarly this too:
> +/* Struct for parallel heap vacuum */
> +typedef struct PHVState
> +{
> +       /* Parallel scan description shared among parallel workers */
>
> 4) Since we are initializing almost all the members of structure,
> should we use palloc0 in this case:
> +       scan_stats = palloc(sizeof(LVRelScanStats));
> +       scan_stats->scanned_pages = 0;
> +       scan_stats->removed_pages = 0;
> +       scan_stats->frozen_pages = 0;
> +       scan_stats->lpdead_item_pages = 0;
> +       scan_stats->missed_dead_pages = 0;
> +       scan_stats->nonempty_pages = 0;
> +
> +       /* Initialize remaining counters (be tidy) */
> +       scan_stats->tuples_deleted = 0;
> +       scan_stats->tuples_frozen = 0;
> +       scan_stats->lpdead_items = 0;
> +       scan_stats->live_tuples = 0;
> +       scan_stats->recently_dead_tuples = 0;
> +       scan_stats->missed_dead_tuples = 0;
>
> 5) typo paralle should be parallel
> +/*
> + * Return the number of parallel workers for a parallel vacuum scan of this
> + * relation.
> + */
> +static inline int
> +table_paralle_vacuum_compute_workers(Relation rel, int requested)
> +{
> +       return rel->rd_tableam->parallel_vacuum_compute_workers(rel, requested);
> +}
>

Thank you for the comments! I'll address these comments in the next
version patch.

BTW while updating the patch, I found that we might want to launch
different numbers of workers for scanning heap and vacuuming heap. The
number of parallel workers is determined based on the number of blocks
in the table. However, even if this number is high, it could happen
that we want to launch fewer workers to vacuum heap pages when there
are not many pages having garbage. And the number of workers for
vacuuming heap could vary on each vacuum pass. I'm considering
implementing it.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: Fix an error while building test_radixtree.c with TEST_SHARED_RT
Next
From: Corey Huinker
Date:
Subject: Re: Statistics Import and Export