Re: Parallel heap vacuum - Mailing list pgsql-hackers

From Peter Smith
Subject Re: Parallel heap vacuum
Date
Msg-id CAHut+PuxF2LbHT-aYPd3SA7wi1S0tk6D6c40AJW9ONv9J1GF2Q@mail.gmail.com
Whole thread Raw
In response to Re: Parallel heap vacuum  (Masahiko Sawada <sawada.mshk@gmail.com>)
List pgsql-hackers
Hi Sawada-San. Here are some review comments for patch v10-0002.

======
src/backend/access/heap/heapam_handler.c

1.
  .scan_bitmap_next_block = heapam_scan_bitmap_next_block,
  .scan_bitmap_next_tuple = heapam_scan_bitmap_next_tuple,
  .scan_sample_next_block = heapam_scan_sample_next_block,
- .scan_sample_next_tuple = heapam_scan_sample_next_tuple
+ .scan_sample_next_tuple = heapam_scan_sample_next_tuple,
+
+ .parallel_vacuum_compute_workers = heap_parallel_vacuum_compute_workers,

Pretty much every other heam AM method here has a 'heapam_' prefix. Is
it better to be consistent and call the new function
'heapam_parallel_vacuum_compute_workers' instead of
'heap_parallel_vacuum_compute_workers'?

======
src/backend/access/heap/vacuumlazy.c

2.
+/*
+ * Compute the number of workers for parallel heap vacuum.
+ *
+ * Disabled so far.
+ */
+int
+heap_parallel_vacuum_compute_workers(Relation rel, int max_workers)
+{
+ return 0;
+}
+

Instead of saying "Disabled so far", the function comment maybe should say:
"Return 0 means parallel heap vacuum is disabled."

Then the comment doesn't need to churn later when the function gets
implemented in later patches.

======
src/backend/commands/vacuumparallel.c

3.
+/* The kind of parallel vacuum work */
+typedef enum
+{
+ PV_WORK_ITEM_PROCESS_INDEXES, /* index vacuuming or cleanup */
+ PV_WORK_ITEM_COLLECT_DEAD_ITEMS, /* collect dead tuples */
+} PVWorkItem;
+

Isn't this more like a PVWorkPhase instead of PVWorkItem? Ditto for
the field name: 'work_phase' seems more appropriate.

~~~

4.
+ /*
+ * Processing indexes or removing dead tuples from the table.
+ */
+ PVWorkItem work_item;

Missing question mark for this comment?

~~~

5.
+ /*
+ * The number of workers for parallel table vacuuming. If > 0, the
+ * parallel table vacuum is enabled.
+ */
+ int nworkers_for_table;
+

I guess this field will never be negative. So is it simpler to modify
the comment to say:
"If 0, parallel table vacuum is disabled."

~~~

parallel_vacuum_init:

6.
+ /* A parallel vacuum must be requested */
  Assert(nrequested_workers >= 0);

It's not very intuitive to say the user requested a parallel vacuum
when the 'nrequested_workers' is 0. I felt some more commentary is
needed here or in the function header; it seems nrequested_workers ==
0 has a special meaning of having the system decide the number of
workers.

~~~

parallel_vacuum_compute_workers:

7.
 static int
-parallel_vacuum_compute_workers(Relation *indrels, int nindexes, int
nrequested,
+parallel_vacuum_compute_workers(Relation rel, Relation *indrels, int nindexes,
+ int nrequested, int *nworkers_table_p,
  bool *will_parallel_vacuum)
 {
  int nindexes_parallel = 0;
  int nindexes_parallel_bulkdel = 0;
  int nindexes_parallel_cleanup = 0;
- int parallel_workers;
+ int nworkers_table = 0;
+ int nworkers_index = 0;
+
+ *nworkers_table_p = 0;

7a.
AFAICT you can just remove the variable 'nworkers_table', and instead
call the 'nworkers_table_p' parameter as 'nworkers_table'

~

7b.
IIUC this 'will_parallel_vacuum' only has meaning for indexes, but now
that the patch introduces parallel table vacuuming it makes this
existing 'will_parallel_vacuum' name generic/misleading. Maybe it now
needs to be called 'will_parallel_index_vacuum' or similar (in all
places).

~~~

8.
+ if (nrequested > 0)
+ {
+ /*
+ * If the parallel degree is specified, accept that as the number of
+ * parallel workers for table vacuum (though still cap at
+ * max_parallel_maintenance_workers).
+ */
+ nworkers_table = Min(nrequested, max_parallel_maintenance_workers);
+ }
+ else
+ {
+ /* Compute the number of workers for parallel table scan */
+ nworkers_table = table_parallel_vacuum_compute_workers(rel,
+    max_parallel_maintenance_workers);
+
+ Assert(nworkers_table <= max_parallel_maintenance_workers);
+ }
+

The Assert can be outside of the if/else because it is the same for both.

~~~

9.
  /* No index supports parallel vacuum */
- if (nindexes_parallel <= 0)
- return 0;
-
- /* Compute the parallel degree */
- parallel_workers = (nrequested > 0) ?
- Min(nrequested, nindexes_parallel) : nindexes_parallel;
+ if (nindexes_parallel > 0)
+ {
+ /* Take into account the requested number of workers */
+ nworkers_index = (nrequested > 0) ?
+ Min(nrequested, nindexes_parallel) : nindexes_parallel;

- /* Cap by max_parallel_maintenance_workers */
- parallel_workers = Min(parallel_workers, max_parallel_maintenance_workers);
+ /* Cap by max_parallel_maintenance_workers */
+ nworkers_index = Min(nworkers_index, max_parallel_maintenance_workers);
+ }

"No index supports..." seems to be an old comment that is not correct
for this new code block.

~~~

10.
+ pg_atomic_sub_fetch_u32(VacuumActiveNWorkers, 1);
+}
+
+

Double blank line after function 'parallel_vacuum_process_table'

~~~

main:

11.
+ /* Initialize AM-specific vacuum state for parallel table vacuuming */
+ if (shared->work_item == PV_WORK_ITEM_COLLECT_DEAD_ITEMS)
+ {
+ ParallelWorkerContext pwcxt;
+
+ pwcxt.toc = toc;
+ pwcxt.seg = seg;
+ table_parallel_vacuum_initialize_worker(rel, &pvs, &pwcxt,
+ &state);
+ }
+

Wondering if this code can be done in the same if block later: "if
(pvs.shared->work_item == PV_WORK_ITEM_COLLECT_DEAD_ITEMS)"

~~~

12.
+ if (pvs.shared->work_item == PV_WORK_ITEM_COLLECT_DEAD_ITEMS)
+ {
+ /* Scan the table to collect dead items */
+ parallel_vacuum_process_table(&pvs, state);
+ }
+ else
+ {
+ Assert(pvs.shared->work_item == PV_WORK_ITEM_PROCESS_INDEXES);
+
+ /* Process indexes to perform vacuum/cleanup */
+ parallel_vacuum_process_safe_indexes(&pvs);
+ }

Would this if/else be better implemented as a 'switch' for the possible phases?

======
Kind Regards,
Peter Smith.
Fujitsu Australia



pgsql-hackers by date:

Previous
From: RECHTÉ Marc
Date:
Subject: Re: Logical replication timeout
Next
From: Fujii Masao
Date:
Subject: Re: Log connection establishment timings