Re: Perform streaming logical transactions by background workers and parallel apply - Mailing list pgsql-hackers

From Peter Smith
Subject Re: Perform streaming logical transactions by background workers and parallel apply
Date
Msg-id CAHut+PuKYdA4047wATm8WpkkqQX2CPNCa++dSQKr6dD1O5nz_Q@mail.gmail.com
Whole thread Raw
In response to RE: Perform streaming logical transactions by background workers and parallel apply  ("houzj.fnst@fujitsu.com" <houzj.fnst@fujitsu.com>)
Responses RE: Perform streaming logical transactions by background workers and parallel apply  ("houzj.fnst@fujitsu.com" <houzj.fnst@fujitsu.com>)
Re: Perform streaming logical transactions by background workers and parallel apply  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
Here are some review comments for patch v79-0002.

======

General

1.

I saw that earlier in this thread Hou-san [1] and Amit [2] also seemed
to say there is not much point for this patch.

So I wanted to +1 that same opinion.

I feel this patch just adds more complexity for almost no gain:
- reducing the 'max_apply_workers_per_suibscription' seems not very
common in the first place.
- even when the GUC is reduced, at that point in time all the workers
might be in use so there may be nothing that can be immediately done.
- IIUC the excess workers (for a reduced GUC) are going to get freed
naturally anyway over time as more transactions are completed so the
pool size will reduce accordingly.


~

OTOH some refactoring parts of this patch (e.g. the new pa_stop_worker
function) look better to me. I would keep those ones but remove all
the pa_stop_idle_workers function/call.


*** NOTE: The remainder of these review comments are maybe only
relevant if you are going to keep this pa_stop_idle_workers
behaviour...

======

Commit message

2.

If the max_parallel_apply_workers_per_subscription is changed to a
lower value, try to stop free workers in the pool to keep the number of
workers lower than half of the max_parallel_apply_workers_per_subscription

SUGGESTION

If the GUC max_parallel_apply_workers_per_subscription is changed to a
lower value, try to stop unused workers to keep the pool size lower
than half of max_parallel_apply_workers_per_subscription.


======

.../replication/logical/applyparallelworker.c

3. pa_free_worker

if (winfo->serialize_changes ||
list_length(ParallelApplyWorkerPool) >
(max_parallel_apply_workers_per_subscription / 2))
{
pa_stop_worker(winfo);
return;
}

winfo->in_use = false;
winfo->serialize_changes = false;

~

IMO the above code can be more neatly written using if/else because
then there is only one return point, and there is a place to write the
explanatory comment about the else.

SUGGESTION

if (winfo->serialize_changes ||
list_length(ParallelApplyWorkerPool) >
(max_parallel_apply_workers_per_subscription / 2))
{
pa_stop_worker(winfo);
}
else
{
/* Don't stop the worker. Only mark it available for re-use. */
winfo->in_use = false;
winfo->serialize_changes = false;
}

======

src/backend/replication/logical/worker.c

4. pa_stop_idle_workers

/*
 * Try to stop parallel apply workers that are not in use to keep the number of
 * workers lower than half of the max_parallel_apply_workers_per_subscription.
 */
void
pa_stop_idle_workers(void)
{
List    *active_workers;
ListCell   *lc;
int max_applyworkers = max_parallel_apply_workers_per_subscription / 2;

if (list_length(ParallelApplyWorkerPool) <= max_applyworkers)
return;

active_workers = list_copy(ParallelApplyWorkerPool);

foreach(lc, active_workers)
{
ParallelApplyWorkerInfo *winfo = (ParallelApplyWorkerInfo *) lfirst(lc);

pa_stop_worker(winfo);

/* Recheck the number of workers. */
if (list_length(ParallelApplyWorkerPool) <= max_applyworkers)
break;
}

list_free(active_workers);
}

~

4a. function comment

SUGGESTION

Try to keep the worker pool size lower than half of the
max_parallel_apply_workers_per_subscription.

~

4b. function name

This is not stopping all idle workers, so maybe a more meaningful name
for this function is something more like "pa_reduce_workerpool"

~

4c.

IMO the "max_applyworkers" var is a misleading name. Maybe something
like "goal_poolsize" is better?

~

4d.

Maybe I misunderstand the logic for the pool, but shouldn't this be
checking the winfo->in_use flag before blindly stopping each worker?


======

src/backend/replication/logical/worker.c

5.

@@ -3630,6 +3630,13 @@ LogicalRepApplyLoop(XLogRecPtr last_received)
  {
  ConfigReloadPending = false;
  ProcessConfigFile(PGC_SIGHUP);
+
+ /*
+ * Try to stop free workers in the pool in case the
+ * max_parallel_apply_workers_per_subscription is changed to a
+ * lower value.
+ */
+ pa_stop_idle_workers();
  }
5a.

SUGGESTED COMMENT
If max_parallel_apply_workers_per_subscription is changed to a lower
value, try to reduce the worker pool to match.

~

5b.

Instead of unconditionally calling pa_stop_idle_workers, shouldn't
this code compare the value of
max_parallel_apply_workers_per_subscription before/after the
ProcessConfigFile so it only calls if the GUC was lowered?


------
[1] Hou-san -
https://www.postgresql.org/message-id/OS0PR01MB5716E527412A3481F90B4397941A9%40OS0PR01MB5716.jpnprd01.prod.outlook.com
[2] Amit -
https://www.postgresql.org/message-id/CAA4eK1J%3D9m-VNRMHCqeG8jpX0CTn3Ciad2o4H-ogrZMDJ3tn4w%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Cygwin cleanup
Next
From: vignesh C
Date:
Subject: Re: Support logical replication of DDLs