Re: suppressing useless wakeups in logical/worker.c - Mailing list pgsql-hackers

From Nathan Bossart
Subject Re: suppressing useless wakeups in logical/worker.c
Date
Msg-id 20230125235004.GA1327755@nathanxps13
Whole thread Raw
In response to Re: suppressing useless wakeups in logical/worker.c  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: suppressing useless wakeups in logical/worker.c
List pgsql-hackers
On Tue, Jan 24, 2023 at 06:45:08PM -0500, Tom Lane wrote:
> I took a look through this, and have a number of mostly-cosmetic
> issues:

Thanks for the detailed review.

> * It seems wrong that next_sync_start isn't handled as one of the
> wakeup[NUM_LRW_WAKEUPS] entries.  I see that it needs to be accessed from
> another module; but you could handle that without exposing either enum
> LogRepWorkerWakeupReason or the array, by providing getter/setter
> functions for process_syncing_tables_for_apply() to call.
> 
> * This code is far too intimately familiar with the fact that TimestampTz
> is an int64 count of microseconds.  (I'm picky about that because I
> remember that they were once something else, so I wonder if someday
> they will be different again.)  You could get rid of the PG_INT64_MAX
> usages by replacing those with the timestamp infinity macro DT_NOEND;
> and I'd even be on board with adding a less-opaque alternate name for
> that to datatype/timestamp.h.  The various magic-constant multipliers
> could perhaps be made less magic by using TimestampTzPlusMilliseconds(). 
> 
> * I think it might be better to construct the enum like this:
> 
> +typedef enum LogRepWorkerWakeupReason
> +{
> +    LRW_WAKEUP_TERMINATE,
> +    LRW_WAKEUP_PING,
> +    LRW_WAKEUP_STATUS
> +#define NUM_LRW_WAKEUPS (LRW_WAKEUP_STATUS + 1)
> +} LogRepWorkerWakeupReason;
> 
> so that you don't have to have a default: case in switches on the
> enum value.  I'm more worried about somebody adding an enum value
> and missing updating a switch statement elsewhere than I am about 
> somebody adding an enum value and neglecting to update the
> immediately-adjacent macro.

I did all of this in v3.

> * The updates of "now" in LogicalRepApplyLoop seem rather
> randomly placed, and I'm not entirely convinced that we'll
> always be using a reasonably up-to-date value.  Can't we
> just update it right before each usage?

This came up for walreceiver.c, too.  The concern is that
GetCurrentTimestamp() might be rather expensive on systems without
something like the vDSO.  I don't know how common that is.  If we can rule
that out, then I agree that we should just update it right before each use.

> * This special handling of next_sync_start seems mighty ugly:
> 
> +            /* Also consider special wakeup time for starting sync workers. */
> +            if (next_sync_start < now)
> +            {
> +                /*
> +                 * Instead of spinning while we wait for the sync worker to
> +                 * start, wait a bit before retrying (unless there's an earlier
> +                 * wakeup time).
> +                 */
> +                nextWakeup = Min(now + INT64CONST(100000), nextWakeup);
> +            }
> +            else
> +                nextWakeup = Min(next_sync_start, nextWakeup);
> 
> Do we really need the slop?  If so, is there a reason why it
> shouldn't apply to all possible sources of nextWakeup?  (It's
> going to be hard to fold next_sync_start into the wakeup[]
> array unless you can make this not a special case.)

I'm not positive it is absolutely necessary.  AFAICT the function that
updates this particular wakeup time is conditionally called, so it at least
seems theoretically possible that we could end up spinning in a tight loop
until we attempt to start a new tablesync worker.  But perhaps this is
unlikely enough that we needn't worry about it.

I noticed that this wakeup time wasn't being updated when the number of
active tablesync workers is >= max_sync_workers_per_subscription.  In v3, I
tried to handle this by setting the wakeup time to a second later for this
case.  I think you could ordinarily depend on the tablesync worker's
notify_pid to wake up the apply worker, but that wouldn't work if the apply
worker has restarted.

Ultimately, this particular wakeup time seems to be a special case, and I
probably need to think about it some more.  If you have ideas, I'm all
ears.

> * It'd probably be worth enlarging the comment for
> LogRepWorkerComputeNextWakeup to explain why its API is like that,
> perhaps "We ask the caller to pass in the value of "now" because
> this frequently avoids multiple calls of GetCurrentTimestamp().
> It had better be a reasonably-up-to-date value, though."

I did this in v3.  I noticed that many of your comments also applied to the
similar patch that was recently applied to walreceiver.c, so I created
another patch to fix that up.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

pgsql-hackers by date:

Previous
From: Justin Pryzby
Date:
Subject: Re: [PATCH] Clarify the behavior of the system when approaching XID wraparound (stop telling users to "vacuum that database in single-user mode")
Next
From: "Karl O. Pinc"
Date:
Subject: Re: drop postmaster symlink