Re: suppressing useless wakeups in logical/worker.c - Mailing list pgsql-hackers
From | Nathan Bossart |
---|---|
Subject | Re: suppressing useless wakeups in logical/worker.c |
Date | |
Msg-id | 20230125235004.GA1327755@nathanxps13 Whole thread Raw |
In response to | Re: suppressing useless wakeups in logical/worker.c (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: suppressing useless wakeups in logical/worker.c
|
List | pgsql-hackers |
On Tue, Jan 24, 2023 at 06:45:08PM -0500, Tom Lane wrote: > I took a look through this, and have a number of mostly-cosmetic > issues: Thanks for the detailed review. > * It seems wrong that next_sync_start isn't handled as one of the > wakeup[NUM_LRW_WAKEUPS] entries. I see that it needs to be accessed from > another module; but you could handle that without exposing either enum > LogRepWorkerWakeupReason or the array, by providing getter/setter > functions for process_syncing_tables_for_apply() to call. > > * This code is far too intimately familiar with the fact that TimestampTz > is an int64 count of microseconds. (I'm picky about that because I > remember that they were once something else, so I wonder if someday > they will be different again.) You could get rid of the PG_INT64_MAX > usages by replacing those with the timestamp infinity macro DT_NOEND; > and I'd even be on board with adding a less-opaque alternate name for > that to datatype/timestamp.h. The various magic-constant multipliers > could perhaps be made less magic by using TimestampTzPlusMilliseconds(). > > * I think it might be better to construct the enum like this: > > +typedef enum LogRepWorkerWakeupReason > +{ > + LRW_WAKEUP_TERMINATE, > + LRW_WAKEUP_PING, > + LRW_WAKEUP_STATUS > +#define NUM_LRW_WAKEUPS (LRW_WAKEUP_STATUS + 1) > +} LogRepWorkerWakeupReason; > > so that you don't have to have a default: case in switches on the > enum value. I'm more worried about somebody adding an enum value > and missing updating a switch statement elsewhere than I am about > somebody adding an enum value and neglecting to update the > immediately-adjacent macro. I did all of this in v3. > * The updates of "now" in LogicalRepApplyLoop seem rather > randomly placed, and I'm not entirely convinced that we'll > always be using a reasonably up-to-date value. Can't we > just update it right before each usage? This came up for walreceiver.c, too. The concern is that GetCurrentTimestamp() might be rather expensive on systems without something like the vDSO. I don't know how common that is. If we can rule that out, then I agree that we should just update it right before each use. > * This special handling of next_sync_start seems mighty ugly: > > + /* Also consider special wakeup time for starting sync workers. */ > + if (next_sync_start < now) > + { > + /* > + * Instead of spinning while we wait for the sync worker to > + * start, wait a bit before retrying (unless there's an earlier > + * wakeup time). > + */ > + nextWakeup = Min(now + INT64CONST(100000), nextWakeup); > + } > + else > + nextWakeup = Min(next_sync_start, nextWakeup); > > Do we really need the slop? If so, is there a reason why it > shouldn't apply to all possible sources of nextWakeup? (It's > going to be hard to fold next_sync_start into the wakeup[] > array unless you can make this not a special case.) I'm not positive it is absolutely necessary. AFAICT the function that updates this particular wakeup time is conditionally called, so it at least seems theoretically possible that we could end up spinning in a tight loop until we attempt to start a new tablesync worker. But perhaps this is unlikely enough that we needn't worry about it. I noticed that this wakeup time wasn't being updated when the number of active tablesync workers is >= max_sync_workers_per_subscription. In v3, I tried to handle this by setting the wakeup time to a second later for this case. I think you could ordinarily depend on the tablesync worker's notify_pid to wake up the apply worker, but that wouldn't work if the apply worker has restarted. Ultimately, this particular wakeup time seems to be a special case, and I probably need to think about it some more. If you have ideas, I'm all ears. > * It'd probably be worth enlarging the comment for > LogRepWorkerComputeNextWakeup to explain why its API is like that, > perhaps "We ask the caller to pass in the value of "now" because > this frequently avoids multiple calls of GetCurrentTimestamp(). > It had better be a reasonably-up-to-date value, though." I did this in v3. I noticed that many of your comments also applied to the similar patch that was recently applied to walreceiver.c, so I created another patch to fix that up. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Attachment
pgsql-hackers by date: