On Tue, Dec 13, 2022 at 04:41:05PM -0800, Nathan Bossart wrote:
> On Tue, Dec 13, 2022 at 07:20:14PM -0500, Tom Lane wrote:
>> I certainly don't think that "wake the apply launcher every 1ms"
>> is a sane configuration. Unless I'm missing something basic about
>> its responsibilities, it should seldom need to wake at all in
>> normal operation.
>
> This parameter appears to control how often the apply launcher starts new
> workers. If it starts new workers in a loop iteration, it updates its
> last_start_time variable, and it won't start any more workers until another
> wal_retrieve_retry_interval has elapsed. If no new workers need to be
> started, it only wakes up every 3 minutes.
Looking closer, I see that wal_retrieve_retry_interval is used for three
purposes. It's main purpose seems to be preventing busy-waiting in
WaitForWALToBecomeAvailable(), as that's what's documented. But it's also
used for logical replication. The apply launcher uses it as I've describe
above, and the apply workers use it when launching sync workers. Unlike
the apply launcher, the apply workers store the last start time for each
table's sync worker and use that to determine whether to start a new one.
My first thought is that the latter two uses should be moved to a new
parameter, and the apply launcher should store the last start time for each
apply worker like the apply workers do for the table-sync workers. In any
case, it probably makes sense to lower this parameter's value for testing
so that tests that restart these workers frequently aren't waiting for so
long.
I can put a patch together if this seems like a reasonable direction to go.
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com