On Wed, Jan 04, 2023 at 10:12:19AM -0800, Nathan Bossart wrote:
> From the discussion thus far, it sounds like the alternatives are to 1) add
> a global flag that causes wal_retrieve_retry_interval to be bypassed for
> all workers or to 2) add a hash map in the launcher and a
> restart_immediately flag in each worker slot. I'll go ahead and create a
> patch for 2 since it seems like the most complete solution, and we can
> evaluate whether the complexity seems appropriate.
Here is a first attempt at adding a hash table to the launcher and a
restart_immediately flag in each worker slot. This provides a similar
speedup to lowering wal_retrieve_retry_interval to 1ms. I've noted a
couple of possible race conditions in comments, but none of them seemed
particularly egregious. Ideally, we'd put the hash table in shared memory
so that other backends could adjust it directly, but IIUC that requires it
to be a fixed size, and the number of subscriptions is virtually unbounded.
There might still be problems with the patch, but I'm hoping it at least
helps further the discussion about which approach to take.
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com