Well, maybe I'm missing something, but sh_mq_create() will just overwrite the contents of the struct, so it doesn't care about sender/receiver: only sh_mq_set_sender/receiver() do.
if you create sh_mq from scratch, then you can reuse structure.
Please find attached a v3.
It uses a shared memory queue and also has the ability to capture plans nested deeply in the call stack. Not sure about using the executor hook, since this is not an extension...
The LWLock is used around initializing/cleaning the shared struct and the message queue, the IO synchronization is handled by the message queue itself. After some testing with concurrent pgbench and intentionally deep recursive plpgsql functions (up to 700 plpgsql stack frames) I think this approach can work. Unless there's some theoretical problem I'm just not aware of. :-)