LockWaiterCount() bravely accesses a shared memory data structure that is mutable with no locking at all. That might actually be safe for our purposes, but I think it would be better to take the partition lock in shared mode if it doesn't cost too much performance. If that's too expensive, then it should at least have a comment explaining (1) that it is doing this without the lock and (2) why that's safe (sketch: the LOCK can't go away because we hold it, and nRequested could change but we don't mind a stale value, and a 32-bit read won't be torn).
With LWLock also performance are equally good so added the lock.
A few of the other functions in here also lack comments, and perhaps should have them.
The changes to RelationGetBufferForTuple need a visit from the refactoring police. Look at the way you are calling RelationAddOneBlock. The logic looks about like this:
if (needLock) { if (trylock relation for extension) RelationAddOneBlock(); else { lock relation for extension; if (use fsm) { complicated; } else RelationAddOneBlock(); } else RelationAddOneBlock();
So basically you want to do the RelationAddOneBlock() thing if !needLock || !use_fsm || can't trylock. See if you can rearrange the code so that there's only one fallthrough call to RelationAddOneBlock() instead of three separate ones.
Actually in every case we need one blocks, So I have re factored it and RelationAddOneBlock is now out of any condition.
Also, consider moving the code that adds multiple blocks at a time to its own function instead of including it all in line.