Most likely HASHHDR.mutex is not only bottleneck in your case so my patch doesn't help much. Unfortunately I don't have access to any POWER8 server so I can't investigate this issue. I suggest to use a gettimeofday trick I described in a first message of this thread. Its time consuming but it gives a clear understanding which code is keeping a lock.
I have also tested the pgbench Readonly test when data don't fit into shared buffer, Because in this case HASHHDR.mutex access will be quite frequent. And in this case i do see very good improvement in POWER8 server.