I want to explain one bad situation we have encountered with one of our
customers.
There are ~5000 tables in their database. And what is worse - most of
them are actively used.
Then several flaws of Postgres make their system almost stuck.
Autovacuum is periodically processing all this 5k relations (because
them are actively updated).
And as far as most of this tables are small enough autovacuum complete
processing of them almost in the same time.
As a result autovacuum workers produce ~5k invalidation messages in
short period of time.
There are several thousand clients, most of which are executing complex
queries.
So them are not able to process all this invalidation messages and their
invalidation message buffer is overflown.
Size of this buffer is hardcoded (MAXNUMMESSAGES = 4096) and can not be
changed without recompilation of Postgres.
This is problem N1.
As a result resetState is set to true, forcing backends to invalidate
their caches.
So most of backends loose there cached metadata and have to access
system catalog trying to reload it.
But then we come to the next show stopper: NUM_LOCK_PARTITIONS.
It is also hardcoded and can't be changed without recompilation:
#define LOG2_NUM_LOCK_PARTITIONS 4
#define NUM_LOCK_PARTITIONS (1 << LOG2_NUM_LOCK_PARTITIONS)
Having just 16 LW-Locks greatly increase conflict probability (taken in
account that there are 5k tables and totally about 25k relations).
It cause huge lw-lock acquisition time for heap_open and planning stage
of some queries is increased from milliseconds to several minutes!
Koda!
This is problem number 2. But there is one more flaw we have faced with.
We have increased LOG2_NUM_LOCK_PARTITIONS to 8
and ... clients start to report "too many LWLocks taken" error.
There is yet another hardcoded constant MAX_SIMUL_LWLOCKS = 200
which relation with NUM_LOCK_PARTITIONS was not mentioned anywhere.
But there are several places in Postgres where it tries to hold all
partition locks (for example in deadlock detector).
Definitely if NUM_LOCK_PARTITIONS > MAX_SIMUL_LWLOCKS we get this error.
So looks like NUM_LOCK_PARTITIONS and MAXNUMMESSAGES constants have to
be replaced with GUCs.
To avoid division, we can specify log2 of this values, so shift can be
used instead.
And MAX_SIMUL_LWLOCKS should be defined as NUM_LOCK_PARTITIONS +
NUM_INDIVIDUAL_LWLOCKS + NAMED_LWLOCK_RESERVE.