Konstantin Knizhnik <k.knizhnik@postgrespro.ru> writes:
> There are several thousand clients, most of which are executing complex
> queries.
So, that's really the core of your problem. We don't promise that
you can run several thousand backends at once. Usually it's recommended
that you stick a connection pooler in front of a server with (at most)
a few hundred backends.
> So them are not able to process all this invalidation messages and their
> invalidation message buffer is overflown.
> Size of this buffer is hardcoded (MAXNUMMESSAGES = 4096) and can not be
> changed without recompilation of Postgres.
> This is problem N1.
No, this isn't a problem. Or at least you haven't shown a reason to
think it is. Sinval overruns are somewhat routine, and we certainly
test that code path (see CLOBBER_CACHE_ALWAYS buildfarm animals).
> But then we come to the next show stopper: NUM_LOCK_PARTITIONS.
> It is also hardcoded and can't be changed without recompilation:
> #define LOG2_NUM_LOCK_PARTITIONS 4
> #define NUM_LOCK_PARTITIONS (1 << LOG2_NUM_LOCK_PARTITIONS)
> Having just 16 LW-Locks greatly increase conflict probability (taken in
> account that there are 5k tables and totally about 25k relations).
> It cause huge lw-lock acquisition time for heap_open and planning stage
> of some queries is increased from milliseconds to several minutes!
Really?
> This is problem number 2. But there is one more flaw we have faced with.
> We have increased LOG2_NUM_LOCK_PARTITIONS to 8
> and ... clients start to report "too many LWLocks taken" error.
> There is yet another hardcoded constant MAX_SIMUL_LWLOCKS = 200
> which relation with NUM_LOCK_PARTITIONS was not mentioned anywhere.
Seems like self-inflicted damage. I certainly don't recall anyplace
in the docs where we suggest that you can alter that constant without
worrying about consequences.
> So looks like NUM_LOCK_PARTITIONS and MAXNUMMESSAGES constants have to
> be replaced with GUCs.
I seriously doubt we'd do that.
regards, tom lane