Re: Postgres is not able to handle more than 4k tables!? - Mailing list pgsql-hackers

From Nikolay Samokhvalov
Subject Re: Postgres is not able to handle more than 4k tables!?
Date
Msg-id CANNMO++NMpDHD0Z2B0oVtaUeYmd+D=Hvd0u=WvGYhFJaOZoXtw@mail.gmail.com
Whole thread Raw
In response to Re: Postgres is not able to handle more than 4k tables!?  (Konstantin Knizhnik <k.knizhnik@postgrespro.ru>)
Responses Re: Postgres is not able to handle more than 4k tables!?  (Konstantin Knizhnik <k.knizhnik@postgrespro.ru>)
List pgsql-hackers
Hi Konstantin, a silly question: do you consider the workload you have as well-optimized? Can it be optimized further? Reading this thread I have a strong feeling that a very basic set of regular optimization actions is missing here (or not explained): query analysis and optimization based on pg_stat_statements (and, maybe pg_stat_kcache), some method to analyze the state of the server in general, resource consumption, etc.

Do you have some monitoring that covers pg_stat_statements?

Before looking under the hood, I would use multiple pg_stat_statements snapshots (can be analyzed using, say, postgres-checkup or pgCenter) to understand the workload and identify the heaviest queries -- first of all, in terms of total_time, calls, shared buffers reads/hits, temporary files generation. Which query groups are Top-N in each category, have you looked at it?

You mentioned some crazy numbers for the planning time, but why not to analyze the picture holistically and see the overall numbers? Those queries that have increased planning time, what their part of total_time, on the overall picture, in %? (Unfortunately, we cannot see Top-N by planning time in pg_stat_statements till PG13, but it doesn't mean that we cannot have some good understanding of overall picture today, it just requires more work).

If workload analysis & optimization was done holistically already, or not possible due to some reason — pardon me. But if not and if your primary goal is to improve this particular setup ASAP, then the topic could be started in the -performance mailing list first, discussing the workload and its aspects, and only after it's done, raised in -hackers. No?

On Thu, Jul 9, 2020 at 8:57 AM Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:
Hi Stephen,

Thank you for supporting an opinion that it is the problems not only of
client system design (I agree it is not so good
idea to have thousands tables and thousands active backends) but also of
Postgres.

We have made more investigation and found out one more problem in
Postgres causing "invalidation storm".
There are some log living transactions which prevent autovacuum to do it
work and  remove dead tuples.
So autovacuum is started once and once again and each time did no
progress but updated statistics and so sent invalidation messages.
autovacuum_naptime was set to 30 seconds, so each 30 seconds autovacuum
proceed huge number of tables and initiated large number of invalidation
messages which quite soon cause overflow of validation message buffers
for backends performing long OLAP queries.

It makes me think about two possible optimizations:

1. Provide separate invalidation messages for relation metadata and its
statistic.
So update of statistic should not invalidate relation cache.
The main problem with this proposal is that pg_class contains relpages
and reltuples columns which conceptually are \ part of relation statistic
but stored in relation cache. If relation statistic is updated, then
most likely this fields are also changed. So we have to remove this relation
from relation cache in any case.

2. Remember in relation info XID of oldest active transaction at the
moment of last autovacuum.
At next autovacuum iteration we first of all compare this stored XID
with current oldest active transaction XID
and bypass vacuuming this relation if XID is not changed.

Thoughts?

>> So, that's really the core of your problem.  We don't promise that
>> you can run several thousand backends at once.  Usually it's recommended
>> that you stick a connection pooler in front of a server with (at most)
>> a few hundred backends.
> Sure, but that doesn't mean things should completely fall over when we
> do get up to larger numbers of backends, which is definitely pretty
> common in larger systems.  I'm pretty sure we all agree that using a
> connection pooler is recommended, but if there's things we can do to
> make the system work at least a bit better when folks do use lots of
> connections, provided we don't materially damage other cases, that's
> probably worthwhile.

I also think that Postgres performance should degrade gradually with
increasing number
of active backends. Actually further investigations of this particular
case shows that such large number of
database connections was caused by ... Postgres slowdown.
During normal workflow number of active backends is few hundreds.
But "invalidation storm" cause hangout of queries, so user application
has to initiate more and more new connections to perform required actions.
Yes, this may be not the best behavior of application in this case. At
least it should first terminate current connection using
pg_terminate_backend. I just want to notice that large number of
backends was not the core of the problem.

>
> Making them GUCs does seem like it's a few steps too far... but it'd be
> nice if we could arrange to have values that don't result in the system
> falling over with large numbers of backends and large numbers of tables.
> To get a lot of backends, you'd have to set max_connections up pretty
> high to begin with- perhaps we should contemplate allowing these values
> to vary based on what max_connections is set to?

I think that optimal value of number of lock partitions should depend
not on number of connections
but on number of available CPU cores and so expected level on concurrency.
It is hard to propose some portable way to obtain this number.
This is why I think that GUCs is better solution.
Certainly I realize that it is very dangerous parameter which should be
changed with special care.
Not only because of  MAX_SIMUL_LWLOCKS.

There are few places in Postgres when it tries to lock all partitions
(deadlock detector, logical replication,...).
If there very thousands of partitions, then such lock will be too
expensive and we get yet another
popular Postgres program: "deadlock detection storm" when due to high
contention between backends lock can not be obtained
in deadlock timeout and so initiate deadlock detection. Simultaneous
deadlock detection performed by all backends
(which tries to take ALL partitions locks) paralyze the system (TPS
falls down to 0).
Proposed patch for this problem was also rejected (once again - problem
can be reproduced only of powerful server with large number of cores).


pgsql-hackers by date:

Previous
From: Konstantin Knizhnik
Date:
Subject: Re: Postgres is not able to handle more than 4k tables!?
Next
From: Alvaro Herrera
Date:
Subject: Re: Log the location field before any backtrace