"It seems your system is under so much stress that there was no resources for Patroni to execute HA loop for 35 seconds.
This interval exceeds ttl=30s, therefore the leader key expired, Patroni noticed it and demoted Postgres.
You need to figure out what is going on with your system, and what is the reason for cpu/memory pressure. Ideally fix these issues."
As company hundreds of clusters use ansible deployments use same parameters, change parameters for 1 cluster is difficult
I just think maybe can get top sql from pg_stat_statements as below then analyse and tuning
Is it correct direction? Any suggestions please, thanks
1 time IO SQL TOP 5
select userid::regrole, dbid, query from pg_stat_statements order by (blk_read_time+blk_write_time)/calls desc limit 5;
total IO SQL TOP 5
select userid::regrole, dbid, query from pg_stat_statements order by (blk_read_time+blk_write_time) desc limit 5;
1 time long SQL TOP 5
select userid::regrole, dbid, query from pg_stat_statements order by mean_time desc limit 5;
total time long SQL TOP 5
select userid::regrole, dbid, query from pg_stat_statements order by total_time desc limit 5;
average time long SQL TOP 5
select calls, total_time/calls AS avg_time, left(query,80) from pg_stat_statements order by 2 desc limit 5;
stddev time SQL
select userid::regrole, dbid, query from pg_stat_statements order by stddev_time desc limit 5;
share block SQL
select userid::regrole, dbid, query from pg_stat_statements order by (shared_blks_hit+shared_blks_dirtied) desc limit 5;
temp blk SQL
select userid::regrole, dbid, query from pg_stat_statements order by temp_blks_written desc limit 5;