Re: [BUGS] BUG #14714: long running sessions from remote instanceseems to hang some times - Mailing list pgsql-bugs

From Pavel Stehule
Subject Re: [BUGS] BUG #14714: long running sessions from remote instanceseems to hang some times
Date
Msg-id CAFj8pRA_TOScxvsB8XwrtXRVGtL_0mgG-kWO055NkaRZhYBJRg@mail.gmail.com
Whole thread Raw
In response to Re: [BUGS] BUG #14714: long running sessions from remote instanceseems to hang some times  (Josef Machytka <josef.machytka@gmail.com>)
Responses Re: [BUGS] BUG #14714: long running sessions from remote instanceseems to hang some times  (Josef Machytka <josef.machytka@gmail.com>)
List pgsql-bugs


2017-06-23 11:26 GMT+02:00 Josef Machytka <josef.machytka@gmail.com>:
Today I got several sessions in this "drowsy" state and it almost caused collapse of the whole reporting system because aggregations were not fully calculated.
Looks like now on PG 9.6.3 is this problem even worse then on previous 9.6.x subversions...

I checked "top", "iotop -u postgres", "perf top" and "perf top -u postgres" and as far as I could see those sessions are running although they are marked as "S" sleeping in "top" command almost all the time.
From time to time they cause very low CPU load, something like from 0,5 to 0,8% - which is visible only in "perf" output. In "top" I see almost all the time 0.0% on CPU.
The same with disk IO - iotop shows only a few MB/s from time to time.
They are marked as "active" in "pg_stat_activity" and no wait events are shown.

GCE Instance has Debian 8, 10 CPUs, 46GB RAM
Our database has 2390 GB is distributed over 3 different tablespaces on 3 different GCE standard persistent disks, wal logs and temp tablespace have separate disk.

Out pg config is:
shared_buffers = 8GB
temp_buffers = 512MB
work_mem = 192MB
maintenance_work_mem = 2GB
max_connections = 150
max_wal_size = 8GB
min_wal_size = 4GB
max_worker_processes = 50
max_parallel_workers_per_gather = 4

There are several procedures which set work_mem to 256MB because tests showed it speeds up processing really significantly.
Reason is when we switched from PG 9.5 to PG 9.6 we found that many old queries had suddenly very bad explain plan and very long run time.
Only setting work_mem from previous 64MB to the current values helped to get the same or better results for those queries on 9.6.
Of course we upgraded instance with more memory. But we do not use swap.

Shall I look for some special perf symbols in output? Or is there a possibility to debug it deeper?


what is perf top output?

you can install debug symbols for postgres and for kernel  https://wiki.debian.org/DebugPackage

Regards

Pavel

Thanks

Josef


On 19 June 2017 at 17:56, Pavel Stehule <pavel.stehule@gmail.com> wrote:


2017-06-19 16:49 GMT+02:00 <josef.machytka@gmail.com>:
The following bug has been logged on the website:

Bug reference:      14714
Logged by:          Josef Machytka
Email address:      josef.machytka@gmail.com
PostgreSQL version: 9.6.3
Operating system:   Debian jessie
Description:

On pg 9.6.* we have sometimes strange problem with some sessions. We
experienced it from bash scripts, golang program and node.js applications.
Environment are Google Compute Engine instances with Debian 8.

Sometimes some session goes into some kind of "drowsy" state and runs
incredibly slowly. It causes some very low disk IO and runs like 20x slower
then usual. This happens mainly when database is under heavy load and it
happens randomly to different tasks. It happens randomly to both connections
from local instance (cronjob running on instance with PG) and remote
connections from other instances.

When we restart task it runs normally. I cannot see any changes in process
priority in OS, process switches between sleeping and running state and I do
not see any locks or waits in pg_stat_activity or pg_locks. Usually all
other processes finish OK and this "drowsy" one runs "for ever..."

We implemented timeouts for PG tasks into our programs to restart tasks but
it very annoying and we have to use long enough timeouts anyway so it delays
tasks.... It happens from time to time also to dblink connections inside pg
function so we had to rewrite our programs which were running without any
problems on PG 9.5. Problems started when we switched to PG 9.6 and we
experienced them on all minor versions - we currently run 9.6.3 and problem
is still present...

Do you please have any advice what to check or what setting might help?

it can be spin lock issue. Try to use "perf top". Install postgresql-server debug symbols before.

what is your share buffers configuration?

Regards

Pavel 




--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs



pgsql-bugs by date:

Previous
From: Josef Machytka
Date:
Subject: Re: [BUGS] BUG #14714: long running sessions from remote instanceseems to hang some times
Next
From: Josef Machytka
Date:
Subject: Re: [BUGS] BUG #14714: long running sessions from remote instanceseems to hang some times