Thread: Postgresql 9.4.1 stuck all queries when making multi updates

Postgresql 9.4.1 stuck all queries when making multi updates

From

Ilya Bazylchuk

Date:

03 April 2015, 14:14:52

Before i used 9.3.5 and servers with ubuntu 12 with 32GB memory.

After upgrade to 9.4.1, with more power server 60GB memory on each in =
wall replication and ubuntu 14, started get db stucks when run multi =
update from resque/sidekiq background workers. resque/sidekiq work via =
pgpool 3.3

Queries can be as simple, update column where primary id =3D id. and =
complex, doesn't metter. usually average connections about less 300, but =
when queries stucks, count connections grow to maximum and i get

LOG:  process 16121 still waiting for ShareLock on transaction =
2448707428 after 1000.121 ms
DETAIL:  Process holding the lock: 16139. Wait queue: 16121.
LOG:  process 16146 still waiting for ExclusiveLock on tuple (665346,11) =
of relation 997395 of database 16455 after 1000.102 ms
DETAIL:  Process holding the lock: 16121. Wait queue: 16146.
ERROR:  canceling statement due to lock timeout
then

FATAL:  sorry, too many clients already
then only restart help, when run restart got this

WARNING:  terminating connection because of crash of another server =
process
DETAIL:  The postmaster has commanded this server process to roll back =
the current transaction and exit, because another server process exited =
abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and =
repeat your command.
Current settings,

Database size 190GB
max_connections =3D 400
shared_buffers =3D 13GB
work_mem =3D 19660kB
maintenance_work_mem =3D 2GB
effective_cache_size =3D 40GB
But i played with different, didn't help. On 9.3 all was fine. Nothing =
more in logs.

Also in one day i did pg_repack, on next day db was down with usual =
queries.=

Re: Postgresql 9.4.1 stuck all queries when making multi updates

From

Venkata Balaji N

Date:

04 April 2015, 03:10:51

On Fri, Apr 3, 2015 at 10:14 PM, Ilya Bazylchuk <ilya.bazylchuk@gmail.com>
wrote:

> Before i used 9.3.5 and servers with ubuntu 12 with 32GB memory.
>
> After upgrade to 9.4.1, with more power server 60GB memory on each in wall
> replication and ubuntu 14, started get db stucks when run multi update from
> resque/sidekiq background workers. resque/sidekiq work via pgpool 3.3
>
Do you see any messages in pgpool logs ?


> Queries can be as simple, update column where primary id = id. and
> complex, doesn't metter. usually average connections about less 300, but
> when queries stucks, count connections grow to maximum and i get
>
> LOG:  process 16121 still waiting for ShareLock on transaction 2448707428 after 1000.121 ms
> DETAIL:  Process holding the lock: 16139. Wait queue: 16121.
> LOG:  process 16146 still waiting for ExclusiveLock on tuple (665346,11) of relation 997395 of database 16455 after
1000.102ms 
> DETAIL:  Process holding the lock: 16121. Wait queue: 16146.
> ERROR:  canceling statement due to lock timeout
>
> Any idea, how long the locks were hanging in there for ?
Do you have any SQL timeouts configured in postgresql.conf ?


> then
>
> FATAL:  sorry, too many clients already
>
> then only restart help, when run restart got this
>
> WARNING:  terminating connection because of crash of another server process
> DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because
anotherserver process exited abnormally and possibly corrupted shared memory. 
> HINT:  In a moment you should be able to reconnect to the database and repeat your command.
>
> This precisely means, one of the connections got crashed/killed.

Regards,
Venkata Balaji N

Fujitsu Australia

Re: Postgresql 9.4.1 stuck all queries when making multi updates

From

Ilya Bazylchuk

Date:

04 April 2015, 12:34:38

nothing in pgpool. Pgpool use only resque/sidekiq. Rest use unicorn =
directly to master.
1. 45 connections
2. 45 connections
3. 30 connections
4. 10 connections

When connections stucks, growth count connections from unicorn only. =
Seems it not granted locks, but weird that i didn=E2=80=99t have it on =
9.3.

In general didn=E2=80=99t change except new pg servers.

I added log for not granted locks, and in 12 hours got only one, and it =
was short and on another table, which never presented in pg logs. will =
see more.

lock_timeout: 30000
deadlock_timeout: 1s
statement_timeout: 0

I know that set statement_timeout isn=E2=80=99t recommended, but =
what=E2=80=99s better solution, don=E2=80=99t get stuck db?


> 4 =D0=B0=D0=BF=D1=80. 2015 =D0=B3., =D0=B2 3:10, Venkata Balaji N =
<nag1010@gmail.com> =D0=BD=D0=B0=D0=BF=D0=B8=D1=81=D0=B0=D0=BB(=D0=B0):
>=20
>=20
> On Fri, Apr 3, 2015 at 10:14 PM, Ilya Bazylchuk =
<ilya.bazylchuk@gmail.com <mailto:ilya.bazylchuk@gmail.com>> wrote:
> Before i used 9.3.5 and servers with ubuntu 12 with 32GB memory.
>=20
> After upgrade to 9.4.1, with more power server 60GB memory on each in =
wall replication and ubuntu 14, started get db stucks when run multi =
update from resque/sidekiq background workers. resque/sidekiq work via =
pgpool 3.3
>=20
> Do you see any messages in pgpool logs ?
> =20
> Queries can be as simple, update column where primary id =3D id. and =
complex, doesn't metter. usually average connections about less 300, but =
when queries stucks, count connections grow to maximum and i get
>=20
> LOG:  process 16121 still waiting for ShareLock on transaction =
2448707428 after 1000.121 ms
> DETAIL:  Process holding the lock: 16139. Wait queue: 16121.
> LOG:  process 16146 still waiting for ExclusiveLock on tuple =
(665346,11) of relation 997395 of database 16455 after 1000.102 ms
> DETAIL:  Process holding the lock: 16121. Wait queue: 16146.
> ERROR:  canceling statement due to lock timeout
>=20
> Any idea, how long the locks were hanging in there for ?
> Do you have any SQL timeouts configured in postgresql.conf ?
> =20
> then
>=20
> FATAL:  sorry, too many clients already
> then only restart help, when run restart got this
>=20
> WARNING:  terminating connection because of crash of another server =
process
> DETAIL:  The postmaster has commanded this server process to roll back =
the current transaction and exit, because another server process exited =
abnormally and possibly corrupted shared memory.
> HINT:  In a moment you should be able to reconnect to the database and =
repeat your command.
>=20
> This precisely means, one of the connections got crashed/killed.
>=20
> Regards,
> Venkata Balaji N
>=20
> Fujitsu Australia