Thread: SELECT outage in semop

SELECT outage in semop

From
Сурен Арустамян
Date:
Hi all,

I'm using postgresql 9.3.4 on Red Hat Enterprise Linux Server release 6.5 (Santiago) 

Linux 193-45-142-74 2.6.32-431.17.1.el6.x86_64 #1 SMP Fri Apr 11 17:27:00 EDT 2014 x86_64 x86_64 x86_64 GNU/LinuxServer specs:
4x Intel(R) Xeon(R) CPU E7- 4870  @ 2.40GHz (40 physical cores in total)

441 GB of RAMI have a schema when multi process daemon is setted up on the system and each process holds 1 postgresql session.

Each process of this daemon run readonly queries over the database.
In normal situation it at most 35 ms for queries but from time to time (at a random point of time) each database session hanges in some very strange semop call. Here is a part of the strace:

41733 20:15:09.682186 lseek(41, 0, SEEK_END) = 16384 <0.000007>
41733 20:15:09.682218 lseek(42, 0, SEEK_END) = 16384 <0.000008>
41733 20:15:09.682258 lseek(43, 0, SEEK_END) = 8192 <0.000007>
41733 20:15:09.682290 lseek(44, 0, SEEK_END) = 16384 <0.000007>
41733 20:15:09.682365 brk(0x1a79000)    = 0x1a79000 <0.000010>
41733 20:15:09.682507 semop(393228, {{0, -1, 0}}, 1) = 0 <2.080439>
41733 20:15:11.769030 brk(0x1b79000)    = 0x1b79000 <0.000028>
41733 20:15:11.774384 lseek(20, 0, SEEK_END) = 81920 <0.000032>
41733 20:15:11.774591 lseek(35, 0, SEEK_END) = 98263040 <0.000084>
41733 20:15:11.775000 brk(0x1b9b000)    = 0x1b9b000 <0.000021>
41733 20:15:11.775741 lseek(35, 0, SEEK_END) = 98263040 <0.000056>
41733 20:15:11.776763 brk(0x19b9000)    = 0x19b9000 <0.000329>
41733 20:15:11.777195 sendto(9,

41733 20:45:58.300097 lseek(39, 0, SEEK_END) = 32768 <0.000015>

41733 20:45:58.300167 lseek(40, 0, SEEK_END) = 32768 <0.000015>

41733 20:45:58.300244 lseek(41, 0, SEEK_END) = 16384 <0.000015>

41733 20:45:58.300314 lseek(42, 0, SEEK_END) = 16384 <0.000015>

41733 20:45:58.300384 lseek(43, 0, SEEK_END) = 8192 <0.000014>

41733 20:45:58.300452 lseek(44, 0, SEEK_END) = 16384 <0.000015>

41733 20:45:58.300599 brk(0x1a79000)    = 0x1a79000 <0.000020>

41733 20:45:58.306472 brk(0x1b79000)    = 0x1b79000 <0.000024>

41733 20:45:58.311412 lseek(20, 0, SEEK_END) = 81920 <0.000026>

41733 20:45:58.311649 lseek(35, 0, SEEK_END) = 98263040 <0.000022>

41733 20:45:58.312049 brk(0x1b9f000)    = 0x1b9f000 <0.000021>

41733 20:45:58.312502 lseek(35, 0, SEEK_END) = 98263040 <0.000024>41733 20:45:58.313207 brk(0x19b9000)    = 0x19b9000 <0.000243>

41733 20:45:58.313544 sendto(10,
You may see that semop took 2 seconds from the whole system call.
Same semops could be find in other database sessions.

Could you point me how can i find

Best Regards,
Suren Arustamyan
suren-a@inbox.ru

Re: SELECT outage in semop

From
Сурен Арустамян
Date:
Excuse me last e-mail was not full.
Here is the rest:

Once more trace where problem is seen:

41733 20:15:09.682258 lseek(43, 0, SEEK_END) = 8192 <0.000007>
41733 20:15:09.682290 lseek(44, 0, SEEK_END) = 16384 <0.000007>
41733 20:15:09.682365 brk(0x1a79000) = 0x1a79000 <0.000010>
41733 20:15:09.682507 semop(393228, {{0, -1, 0}}, 1) = 0 <2.080439>
41733 20:15:11.769030 brk(0x1b79000) = 0x1b79000 <0.000028>
Trace without the problem

41733 20:45:58.300384 lseek(43, 0, SEEK_END) = 8192 <0.000014>
41733 20:45:58.300452 lseek(44, 0, SEEK_END) = 16384 <0.000015>
41733 20:45:58.300599 brk(0x1a79000) = 0x1a79000 <0.000020>
41733 20:45:58.306472 brk(0x1b79000) = 0x1b79000 <0.000024>
41733 20:45:58.311412 lseek(20, 0, SEEK_END) = 81920 <0.000026>
41733 20:45:58.311649 lseek(35, 0, SEEK_END) = 98263040 <0.000022>
41733 20:45:58.312049 brk(0x1b9f000) = 0x1b9f000 <0.000021>
41733 20:45:58.312502 lseek(35, 0, SEEK_END) = 98263040 <0.000024>
41733 20:45:58.313207 brk(0x19b9000) = 0x19b9000 <0.000243>
41733 20:45:58.313544 sendto(10,
Please let me know if you have any ideas what those outages are and how can i remove them. 



Fri, 30 May 2014 18:55:13 +0400 от Сурен Арустамян <suren-a@inbox.ru>:

Hi all,

I'm using postgresql 9.3.4 on Red Hat Enterprise Linux Server release 6.5 (Santiago) 

Linux 193-45-142-74 2.6.32-431.17.1.el6.x86_64 #1 SMP Fri Apr 11 17:27:00 EDT 2014 x86_64 x86_64 x86_64 GNU/LinuxServer specs:
4x Intel(R) Xeon(R) CPU E7- 4870  @ 2.40GHz (40 physical cores in total)

441 GB of RAMI have a schema when multi process daemon is setted up on the system and each process holds 1 postgresql session.

Each process of this daemon run readonly queries over the database.
In normal situation it at most 35 ms for queries but from time to time (at a random point of time) each database session hanges in some very strange semop call. Here is a part of the strace:

41733 20:15:09.682186 lseek(41, 0, SEEK_END) = 16384 <0.000007>
41733 20:15:09.682218 lseek(42, 0, SEEK_END) = 16384 <0.000008>
41733 20:15:09.682258 lseek(43, 0, SEEK_END) = 8192 <0.000007>
41733 20:15:09.682290 lseek(44, 0, SEEK_END) = 16384 <0.000007>
41733 20:15:09.682365 brk(0x1a79000)    = 0x1a79000 <0.000010>
41733 20:15:09.682507 semop(393228, {{0, -1, 0}}, 1) = 0 <2.080439>
41733 20:15:11.769030 brk(0x1b79000)    = 0x1b79000 <0.000028>
41733 20:15:11.774384 lseek(20, 0, SEEK_END) = 81920 <0.000032>
41733 20:15:11.774591 lseek(35, 0, SEEK_END) = 98263040 <0.000084>
41733 20:15:11.775000 brk(0x1b9b000)    = 0x1b9b000 <0.000021>
41733 20:15:11.775741 lseek(35, 0, SEEK_END) = 98263040 <0.000056>
41733 20:15:11.776763 brk(0x19b9000)    = 0x19b9000 <0.000329>
41733 20:15:11.777195 sendto(9,

41733 20:45:58.300097 lseek(39, 0, SEEK_END) = 32768 <0.000015>

41733 20:45:58.300167 lseek(40, 0, SEEK_END) = 32768 <0.000015>

41733 20:45:58.300244 lseek(41, 0, SEEK_END) = 16384 <0.000015>

41733 20:45:58.300314 lseek(42, 0, SEEK_END) = 16384 <0.000015>

41733 20:45:58.300384 lseek(43, 0, SEEK_END) = 8192 <0.000014>

41733 20:45:58.300452 lseek(44, 0, SEEK_END) = 16384 <0.000015>

41733 20:45:58.300599 brk(0x1a79000)    = 0x1a79000 <0.000020>

41733 20:45:58.306472 brk(0x1b79000)    = 0x1b79000 <0.000024>

41733 20:45:58.311412 lseek(20, 0, SEEK_END) = 81920 <0.000026>

41733 20:45:58.311649 lseek(35, 0, SEEK_END) = 98263040 <0.000022>

41733 20:45:58.312049 brk(0x1b9f000)    = 0x1b9f000 <0.000021>

41733 20:45:58.312502 lseek(35, 0, SEEK_END) = 98263040 <0.000024>41733 20:45:58.313207 brk(0x19b9000)    = 0x19b9000 <0.000243>

41733 20:45:58.313544 sendto(10,
You may see that semop took 2 seconds from the whole system call.
Same semops could be find in other database sessions.

Could you point me how can i find

Best Regards,
Suren Arustamyan
suren-a@inbox.ru



Best Regards,
Сурен Арустамян
suren-a@inbox.ru

Re: SELECT outage in semop

From
Albe Laurenz
Date:
Сурен Арустамян wrote:

> I'm using postgresql 9.3.4 on Red Hat Enterprise Linux Server release 6.5 (Santiago)
> 
> Linux 193-45-142-74 2.6.32-431.17.1.el6.x86_64 #1 SMP Fri Apr 11 17:27:00 EDT 2014 x86_64 x86_64
> x86_64 GNU/Linux
> 
> Server specs:
> 4x Intel(R) Xeon(R) CPU E7- 4870  @ 2.40GHz (40 physical cores in total)
> 
> 
> 441 GB of RAM
> 
> I have a schema when multi process daemon is setted up on the system and each process holds 1
> postgresql session.
> 
> Each process of this daemon run readonly queries over the database.
> In normal situation it at most 35 ms for queries but from time to time (at a random point of time)
> each database session hanges in some very strange semop call. Here is a part of the strace:

[...]

> 41733 20:15:09.682507 semop(393228, {{0, -1, 0}}, 1) = 0 <2.080439>

[...]

> You may see that semop took 2 seconds from the whole system call.
> Same semops could be find in other database sessions.
> 
> Could you point me how can i find

What is your PostgreSQL configuration?

Is your database workload read-only?
If not, could these be locks?
You could set log_lock_waits and see if anything is logged.

Anything noteworthy in the database server log?
How busy is the I/O system and the CPU when this happens?

Yours,
Laurenz Albe

Re[2]: [PERFORM] SELECT outage in semop

From
Suren Arustamyan
Date:
Hello Albe,

Here are changes that were made on the postgresql.conf from the default configuration:

max_connections = 200

shared_buffers = 129215MB

work_mem = 256MB

maintenance_work_mem = 512MB

vacuum_cost_delay = 70

vacuum_cost_limit = 30

wal_level = hot_standby  (system sends data to 1 slave server using hot standby streaming replication. Problem still observed when there is no replication running)

wal_buffers = 2MB

commit_delay = 500

checkpoint_segments = 256

wal_keep_segments = 512

enable_seqscan = off

effective_cache_size = 258430MB

max_locks_per_transaction = 128


In general system has write queries also but this daemon runs read only queries.

It aquired near the 20-30 ACCESS SHARE locks per query so the only way to lock them would be Exclusive lock.
There is no explicit exclusive locks in the application.

During the problem LA 0.1 - 2 
No iowait. 

Also interesting point i have setted up monitoring daemon that runs select from pg_stat_activity and from pg_locks each half a second and during the time i observer the problem daemon was not able to run those queries also - only after semop timeout.






Fri, 30 May 2014 15:19:05 +0000 от Albe Laurenz <laurenz.albe@wien.gv.at>:

Сурен Арустамян wrote:

> I'm using postgresql 9.3.4 on Red Hat Enterprise Linux Server release 6.5 (Santiago)
>
> Linux 193-45-142-74 2.6.32-431.17.1.el6.x86_64 #1 SMP Fri Apr 11 17:27:00 EDT 2014 x86_64 x86_64
> x86_64 GNU/Linux
>
> Server specs:
> 4x Intel(R) Xeon(R) CPU E7- 4870 @ 2.40GHz (40 physical cores in total)
>
>
> 441 GB of RAM
>
> I have a schema when multi process daemon is setted up on the system and each process holds 1
> postgresql session.
>
> Each process of this daemon run readonly queries over the database.
> In normal situation it at most 35 ms for queries but from time to time (at a random point of time)
> each database session hanges in some very strange semop call. Here is a part of the strace:

[...]

> 41733 20:15:09.682507 semop(393228, {{0, -1, 0}}, 1) = 0 <2.080439>

[...]

> You may see that semop took 2 seconds from the whole system call.
> Same semops could be find in other database sessions.
>
> Could you point me how can i find

What is your PostgreSQL configuration?

Is your database workload read-only?
If not, could these be locks?
You could set log_lock_waits and see if anything is logged.

Anything noteworthy in the database server log?
How busy is the I/O system and the CPU when this happens?

Yours,
Laurenz Albe

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



Best Regards,
Suren Arustamyan
suren-a@inbox.ru