Thread: SELECT outage in semop
I'm using postgresql 9.3.4 on Red Hat Enterprise Linux Server release 6.5 (Santiago)
Linux 193-45-142-74 2.6.32-431.17.1.el6.x86_64 #1 SMP Fri Apr 11 17:27:00 EDT 2014 x86_64 x86_64 x86_64 GNU/LinuxServer specs:
4x Intel(R) Xeon(R) CPU E7- 4870 @ 2.40GHz (40 physical cores in total)
441 GB of RAMI have a schema when multi process daemon is setted up on the system and each process holds 1 postgresql session.
Each process of this daemon run readonly queries over the database.
In normal situation it at most 35 ms for queries but from time to time (at a random point of time) each database session hanges in some very strange semop call. Here is a part of the strace:
41733 20:15:09.682186 lseek(41, 0, SEEK_END) = 16384 <0.000007>
41733 20:15:09.682218 lseek(42, 0, SEEK_END) = 16384 <0.000008>
41733 20:15:09.682258 lseek(43, 0, SEEK_END) = 8192 <0.000007>
41733 20:15:09.682290 lseek(44, 0, SEEK_END) = 16384 <0.000007>
41733 20:15:09.682365 brk(0x1a79000) = 0x1a79000 <0.000010>
41733 20:15:09.682507 semop(393228, {{0, -1, 0}}, 1) = 0 <2.080439>
41733 20:15:11.769030 brk(0x1b79000) = 0x1b79000 <0.000028>
41733 20:15:11.774384 lseek(20, 0, SEEK_END) = 81920 <0.000032>
41733 20:15:11.774591 lseek(35, 0, SEEK_END) = 98263040 <0.000084>
41733 20:15:11.775000 brk(0x1b9b000) = 0x1b9b000 <0.000021>
41733 20:15:11.775741 lseek(35, 0, SEEK_END) = 98263040 <0.000056>
41733 20:15:11.776763 brk(0x19b9000) = 0x19b9000 <0.000329>
41733 20:15:11.777195 sendto(9,
41733 20:45:58.300097 lseek(39, 0, SEEK_END) = 32768 <0.000015>
41733 20:45:58.300167 lseek(40, 0, SEEK_END) = 32768 <0.000015>
41733 20:45:58.300244 lseek(41, 0, SEEK_END) = 16384 <0.000015>
41733 20:45:58.300314 lseek(42, 0, SEEK_END) = 16384 <0.000015>
41733 20:45:58.300384 lseek(43, 0, SEEK_END) = 8192 <0.000014>
41733 20:45:58.300452 lseek(44, 0, SEEK_END) = 16384 <0.000015>
41733 20:45:58.300599 brk(0x1a79000) = 0x1a79000 <0.000020>
41733 20:45:58.306472 brk(0x1b79000) = 0x1b79000 <0.000024>
41733 20:45:58.311412 lseek(20, 0, SEEK_END) = 81920 <0.000026>
41733 20:45:58.311649 lseek(35, 0, SEEK_END) = 98263040 <0.000022>
41733 20:45:58.312049 brk(0x1b9f000) = 0x1b9f000 <0.000021>
41733 20:45:58.312502 lseek(35, 0, SEEK_END) = 98263040 <0.000024>41733 20:45:58.313207 brk(0x19b9000) = 0x19b9000 <0.000243>
41733 20:45:58.313544 sendto(10,
You may see that semop took 2 seconds from the whole system call.
Same semops could be find in other database sessions.
Could you point me how can i find
Best Regards,
Suren Arustamyan
suren-a@inbox.ru
Here is the rest:
Once more trace where problem is seen:
41733 20:15:09.682258 lseek(43, 0, SEEK_END) = 8192 <0.000007>
41733 20:15:09.682290 lseek(44, 0, SEEK_END) = 16384 <0.000007>
41733 20:15:09.682365 brk(0x1a79000) = 0x1a79000 <0.000010>
41733 20:15:09.682507 semop(393228, {{0, -1, 0}}, 1) = 0 <2.080439>
41733 20:15:11.769030 brk(0x1b79000) = 0x1b79000 <0.000028>
Trace without the problem
41733 20:45:58.300384 lseek(43, 0, SEEK_END) = 8192 <0.000014>
41733 20:45:58.300452 lseek(44, 0, SEEK_END) = 16384 <0.000015>
41733 20:45:58.300599 brk(0x1a79000) = 0x1a79000 <0.000020>
41733 20:45:58.306472 brk(0x1b79000) = 0x1b79000 <0.000024>
41733 20:45:58.311412 lseek(20, 0, SEEK_END) = 81920 <0.000026>
41733 20:45:58.311649 lseek(35, 0, SEEK_END) = 98263040 <0.000022>
41733 20:45:58.312049 brk(0x1b9f000) = 0x1b9f000 <0.000021>
41733 20:45:58.312502 lseek(35, 0, SEEK_END) = 98263040 <0.000024>
41733 20:45:58.313207 brk(0x19b9000) = 0x19b9000 <0.000243>
41733 20:45:58.313544 sendto(10,
Please let me know if you have any ideas what those outages are and how can i remove them.
Fri, 30 May 2014 18:55:13 +0400 от Сурен Арустамян <suren-a@inbox.ru>:
I'm using postgresql 9.3.4 on Red Hat Enterprise Linux Server release 6.5 (Santiago)
Linux 193-45-142-74 2.6.32-431.17.1.el6.x86_64 #1 SMP Fri Apr 11 17:27:00 EDT 2014 x86_64 x86_64 x86_64 GNU/LinuxServer specs:
4x Intel(R) Xeon(R) CPU E7- 4870 @ 2.40GHz (40 physical cores in total)
441 GB of RAMI have a schema when multi process daemon is setted up on the system and each process holds 1 postgresql session.
Each process of this daemon run readonly queries over the database.
In normal situation it at most 35 ms for queries but from time to time (at a random point of time) each database session hanges in some very strange semop call. Here is a part of the strace:
41733 20:15:09.682186 lseek(41, 0, SEEK_END) = 16384 <0.000007>
41733 20:15:09.682218 lseek(42, 0, SEEK_END) = 16384 <0.000008>
41733 20:15:09.682258 lseek(43, 0, SEEK_END) = 8192 <0.000007>
41733 20:15:09.682290 lseek(44, 0, SEEK_END) = 16384 <0.000007>
41733 20:15:09.682365 brk(0x1a79000) = 0x1a79000 <0.000010>
41733 20:15:09.682507 semop(393228, {{0, -1, 0}}, 1) = 0 <2.080439>
41733 20:15:11.769030 brk(0x1b79000) = 0x1b79000 <0.000028>
41733 20:15:11.774384 lseek(20, 0, SEEK_END) = 81920 <0.000032>
41733 20:15:11.774591 lseek(35, 0, SEEK_END) = 98263040 <0.000084>
41733 20:15:11.775000 brk(0x1b9b000) = 0x1b9b000 <0.000021>
41733 20:15:11.775741 lseek(35, 0, SEEK_END) = 98263040 <0.000056>
41733 20:15:11.776763 brk(0x19b9000) = 0x19b9000 <0.000329>
41733 20:15:11.777195 sendto(9,
41733 20:45:58.300097 lseek(39, 0, SEEK_END) = 32768 <0.000015>
41733 20:45:58.300167 lseek(40, 0, SEEK_END) = 32768 <0.000015>
41733 20:45:58.300244 lseek(41, 0, SEEK_END) = 16384 <0.000015>
41733 20:45:58.300314 lseek(42, 0, SEEK_END) = 16384 <0.000015>
41733 20:45:58.300384 lseek(43, 0, SEEK_END) = 8192 <0.000014>
41733 20:45:58.300452 lseek(44, 0, SEEK_END) = 16384 <0.000015>
41733 20:45:58.300599 brk(0x1a79000) = 0x1a79000 <0.000020>
41733 20:45:58.306472 brk(0x1b79000) = 0x1b79000 <0.000024>
41733 20:45:58.311412 lseek(20, 0, SEEK_END) = 81920 <0.000026>
41733 20:45:58.311649 lseek(35, 0, SEEK_END) = 98263040 <0.000022>
41733 20:45:58.312049 brk(0x1b9f000) = 0x1b9f000 <0.000021>
41733 20:45:58.312502 lseek(35, 0, SEEK_END) = 98263040 <0.000024>41733 20:45:58.313207 brk(0x19b9000) = 0x19b9000 <0.000243>
41733 20:45:58.313544 sendto(10,
You may see that semop took 2 seconds from the whole system call.
Same semops could be find in other database sessions.
Could you point me how can i find
Best Regards,
Suren Arustamyan
suren-a@inbox.ru
Best Regards,
Сурен Арустамян
suren-a@inbox.ru
Сурен Арустамян wrote: > I'm using postgresql 9.3.4 on Red Hat Enterprise Linux Server release 6.5 (Santiago) > > Linux 193-45-142-74 2.6.32-431.17.1.el6.x86_64 #1 SMP Fri Apr 11 17:27:00 EDT 2014 x86_64 x86_64 > x86_64 GNU/Linux > > Server specs: > 4x Intel(R) Xeon(R) CPU E7- 4870 @ 2.40GHz (40 physical cores in total) > > > 441 GB of RAM > > I have a schema when multi process daemon is setted up on the system and each process holds 1 > postgresql session. > > Each process of this daemon run readonly queries over the database. > In normal situation it at most 35 ms for queries but from time to time (at a random point of time) > each database session hanges in some very strange semop call. Here is a part of the strace: [...] > 41733 20:15:09.682507 semop(393228, {{0, -1, 0}}, 1) = 0 <2.080439> [...] > You may see that semop took 2 seconds from the whole system call. > Same semops could be find in other database sessions. > > Could you point me how can i find What is your PostgreSQL configuration? Is your database workload read-only? If not, could these be locks? You could set log_lock_waits and see if anything is logged. Anything noteworthy in the database server log? How busy is the I/O system and the CPU when this happens? Yours, Laurenz Albe
Here are changes that were made on the postgresql.conf from the default configuration:
max_connections = 200
shared_buffers = 129215MB
work_mem = 256MB
maintenance_work_mem = 512MB
vacuum_cost_delay = 70
vacuum_cost_limit = 30
wal_level = hot_standby (system sends data to 1 slave server using hot standby streaming replication. Problem still observed when there is no replication running)
wal_buffers = 2MB
commit_delay = 500
checkpoint_segments = 256
wal_keep_segments = 512
enable_seqscan = off
effective_cache_size = 258430MB
max_locks_per_transaction = 128
In general system has write queries also but this daemon runs read only queries.
It aquired near the 20-30 ACCESS SHARE locks per query so the only way to lock them would be Exclusive lock.
There is no explicit exclusive locks in the application.
During the problem LA 0.1 - 2
No iowait.
Also interesting point i have setted up monitoring daemon that runs select from pg_stat_activity and from pg_locks each half a second and during the time i observer the problem daemon was not able to run those queries also - only after semop timeout.
Fri, 30 May 2014 15:19:05 +0000 от Albe Laurenz <laurenz.albe@wien.gv.at>:
> I'm using postgresql 9.3.4 on Red Hat Enterprise Linux Server release 6.5 (Santiago)
>
> Linux 193-45-142-74 2.6.32-431.17.1.el6.x86_64 #1 SMP Fri Apr 11 17:27:00 EDT 2014 x86_64 x86_64
> x86_64 GNU/Linux
>
> Server specs:
> 4x Intel(R) Xeon(R) CPU E7- 4870 @ 2.40GHz (40 physical cores in total)
>
>
> 441 GB of RAM
>
> I have a schema when multi process daemon is setted up on the system and each process holds 1
> postgresql session.
>
> Each process of this daemon run readonly queries over the database.
> In normal situation it at most 35 ms for queries but from time to time (at a random point of time)
> each database session hanges in some very strange semop call. Here is a part of the strace:
[...]
> 41733 20:15:09.682507 semop(393228, {{0, -1, 0}}, 1) = 0 <2.080439>
[...]
> You may see that semop took 2 seconds from the whole system call.
> Same semops could be find in other database sessions.
>
> Could you point me how can i find
What is your PostgreSQL configuration?
Is your database workload read-only?
If not, could these be locks?
You could set log_lock_waits and see if anything is logged.
Anything noteworthy in the database server log?
How busy is the I/O system and the CPU when this happens?
Yours,
Laurenz Albe
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Best Regards,
Suren Arustamyan
suren-a@inbox.ru