On a database (PostgreSQL 8.2.4 on 64-bit Linux 2.6.18 on 8 AMD Opterons)
that is under high load, I observe the following:
- About 200 database sessions concurrently issue queries, most of them small,
but I have found one that touches 38000 table and index blocks.
- "vmstat" shows that CPU time is divided between "idle" and "iowait",
with user and sys time practically zero.
- the run queue is short, but the blocked queue (uninterruptible sleep) is around 10.
- Many context switches are reported (over hundred per second).
- "sar" says that the disk with the database is on 100% of its capacity.
Storage is on a SAN box.
Queries that normally take seconds at most require up to an hour to finish.
I ran "lsof -p" on a backend running the big query mentioned above, and
it does not use any temp files (work_mem = 20MB).
The query accesses only one table and its index.
What puzzles me is the "strace -tt" output from that backend:
13:44:58.263598 semop(393227, 0x7fff482f6050, 1) = 0
13:44:58.313448 semop(229382, 0x7fff482f6070, 1) = 0
13:44:58.313567 semop(393227, 0x7fff482f6050, 1) = 0
13:44:58.442917 semop(229382, 0x7fff482f6070, 1) = 0
13:44:58.443074 semop(393227, 0x7fff482f6050, 1) = 0
13:44:58.565313 semop(393227, 0x7fff482f6050, 1) = 0
13:44:58.682178 semop(229382, 0x7fff482f6070, 1) = 0
13:44:58.682333 semop(393227, 0x7fff482f6480, 1) = 0
13:44:58.807452 semop(393227, 0x7fff482f6050, 1) = 0
13:44:58.924425 semop(393227, 0x7fff482f6480, 1) = 0
13:44:58.924727 semop(393227, 0x7fff482f6050, 1) = 0
13:44:59.045456 semop(393227, 0x7fff482f6050, 1) = 0
13:44:59.169011 semop(393227, 0x7fff482f6480, 1) = 0
13:44:59.169226 semop(327689, 0x7fff482f64a0, 1) = 0
[many more semops]
13:44:59.602532 semop(327689, 0x7fff482f6070, 1) = 0
13:44:59.602648 lseek(32, 120176640, SEEK_SET) = 120176640
13:44:59.602742 read(32, "{\0\0\0xwv\227\1\0\0\0\320\0\350\0\0 \3 \237\300\1@\236\300\1`\235\300\1"..., 8192) = 8192
13:44:59.602825 semop(327689, 0x7fff482f64d0, 1) = 0
13:44:59.602872 semop(393227, 0x7fff482f6080, 1) = 0
13:44:59.602929 semop(393227, 0x7fff482f6050, 1) = 0
13:44:59.614559 semop(360458, 0x7fff482f6070, 1) = 0
[many more semops]
13:44:59.742103 semop(229382, 0x7fff482f64a0, 1) = 0
13:44:59.742172 semop(393227, 0x7fff482f6050, 1) = 0
13:44:59.756526 select(0, NULL, NULL, NULL, {0, 1000}) = 0 (Timeout)
13:44:59.758096 semop(393227, 0x7fff482f6480, 1) = 0
13:44:59.771655 semop(393227, 0x7fff482f6050, 1) = 0
[hundreds of semops]
13:45:14.339905 semop(393227, 0x7fff482f6050, 1) = 0
13:45:14.466992 semop(360458, 0x7fff482f6070, 1) = 0
13:45:14.467102 lseek(33, 332693504, SEEK_SET) = 332693504
13:45:14.467138 read(33, "{\0\0\0\210\235\351\331\1\0\0\0\204\0010\32\360\37\3 \340\237 \0\320\237 \0\300\237 \0"...,
8192)= 8192
13:45:14.599815 semop(163844, 0x7fff482f60a0, 1) = 0
13:45:14.600006 lseek(32, 125034496, SEEK_SET) = 125034496
13:45:14.600305 read(32, "{\0\0\0\230\257\270\227\1\0\0\0\330\0\340\0\0 \3 \237\300\1@\236\300\1`\235\300\1"..., 8192)
=8192
13:45:14.600391 semop(163844, 0x7fff482f64d0, 1) = 0
13:45:14.600519 semop(393227, 0x7fff482f6480, 1) = 0
and so on. File 32 is the table, file 33 is the index.
Many of the table and index blocks are probably already in shared memory
(shared_buffers = 6GB) and don't have to be read from disk.
My questions:
Is the long duration of the query caused by something else than I/O overload?
What are the semops? Lightweight locks waiting for shared buffer?
Are the lseek and read operations really that fast although the disk is on 100%?
Is this normal behavior under overload or is something ill tuned?
Yours,
Laurenz Albe