Hallo,
Im running pg-8,pgpoolII on sol10-zone.
After update sol10u7, queries on coltype timestamp are very slow.
System: sparc, 2GB RAM
This DB is a greylist-DB to fight spam.
500 connections should be easy.
But 16 connection consum 10sec/query.
On another system (sparc) only 1 sec.i
s. details
howto diag?
regards heiko
- details
$ /opt/csw/postgresql/bin/postmaster -V
postgres (PostgreSQL) 8.3.1
- numDS: 200k-800k
- conn: 50-500
- max age: 2d
- testbed
tab=blacklist
filter="^ .*[0-9]|elaps"
cmd="/usr/bin/time psql -t -p 5432 -h 192.168.5.126 -U smtpuser smtp"
sql="select count(*) from $tab";
sql1="select count(*) from $tab where create_time > abstime(int4(timenow()) -30
00)";
sql2="select count(*) from $tab where create_time < abstime(int4(timenow()) -30
00)";
echo "$sql" | $cmd 2>&1 | egrep "$filter";
echo "$sql1" | $cmd 2>&1 | egrep "$filter";
echo "$sql2" | $cmd 2>&1 | egrep "$filter";
- result:
152603
0.01user 0.01system 0:00.29elapsed 10%CPU (0avgtext+0avgdata 0maxresident)k
0
0.02user 0.01system 0:00.05elapsed 58%CPU (0avgtext+0avgdata 0maxresident)k
152603
0.00user 0.02system 0:01.95elapsed 1%CPU (0avgtext+0avgdata 0maxresident)k
- select without where: 150000 / 0.3 s
- select with where: 150000 / 2 s
- time is depended on recv DS (linear)
I simulate any parallel queries:
$ ./sqltestcon.sh -v 1 -i 20 -min 1 -max 64 -sql "$sql1; $sql2" -cmd "$cmd"
3081844 sum: 1 (5) * 13 = 13 -> 0 R/s, e=0 0:01.36elapsed 3%CPU
3081901 sum: 2 (7) * 10 = 20 -> 1 R/s, e=0 0:01.53elapsed 3%CPU
3081922 sum: 4 (11) * 8 = 32 -> 1 R/s, e=0 0:02.40elapsed 0%CPU
3081947 sum: 8 (19) * 5 = 40 -> 2 R/s, e=0 0:04.69elapsed 0%CPU
3082005 sum: 16 (36) * 2 = 32 -> 1 R/s, e=0 0:09.06elapsed 0%CPU
3082043 sum: 32 (67) * 2 = 64 -> 3 R/s, e=0 0:17.83elapsed 0%CPU
3082119 sum: 64 (130) * 1 = 64 -> 3 R/s, e=0 0:34.19elapsed 0%CPU
-> 16 connections: only 2 passes -> aprox. 10s /query
$ connstat -cols "_sy_load 5432 5433 _ps_pool _ps_postmaster" 60
## time load 5432 5433 pgpool postmaster
08:18:34 026 0 1 2 553 6
08:19:36 471 1 10 553 14
08:20:40 1804 2 35 553 42
08:21:51 5877 63 128 553 132
-> 128 connections: Load=58!
- following test are used:
- reindex
- drop, create index
- drop, create database
- zfs recordsize: 128,8 -> no differ
- zfs load: 10% ops, 10% read
- shmmax: (0x80000000)
- shared_buffers
- work_mem 64,256,1024
- wal_buffers
- effective_cache_size
- #log_disconnections = off
- autovacuum = on , off
What can i do?
Howto interpret explain output?
$ echo "EXPLAIN ANALYSE $sql2" | $cmd 2>&1
Aggregate (cost=5236.53..5236.54 rows=1 width=0) (actual time=2018.985..2018.9
86 rows=1 loops=1)
-> Seq Scan on blacklist (cost=0.00..4855.06 rows=152588 width=0) (actual t
ime=0.329..1883.275 rows=152603 loops=1)
Filter: (create_time < ((((timenow())::integer - 3000))::abstime)::time
stamp with time zone)
Total runtime: 2019.371 ms
0.00user 0.02system 0:02.07elapsed 1%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+2353minor)pagefaults 0swaps
------------------------------------------------------------------------
- test 5 frische DB
tab=testbl
f=/tmp/test.sql
cat > $f <<EOF
create table $tab
(
relay_ip inet,
create_time timestamp default now() NOT NULL
);
create index ${tab}_relay_ip_idx on ${tab}(relay_ip);
create index ${tab}_create_time_idx on ${tab}(create_time);
EOF
cat $f | $cmd
------------------------------------------------------------------------
- memstat [10]
$ echo "::memstat"|mdb -k
Page Summary Pages MB %Tot
------------ ---------------- ---------------- ----
Kernel 189712 1482 74%
Anon 25308 197 10%
Exec and libs 1991 15 1%
Page cache 1849 14 1%
Free (cachelist) 2949 23 1%
Free (freelist) 35060 273 14%
Total 256869 2006
Physical 255327 1994
- shm (org)
$ echo "shminfo_shmmax/E" | mdb -k
shminfo_shmmax:
shminfo_shmmax: 8388608