Thread: performance sol10 zone (fup)

performance sol10 zone (fup)

From
"Heiko L."
Date:
Hallo,

Im running pg-8,pgpoolII on sol10-zone.


After update sol10u7, queries on coltype timestamp are very slow.
System: sparc, 2GB RAM

This DB is a greylist-DB to fight spam.
500 connections should be easy.
But 16 connection  consum 10sec/query.
On another system (sparc) only 1 sec.i

s. details

howto diag?

regards heiko


- details

  $ /opt/csw/postgresql/bin/postmaster -V
  postgres (PostgreSQL) 8.3.1

  - numDS: 200k-800k
  - conn: 50-500
  - max age: 2d

- testbed
 tab=blacklist
 filter="^ .*[0-9]|elaps"
 cmd="/usr/bin/time psql -t -p 5432 -h 192.168.5.126 -U smtpuser smtp"
 sql="select count(*) from $tab";
 sql1="select count(*) from $tab where create_time > abstime(int4(timenow()) -30
00)";
 sql2="select count(*) from $tab where create_time < abstime(int4(timenow()) -30
00)";
 echo "$sql" | $cmd 2>&1 | egrep "$filter";
 echo "$sql1" | $cmd 2>&1 | egrep "$filter";
 echo "$sql2" | $cmd 2>&1 | egrep "$filter";

- result:
  152603
  0.01user 0.01system 0:00.29elapsed 10%CPU (0avgtext+0avgdata 0maxresident)k
     0
  0.02user 0.01system 0:00.05elapsed 58%CPU (0avgtext+0avgdata 0maxresident)k
  152603
  0.00user 0.02system 0:01.95elapsed 1%CPU (0avgtext+0avgdata 0maxresident)k


  - select without where: 150000 / 0.3 s
  - select with where:          150000 / 2 s
  - time is depended on recv DS (linear)

I simulate any parallel queries:
$ ./sqltestcon.sh -v 1 -i 20 -min 1 -max 64 -sql "$sql1; $sql2" -cmd "$cmd"
3081844 sum: 1 (5) * 13 = 13 ->  0 R/s, e=0 0:01.36elapsed 3%CPU
3081901 sum: 2 (7) * 10 = 20 ->  1 R/s, e=0 0:01.53elapsed 3%CPU
3081922 sum: 4 (11) * 8 = 32 ->  1 R/s, e=0 0:02.40elapsed 0%CPU
3081947 sum: 8 (19) * 5 = 40 ->  2 R/s, e=0 0:04.69elapsed 0%CPU
3082005 sum: 16 (36) * 2 = 32 ->  1 R/s, e=0 0:09.06elapsed 0%CPU
3082043 sum: 32 (67) * 2 = 64 ->  3 R/s, e=0 0:17.83elapsed 0%CPU
3082119 sum: 64 (130) * 1 = 64 ->  3 R/s, e=0 0:34.19elapsed 0%CPU

  -> 16 connections:  only 2 passes -> aprox. 10s /query

$ connstat -cols "_sy_load 5432 5433 _ps_pool _ps_postmaster" 60
## time     load    5432    5433    pgpool    postmaster
08:18:34    026    0    1    2    553    6
08:19:36    471    1    10    553    14
08:20:40    1804   2     35    553    42
08:21:51    5877    63    128    553    132

  -> 128 connections: Load=58!

- following test are used:
  - reindex
  - drop, create index
  - drop, create database
  - zfs recordsize: 128,8  -> no differ
  - zfs load:  10% ops, 10% read
  - shmmax: (0x80000000)
  - shared_buffers
  - work_mem 64,256,1024
  - wal_buffers
  - effective_cache_size
  - #log_disconnections = off
  - autovacuum = on , off

What can i do?
Howto interpret explain output?

$ echo "EXPLAIN ANALYSE $sql2" | $cmd 2>&1
 Aggregate  (cost=5236.53..5236.54 rows=1 width=0) (actual time=2018.985..2018.9
86 rows=1 loops=1)
   ->  Seq Scan on blacklist  (cost=0.00..4855.06 rows=152588 width=0) (actual t
ime=0.329..1883.275 rows=152603 loops=1)
         Filter: (create_time < ((((timenow())::integer - 3000))::abstime)::time
stamp with time zone)
 Total runtime: 2019.371 ms

0.00user 0.02system 0:02.07elapsed 1%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+2353minor)pagefaults 0swaps


------------------------------------------------------------------------
- test 5 frische DB

tab=testbl
f=/tmp/test.sql
cat > $f <<EOF
create table $tab
(
        relay_ip        inet,
        create_time     timestamp        default now() NOT NULL
);

create index ${tab}_relay_ip_idx on ${tab}(relay_ip);
create index ${tab}_create_time_idx on ${tab}(create_time);
EOF
cat $f | $cmd

------------------------------------------------------------------------
- memstat [10]

 $  echo "::memstat"|mdb -k
 Page Summary                Pages                MB  %Tot
 ------------     ----------------  ----------------  ----
 Kernel                     189712              1482   74%
 Anon                        25308               197   10%
 Exec and libs                1991                15    1%
 Page cache                   1849                14    1%
 Free (cachelist)             2949                23    1%
 Free (freelist)             35060               273   14%

 Total                      256869              2006
 Physical                   255327              1994


- shm (org)
 $ echo "shminfo_shmmax/E" | mdb -k
 shminfo_shmmax:
 shminfo_shmmax: 8388608



Re: performance sol10 zone (fup)

From
Robert Haas
Date:
On Wed, Aug 4, 2010 at 3:16 AM, Heiko L. <heikol@fh-lausitz.de> wrote:
> $ echo "EXPLAIN ANALYSE $sql2" | $cmd 2>&1
>  Aggregate  (cost=5236.53..5236.54 rows=1 width=0) (actual time=2018.985..2018.9
> 86 rows=1 loops=1)
>   ->  Seq Scan on blacklist  (cost=0.00..4855.06 rows=152588 width=0) (actual t
> ime=0.329..1883.275 rows=152603 loops=1)
>         Filter: (create_time < ((((timenow())::integer - 3000))::abstime)::time
> stamp with time zone)
>  Total runtime: 2019.371 ms

Maybe you need an index on create_time.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

Re: performance sol10 zone (fup)

From
Scott Marlowe
Date:
On Wed, Aug 4, 2010 at 1:16 AM, Heiko L. <heikol@fh-lausitz.de> wrote:
> Hallo,
>
> Im running pg-8,pgpoolII on sol10-zone.

I noticed late you mention 8.3.1.  Two points, you're missing > 1 year
of updates, bug fixes, security patches etc.  Assuming this version
was fast before, we'll assume it's not the cause of this problem,
however, you're asking for trouble with a version that old.  There are
bugs that might not bite you today, but may well in the future.
Please upgrade to 8.3.11.

> After update sol10u7, queries on coltype timestamp are very slow.
> System: sparc, 2GB RAM

Is it possible you had an index that was working that now isn't?  Are
the queries you included the real ones or approximations?

It looks like you have a bunch of seq scans happening.  If they're all
happening on the same table or small set of them, then a lot of
queries should be able to access them in any order together in 8.3

Are sequential scans normal for this query when it runs fast?

What does vmstat 10 and / or iostat -xd 10 have to say while this is running?

--
To understand recursion, one must first understand recursion.