Re: Finding bottleneck

From: Kari Lavikka
Subject: Re: Finding bottleneck
Date: ,
Msg-id: Pine.HPX.4.62.0508191413480.3361@purple.bdb.fi
(view: Whole thread, Raw)
In response to: Re: Finding bottleneck  (Tom Lane)
Responses: Re: Finding bottleneck  (Tom Lane)
List: pgsql-performance

Tree view

Finding bottleneck  (Kari Lavikka, )
 Re: Finding bottleneck  (Gavin Sherry, )
 Re: Finding bottleneck  (Claus Guttesen, )
 Re: Finding bottleneck  ("Merlin Moncure", )
 Re: Finding bottleneck  ("Luke Lonergan", )
  Re: Finding bottleneck  (Kari Lavikka, )
   Re: Finding bottleneck  (Tom Lane, )
 Re: Finding bottleneck  ("Merlin Moncure", )
  Re: Finding bottleneck  (Tom Lane, )
  Re: Finding bottleneck  (Kari Lavikka, )
   Re: Finding bottleneck  (Tom Lane, )
    Re: Finding bottleneck  (Kari Lavikka, )
     Re: Finding bottleneck  (Tom Lane, )
    Re: Finding bottleneck  (Kari Lavikka, )
     Re: Finding bottleneck  (Tom Lane, )
 Re: Finding bottleneck  (Ron, )
 Re: Finding bottleneck  ("Merlin Moncure", )
  Re: Finding bottleneck  (Tom Lane, )
 Re: Finding bottleneck  ("Merlin Moncure", )
 Re: Finding bottleneck  ("Merlin Moncure", )
  Re: Finding bottleneck  (Tom Lane, )
 Re: Finding bottleneck  ("Merlin Moncure", )

On Mon, 8 Aug 2005, Tom Lane wrote:
> What that sounds like to me is a machine with inadequate disk I/O bandwidth.
> Your earlier comment that checkpoint drives the machine into the ground
> fits right into that theory, too.  You said there is "almost no IO-wait"
> but are you sure you are measuring that correctly?

Reducing checkpoint_timeout to 600 seconds had a positive effect. Previous
value was 1800 seconds.

We have a spare disk array from the old server and I'm planning to use it
as a tablespace for the comment table (the 100M+ rows one) as Ron
suggested.

>> Queries accumulate and when checkpointing is over, there can be
>> something like 400 queries running but over 50% of cpu is just idling.
>
> 400 queries?  Are you launching 400 separate backends to do that?
> Some sort of connection pooling seems like a good idea, if you don't
> have it in place already.  If the system's effective behavior in the
> face of heavy load is to start even more concurrent backends, that
> could easily drive things into the ground.

Ok, I implemented connection pooling using pgpool and it increased
performance a lot! We are now delivering about 1500 dynamic pages a second
without problems. Each of the eight single-cpu webservers are running a
pgpool instance with 20 connections.

However, those configuration changes didn't have significant effect to
oprofile results. AtEOXact_CatCache consumes even more cycles. This isn't
a problem right now but it may be in the future...

CPU: AMD64 processors, speed 2190.23 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 100000
samples  %        symbol name
1147870  21.1602  AtEOXact_CatCache
187466    3.4558  hash_seq_search
174357    3.2142  AllocSetAlloc
170896    3.1504  nocachegetattr
131724    2.4282  ExecMakeFunctionResultNoSets
125292    2.3097  SearchCatCache
117264    2.1617  StrategyDirtyBufferList
105741    1.9493  hash_search
98245     1.8111  FunctionCall2
97878     1.8043  yyparse
90932     1.6763  LWLockAcquire
83555     1.5403  LWLockRelease
81045     1.4940  _bt_compare
... and so on ...

----->8 Signigicant rows from current postgresql.conf 8<-----

max_connections = 768           # unnecessarily large with connection
pooling
shared_buffers = 15000
work_mem = 2048
maintenance_work_mem = 32768
max_fsm_pages = 1000000
max_fsm_relations = 5000
bgwriter_percent = 2
fsync = true
wal_buffers = 512
checkpoint_segments = 200       # less would probably be enuff with 600sec
timeout
checkpoint_timeout = 600
effective_cache_size = 500000
random_page_cost = 1.5
default_statistics_target = 150
stats_start_collector = true
stats_command_string = true


     |\__/|
     ( oo )    Kari Lavikka -  - (050) 380 3808
__ooO(  )Ooo_______ _____ ___ _ _  _   _    _      _                  _
       ""



pgsql-performance by date:

From: John A Meinel
Date:
Subject: Re: extremly low memory usage
From: Ron
Date:
Subject: Re: sustained update load of 1-2k/sec