Re: Finding bottleneck

From: Tom Lane
Subject: Re: Finding bottleneck
Date: ,
Msg-id: 29384.1123520200@sss.pgh.pa.us
(view: Whole thread, Raw)
In response to: Re: Finding bottleneck  (Kari Lavikka)
Responses: Re: Finding bottleneck  (Kari Lavikka)
Re: Finding bottleneck  (Kari Lavikka)
List: pgsql-performance

Tree view

Finding bottleneck  (Kari Lavikka, )
 Re: Finding bottleneck  (Gavin Sherry, )
 Re: Finding bottleneck  (Claus Guttesen, )
 Re: Finding bottleneck  ("Merlin Moncure", )
 Re: Finding bottleneck  ("Luke Lonergan", )
  Re: Finding bottleneck  (Kari Lavikka, )
   Re: Finding bottleneck  (Tom Lane, )
 Re: Finding bottleneck  ("Merlin Moncure", )
  Re: Finding bottleneck  (Tom Lane, )
  Re: Finding bottleneck  (Kari Lavikka, )
   Re: Finding bottleneck  (Tom Lane, )
    Re: Finding bottleneck  (Kari Lavikka, )
     Re: Finding bottleneck  (Tom Lane, )
    Re: Finding bottleneck  (Kari Lavikka, )
     Re: Finding bottleneck  (Tom Lane, )
 Re: Finding bottleneck  (Ron, )
 Re: Finding bottleneck  ("Merlin Moncure", )
  Re: Finding bottleneck  (Tom Lane, )
 Re: Finding bottleneck  ("Merlin Moncure", )
 Re: Finding bottleneck  ("Merlin Moncure", )
  Re: Finding bottleneck  (Tom Lane, )
 Re: Finding bottleneck  ("Merlin Moncure", )

Kari Lavikka <> writes:
> We are having performance problems with some smaller tables and very
> simple queries. For example:

> SELECT u.uid, u.nick, extract(epoch from uc.stamp) AS stamp FROM
> user_channel uc INNER JOIN users u USING (uid) WHERE channel_id = 281321
> AND u.status = 'a' ORDER BY uc.channel_id, upper(uc.nick)

> And explain analyze:
>   Nested Loop  (cost=0.00..200.85 rows=35 width=48) (actual time=0.414..38.128 rows=656 loops=1)
>     ->  Index Scan using user_channel_channel_id_nick on user_channel uc  (cost=0.00..40.18 rows=47 width=27) (actual
time=0.090..0.866rows=667 loops=1) 
>           Index Cond: (channel_id = 281321)
>     ->  Index Scan using users_pkey on users u  (cost=0.00..3.40 rows=1 width=25) (actual time=0.048..0.051 rows=1
loops=667)
>           Index Cond: ("outer".uid = u.uid)
>           Filter: (status = 'a'::bpchar)
>   Total runtime: 38.753 ms

> Under heavy load these queries tend to take several minutes to execute
> although there's plenty of free cpu available.

What that sounds like to me is a machine with inadequate disk I/O bandwidth.
Your earlier comment that checkpoint drives the machine into the ground
fits right into that theory, too.  You said there is "almost no IO-wait"
but are you sure you are measuring that correctly?

Something else just struck me from your first post:

> Queries accumulate and when checkpointing is over, there can be
> something like 400 queries running but over 50% of cpu is just idling.

400 queries?  Are you launching 400 separate backends to do that?
Some sort of connection pooling seems like a good idea, if you don't
have it in place already.  If the system's effective behavior in the
face of heavy load is to start even more concurrent backends, that
could easily drive things into the ground.

            regards, tom lane


pgsql-performance by date:

From: Tom Lane
Date:
Subject: Re: Finding bottleneck
From: Tom Lane
Date:
Subject: Re: Why hash join instead of nested loop?