Re: Queries sometimes take 1000 times the normal time - Mailing list pgsql-performance

From Shridhar Daithankar
Subject Re: Queries sometimes take 1000 times the normal time
Date
Msg-id 3F4F5C82.18176.44007A5@localhost
Whole thread Raw
In response to Re: Queries sometimes take 1000 times the normal time  ("Anders K. Pedersen" <akp@cohaesio.com>)
List pgsql-performance
On 28 Aug 2003 at 20:16, Anders K. Pedersen wrote:

> Shridhar Daithankar wrote:
> > On 28 Aug 2003 at 1:07, Anders K. Pedersen wrote:
> >>We're running a set of Half-Life based game servers that lookup user
> >>privileges from a central PostgreSQL 7.3.4 database server (I recently
> >>ported the MySQL code in Adminmod to PostgreSQL to be able to do this).
> >>
> >>The data needed by the game servers are combined from several different
> >>tables, so we have some views set up to provide the data in the format
> >>needed.
> >>
> >>Currently there's only a few users in the database for testing purposes,
> >>and most of the time the user lookup's take 2-3 ms (I have syslog'ing of
> >>queries and duration turned on), but several times per hour the duration
> >>for one of the queries is 2-3 seconds (1000 times larger), while the
> >>surrounding lookups take the usual 2-3 ms.
> >
> >
> > Check vmstat during the same period if it is syncing at that point as Tom
> > suggested.
>
> I've been running a vmstat 1 logging process for a while now, and the
> sample below shows what happende around one of these spikes - at
> 18:18:03 specifically (actually there were two 1 second long queries,
> that finished at 18:18:03).
>
> Thu Aug 28 18:17:53 2003 0  0  0  40904   4568  22288 404352   0   0
> 12     0  181   362  2   1  97
> Thu Aug 28 18:17:54 2003 0  0  0  40904   4580  22260 404380   0   0
> 128     0  205   330  2   3  95
> Thu Aug 28 18:17:55 2003 0  0  2  40904   4576  22264 404380   0   0
>   0   284  224   127  0   1  99
> Thu Aug 28 18:17:56 2003 0  0  2  40904   5008  22268 404512   0   0
> 128   728  571   492  2   3  95
> Thu Aug 28 18:17:57 2003 0  0  1  40904   5000  22276 404512   0   0
>   0   120  201   181  1   0  99
> Thu Aug 28 18:17:58 2003 0  0  1  40904   4936  22284 404528   0   0
>   8     0 1147  2204 12   3  85
> Thu Aug 28 18:17:59 2003 0  0  0  40904   4784  22304 404660   0   0
> 148     0 2112  3420  2   3  95
> Thu Aug 28 18:18:00 2003 1  1  3  40904   4760  22324 404664   0   0
> 20   456 2374  3277  2   1  97
> Thu Aug 28 18:18:01 2003 0  2 10  40904   4436  22000 401456   0   0
> 144   540  510   457 11   6  83
> Thu Aug 28 18:18:02 2003 1  1  2  40904   8336  22032 401512   0   0
> 68   676 1830  2540  4   3  93
> Thu Aug 28 18:18:04 2003 1  0  1  40904   8160  22052 401664   0   0
> 140   220 2308  3253  2   3  95
> Thu Aug 28 18:18:05 2003 0  0  1  40904   7748  22064 402064   0   0
> 288     0 1941  2856  1   3  96
> Thu Aug 28 18:18:06 2003 0  0  3  40904   6704  22064 403100   0   0
> 496   992 2326  3510  0   5  95
> Thu Aug 28 18:18:07 2003 1  0  0  40904   6324  22088 402716   0   0
> 260   188 1984  2927 11   4  85
> Thu Aug 28 18:18:08 2003 0  0  0  40904   6920  22088 402828   0   0
> 72     0  419  1473 17   5  78
> Thu Aug 28 18:18:09 2003 0  0  0  40904   6784  22088 402964   0   0
> 128     0  235   476  2   1  97
> Thu Aug 28 18:18:10 2003 0  0  1  40904   6404  22088 402980   0   0
>   0     0  343   855 14   2  84
>

Notice a pattern. In first few entries free memory is increasing coupled with
IO. Few entries down the line it's decreasing again with IO.  I would guess
that something terminated and started..

However given how idle CPU is, I wonder would it matter. Besides changes in
stats are pretty small to be any significant. I wouldn't worry about context
switches or interrupts. They don't seem to be any dramatic..

I wonder what kernel you are using. While running pgbench on 2.4 and 2.6 couple
of days back, I noticed several stalls with 2.4 where neither CPU or disk does
anything but nothing moves forward, for 30 sec. or so.

If possible try with 2.6 Check which scheduler you are using and which works
best for you.

    http://marc.theaimsgroup.com/?l=linux-kernel&m=105743728122143&w=2

If you want a step by step how to install 2.6, I can give that too.. It's
pretty simple..

> > Are you using pooled connections? If yes you could shorten life of a connection
> > and force making a new connection every 10-15 minutes say. That would avoid IO
> > avelanche at the end of the hour types.
>
> I'm not quite sure, what you mean by "pooled connections". Each game
> server has one connection to the PostgreSQL server, which is opened,
> when the server is first started, and then never closed (until the game
> server terminates, but there's days between this happens).

I would say let each server start a new connection every 15 minutes. As soon as
new connection is established, close old one.

See if that cures the problem. Apache uses similar methods albeit it measure by
nubmer of requests served by each child.

HTH

Bye
 Shridhar

--
It is necessary to have purpose.        -- Alice #1, "I, Mudd", stardate 4513.3


pgsql-performance by date:

Previous
From: Ken Geis
Date:
Subject: Re: bad estimates
Next
From: "Alexander Priem"
Date:
Subject: Re: Indexing question