Thread: Intermittent slowdowns, connection delays

Intermittent slowdowns, connection delays

From
"Jason Coene"
Date:
Hi All,

We have a Postgres 7.4.1 server running on FreeBSD 5.2.  Hardware is a Dual
Xeon 2.6 (HT enabled), 2 GB Memory, 3Ware SATA RAID-5 w/ 4 7200 RPM Seagate
disks and gigabit Intel Server Ethernet.  The server is dedicated to serving
data to our web-based CMS.

We have a few web servers load balanced, and we do around 1M page
impressions per day.  Our website is highly personalized, and we've
optimized it to limit the number of queries, but we still see between 2 and
3 SELECT's (with JOIN's) and 1 UPDATE per page load, selectively more - a
fair volume.

The single UPDATE per page load is updating a timestamp in a small table
(about 150,000 rows) with only 1 index (on the 1 field that needs to be
matched).

We're seeing some intermittent spikes in query time as actual connection
time.  I.e., during these seemingly random spikes, our debug output looks
like this (times from start of HTTP request):

SQL CONNECTION CREATING 'gf'
0.0015 - ESTABLISHING CONNECTION
1.7113 - CONNECTION OK
SQL QUERY ID 1 COST 0.8155 ROWS 1
SQL QUERY ID 2 COST 0.5607 ROWS 14
.. etc.. (all queries taking more time than normal, see below)

Refresh the page 2 seconds later, and we'll get:

SQL CONNECTION CREATING 'gf'
0.0017 - ESTABLISHING CONNECTION
0.0086 - CONNECTION OK
SQL QUERY ID 1 COST 0.0128 ROWS 1
SQL QUERY ID 2 COST 0.0033 ROWS 14
.. etc.. (with same queries)

Indeed, during these types, it takes a moment for "psql" to connect on the
command line (from the same machine using a local file socket), so it's not
a network issue or a web-server issue.  During these spurts, there's nothing
too out of the ordinary in vmstat, systat or top.

These programs show that we're not using much CPU (usually 60-80% idle), and
disks usage is virtually nil.  I've attached 60 seconds of "vmstat 5".
Memory usage looks like this (constantly):

Mem: 110M Active, 1470M Inact, 206M Wired, 61M Cache, 112M Buf, 26M Free

I've cleaned up and tested query after query, and nothing is a "hog".  On an
idle server, every query will execute in < 0.05 sec.  Perhaps some of you
veterans have ideas?

Thanks,

Jason Coene
Gotfrag eSports
585-598-6621 Phone
585-598-6633 Fax
jcoene@gotfrag.com
http://www.gotfrag.com



Attachment

Re: Intermittent slowdowns, connection delays

From
"Jason Coene"
Date:
Hi Paul,

Thanks for the valuable feedback.  I suspect you're correct about the
serialization in some capacity, but the actual cause is eluding me.

Basically, every time a registered user checks a page, the site has to
authenticate them (with a query against a table with > 200,000 records).  It
doesn't update this table, however - it updates another table with "user
stats" information (last click, last ip, etc).

From what I've seen, there doesn't seem to be any serious locking issues.
It does make sense when a number of users whose information isn't in cache,
it could take a bit longer - but AFAIK this shouldn't prevent other
simultaneous queries.  What else could cause such serialization?

If I look at open locks (this is a view, info from pg tables):

       relname        |       mode       | numlocks
----------------------+------------------+----------
 users                | AccessShareLock  |        4
 userstats            | AccessShareLock  |        4
 pg_statistic         | AccessShareLock  |        2
 users_ix_id          | AccessShareLock  |        2
 countries            | AccessShareLock  |        2
 comments             | AccessShareLock  |        2
 countries_ix_id      | AccessShareLock  |        2
 userstats_ix_id      | AccessShareLock  |        2
 comments_ix_parentid | AccessShareLock  |        2
 users                | RowExclusiveLock |        1
 filequeue_ix_id      | AccessShareLock  |        1
 pg_class             | AccessShareLock  |        1
 vopenlocks           | AccessShareLock  |        1
 pg_locks             | AccessShareLock  |        1
 userstats            | RowExclusiveLock |        1
 filequeue            | AccessShareLock  |        1
 pg_class_oid_index   | AccessShareLock  |        1

Also of note, executing a random "in the blue" query on our "users" table
returns results very fast.  While there's no doubt that caching may help,
returning a row that is definitely not cached is very fast: < 0.05 sec.

Top tells me that the system isn't using much memory - almost always under
100MB (of the 2GB we have).  Is there a way to increase the amount of
physical RAM that PG uses?  It seems there's a lot of room there.

Postgresql.conf has:

shared_buffers = 16384
sort_mem = 8192
vacuum_mem = 8192

Also, would queries becoming serialized effect connection delays?  I think
there's still something else at large here...

I've attached a vmstat output, while running dd.  The RAID array is tw0.  It
does show the tw0 device getting significantly more work, numbers not seen
during normal operation.

Thanks,

Jason Coene
Gotfrag eSports
585-598-6621 Phone
585-598-6633 Fax
jcoene@gotfrag.com
http://www.gotfrag.com


-----Original Message-----
From: Paul Tuckfield [mailto:paul@tuckfield.com]
Sent: Tuesday, May 11, 2004 7:50 PM
To: Jason Coene
Subject: Re: [PERFORM] Intermittent slowdowns, connection delays

The things you point out suggest a heavy dependence on good cache
performance
(typical of OLTP mind you)  Do not be fooled if a query runs in 2
seconds then the second
run takes < .01 secons:  the first run put it in cache the second got
all cache hits :)

But beyond that,  in an OLTP system, and typical website backing
database, "cache is king".
And serialization is the devil

So look for reasons why your cache performance might deteriorate during
peak, (like large historical tables
that users pull up dozens of scattered rows from, flooding cache)  or
why you may be
serializing somewhere inside postgres (ex. if every page hit re-logs
in, then theres probably serialization
trying to spawn what must be 40 processes/sec assuming your 11hit/sec
avg peaks at about 40/sec)

Also:
I am really surprised you see zero IO in the vmstat you sent, but I'm
unfamiliar with BSD version of vmstat.
AFAIR,  Solaris shows cached filesystem reads as "page faults" which is
rather confusing.  Since you have 1500 page
faults per second, yet no paging (bi bo) does thins mean the 1500 page
faults are filesystem IO that pg is doing?
do an objective test on an idle system by dd'ing a large file in and
watching what vmstat does.





On May 11, 2004, at 3:10 PM, Jason Coene wrote:

> Hi All,
>
> We have a Postgres 7.4.1 server running on FreeBSD 5.2.  Hardware is a
> Dual
> Xeon 2.6 (HT enabled), 2 GB Memory, 3Ware SATA RAID-5 w/ 4 7200 RPM
> Seagate
> disks and gigabit Intel Server Ethernet.  The server is dedicated to
> serving
> data to our web-based CMS.
>
> We have a few web servers load balanced, and we do around 1M page
> impressions per day.  Our website is highly personalized, and we've
> optimized it to limit the number of queries, but we still see between
> 2 and
> 3 SELECT's (with JOIN's) and 1 UPDATE per page load, selectively more
> - a
> fair volume.
>
> The single UPDATE per page load is updating a timestamp in a small
> table
> (about 150,000 rows) with only 1 index (on the 1 field that needs to be
> matched).
>
> We're seeing some intermittent spikes in query time as actual
> connection
> time.  I.e., during these seemingly random spikes, our debug output
> looks
> like this (times from start of HTTP request):
>
> SQL CONNECTION CREATING 'gf'
> 0.0015 - ESTABLISHING CONNECTION
> 1.7113 - CONNECTION OK
> SQL QUERY ID 1 COST 0.8155 ROWS 1
> SQL QUERY ID 2 COST 0.5607 ROWS 14
> .. etc.. (all queries taking more time than normal, see below)
>
> Refresh the page 2 seconds later, and we'll get:
>
> SQL CONNECTION CREATING 'gf'
> 0.0017 - ESTABLISHING CONNECTION
> 0.0086 - CONNECTION OK
> SQL QUERY ID 1 COST 0.0128 ROWS 1
> SQL QUERY ID 2 COST 0.0033 ROWS 14
> .. etc.. (with same queries)
>
> Indeed, during these types, it takes a moment for "psql" to connect on
> the
> command line (from the same machine using a local file socket), so
> it's not
> a network issue or a web-server issue.  During these spurts, there's
> nothing
> too out of the ordinary in vmstat, systat or top.
>
> These programs show that we're not using much CPU (usually 60-80%
> idle), and
> disks usage is virtually nil.  I've attached 60 seconds of "vmstat 5".
> Memory usage looks like this (constantly):
>
> Mem: 110M Active, 1470M Inact, 206M Wired, 61M Cache, 112M Buf, 26M
> Free
>
> I've cleaned up and tested query after query, and nothing is a "hog".
> On an
> idle server, every query will execute in < 0.05 sec.  Perhaps some of
> you
> veterans have ideas?
>
> Thanks,
>
> Jason Coene
> Gotfrag eSports
> 585-598-6621 Phone
> 585-598-6633 Fax
> jcoene@gotfrag.com
> http://www.gotfrag.com
>
>
> <vmstat51min.txt>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
>                http://archives.postgresql.org


Attachment