Re: Hardware upgrade for a high-traffic database - Mailing list pgsql-performance
From | Jason Coene |
---|---|
Subject | Re: Hardware upgrade for a high-traffic database |
Date | |
Msg-id | 200408110106.i7B16b0E072661@mail.gotfrag.com Whole thread Raw |
In response to | Hardware upgrade for a high-traffic database ("Jason Coene" <jcoene@gotfrag.com>) |
List | pgsql-performance |
Hi Rod, Actually, we're already using a substantial caching system in code for nearly all pages delivered - we've exhausted that option. Our system uses a login/session table for about 1/8 of our page views (those visitors who are logged in), and has tracking features. While I'd love to scrap them and give the database server a vacation, it's a requirement for us. You're correct about the query caching (stored in memory) being used - most of our queries are run once and then come from memory (or, based on speed of consecutive executions, that seems to be the case). Once a user hits a page for the first time in an hour or so, it seems to cache their session query. The issue that I think we're seeing is that the performance on the 3Ware RAID is quite bad, watching FreeBSD systat will show it at "100% busy" at around "3.5 MB/s". When it needs to seek across a table (for, say, an aggregate function - typically a COUNT()), it slows the entire server down while working on the disk. Additionally, VACUUM's make the server practically useless. We have indexes on everything that's used in queries, and the planner is using them. The server has 2GB of physical memory, however it's only uses between 130MB and 200MB of it. Postgres is the only application running on the server. Our pertinent settings look like this: max_connections = 512 shared_buffers = 20000 sort_mem = 2000 vacuum_mem = 20000 effective_cache_size = 300000 fsync = false wal_sync_method = fsync wal_buffers = 32 checkpoint_segments = 2 checkpoint_timeout = 30 commit_delay = 10000 Typically, we don't use anywhere near the 512 connections - however there are peak hours where we come close, and other times that we eclipse it and run out (should some connections become serialized due to a slowdown). It's not something that we can comfortably lower. The non-standard checkpoint settings have helped making it less likely that a large (in disk time) query will conflict with a checkpoint write. I'm a programmer - definitely not a DBA by any stretch - though I am forced into the role. From reading this list, it seems to me that our settings are reasonable given our usage, and that a disk upgrade is likely in order. I'd love to hear any suggestions. Thanks, Jason -----Original Message----- From: Rod Taylor [mailto:pg@rbt.ca] Sent: Tuesday, August 10, 2004 7:07 PM To: Jason Coene Cc: Postgresql Performance Subject: Re: [PERFORM] Hardware upgrade for a high-traffic database > Our database is about 20GB on disk, we have some quite large tables - 2M > rows with TEXT fields in a sample table, accessed constantly. We average > about 4,000 - 5,000 queries per second - all from web traffic. As you can 99% is reads? and probably the same data over and over again? You might want to think about a small code change to cache sections of page output in memory for the most commonly generated pages (there are usually 3 or 4 that account for 25% to 50% of web traffic -- starting pages). The fact you're getting 5k queries/second off IDE drives tells me most of the active data is in memory -- so your actual working data set is probably quite small (less than 10% of the 20GB). If the above is all true (mostly reads, smallish dataset, etc.) and the database is not growing very quickly, you might want to look into RAM and RAM bandwidth over disk. An Opteron with 8GB ram using the same old IDE drives. Get a mobo with a SCSI raid controller in it, so the disk component can be upgraded in the future (when necessary).
pgsql-performance by date: