Re: Scaling concerns - Mailing list pgsql-performance
From | Bill Moran |
---|---|
Subject | Re: Scaling concerns |
Date | |
Msg-id | 20061217104429.abe6c821.wmoran@collaborativefusion.com Whole thread Raw |
In response to | Scaling concerns (tsuraan <tsuraan@gmail.com>) |
List | pgsql-performance |
tsuraan <tsuraan@gmail.com> wrote: > > I'm writing a webmail-type application that is meant to be used in a > corporate environment. The core of my system is a Postgres database > that is used as a message header cache. The two (relevant) tables > being used are pasted into the end of this message. My problem is > that, as the messages table increases to tens of millions of rows, > pgsql slows down considerably. Even an operation like "select > count(*) from messages" can take minutes, with a totally idle system. > Postgres seems to be the most scalable Free database out there, so I > must be doing something wrong. > > As for the most common strategy of having a slower (more rows) > "archival" database and a smaller, faster "live" database, all the > clients in the company are using their normal corporate email server > for day-to-day email handling. The webmail is used for access email > that's no longer on the corporate server, so it's not really simple to > say which emails should be considered live and which are really > out-of-date. > > My postgres settings are entirely default with the exception of > shared_buffers being set to 40,000 and max_connections set to 400. > I'm not sure what the meaning of most of the other settings are, so I > haven't touched them. The machines running the database servers are > my home desktop (a dual-core athlon 3200+ with 2GB RAM and a 120GB > SATA II drive), and a production server with two dual-core Intel > chips, 4 GB RAM, and a RAID 5 array of SATA II drives on a 3Ware 9550 > controller. Both machines are running Gentoo Linux with a 2.6.1x > kernel, and both exhibit significant performance degradation when I > start getting tens of millions of records. > > Any advice would be most appreciated. Thanks in advance! > > Tables: > > CREATE TABLE EmailAddresses ( > emailid SERIAL PRIMARY KEY, -- The unique identifier of this address > name TEXT NOT NULL, -- The friendly name in the address > addrspec TEXT NOT NULL, -- The user@domain part of the address > UNIQUE(name, addrspec) > ); > > and > > CREATE TABLE Messages ( > -- Store info: > msgkey BIGSERIAL PRIMARY KEY, -- Unique identifier for a message > path TEXT NOT NULL, -- Where the message is on the file system > inserted TIMESTAMP DEFAULT now(),-- When the message was fetched > -- Message Info: > msgid TEXT UNIQUE NOT NULL, -- Message's Message-Id field > mfrom INTEGER -- Who sent the message > REFERENCES EmailAddresses > DEFAULT NULL, > mdate TIMESTAMP DEFAULT NULL, -- Message "date" header field > replyto TEXT DEFAULT NULL, -- Message-ID of replied-to message > subject TEXT DEFAULT NULL, -- Message "subject" header field > numatch INTEGER DEFAULT NULL, -- Number of attachments > UNIQUE(path) > ); You might benefit from adding some performance-specific changes to your schema. For example, if you created a separate table for each emailid (call them Messages_1, Messages_2, etc). I expect that no one user will have an unbearable number of messages, thus each user will see reasonable performance when working with their mailbox. You can handle the management of this entirely in your application logic, but it might be better of you wrote stored procedures to access message tables -- to make it easier on the application side. -Bill
pgsql-performance by date: