Thread: Scaling concerns
I'm writing a webmail-type application that is meant to be used in a corporate environment. The core of my system is a Postgres database that is used as a message header cache. The two (relevant) tables being used are pasted into the end of this message. My problem is that, as the messages table increases to tens of millions of rows, pgsql slows down considerably. Even an operation like "select count(*) from messages" can take minutes, with a totally idle system. Postgres seems to be the most scalable Free database out there, so I must be doing something wrong. As for the most common strategy of having a slower (more rows) "archival" database and a smaller, faster "live" database, all the clients in the company are using their normal corporate email server for day-to-day email handling. The webmail is used for access email that's no longer on the corporate server, so it's not really simple to say which emails should be considered live and which are really out-of-date. My postgres settings are entirely default with the exception of shared_buffers being set to 40,000 and max_connections set to 400. I'm not sure what the meaning of most of the other settings are, so I haven't touched them. The machines running the database servers are my home desktop (a dual-core athlon 3200+ with 2GB RAM and a 120GB SATA II drive), and a production server with two dual-core Intel chips, 4 GB RAM, and a RAID 5 array of SATA II drives on a 3Ware 9550 controller. Both machines are running Gentoo Linux with a 2.6.1x kernel, and both exhibit significant performance degradation when I start getting tens of millions of records. Any advice would be most appreciated. Thanks in advance! Tables: CREATE TABLE EmailAddresses ( emailid SERIAL PRIMARY KEY, -- The unique identifier of this address name TEXT NOT NULL, -- The friendly name in the address addrspec TEXT NOT NULL, -- The user@domain part of the address UNIQUE(name, addrspec) ); and CREATE TABLE Messages ( -- Store info: msgkey BIGSERIAL PRIMARY KEY, -- Unique identifier for a message path TEXT NOT NULL, -- Where the message is on the file system inserted TIMESTAMP DEFAULT now(),-- When the message was fetched -- Message Info: msgid TEXT UNIQUE NOT NULL, -- Message's Message-Id field mfrom INTEGER -- Who sent the message REFERENCES EmailAddresses DEFAULT NULL, mdate TIMESTAMP DEFAULT NULL, -- Message "date" header field replyto TEXT DEFAULT NULL, -- Message-ID of replied-to message subject TEXT DEFAULT NULL, -- Message "subject" header field numatch INTEGER DEFAULT NULL, -- Number of attachments UNIQUE(path) );
On Sat, Dec 16, 2006 at 11:26:02AM -0600, tsuraan wrote: > Even an operation like "select count(*) from messages" can take minutes, > with a totally idle system. Postgres seems to be the most scalable Free > database out there, so I must be doing something wrong. Unqualified SELECT COUNT(*) FROM foo is one of the most expensive operations you can do on your system, since the visibility information has to be checked on disk for each row. Instead, try real queries on real data, and post here if some are too slow for you. /* Steinar */ -- Homepage: http://www.sesse.net/
> Unqualified SELECT COUNT(*) FROM foo is one of the most expensive operations > you can do on your system, since the visibility information has to be > checked > on disk for each row. Instead, try real queries on real data, and post here > if some are too slow for you. Ok, that's a bad example. I'm learning :-) Is insert ... select also really expensive then? I have a table loaded with message-id and path information of currently-existing messages. It has ~20 million rows. Trying to do "INSERT INTO Messages(path, msgid) SELECT (path, msgid) FROM tmpMessages" took a really long time before psql died with an out-of-memory error. Is there a more sane way to do a table copy, or should I have just dropped all the indices from the Message table and loaded into that? Thanks!
Le samedi 16 décembre 2006 18:32, Steinar H. Gunderson a écrit : > Instead, try real queries on real data, > and post here if some are too slow for you. To quickly find out a subset of slow queries on your production system, you can use the pgfouine tool: http://pgfouine.projects.postgresql.org/ If you then want to make some measurements of PostgreSQL performances with some different settings and compare them, consider using the tsung tool (and may be tsung-ploter companion tool to graph several benchs onto the same charts for comparing purpose): http://pgfouine.projects.postgresql.org/tsung.html http://tsung.erlang-projects.org/ http://debian.dalibo.org/unstable/ This latter link also contains a .tar.gz archive of tsung-ploter in case you're not running a debian system. Dependencies are python and matplotlib. Regards, -- Dimitri Fontaine http://www.dalibo.com/
Attachment
* tsuraan <tsuraan@gmail.com> [061216 18:26]: > I'm writing a webmail-type application that is meant to be used in a > corporate environment. The core of my system is a Postgres database > that is used as a message header cache. The two (relevant) tables > being used are pasted into the end of this message. My problem is > that, as the messages table increases to tens of millions of rows, > pgsql slows down considerably. Even an operation like "select > count(*) from messages" can take minutes, with a totally idle system. > Postgres seems to be the most scalable Free database out there, so I > must be doing something wrong. select count(*) from table is the worst case in PostgreSQL. (MVC systems in general I guess). If you really need to run count(*) you need to think about the required isolation level of these operations and make some aggregate table yourself. (btw, select aggregate(*) from bigtable is something that no database likes, it's just the degree of slowness that sometimes is different). For scaling you should consider slony. Either hangout on #slony on Freenode.net or ask on the mailing list if you have questions. > As for the most common strategy of having a slower (more rows) > "archival" database and a smaller, faster "live" database, all the > clients in the company are using their normal corporate email server > for day-to-day email handling. The webmail is used for access email > that's no longer on the corporate server, so it's not really simple to > say which emails should be considered live and which are really > out-of-date. > > My postgres settings are entirely default with the exception of > shared_buffers being set to 40,000 and max_connections set to 400. > I'm not sure what the meaning of most of the other settings are, so I > haven't touched them. The machines running the database servers are > my home desktop (a dual-core athlon 3200+ with 2GB RAM and a 120GB > SATA II drive), and a production server with two dual-core Intel Intel chips => define more. There are Intel boxes known to have issues under specific load scenarios with PostgreSQL (again specific versions). To make it funnier, these are really really hard to track down ;) > chips, 4 GB RAM, and a RAID 5 array of SATA II drives on a 3Ware 9550 > controller. Both machines are running Gentoo Linux with a 2.6.1x > kernel, and both exhibit significant performance degradation when I > start getting tens of millions of records. > > Any advice would be most appreciated. Thanks in advance! Cluster. One box that applies changes, and multiple boxes that read the data. If you cannot afford multiple boxes from the start, design your application still to work with two connections: one connection to a user with read/write permissions, and one connecting to a user having only select permissions => this way you can later easily add a loadbalancer to the mix, and use multiple postgres boxes for reading stuff. Andreas
Tsuraan, "Select count(*) from bigtable" is testing your disk drive speed up till about 300MB/s, after which it is CPU limited in Postgres. My guess is that your system has a very slow I/O configuration, either due to faulty driver/hardware or the configuration. The first thing you should do is run a simple I/O test on your data directory - write a file twice the size of memory using dd like this: time bash -c "dd if=/dev/zero of=data_directory/bigfile bs=8k count=(2 * memory_size / 8192) && sync" time dd if=data_directory/bigfile of=/dev/null bs=8k Then report the times here. - Luke On 12/16/06 9:26 AM, "tsuraan" <tsuraan@gmail.com> wrote: > I'm writing a webmail-type application that is meant to be used in a > corporate environment. The core of my system is a Postgres database > that is used as a message header cache. The two (relevant) tables > being used are pasted into the end of this message. My problem is > that, as the messages table increases to tens of millions of rows, > pgsql slows down considerably. Even an operation like "select > count(*) from messages" can take minutes, with a totally idle system. > Postgres seems to be the most scalable Free database out there, so I > must be doing something wrong. > > As for the most common strategy of having a slower (more rows) > "archival" database and a smaller, faster "live" database, all the > clients in the company are using their normal corporate email server > for day-to-day email handling. The webmail is used for access email > that's no longer on the corporate server, so it's not really simple to > say which emails should be considered live and which are really > out-of-date. > > My postgres settings are entirely default with the exception of > shared_buffers being set to 40,000 and max_connections set to 400. > I'm not sure what the meaning of most of the other settings are, so I > haven't touched them. The machines running the database servers are > my home desktop (a dual-core athlon 3200+ with 2GB RAM and a 120GB > SATA II drive), and a production server with two dual-core Intel > chips, 4 GB RAM, and a RAID 5 array of SATA II drives on a 3Ware 9550 > controller. Both machines are running Gentoo Linux with a 2.6.1x > kernel, and both exhibit significant performance degradation when I > start getting tens of millions of records. > > Any advice would be most appreciated. Thanks in advance! > > Tables: > > CREATE TABLE EmailAddresses ( > emailid SERIAL PRIMARY KEY, -- The unique identifier of this address > name TEXT NOT NULL, -- The friendly name in the address > addrspec TEXT NOT NULL, -- The user@domain part of the address > UNIQUE(name, addrspec) > ); > > and > > CREATE TABLE Messages ( > -- Store info: > msgkey BIGSERIAL PRIMARY KEY, -- Unique identifier for a message > path TEXT NOT NULL, -- Where the message is on the file > system > inserted TIMESTAMP DEFAULT now(),-- When the message was fetched > -- Message Info: > msgid TEXT UNIQUE NOT NULL, -- Message's Message-Id field > mfrom INTEGER -- Who sent the message > REFERENCES EmailAddresses > DEFAULT NULL, > mdate TIMESTAMP DEFAULT NULL, -- Message "date" header field > replyto TEXT DEFAULT NULL, -- Message-ID of replied-to message > subject TEXT DEFAULT NULL, -- Message "subject" header field > numatch INTEGER DEFAULT NULL, -- Number of attachments > UNIQUE(path) > ); > > ---------------------------(end of broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq >
tsuraan <tsuraan@gmail.com> wrote: > > I'm writing a webmail-type application that is meant to be used in a > corporate environment. The core of my system is a Postgres database > that is used as a message header cache. The two (relevant) tables > being used are pasted into the end of this message. My problem is > that, as the messages table increases to tens of millions of rows, > pgsql slows down considerably. Even an operation like "select > count(*) from messages" can take minutes, with a totally idle system. > Postgres seems to be the most scalable Free database out there, so I > must be doing something wrong. > > As for the most common strategy of having a slower (more rows) > "archival" database and a smaller, faster "live" database, all the > clients in the company are using their normal corporate email server > for day-to-day email handling. The webmail is used for access email > that's no longer on the corporate server, so it's not really simple to > say which emails should be considered live and which are really > out-of-date. > > My postgres settings are entirely default with the exception of > shared_buffers being set to 40,000 and max_connections set to 400. > I'm not sure what the meaning of most of the other settings are, so I > haven't touched them. The machines running the database servers are > my home desktop (a dual-core athlon 3200+ with 2GB RAM and a 120GB > SATA II drive), and a production server with two dual-core Intel > chips, 4 GB RAM, and a RAID 5 array of SATA II drives on a 3Ware 9550 > controller. Both machines are running Gentoo Linux with a 2.6.1x > kernel, and both exhibit significant performance degradation when I > start getting tens of millions of records. > > Any advice would be most appreciated. Thanks in advance! > > Tables: > > CREATE TABLE EmailAddresses ( > emailid SERIAL PRIMARY KEY, -- The unique identifier of this address > name TEXT NOT NULL, -- The friendly name in the address > addrspec TEXT NOT NULL, -- The user@domain part of the address > UNIQUE(name, addrspec) > ); > > and > > CREATE TABLE Messages ( > -- Store info: > msgkey BIGSERIAL PRIMARY KEY, -- Unique identifier for a message > path TEXT NOT NULL, -- Where the message is on the file system > inserted TIMESTAMP DEFAULT now(),-- When the message was fetched > -- Message Info: > msgid TEXT UNIQUE NOT NULL, -- Message's Message-Id field > mfrom INTEGER -- Who sent the message > REFERENCES EmailAddresses > DEFAULT NULL, > mdate TIMESTAMP DEFAULT NULL, -- Message "date" header field > replyto TEXT DEFAULT NULL, -- Message-ID of replied-to message > subject TEXT DEFAULT NULL, -- Message "subject" header field > numatch INTEGER DEFAULT NULL, -- Number of attachments > UNIQUE(path) > ); You might benefit from adding some performance-specific changes to your schema. For example, if you created a separate table for each emailid (call them Messages_1, Messages_2, etc). I expect that no one user will have an unbearable number of messages, thus each user will see reasonable performance when working with their mailbox. You can handle the management of this entirely in your application logic, but it might be better of you wrote stored procedures to access message tables -- to make it easier on the application side. -Bill
> To quickly find out a subset of slow queries on your production system, you > can use the pgfouine tool: > http://pgfouine.projects.postgresql.org/ > > If you then want to make some measurements of PostgreSQL performances with > some different settings and compare them, consider using the tsung tool (and > may be tsung-ploter companion tool to graph several benchs onto the same > charts for comparing purpose): > http://pgfouine.projects.postgresql.org/tsung.html > http://tsung.erlang-projects.org/ > http://debian.dalibo.org/unstable/ Thanks for all the links! I'll check these out when I get back to work.
> For scaling you should consider slony. Either hangout on #slony on > Freenode.net or ask on the mailing list if you have questions. For some reason I had thought slony was really immature, but it actually looks really usable. > Intel chips => define more. There are Intel boxes known to have issues > under specific load scenarios with PostgreSQL (again specific > versions). To make it funnier, these are really really hard to track > down ;) I can't access the Intel machine right now, but it's the current fastest Intel dual-core. I'll figure it out tomorrow. > Cluster. One box that applies changes, and multiple boxes that read > the data. > > If you cannot afford multiple boxes from the start, design your > application still to work with two connections: one connection to a > user with read/write permissions, and one connecting to a user having > only select permissions => this way you can later easily add a > loadbalancer to the mix, and use multiple postgres boxes for reading > stuff. I think I'll see what I can do for that. Is there an aggregation-type of clustering for Postgres? I'm thinking of something where database information is divided between machines, rather than shared among them as in Slony. Sort of like the difference between RAID0 and RAID1. Since my application is constantly adding to the database (far more is written than is ever read), it would be nice to have a multiple-write, single reader solution, if such a thing exists.
On Sun, 17 Dec 2006, tsuraan wrote: > Since my application is constantly adding to the database (far more is > written than is ever read), it would be nice to have a multiple-write, > single reader solution, if such a thing exists. You seem to be working from the assumption that you have a scaling issue, and that therefore you should be researching how to scale your app to more machines. I'm not so sure you do; I would suggest that you drop that entire idea for now, spend some time doing basic performance tuning for Postgres instead, and only after then consider adding more machines. It does little good to add more incorrectly setup servers to the mix, and solving the multiple-write problem is hard. Let's take a quick tour through your earlier messages: > My postgres settings are entirely default with the exception of > shared_buffers being set to 40,000 and max_connections set to 400. I'm > not sure what the meaning of most of the other settings are, so I > haven't touched them. http://www.powerpostgresql.com/Downloads/annotated_conf_80.html is a good intro to the various parameters you might set, with some valuable hints on the effective range you should be considering. I'd suggest you use that to identify the most likely things to increase, then read the manuals at http://www.postgresql.org/docs/8.2/interactive/runtime-config.html for more detail on what you're actually adjusting. To get you started, consider increasing effective_cache_size, checkpoint_segments, and work_mem; those are three whose defaults are very low for your application, relative to your hardware. The thought of how your poor database is suffering when trying to manage a heavy write load with the default checkpoint_segments in particular makes me sad, especially when we add: > The machines running the database servers are my home desktop (a > dual-core athlon 3200+ with 2GB RAM and a 120GB SATA II drive), and a > production server with two dual-core Intel chips, 4 GB RAM, and a RAID 5 > array of SATA II drives on a 3Ware 9550 controller. One big RAID 5 volume is probably the worst setup available for what you're doing. Luke already gave you a suggestion for testing write speed; you should run that test, but I wouldn't expect happy numbers there. You might be able to get by with the main database running like that, but think about what you'd need to do to add more disks (or reorganize the ones you have) so that you could dedicate a pair to a RAID-1 volume for holding the WAL. If you're limited by write performance, I think you'd find adding a separate WAL drive set a dramatically more productive upgrade than trying to split the app to another machine. Try it on your home machine first; that's a cheap upgrade, to add another SATA drive to there, and you should see a marked improvement (especially once you get the server parameters set to more appropriate values). I'd also suggest that you'd probably be able to get more help from people here if you posted a snippet of output from vmstat and iostat -x with a low interval (say 5 seconds) during a period where the machine was busy; that's helpful for figuring out where the bottleneck on your machine really is. > Trying to do "INSERT INTO Messages(path, msgid) SELECT (path, msgid) > FROM tmpMessages" took a really long time before psql died with an > out-of-memory error. Do you have the exact text of the error? I suspect you're falling victim to the default parameters being far too low here as well, but without the error it's hard to know exactly which. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
> http://www.powerpostgresql.com/Downloads/annotated_conf_80.html is a good > intro to the various parameters you might set, with some valuable hints on > the effective range you should be considering. I'd suggest you use that > to identify the most likely things to increase, then read the manuals at > http://www.postgresql.org/docs/8.2/interactive/runtime-config.html for > more detail on what you're actually adjusting. To get you started, > consider increasing effective_cache_size, checkpoint_segments, and > work_mem; those are three whose defaults are very low for your I'll play with these - I've definitely seen the (for me) confusing use of seqscan rather than index scan that the annotated page says is caused by too little effective_cache_size. That first link is really great; I can't believe I've never seen it before. > One big RAID 5 volume is probably the worst setup available for what > you're doing. Luke already gave you a suggestion for testing write speed; > you should run that test, but I wouldn't expect happy numbers there. You I've run dstat with a really busy postgres and seen 94 MB read and write simultaneously for a few seconds. I think our RAID cards have 16MB of RAM, so unless it was really freakish, I probably wasn't seeing all cache access. I'll try the some tests with dd tomorrow when I get to work. > might be able to get by with the main database running like that, but > think about what you'd need to do to add more disks (or reorganize the > ones you have) so that you could dedicate a pair to a RAID-1 volume for > holding the WAL. If you're limited by write performance, I think you'd > find adding a separate WAL drive set a dramatically more productive > upgrade than trying to split the app to another machine. Try it on your > home machine first; that's a cheap upgrade, to add another SATA drive to > there, and you should see a marked improvement (especially once you get > the server parameters set to more appropriate values). Is the WAL at the same location as the xlog (transaction log?)? The checkpoint_segments doc says increasing that value is really only useful if the xlog is separate from the data, so do I put both WAL and xlog on the separate drive, or is that automatic (or redundant; I don't know what I'm talking about...)? > I'd also suggest that you'd probably be able to get more help from people > here if you posted a snippet of output from vmstat and iostat -x with a > low interval (say 5 seconds) during a period where the machine was busy; > that's helpful for figuring out where the bottleneck on your machine > really is. I'll try to stress a machine and get some real stats soon. > Do you have the exact text of the error? I suspect you're falling victim > to the default parameters being far too low here as well, but without the > error it's hard to know exactly which. Well, I tried to repeat it on my home machine with 20 million rows, and it worked fine in about two minutes. I'll have to see what's going on on that other system... Thanks for the help!
tsuraan <tsuraan@gmail.com> writes: > Is the WAL at the same location as the xlog (transaction log?)? Same thing. > The checkpoint_segments doc says increasing that value is really only > useful if the xlog is separate from the data, Dunno where you read that, but it's utter bilge. If you've got a write-intensive workload, you want to crank checkpoint_segments as high as you can stand. With the default settings on a modern machine it's not hard at all to push it into checkpointing every dozen or seconds, which will completely kill performance. (Disk space for pg_xlog/ and potential delay during crash restart are the only negatives here. If you are willing to push the average inter-checkpoint interval past five minutes then you need to increase checkpoint_timeout too.) regards, tom lane