Thread: performance optimzations
I'm moving a webmail service over to use a postgresql database for storage and wanted to get any tips for optimizing performance. The machine will be a multiprocessor (either 2 or 4 cpu ) system with a raid array. What layout should be used? I was thinking using about using a raid 1+0 array to hold the database but since I can use different array types, would it be better to use 1+0 for the wal logs and a raid 5 for the database? The database gets fairly heavy activity (the system handles about 500MB of incoming and about 750MB of outgoing emails daily). I have a fairly free rein in regards to the system's layout as well as how the applications will interact with the database since I'm writing the code. -- Suchandra Thapa <s-thapa-11@alumni.uchicago.edu>
Attachment
On Wed, 2003-11-12 at 12:34, Suchandra Thapa wrote: > I'm moving a webmail service over to use a postgresql database for > storage and wanted to get any tips for optimizing performance. The > machine will be a multiprocessor (either 2 or 4 cpu ) system with a raid > array. What layout should be used? I was thinking using about using a > raid 1+0 array to hold the database but since I can use different array > types, would it be better to use 1+0 for the wal logs and a raid 5 for > the database? How much in total storage? If you have (or will have) > ~6 disks, go for RAID 5 otherwise 10 is probably appropriate. > The database gets fairly heavy activity (the system handles about 500MB > of incoming and about 750MB of outgoing emails daily). I have a fairly > free rein in regards to the system's layout as well as how the > applications will interact with the database since I'm writing the > code. These are archived permanently -- ~450GB of annual data? Or is the data removed upon delivery?
On Wed, 2003-11-12 at 12:23, Rod Taylor wrote: > On Wed, 2003-11-12 at 12:34, Suchandra Thapa wrote: > > I'm moving a webmail service over to use a postgresql database for > > storage and wanted to get any tips for optimizing performance. The > > machine will be a multiprocessor (either 2 or 4 cpu ) system with a raid > > array. What layout should be used? I was thinking using about using a > > raid 1+0 array to hold the database but since I can use different array > > types, would it be better to use 1+0 for the wal logs and a raid 5 for > > the database? > > How much in total storage? If you have (or will have) > ~6 disks, go > for RAID 5 otherwise 10 is probably appropriate. I'm not sure but I believe there are about 6-8 10K scsi drives on the system. There is quite a bit of storage to spare currently so I think > > The database gets fairly heavy activity (the system handles about 500MB > > of incoming and about 750MB of outgoing emails daily). I have a fairly > > free rein in regards to the system's layout as well as how the > > applications will interact with the database since I'm writing the > > code. > > These are archived permanently -- ~450GB of annual data? Or is the data > removed upon delivery? No, it's more like hotmail. Some users may keep mail for a longer term but a lot of the mail probably gets deleted fairly quickly. The database load will be mixed with a insertions due to deliveries, queries by the webmail system, and deletions from pop and webmail. -- Suchandra Thapa <s-thapa-11@alumni.uchicago.edu>
Attachment
Suchandra Thapa <s-thapa-11@alumni.uchicago.edu> writes: > I was thinking using about using a raid 1+0 array to hold the > database but since I can use different array types, would it be > better to use 1+0 for the wal logs and a raid 5 for the database? It has been recommended on this list that getting a RAID controller with a battery-backed cache is pretty essential to getting good performance. Search the list archives for lots more discussion about RAID configurations. -Neil
On Wed, 2003-11-12 at 16:29, Neil Conway wrote: > Suchandra Thapa <s-thapa-11@alumni.uchicago.edu> writes: > > I was thinking using about using a raid 1+0 array to hold the > > database but since I can use different array types, would it be > > better to use 1+0 for the wal logs and a raid 5 for the database? > > It has been recommended on this list that getting a RAID controller > with a battery-backed cache is pretty essential to getting good > performance. Search the list archives for lots more discussion about > RAID configurations. The server is already using a raid controller with battery backed ram and the cache set to write back (the server is on a ups so power failures shouldn't cause problems). I'll look at the list archives for RAID information. -- Suchandra Thapa <s-thapa-11@alumni.uchicago.edu>
Attachment
> > How much in total storage? If you have (or will have) > ~6 disks, go > > for RAID 5 otherwise 10 is probably appropriate. > > I'm not sure but I believe there are about 6-8 10K scsi drives on the > system. There is quite a bit of storage to spare currently so I think I see.. With 8 drives, you'll probably want to go with RAID 5. It grows beyond that point fairly well with a decent controller card. Be sure to have some battery backed write cache on the raid card (128MB goes a long way). > > > The database gets fairly heavy activity (the system handles about 500MB > > > of incoming and about 750MB of outgoing emails daily). I have a fairly > No, it's more like hotmail. Some users may keep mail for a longer term > but a lot of the mail probably gets deleted fairly quickly. The > database load will be mixed with a insertions due to deliveries, queries > by the webmail system, and deletions from pop and webmail. You might consider having the mailserver gzip the emails prior to injection into the database (turn off compression in PostgreSQL) and decompress the data on the webserver for display to the client. Now you have about 7 times the number of emails in memory. It's easier to toss a webserver at the problem than make the database bigger in size. Take the savings in CPU on the DB and add it to ram. 1200MB of compressed mail is about 200MB? Assume email descriptive material (subject, from, etc.), account structure, indexes... so about 400MB for one days worth of information? You may want to consider keeping the compressed email in a separate table than the information describing it. It would mean descriptive information is more likely to be in RAM, where the body probably doesn't matter as much (you view them 1 at a time, subjects tend to be listed all at once). Most clients will be interested in say the last 7 days worth of data? Great.. Start out with 4GB ram on a good Dual CPU -- Opterons seem to work quite well -- and make sure the motherboard can hold double that in memory for an upgrade sometime next year when you've become popular. I firmly believe lots of RAM is the answer to most IO issues until you start getting into large sets of active data (>50GB). 64GB ram is fairly cheap compared to ongoing maintenance of the 30+ drive system required to get decent throughput.
On Wed, 2003-11-12 at 22:35, Rod Taylor wrote: > You may want to consider keeping the compressed email in a separate > table than the information describing it. It would mean descriptive > information is more likely to be in RAM, where the body probably doesn't > matter as much (you view them 1 at a time, subjects tend to be listed > all at once). Thanks for the suggestions. Splitting the load between several machines was the original intent of moving the storage from the file system to a database. I believe the schema I'm already using splits out the body due to the size of some attachments. Luckily the code already gzips the email body and abbreviates common email headers so storing compressed emails isn't a problem. > Most clients will be interested in say the last 7 days worth of data? > Great.. Start out with 4GB ram on a good Dual CPU -- Opterons seem to > work quite well -- and make sure the motherboard can hold double that in > memory for an upgrade sometime next year when you've become popular. Unfortunately, the hardware available is pretty much fixed in regards to the system. I can play around with the raid configurations and have some limited choice in regards to the raid controller and number of drivers but that's about all in terms of hardware. > I firmly believe lots of RAM is the answer to most IO issues until you > start getting into large sets of active data (>50GB). 64GB ram is fairly > cheap compared to ongoing maintenance of the 30+ drive system required > to get decent throughput. The current file system holding the user and email information indicates the current data has about 64GB (70K accounts, I'm not sure how many are active but 50% might be good guess). This seems to be somewhat of a steady state however. -- Suchandra Thapa <s-thapa-11@alumni.uchicago.edu>
Attachment
> > Most clients will be interested in say the last 7 days worth of data? > > Great.. Start out with 4GB ram on a good Dual CPU -- Opterons seem to > > work quite well -- and make sure the motherboard can hold double that in > > memory for an upgrade sometime next year when you've become popular. > > Unfortunately, the hardware available is pretty much fixed in regards to > the system. I can play around with the raid configurations and have > some limited choice in regards to the raid controller and number of > drivers but that's about all in terms of hardware. Good luck then. Unless the configuration takes into account incremental additions in ram and disk, sustained growth could get very expensive. I guess that depends on the business plan expectations. This just puts more emphasis to offload everything you can onto machines that can multiply. > The current file system holding the user and email information indicates > the current data has about 64GB (70K accounts, I'm not sure how many are > active but 50% might be good guess). This seems to be somewhat of a > steady state however. 35k clients checking their mail daily isn't so bad. Around 10 pages per second peak load?