Thread: performance optimzations

performance optimzations

From
Suchandra Thapa
Date:
I'm moving a webmail service over to use a postgresql database for
storage and wanted to get any tips for optimizing performance.  The
machine will be a multiprocessor (either 2 or 4 cpu ) system with a raid
array.  What layout should be used?  I was thinking using about using a
raid 1+0 array to hold the database but since I can use different array
types, would it be better to use 1+0 for the wal logs and a raid 5 for
the database?

The database gets fairly heavy activity (the system handles about 500MB
of incoming and about 750MB of outgoing emails daily).  I have a fairly
free rein in regards to the system's layout as well as how the
applications will interact with the database since I'm writing the
code.


--
Suchandra Thapa <s-thapa-11@alumni.uchicago.edu>

Attachment

Re: performance optimzations

From
Rod Taylor
Date:
On Wed, 2003-11-12 at 12:34, Suchandra Thapa wrote:
> I'm moving a webmail service over to use a postgresql database for
> storage and wanted to get any tips for optimizing performance.  The
> machine will be a multiprocessor (either 2 or 4 cpu ) system with a raid
> array.  What layout should be used?  I was thinking using about using a
> raid 1+0 array to hold the database but since I can use different array
> types, would it be better to use 1+0 for the wal logs and a raid 5 for
> the database?

How much in total storage?  If you have (or will have) > ~6 disks, go
for RAID 5 otherwise 10 is probably appropriate.

> The database gets fairly heavy activity (the system handles about 500MB
> of incoming and about 750MB of outgoing emails daily).  I have a fairly
> free rein in regards to the system's layout as well as how the
> applications will interact with the database since I'm writing the
> code.

These are archived permanently -- ~450GB of annual data? Or is the data
removed upon delivery?


Re: performance optimzations

From
Suchandra Thapa
Date:
On Wed, 2003-11-12 at 12:23, Rod Taylor wrote:
> On Wed, 2003-11-12 at 12:34, Suchandra Thapa wrote:
> > I'm moving a webmail service over to use a postgresql database for
> > storage and wanted to get any tips for optimizing performance.  The
> > machine will be a multiprocessor (either 2 or 4 cpu ) system with a raid
> > array.  What layout should be used?  I was thinking using about using a
> > raid 1+0 array to hold the database but since I can use different array
> > types, would it be better to use 1+0 for the wal logs and a raid 5 for
> > the database?
>
> How much in total storage?  If you have (or will have) > ~6 disks, go
> for RAID 5 otherwise 10 is probably appropriate.

I'm not sure but I believe there are about 6-8 10K scsi drives on the
system.   There is quite a bit of storage to spare currently so I think

> > The database gets fairly heavy activity (the system handles about 500MB
> > of incoming and about 750MB of outgoing emails daily).  I have a fairly
> > free rein in regards to the system's layout as well as how the
> > applications will interact with the database since I'm writing the
> > code.
>
> These are archived permanently -- ~450GB of annual data? Or is the data
> removed upon delivery?

No, it's more like hotmail.  Some users may keep mail for a longer term
but a lot of the mail probably gets deleted fairly quickly.  The
database load will be mixed with a insertions due to deliveries, queries
by the webmail system, and deletions from pop and webmail.

--
Suchandra Thapa <s-thapa-11@alumni.uchicago.edu>

Attachment

Re: performance optimzations

From
Neil Conway
Date:
Suchandra Thapa <s-thapa-11@alumni.uchicago.edu> writes:
> I was thinking using about using a raid 1+0 array to hold the
> database but since I can use different array types, would it be
> better to use 1+0 for the wal logs and a raid 5 for the database?

It has been recommended on this list that getting a RAID controller
with a battery-backed cache is pretty essential to getting good
performance. Search the list archives for lots more discussion about
RAID configurations.

-Neil


Re: performance optimzations

From
Suchandra Thapa
Date:
On Wed, 2003-11-12 at 16:29, Neil Conway wrote:
> Suchandra Thapa <s-thapa-11@alumni.uchicago.edu> writes:
> > I was thinking using about using a raid 1+0 array to hold the
> > database but since I can use different array types, would it be
> > better to use 1+0 for the wal logs and a raid 5 for the database?
>
> It has been recommended on this list that getting a RAID controller
> with a battery-backed cache is pretty essential to getting good
> performance. Search the list archives for lots more discussion about
> RAID configurations.

The server is already using a raid controller with battery backed ram
and the cache set to write back (the server is on a ups so power
failures shouldn't cause problems).    I'll look at the list archives
for RAID information.

--
Suchandra Thapa <s-thapa-11@alumni.uchicago.edu>

Attachment

Re: performance optimzations

From
Rod Taylor
Date:
> > How much in total storage?  If you have (or will have) > ~6 disks, go
> > for RAID 5 otherwise 10 is probably appropriate.
>
> I'm not sure but I believe there are about 6-8 10K scsi drives on the
> system.   There is quite a bit of storage to spare currently so I think

I see.. With 8 drives, you'll probably want to go with RAID 5. It grows
beyond that point fairly well with a decent controller card. Be sure to
have some battery backed write cache on the raid card (128MB goes a long
way).

> > > The database gets fairly heavy activity (the system handles about 500MB
> > > of incoming and about 750MB of outgoing emails daily).  I have a fairly

> No, it's more like hotmail.  Some users may keep mail for a longer term
> but a lot of the mail probably gets deleted fairly quickly.  The
> database load will be mixed with a insertions due to deliveries, queries
> by the webmail system, and deletions from pop and webmail.

You might consider having the mailserver gzip the emails prior to
injection into the database (turn off compression in PostgreSQL) and
decompress the data on the webserver for display to the client. Now you
have about 7 times the number of emails in memory.

It's easier to toss a webserver at the problem than make the database
bigger in size. Take the savings in CPU on the DB and add it to ram.

1200MB of compressed mail is about 200MB? Assume email descriptive
material (subject, from, etc.), account structure, indexes... so about
400MB for one days worth of information?

You may want to consider keeping the compressed email in a separate
table than the information describing it. It would mean descriptive
information is more likely to be in RAM, where the body probably doesn't
matter as much (you view them 1 at a time, subjects tend to be listed
all at once).

Most clients will be interested in say the last 7 days worth of data?
Great.. Start out with 4GB ram on a good Dual CPU -- Opterons seem to
work quite well -- and make sure the motherboard can hold double that in
memory for an upgrade sometime next year when you've become popular.

I firmly believe lots of RAM is the answer to most IO issues until you
start getting into large sets of active data (>50GB). 64GB ram is fairly
cheap compared to ongoing maintenance of the 30+ drive system required
to get decent throughput.


Re: performance optimzations

From
Suchandra Thapa
Date:
On Wed, 2003-11-12 at 22:35, Rod Taylor wrote:
> You may want to consider keeping the compressed email in a separate
> table than the information describing it. It would mean descriptive
> information is more likely to be in RAM, where the body probably doesn't
> matter as much (you view them 1 at a time, subjects tend to be listed
> all at once).

Thanks for the suggestions.  Splitting the load between several machines
was the original intent of moving the storage from the file system to a
database.  I believe the schema I'm already using splits out the body
due to the size of some attachments.  Luckily the code already gzips the
email body and abbreviates common email headers so storing compressed
emails isn't a problem.

> Most clients will be interested in say the last 7 days worth of data?
> Great.. Start out with 4GB ram on a good Dual CPU -- Opterons seem to
> work quite well -- and make sure the motherboard can hold double that in
> memory for an upgrade sometime next year when you've become popular.

Unfortunately, the hardware available is pretty much fixed in regards to
the system.  I can play around with the raid configurations and have
some limited choice in regards to the raid controller and number of
drivers but that's about all in terms of hardware.

> I firmly believe lots of RAM is the answer to most IO issues until you
> start getting into large sets of active data (>50GB). 64GB ram is fairly
> cheap compared to ongoing maintenance of the 30+ drive system required
> to get decent throughput.

The current file system holding the user and email information indicates
the current data has about 64GB (70K accounts, I'm not sure how many are
active but 50% might be good guess).  This seems to be somewhat of a
steady state however.

--
Suchandra Thapa <s-thapa-11@alumni.uchicago.edu>

Attachment

Re: performance optimzations

From
Rod Taylor
Date:
> > Most clients will be interested in say the last 7 days worth of data?
> > Great.. Start out with 4GB ram on a good Dual CPU -- Opterons seem to
> > work quite well -- and make sure the motherboard can hold double that in
> > memory for an upgrade sometime next year when you've become popular.
>
> Unfortunately, the hardware available is pretty much fixed in regards to
> the system.  I can play around with the raid configurations and have
> some limited choice in regards to the raid controller and number of
> drivers but that's about all in terms of hardware.

Good luck then. Unless the configuration takes into account incremental
additions in ram and disk, sustained growth could get very expensive. I
guess that depends on the business plan expectations.

This just puts more emphasis to offload everything you can onto machines
that can multiply.

> The current file system holding the user and email information indicates
> the current data has about 64GB (70K accounts, I'm not sure how many are
> active but 50% might be good guess).  This seems to be somewhat of a
> steady state however.

35k clients checking their mail daily isn't so bad. Around 10 pages per
second peak load?