Thread: Performance Tuning, hardware-wise

Performance Tuning, hardware-wise

From
Frank Joerdens
Date:
I wonder why there is no mention at all in the documentation about hardware considerations
- aside from CPU and Memory stuff, which is probably a) _way_ to obvious (a faster CPU
will always make everything faster) and b) too bound up with both fincancials and also
this vast zoo that we have, of different brands and architectures, which is a minefield of
potential flamewars - what I mean is simple, straightforward stuff such as disk layout?!

I just stuck a 2nd hard drive into a box that will serve as a web/database server and it
seems glaringly obvious to me also (maybe that's the reason why there is no mention of
it?!), that that is a good idea. Because: If the same physical disk that contains the root
of the webserver also houses the PGDATA directory, then a call to a web page that is
enhanced with stuff from the database will have the disk almost simultaneously try to
access the static file and retrieve (a) row(s) from some table which is physically located
someplace elso on the disk. What with hard disks being the slowest devices in any computer
(aside from, of course, tape, floppy, cdrom etc.) by orders of magnitude, the plan to have
different drives for database and web root surely seems a good one. I am just wondering
whether this is all mad rambling or decent common sense. I would also be keen to have some
sense of _how_ much I can hope to gain by such measures.

A more extreme idea: If I know that the total size of my database won't grow over, say, a
gigabyte, then why shouldn't I invest my money in RAM (little more than $500 these days,
for a G of RAM), create a ramdisk of 1 G (provided my architecture can house that much RAM
but that's beside the point), and mount /usr/local/pgsql/data right there?!

Am I going mad?

- Frank

Re: Performance Tuning, hardware-wise

From
"Gordan Bobic"
Date:
[separate physical disks for web server and database]
> I am just wondering whether this is all mad rambling or decent common
sense.

IMHO, it's decent common sense. That's probably why it isn't documented.

> I would also be keen to have some sense of _how_ much I can hope to gain
> by such measures.

I think this is just another example of people expecting to know exactly
what measure will help how much in the performance of their unspecified
application.

THIS CANNOT BE DONE. Your mileage will vary. Use your common sense. If you
have a rather big, frequent random access select/insert/update database,
then disk speed is a very important issue, and if you can, you should put
it on a separate disk, if other things are also hammering the server's disk
subsystem.

But the improvement is not necessarily so obvious. If you have CGI
generated pages, then the hit will be taken by the CPU and the database,
not necessarily by what httpd does. If your files are comparatively small,
as CGIs generally are, then the chances are that with enough memory, they
will be cached anyway. Especially if they are perl CGIs which are running
using mod_perl. So, the disk hit for this will be minimal because they are
constantly used. Just something to consider...

> A more extreme idea: If I know that the total size of my database won't
grow
> over, say, a gigabyte, then why shouldn't I invest my money in RAM
(little
> more than $500 these days, for a G of RAM), create a ramdisk of 1 G
> (provided my architecture can house that much RAM but that's beside the
> point), and mount /usr/local/pgsql/data right there?!

Hmm... This sounds like something that I never understood completely. Back
in the Windows 3.1 days, some people were swearing by the idea of having a
swap file on a RAM disk. I never could figure that out, as more memory in
the main pool is a lot better than faster swap! Before you start flaming me
for mentioning Windows 3.1, this was 10 years ago when I hadn't discovered
UNIX yet. ;-)

Similarly, putting the database on a RAM disk can work, and no doubt, with
a ramdisk bigger than the database, you will see a vast improvement, if all
you ever run is the database, because the whole database will effectively,
already be in the memory "cache". However, this memory cannot be recycled
and used for anything else. This is why more memory is always a good idea,
but leaving the OS to work out the caching is usually the best way for
dealing with an average case scenario. Linux by default uses all free
memory for caching. So, if you have 100 MB of programs running, and have
500 MB of RAM, it will by default allocate 400 MB to cache. In my modest
experience, I've found that letting the OS figure it out is usually the
most optimal solution.

> Am I going mad?

I don't think so, you're just over-thinking about the issue. ;-)

Optimizing the hardware is simple. Start with something you can scrape
together from spares (within reason, of course). Run your application. If
your CPU load goes to nearly 100% for a long time, you need a faster CPU.
If your memory is always all used up and your machine starts swapping like
mad, you need more memory. If your disk is grinding to a halt, you need a
faster disk. Or any combination of the above. As is the case with a lot of
functionality-oriented development, it can be difficult to estimate
hardware requirements before you actually try to run the application.
Depending on what you do with Postgres, you may be happy with a 486 with 32
MB of RAM and a 500 MB disk. OTOH, a 16-way Alpha with 64 GB of RAM may be
insufficient for a different application that is also based on Postgres.
You need to know an awful lot about what it is you're trying to do before
you can reliably say what will improve performance most.


Regards.

Gordan


Re: Performance Tuning, hardware-wise

From
"Steve Wolfe"
Date:
> I wonder why there is no mention at all in the documentation about
hardware considerations
> - aside from CPU and Memory stuff, which is probably a) _way_ to obvious
(a faster CPU
> will always make everything faster) and b) too bound up with both
fincancials and also
> this vast zoo that we have, of different brands and architectures, which
is a minefield of
> potential flamewars - what I mean is simple, straightforward stuff such as
disk layout?!

    A truly bad disk layout can indeed cripple a database, but running Linux
and assuming you have enough RAM, the disk I/O has never been that much of a
problem for us.   I'll tell you about our setup, maybe it will help with
performance planning.

    When we were small, we used a single dual-processor machine with an IDE
drive, and ran the web server, CGI, and database all on the same machine.
Despite the efficiency of Postgres and Apache, as our load got significantly
heavier, that single box with twin 600's just simply couldn't cut it,
running at 100% CPU utilization most of the time.

    As for the drive in that machine, doing inserts on it was SLOW.   Slower
even than on our beater development machine.  I suppose I could have fiddled
with hdparm to increase the disk I/O, but that would have been a temporary
fix at best.  Our CGI applications were eating lots of CPU time, and we just
needed more processors.

     So, we moved to a large database machine to service a group of web
servers.  The database machine has four Xeon 700's in it, with 512 megs of
RAM.  For the disk, we did decide to go with a RAID array, for disk
redundancy as much as efficiency.  The fast disk array makes inserts go
tremedously fast, but for selects, well, it's still the same - you're
limitted by CPU and memory.   Why not disk I/O?  Well, the entire dataset is
in the disk cache anyway, so there's very little disk I/O involved with
selects.

     Our data directory is 340 megs, and the machine has 512 megs of RAM,
and Postgres is just too efficient with RAM.   ; )  I start up the
postmaster with 3072 buffer blocks, and yesterday increased the amount of
memory each postmaster can use for sorting from the default of 4 megs to 32
megs, which did give a small speed increase, but the machine *still* has at
least 120 megs completely unused at all times.  Maybe I'll give each backend
more RAM for sorting. : )

> A more extreme idea: If I know that the total size of my database won't
grow over, say, a
> gigabyte, then why shouldn't I invest my money in RAM (little more than
$500 these days,
> for a G of RAM), create a ramdisk of 1 G (provided my architecture can
house that much RAM
> but that's beside the point), and mount /usr/local/pgsql/data right
there?!

  Under Linux, RAM disks aren't much faster/better than regulsr disks, from
the benchmarks I've seen.  Besides, then a power-outtage can really hose
you. : )

Steve



Re: Performance Tuning, hardware-wise

From
"Gordan Bobic"
Date:
>     As for the drive in that machine, doing inserts on it was SLOW.
Slower
> even than on our beater development machine.  I suppose I could have
fiddled
> with hdparm to increase the disk I/O, but that would have been a
temporary
> fix at best. Our CGI applications were eating lots of CPU time, and we
just
> needed more processors.

If you needed more CPU power, then that's fair. However, bear in mind that
hdparm is NOT necessarily just a TEMPORARY fix. Using DMA modes can HALVE
the CPU utilization required for heavy disk I/O. It is also not uncommon to
increase the disk bandwidth by as much as four times by tuning the disks
with hdparm. I have seen tens, if not hundreds of Linux machines. I have
yet to see one that didn't benefit greatly from using hdparm to tune up
disk transfers. On my database development server, the CPU consumption on
heavy SELECT/INSERT/UPDATE setup went from around 60% to around 40%, and
the hdparm -t -T reported the increase from 4 MB/s on both buffer-cache and
buffer-disk transfers to about 50 MB/s and 20 MB/s respectively. It takes a
bit to get it right, but if you know your disk and motherboard spec, and
don't feel like adventuring into overclocking and overtuning, you are
pretty much guaranteed success on the first try. Then you just have to add
the tune-up to your startup scripts, preferably before the swap is enabled,
as in my experience, in a very limited number of cases, retuning the disk
after the swap has been started can cause some minor, random stability
problems.

>      So, we moved to a large database machine to service a group of web
> servers.  The database machine has four Xeon 700's in it, with 512 megs
of
> RAM.  For the disk, we did decide to go with a RAID array, for disk
> redundancy as much as efficiency.  The fast disk array makes inserts go
> tremedously fast, but for selects, well, it's still the same - you're
> limitted by CPU and memory.   Why not disk I/O?  Well, the entire dataset
is
> in the disk cache anyway, so there's very little disk I/O involved with
> selects.

Glad to hear that this worked for you, but some of us have a setup where
you have to randomly query a 8+ GB database. Having it all in cache just
isn't going to happen in that case and lots of disk I/O bandwidth is the
only feasible answer (I don't consider an 8 GB RAM disk to be a feasible
option)

>      Our data directory is 340 megs, and the machine has 512 megs of RAM,
> and Postgres is just too efficient with RAM.   ; )  I start up the
> postmaster with 3072 buffer blocks, and yesterday increased the amount of
> memory each postmaster can use for sorting from the default of 4 megs to
32
> megs, which did give a small speed increase, but the machine *still* has
at
> least 120 megs completely unused at all times.  Maybe I'll give each
backend
> more RAM for sorting. : )

You're lucky that you have a comparatively small data set. My data set is
about 20 times as big, and we were stargint to experience crashes (well,
things would abort, rather then crash as they are well behaved, but still)
because the machine would blow the 256MB of RAM and 256 MB of swap. And it
would grind to a halt long before that... 512 MB made a world of
difference... But as with all different applications, YMWV.

>   Under Linux, RAM disks aren't much faster/better than regulsr disks,
from
> the benchmarks I've seen.

I can second this. The caching seems to be just too efficient for a vast
improvement in a real-world application... I don't know about benchmarking,
though, as IMHO and experience, benchmark numbers are often meaningless
when used to assess a real-world situation.

> Besides, then a power-outtage can really hose
> you. : )

Well, that is entirely true, but technically, you ought to have a UPS if
you have a mission critical system. I have recently had a complete disk's
worth of data hosed due to power failure, as something went wrong and the
root inode got corrupted. Usefulness of backups is difficult to
overestimate...

HTH.

Gordan


RE: Performance Tuning, hardware-wise

From
"Willis, Ian (Ento, Canberra)"
Date:
What I would like to see is some simple documentation on what is the optimal
disk setup to maximise database throughput and what are the plans in this
area.
Can I separate the parts of the DB to seperate disks? Is is easy to do with
an operational db? Can I seperate data,  indexes logs, temp files etc onto
seperate disks. Where are the major areas of contention and what can be done
about them, which areas would have the most payoff, which optimisations
create little gain.
Is it worth doing this sort of manual optimisation in the present era of
raid sets and if so which raid types would be most useful in the above
scenario.


-----Original Message-----
From: Gordan Bobic [mailto:gordan@freeuk.com]
Sent: Saturday, 30 December 2000 5:06 AM
To: PostgreSQL General
Subject: Re: [GENERAL] Performance Tuning, hardware-wise


[separate physical disks for web server and database]
> I am just wondering whether this is all mad rambling or decent common
sense.

IMHO, it's decent common sense. That's probably why it isn't documented.

> I would also be keen to have some sense of _how_ much I can hope to gain
> by such measures.

I think this is just another example of people expecting to know exactly
what measure will help how much in the performance of their unspecified
application.

THIS CANNOT BE DONE. Your mileage will vary. Use your common sense. If you
have a rather big, frequent random access select/insert/update database,
then disk speed is a very important issue, and if you can, you should put
it on a separate disk, if other things are also hammering the server's disk
subsystem.

But the improvement is not necessarily so obvious. If you have CGI
generated pages, then the hit will be taken by the CPU and the database,
not necessarily by what httpd does. If your files are comparatively small,
as CGIs generally are, then the chances are that with enough memory, they
will be cached anyway. Especially if they are perl CGIs which are running
using mod_perl. So, the disk hit for this will be minimal because they are
constantly used. Just something to consider...

> A more extreme idea: If I know that the total size of my database won't
grow
> over, say, a gigabyte, then why shouldn't I invest my money in RAM
(little
> more than $500 these days, for a G of RAM), create a ramdisk of 1 G
> (provided my architecture can house that much RAM but that's beside the
> point), and mount /usr/local/pgsql/data right there?!

Hmm... This sounds like something that I never understood completely. Back
in the Windows 3.1 days, some people were swearing by the idea of having a
swap file on a RAM disk. I never could figure that out, as more memory in
the main pool is a lot better than faster swap! Before you start flaming me
for mentioning Windows 3.1, this was 10 years ago when I hadn't discovered
UNIX yet. ;-)

Similarly, putting the database on a RAM disk can work, and no doubt, with
a ramdisk bigger than the database, you will see a vast improvement, if all
you ever run is the database, because the whole database will effectively,
already be in the memory "cache". However, this memory cannot be recycled
and used for anything else. This is why more memory is always a good idea,
but leaving the OS to work out the caching is usually the best way for
dealing with an average case scenario. Linux by default uses all free
memory for caching. So, if you have 100 MB of programs running, and have
500 MB of RAM, it will by default allocate 400 MB to cache. In my modest
experience, I've found that letting the OS figure it out is usually the
most optimal solution.

> Am I going mad?

I don't think so, you're just over-thinking about the issue. ;-)

Optimizing the hardware is simple. Start with something you can scrape
together from spares (within reason, of course). Run your application. If
your CPU load goes to nearly 100% for a long time, you need a faster CPU.
If your memory is always all used up and your machine starts swapping like
mad, you need more memory. If your disk is grinding to a halt, you need a
faster disk. Or any combination of the above. As is the case with a lot of
functionality-oriented development, it can be difficult to estimate
hardware requirements before you actually try to run the application.
Depending on what you do with Postgres, you may be happy with a 486 with 32
MB of RAM and a 500 MB disk. OTOH, a 16-way Alpha with 64 GB of RAM may be
insufficient for a different application that is also based on Postgres.
You need to know an awful lot about what it is you're trying to do before
you can reliably say what will improve performance most.


Regards.

Gordan