Thread: Question on hardware & server capacity

Question on hardware & server capacity

From
"Steve Wolfe"
Date:
  Well, our current database server is getting tremendously loaded, and
right now there isn't a clear-cut choice as to an upgrade path - at least
not within the commodity hardware market.

  The machine is a dual AthlonMP 2000, with 2 gigs of RAM.  the loads on
the machine are getting out of hand, and performance is noticeably slowed.
'top' shows the CPU's as being anywhere from 30% to 50% idle, with (on
average) 5-10 postmasters in the "non-idle" state.  'vmstat' shows bi/bo
pegged at zero (copious quantities of disk cache, fsync turned off),
interrupts fluctuating between 200 and 1,000 per second (avg. is approx
400), context switches between 1300 and 4500 (avg. is approx 2300).  I
logged some queries, and found that in an average second, the machine
forks off 10 new backends, and responds to 50 selects and 3 updates.

  My feelings are that the machine is being swamped by both the number of
context switches and the I/O, most likely the memory bandwidth.  I'm
working on implementing some connection pooling to reduce the number of
new backends forked off, but there's not much I can do about the sheer
volume (or cost) of queries.

  Now, if quad-Hammers were here, I'd simply throw hardware at it.
Unfortunately, they're not.  So far, about the only commodity-level answer
I can think of would be a dual P4 Xeon, with the 533 MHz bus, and
dual-channel DDR memory.  That would give each processor approximately
double the memory bandwidth over what we're currently running.

  I'm fairly sure that would at least help lower the load, but I'm not
sure by how much.  If anyone has run testing under similar platforms, I'd
love to hear of the performance difference.  If this is going to chop the
loads in half, I'll do it.  If it's only going to improve it by 10% or so,
I'm not going to waste the money.

Steve


Re: Question on hardware & server capacity

From
Tom Lane
Date:
"Steve Wolfe" <nw@codon.com> writes:
> I logged some queries, and found that in an average second, the machine
> forks off 10 new backends, and responds to 50 selects and 3 updates.

So an average backend only processes ~ 5 queries before exiting?

>   My feelings are that the machine is being swamped by both the number of
> context switches and the I/O, most likely the memory bandwidth.

I think you're getting killed by the lack of connection pooling.
Launching a new backend is moderately expensive: there's not just the
OS-level fork overhead, but significant cost to fill the catalog caches
to useful levels, etc.

7.3 has reduced some of those startup costs a little, so if you're still
on 7.2 then an update might help.  But I'd strongly recommend getting
connection re-use in place before you go off and buy hardware.

            regards, tom lane

Re: Question on hardware & server capacity

From
"Steve Wolfe"
Date:
> So an average backend only processes ~ 5 queries before exiting?
>

> 7.3 has reduced some of those startup costs a little, so if you're still
> on 7.2 then an update might help.  But I'd strongly recommend getting
> connection re-use in place before you go off and buy hardware.

  I've been fooling around with some connection pooling, and it hasn't
make the sort of difference we're looking for.  Going from 3 queries per
back-end to 100 queries per backend made only about a 20% difference.
While that's nothing to scoff at, we're looking for at least a 100%
improvement.  Either way, the connection pooling WILL be put in place, but
I'm certainly not counting on it preventing the need for a hardware
upgrade.

steve


Re: Question on hardware & server capacity

From
Hannu Krosing
Date:
Steve Wolfe kirjutas N, 02.01.2003 kell 22:42:
>   Well, our current database server is getting tremendously loaded, and
> right now there isn't a clear-cut choice as to an upgrade path - at least
> not within the commodity hardware market.

Have you optimized your queries to max ?

Often one or two of the queries take most of resources and starve
others.

>   The machine is a dual AthlonMP 2000, with 2 gigs of RAM.  the loads on
> the machine are getting out of hand, and performance is noticeably slowed.
> 'top' shows the CPU's as being anywhere from 30% to 50% idle, with (on
> average) 5-10 postmasters in the "non-idle" state.  'vmstat' shows bi/bo
> pegged at zero (copious quantities of disk cache, fsync turned off),

Could there be some unnecessary trashing between OS and PG caches ?
How could this be detected ?

> interrupts fluctuating between 200 and 1,000 per second (avg. is approx
> 400), context switches between 1300 and 4500 (avg. is approx 2300).  I
> logged some queries, and found that in an average second, the machine
> forks off 10 new backends, and responds to 50 selects and 3 updates.

What are the average times for query responses ?

Will running the same queries (the ones from the logs) serially run
faster/slower/at the same speed ?

Do you have some triggers on updates - I have occasionally found them to
be real performance killers.

Also - if memory bandwidth is the issue, you could tweak the parameters
so that PG will prefer index scans more often - there are rumors that
under heavy loads it is often better to use more index scans due to
possible smaller memory/buffer use, even if they would be slower for
only one or two backends.

>   My feelings are that the machine is being swamped by both the number of
> context switches and the I/O, most likely the memory bandwidth.  I'm
> working on implementing some connection pooling to reduce the number of
> new backends forked off, but there's not much I can do about the sheer
> volume (or cost) of queries.

You could try to replicate the updates (one master - multiple slaves)
and distribute the selects. I guess this is what current postgreSQL
state-of-the-art already lets you do with reasonable effort.

> Now, if quad-Hammers were here, I'd simply throw hardware at it.
> Unfortunately, they're not.

Yes, it's BAD if your business grows faster than Moores law ;-p

>   So far, about the only commodity-level answer
> I can think of would be a dual P4 Xeon, with the 533 MHz bus, and
> dual-channel DDR memory.  That would give each processor approximately
> double the memory bandwidth over what we're currently running.
>
>   I'm fairly sure that would at least help lower the load, but I'm not
> sure by how much.  If anyone has run testing under similar platforms, I'd
> love to hear of the performance difference.

How big is the dataset ? What kinds of queries ?

I could perhaps run some quick tests on quad Xeon 1.40GHz , 2GB before
this box goes to production sometime early next week. It is a RedHat
AS2.1 box with rh-postgresql-7.2.3-1_as21.

# hdparm -tT /dev/sda

/dev/sda:
 Timing buffer-cache reads:   128 MB in  0.39 seconds =328.21 MB/sec
 Timing buffered disk reads:  64 MB in  1.97 seconds = 32.49 MB/sec

>  If this is going to chop the
> loads in half, I'll do it.  If it's only going to improve it by 10% or so,
> I'm not going to waste the money.

--
Hannu Krosing <hannu@tm.ee>

Re: Question on hardware & server capacity

From
"Steve Wolfe"
Date:
> Have you optimized your queries to max ?
>
> Often one or two of the queries take most of resources and starve
> others.

   I did log a good number of queries and analyze them, and 69% of the
queries issued are from one particular application, and they consume 78%
of the total "cost".  The developper is looking into optimizations, but it
doesn't look like there's going to be any low-hanging fruit.  It's simply
a complicated and frequently-used app.

> Could there be some unnecessary trashing between OS and PG caches ?
> How could this be detected ?

  The machine generally has a minimum of a hundred megs free, unused
memory, so I'm not terribly worried about memory thrashing.  I've
increased the various tuneable parameters (buffer blocks, sort mem, etc.)
to the point where performance increases stopped, then I doubled them all
for good measure.  I've already decided that the next machine will have at
least 4 gigs of RAM, just because RAM's cheap, and having too much is a
Good Thing.

> Do you have some triggers on updates - I have occasionally found them to
> be real performance killers.

  There are a few triggers, but not many - and the number of updates is
extremely low relative to the number of inserts.

> Yes, it's BAD if your business grows faster than Moores law ;-p

  .. unfortunately, that's been the case.   Each year we've done slightly
more than double the traffic of the previous year - and at the same time,
as we unify all of our various data sources, the new applications that we
develop tend to make greater and greater demands on the database server.
There is always the option of the "big iron", but your
cost-per-transaction shoots through the roof.  Paying a 10x premium can
really hurt. : )

> How big is the dataset ? What kinds of queries ?

    our ~postgres/data/base is currently 3.4 gigs.

> I could perhaps run some quick tests on quad Xeon 1.40GHz , 2GB before
> this box goes to production sometime early next week. It is a RedHat
> AS2.1 box with rh-postgresql-7.2.3-1_as21.

  I'd appreciate that!

steve


Re: Question on hardware & server capacity

From
"scott.marlowe"
Date:
On Fri, 3 Jan 2003, Steve Wolfe wrote:

> > Have you optimized your queries to max ?
> >
> > Often one or two of the queries take most of resources and starve
> > others.
>
>    I did log a good number of queries and analyze them, and 69% of the
> queries issued are from one particular application, and they consume 78%
> of the total "cost".  The developper is looking into optimizations, but it
> doesn't look like there's going to be any low-hanging fruit.  It's simply
> a complicated and frequently-used app.
>
> > Could there be some unnecessary trashing between OS and PG caches ?
> > How could this be detected ?
>
>   The machine generally has a minimum of a hundred megs free, unused
> memory, so I'm not terribly worried about memory thrashing.  I've
> increased the various tuneable parameters (buffer blocks, sort mem, etc.)
> to the point where performance increases stopped, then I doubled them all
> for good measure.  I've already decided that the next machine will have at
> least 4 gigs of RAM, just because RAM's cheap, and having too much is a
> Good Thing.

Actually, free memory doesn't mean a whole lot.  How much memory is being
used as cache by the kernel?  I've found that as long as the kernel is
caching more data than postgresql, performance is better than when
postgresql starts using more memory than the OS.  for example, on my boxes
at work, we have 1.5 gigs ram, and 256 megs are allocated to pgsql as
shared buffer.  The Linux kernel on those boxes has 100 megs free mem and
690 megs cached.  The first time a heavy query runs there's a lag as the
dataset is read into memory, but then subsequent queries fly.

My experience has been that under Liunx (2.4.9 kernel RH7.2) the file
system caching is better performance wise for very large amounts of data
(500 Megs or more) than the postgresql shared buffers are.  I.e. it would
seem that when Postgresql has a large amount of shared memory to keep
track of, it's quicker to just issue a request to the OS if the data is in
the file cache than it is to look it up in postgresql's own shared memory
buffers.  The knee for me is somewhere between 32 megs and 512 megs memory
to postgresql and twice that on average or a little more to the kernel
file caches.

> Yes, it's BAD if your business grows faster than Moores law ;-p
>
>   .. unfortunately, that's been the case.   Each year we've done slightly
> more than double the traffic of the previous year - and at the same time,
> as we unify all of our various data sources, the new applications that we
> develop tend to make greater and greater demands on the database server.
> There is always the option of the "big iron", but your
> cost-per-transaction shoots through the roof.  Paying a 10x premium can
> really hurt. : )

Can you distribute your dataset across multiple machines?  or is it the
kinda thing that all needs to be in one big machine?

Well, good luck with all this.


Re: Question on hardware & server capacity

From
"Steve Wolfe"
Date:
> Actually, free memory doesn't mean a whole lot.  How much memory is
being
> used as cache by the kernel?

  Generally, a gig or so.

> Can you distribute your dataset across multiple machines?  or is it the
> kinda thing that all needs to be in one big machine?

  We're splitting the front-end across a number of machines, but all of
the various datasets are sufficiently intertwined that they all have to be
in the same database.  I'm going to fiddle around with some of the
available replication options and see if they're robust enough to put them
into production.

steve


Re: Question on hardware & server capacity

From
Josh Berkus
Date:
Steve,

>   We're splitting the front-end across a number of machines, but all of
> the various datasets are sufficiently intertwined that they all have to be
> in the same database.  I'm going to fiddle around with some of the
> available replication options and see if they're robust enough to put them
> into production.

2 other suggestions:

1. Both PostgreSQL Inc. and Command Prompt Inc. have some sort of pay-for HA
solution for Postgres.   Paying them may end up being cheaper than
improvising this yourself.

2. Zapatec Inc. has acheived impressive performance gains by putting the
database on a high-speed, HA gigabit NAS server and having a few "client
servers" handle incoming queries.     You may want to experiment along these
lines.

--
-Josh Berkus
 Aglio Database Solutions
 San Francisco