Thread: Dream Server?

Dream Server?

From
"Gavin M. Roy"
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I'm currently running a 4 Gig 11 million row database and am looking
to build a "dream server" for it.   I am currently running a dual p3
1GHz system with 10k rpm scsi drives and 4 gigs of ram (which I have
configured pgsql to use all of) and I'm concerned about performance
once the db doubles in size, which should be in 6 mos to a year at
the latest.  First off, If money was no concern, what would you buy
as the ultimate postgresql server running linux?  Second off, on a
more technical note, does pgsql take advantage of multiple
processors.  If I had a 8 way 800 MHz Xeon would the machine blow
away a 2GHz P4?  How much is CPU a factor compared to memory?  Disk
speed?  I want to be able to do large volume selects on tables with
more than 5 million rows and not have the server blink at other
requests put in at the same time.

Any hints or suggestions/experience here would be appreciated.

Thanks,

Gavin

-----BEGIN PGP SIGNATURE-----
Version: PGPfreeware 7.0.3 for non-commercial use <http://www.pgp.com>

iQA/AwUBPGGlgn9xeBXAlKqsEQJPywCfenV2bHDKJ0czKLy2qbaoj+hiow8AoNaT
9gYOTvmzFR9+YIjA5MQwjSMN
=B6Ak
-----END PGP SIGNATURE-----


Re: Dream Server?

From
"Robert J. Sanford, Jr."
Date:
here's a link for a sql server performance tuning guide...

http://www.sql-server-performance.com/hardware_tuning.asp

some of the nuggets from it is:
   "When selecting your CPU for your server, select one
   with a large L2 cache. This is especially important if
   you have multiple-processor servers. Select at least a
   1MB L2 cache if you have one or two CPUs. If you have
   4 or more CPUs, get at a least 2MB L2 cache in each
   CPU. The greater the size of the L2 cache, the greater
   the server's CPU performance because it reduces the
   amount of wait time experienced by the CPU when reading
   and writing data to main memory."

   "From a performance perspective, it is better to have
   more smaller SCSI disk drives in an array than having
   fewer larger SCSI disk drives. Let's say that you need
   about 100GB of hard disk space in a RAID 5 array. There
   are several ways you can configure such an array, some
   of which offer more performance than others. Some
   configurations include:

   13 - 9GB Drives

   7 - 18GB Drives

   4 - 36GB Drives

   Each of the above configurations will provide about the
   same amount of storage space, but the more drives there
   are in the array, the faster the I/O will be (assuming
   that the controllers can handle all of the I/O traffic).
   This is because more drives offer more read/write heads
   that all can be working simultaneously, which speeds
   disk reads and writes."

   *****

   "Select the best I/O controller you can get. Top-notch
   controllers offload much of the I/O work onto its own
   local CPU, freeing up CPU time on the server to do other
   tasks. For the ultimate in I/O controllers, consider a
   fiber channel connection instead of a SCSI connection."

rjsjr

> -----Original Message-----
> From: pgsql-general-owner@postgresql.org
> [mailto:pgsql-general-owner@postgresql.org]On Behalf Of Gavin M. Roy
> Sent: Wednesday, February 06, 2002 3:52 PM
> To: pgsql-general@postgresql.org
> Subject: [GENERAL] Dream Server?
>
>
>
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> I'm currently running a 4 Gig 11 million row database and am looking
> to build a "dream server" for it.   I am currently running a dual p3
> 1GHz system with 10k rpm scsi drives and 4 gigs of ram (which I have
> configured pgsql to use all of) and I'm concerned about performance
> once the db doubles in size, which should be in 6 mos to a year at
> the latest.  First off, If money was no concern, what would you buy
> as the ultimate postgresql server running linux?  Second off, on a
> more technical note, does pgsql take advantage of multiple
> processors.  If I had a 8 way 800 MHz Xeon would the machine blow
> away a 2GHz P4?  How much is CPU a factor compared to memory?  Disk
> speed?  I want to be able to do large volume selects on tables with
> more than 5 million rows and not have the server blink at other
> requests put in at the same time.
>
> Any hints or suggestions/experience here would be appreciated.
>
> Thanks,
>
> Gavin
>
> -----BEGIN PGP SIGNATURE-----
> Version: PGPfreeware 7.0.3 for non-commercial use <http://www.pgp.com>
>
> iQA/AwUBPGGlgn9xeBXAlKqsEQJPywCfenV2bHDKJ0czKLy2qbaoj+hiow8AoNaT
> 9gYOTvmzFR9+YIjA5MQwjSMN
> =B6Ak
> -----END PGP SIGNATURE-----
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo@postgresql.org so that your
> message can get through to the mailing list cleanly
>

Re: Dream Server?

From
"Steve Wolfe"
Date:
> I'm currently running a 4 Gig 11 million row database and am looking
> to build a "dream server" for it.   I am currently running a dual p3
> 1GHz system with 10k rpm scsi drives and 4 gigs of ram (which I have
> configured pgsql to use all of) and I'm concerned about performance
> once the db doubles in size, which should be in 6 mos to a year at
> the latest.  First off, If money was no concern, what would you buy
> as the ultimate postgresql server running linux?

  I don't know if Linux runs on the new version of the Sun Starfires, does
it?

> Second off, on a
> more technical note, does pgsql take advantage of multiple
> processors.

  If you are running multiple connections, then yes, it does.  But having
an extra CPU around to handle kernel code, disk activity, network
activity, etc. does help even for a single connection.

> If I had a 8 way 800 MHz Xeon would the machine blow
> away a 2GHz P4?  How much is CPU a factor compared to memory?

   Both CPU and memory have to be taken into account.  It does you no good
to have a 10 GHz chip if your memory subsystem can't provide the data to
feed the chip, and likewise, having 10 GB/sec throughput from your memory
system does no good if you are running a 386!  If you're using extremely
large tables (which it sounds like you will be), then memory throughput is
a very critical factor.

  As for the 8-Xeon to 2-P4, in total throughput, yes, the Xeon should
blow away the P4, assuming that you were talking about overall throughput
with multiple connections, and the 8-way machine had a memory subsystem
that was up to the task.  If you're talking about a single query on a
single connection, then the P4 may very well beat the Xeon.  Ideally, your
SMP machine should require you to add DIMMS in groups equal to the number
of your processors - in this case, 8 DIMMS at a time.  Otherwise, you
simply can't give each chip the full bandwidth.  One interesting solution
would be to interleave 4 banks of DDR Ram to feed 8 Xeons, but I doubt
we'll ever see that in production.

> Disk
> speed?  I want to be able to do large volume selects on tables with
> more than 5 million rows and not have the server blink at other
> requests put in at the same time.

  If you have enough RAM to keep the entire database in disk cache, then
disk speed becomes much less of a factor, if you turn of fsync().  Like
I've said plenty of times before (and probably bored everyone to tears),
the lights on our DB machine only blink *occasionally*, even when the
machine is literally being slammed with database activity.  A hardware
RAID card with cache on the board and some moderately-decent drives can
give you a VERY fast, responsive disk subsystem.

  In a day or two, I'll be getting the parts to build a dual Athlon MP
1800+ machine, and I'm planning on putting PG on it and testing it against
our 4x700 MHz Xeon machine, to see how it fares.  Once I do, I'm planning
on making a full report to the list.  If anyone has a certain PG benchmark
that they'd like me to run, let me know.

steve




Re: Dream Server?

From
Medi Montaseri
Date:
Perhaps we should think wide not tall. As the pontiac commercial says, wider
is
better. Build a distributed database. Increasing height of a box does not
scale.
Amdahl proved it.

Perhaps someone can help with some links, it have seen references to it on
ha-linux groups.

Steve Wolfe wrote:

> > I'm currently running a 4 Gig 11 million row database and am looking
> > to build a "dream server" for it.   I am currently running a dual p3
> > 1GHz system with 10k rpm scsi drives and 4 gigs of ram (which I have
> > configured pgsql to use all of) and I'm concerned about performance
> > once the db doubles in size, which should be in 6 mos to a year at
> > the latest.  First off, If money was no concern, what would you buy
> > as the ultimate postgresql server running linux?
>
>   I don't know if Linux runs on the new version of the Sun Starfires, does
> it?
>
> > Second off, on a
> > more technical note, does pgsql take advantage of multiple
> > processors.
>
>   If you are running multiple connections, then yes, it does.  But having
> an extra CPU around to handle kernel code, disk activity, network
> activity, etc. does help even for a single connection.
>
> > If I had a 8 way 800 MHz Xeon would the machine blow
> > away a 2GHz P4?  How much is CPU a factor compared to memory?
>
>    Both CPU and memory have to be taken into account.  It does you no good
> to have a 10 GHz chip if your memory subsystem can't provide the data to
> feed the chip, and likewise, having 10 GB/sec throughput from your memory
> system does no good if you are running a 386!  If you're using extremely
> large tables (which it sounds like you will be), then memory throughput is
> a very critical factor.
>
>   As for the 8-Xeon to 2-P4, in total throughput, yes, the Xeon should
> blow away the P4, assuming that you were talking about overall throughput
> with multiple connections, and the 8-way machine had a memory subsystem
> that was up to the task.  If you're talking about a single query on a
> single connection, then the P4 may very well beat the Xeon.  Ideally, your
> SMP machine should require you to add DIMMS in groups equal to the number
> of your processors - in this case, 8 DIMMS at a time.  Otherwise, you
> simply can't give each chip the full bandwidth.  One interesting solution
> would be to interleave 4 banks of DDR Ram to feed 8 Xeons, but I doubt
> we'll ever see that in production.
>
> > Disk
> > speed?  I want to be able to do large volume selects on tables with
> > more than 5 million rows and not have the server blink at other
> > requests put in at the same time.
>
>   If you have enough RAM to keep the entire database in disk cache, then
> disk speed becomes much less of a factor, if you turn of fsync().  Like
> I've said plenty of times before (and probably bored everyone to tears),
> the lights on our DB machine only blink *occasionally*, even when the
> machine is literally being slammed with database activity.  A hardware
> RAID card with cache on the board and some moderately-decent drives can
> give you a VERY fast, responsive disk subsystem.
>
>   In a day or two, I'll be getting the parts to build a dual Athlon MP
> 1800+ machine, and I'm planning on putting PG on it and testing it against
> our 4x700 MHz Xeon machine, to see how it fares.  Once I do, I'm planning
> on making a full report to the list.  If anyone has a certain PG benchmark
> that they'd like me to run, let me know.
>
> steve
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

--
-------------------------------------------------------------------------
Medi Montaseri                               medi@CyberShell.com
Unix Distributed Systems Engineer            HTTP://www.CyberShell.com
CyberShell Engineering
-------------------------------------------------------------------------




Re: Dream Server?

From
"Steve Wolfe"
Date:
----- Original Message -----

> Perhaps we should think wide not tall. As the pontiac commercial says,
wider
> is
> better. Build a distributed database. Increasing height of a box does
not
> scale.
> Amdahl proved it.
>
> Perhaps someone can help with some links, it have seen references to it
on
> ha-linux groups.

  That's something that I've dreamed about for some time.  My rack of
load-balanced web servers scales efficiently, easily, and cheaply.   I
need double the capacity?  I buy more machines, and plug them in.  If
there were a way of replicating PG data from one master to many slaves in
near-real-time, I could have a rack full of load-balanced database servers
right next to it - cheap, easy, and effective.

  Even though I've kept my mouth shut, I've wondered why more effort isn't
devoted to that.  There are a LOT of companies out there that fork money
over hand-over-fist trying to buy a single machine that can handle all of
their database usage, and as the size of the machine increases, the cost
per transaction seems to increase exponentially.  Sure, a million dollars
would get you an entry-level Starfire, with 16 processors, upgradeably to
64, with a couple of gigabytes/second throughput, or for a lot less money,
you could buy a number of smaller systems that, through copious amounts of
RAM, CPU cycles, and combined bandwidth, could spin circles around the
Starfire.

  Don't get the impression that I'm bad-mouthing the developpers - I'm
sure that they're taking care of priorities as best they can.

steve



Re: Dream Server?

From
Andrew Sullivan
Date:
On Wed, Feb 06, 2002 at 06:00:33PM -0700, Steve Wolfe wrote:

> need double the capacity?  I buy more machines, and plug them in.  If
> there were a way of replicating PG data from one master to many slaves in
> near-real-time, I could have a rack full of load-balanced database servers
> right next to it - cheap, easy, and effective.

If the idea here is to have only one master (where data goes in) but
many slaves (whence data comes when you read), then check out
eRserver.  PostgreSQL Inc's version is better than the one included
in contrib/.  We're using it to provide both redundancy and a
secondary source for reads, and it is performing without a problem.

A

--
----
Andrew Sullivan                               87 Mowat Avenue
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M6K 3E3
                                         +1 416 646 3304 x110


Re: Dream Server?

From
"Gregory Wood"
Date:
>   That's something that I've dreamed about for some time.  My rack of
> load-balanced web servers scales efficiently, easily, and cheaply.   I
> need double the capacity?  I buy more machines, and plug them in.  If
> there were a way of replicating PG data from one master to many slaves in
> near-real-time, I could have a rack full of load-balanced database servers
> right next to it - cheap, easy, and effective.
>
>   Even though I've kept my mouth shut, I've wondered why more effort isn't
> devoted to that.

I'll second this. Replication is my/our number one (actually, aside from
schemas, the only) needed new feature for PostgreSQL. If we could do updates
to a master server and push all those changes to a bunch of slaves, my job
would be incredibly easier. As it stands, we've had to restructure our data
model so that we can spread information across a number of different
machines.

>   Don't get the impression that I'm bad-mouthing the developpers - I'm
> sure that they're taking care of priorities as best they can.

Here here. PostgreSQL seems to have come a long way, even in the relatively
short time I've been using it. I can't heap enough praise on those who give
their time to improving it. But as a user, it's important to let the
developers know what we want and/or need.

Greg


Re: Dream Server?

From
Medi Montaseri
Date:
The two tier architecture proposed here, consisting of a master and slave
seems to be the quickest way to get there.  However writes handled by a
slave needs to be reported to the master for rebroadcast to all slaves. This
would create a wave in the system, or poor integrity in rapid writes
environment.

Of course an "application aware" scenario, where application connects to the
master for all writes could ease the situation. It effectively removes one hup
from
the data path. ie instead of client-slave-master-slaves, it does
client-master-slaves.
And for a do it yourselfer, one can have a trigger on a system table for
auto-updates.

What I was thinking is perhaps to modify the "Planner/Optimizer" so that this
layer would in addition to its current functionality, would assume the role of
a distributor, dispatcher, scheduler (or whatever other name), setting up a
"Modified Plan Tree" for  the Executor.  This "Plan Tree" would include a Node
information as well (a Remote Executor, think RPC). There would also need to
be an Intra-Node Advisory Communication Channel (perhaps more than one)
for load balancing and Lock management. I call this an Asymetric architecture.

Another approach would be to have the "Planner" only do the lock management
(in additon to its normal things), I call this "Almost Shared Nothing", and let
DNS
determine who is next.

Another approach would  be to have a RAID (Redundant Array of Inexpensive
Databases, I know  I'm overloading RAID) Virtual Database (think Virtual File
System),
where an abstraction layer would determine where to put the tables; on node-1
or
node-2 or both. Then either thru a discovery phase (client says, where do I
connect
for this query), or in a transparent mode, the server would pipe up with the
owner of
the data for the given query and simply facilitates the data movement.

The more I write the more I see how complicated this is....
Sorry about my random thoughts.....it reminds me of "Some write to say what
they
think and some write to think what they want to say...."

Aaaahhh.... I better get back to work.... day dreaming again...

Gregory Wood wrote:

> >   That's something that I've dreamed about for some time.  My rack of
> > load-balanced web servers scales efficiently, easily, and cheaply.   I
> > need double the capacity?  I buy more machines, and plug them in.  If
> > there were a way of replicating PG data from one master to many slaves in
> > near-real-time, I could have a rack full of load-balanced database servers
> > right next to it - cheap, easy, and effective.
> >
> >   Even though I've kept my mouth shut, I've wondered why more effort isn't
> > devoted to that.
>
> I'll second this. Replication is my/our number one (actually, aside from
> schemas, the only) needed new feature for PostgreSQL. If we could do updates
> to a master server and push all those changes to a bunch of slaves, my job
> would be incredibly easier. As it stands, we've had to restructure our data
> model so that we can spread information across a number of different
> machines.
>
> >   Don't get the impression that I'm bad-mouthing the developpers - I'm
> > sure that they're taking care of priorities as best they can.
>
> Here here. PostgreSQL seems to have come a long way, even in the relatively
> short time I've been using it. I can't heap enough praise on those who give
> their time to improving it. But as a user, it's important to let the
> developers know what we want and/or need.
>
> Greg
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org

--
-------------------------------------------------------------------------
Medi Montaseri                               medi@CyberShell.com
Unix Distributed Systems Engineer            HTTP://www.CyberShell.com
CyberShell Engineering
-------------------------------------------------------------------------