Thread: hardware performance and some more

hardware performance and some more

From

Kasim Oztoprak

Date:

24 July 2003, 09:43:56

hello,

some of my questions may not be related to this group however, I know that some
of them are directly related to this list.

first of all I would like to learn that, any of you use the postgresql within the
clustered environment? Or, let me ask you the question, in different manner,
can we use postgresql in a cluster environment? If we can do what is the support
method of the postgresql for clusters?

I would like to know two main clustering methods. (let us assume we use 2 machines
in the clustering system) in the first case we have two machines running in a cluster
however, the second one does not run the database server untill the observation of the
failure of the first machine, the oracle guys call this situation as active-passive
configuration. There is only one machine running the database server at the same time.
Hence, in the case of failure there are some time to be waited untill the second
machine comes up.

In the second option both machines run the database server at the same time. Again oracle
supports this method using some additional applications called Real Application Cluster (RAC).
Again oracle guys call this method as active-active configuration.

The questions for this explanation are:
1 - Can we use postgresql within clustered environment?
2 - if the answer is yes, in which method can we use postgresql within a cluster?
active - passive or active - active?

Now, the second question is related to the performance of the database. Assuming we have a
dell's poweredge 6650 with 4 x 2.8 Ghz Xeon processors having 2 MB of cache for each, with the
main memory of lets say 32 GB. We can either use a small SAN from EMC or we can put all disks
into the machines with the required raid confiuration.

We will install RedHat Advanced Server 2.1 to the machine as the operating system and postgresql as
the database server. We have a database having 25 millions records having the length of 250 bytes
on average for each record. And there are 1000 operators accessing the database concurrently. The main
operation on the database (about 95%) is select rather than insert, so do you have any idea about
the performance of the system?

best regards,

-kas�m

Re: hardware performance and some more

From

"Shridhar Daithankar"

Date:

24 July 2003, 10:52:07

On 24 Jul 2003 at 15:54, Kasim Oztoprak wrote:

> The questions for this explanation are:
>       1 - Can we use postgresql within clustered environment?
>       2 - if the answer is yes, in which method can we use postgresql within a cluster?
>       active - passive or active - active?

Coupled with linux-HA( See http://linux-ha.org) heartbeat service, it *should*
be possible to run postgresql in active-passive clustering.

If postgresql supported read-only database so that several nodes could read off
a single disk but only one could update that, a sort of active-active should be
possible as well. But postgresql can not have a read only database. That would
be a handy addition in such cases..

> Now, the second question is related to the performance of the database. Assuming we have a
> dell's poweredge 6650 with 4 x 2.8 Ghz Xeon processors having 2 MB of cache for each, with the
> main memory of lets say 32 GB. We can either use a small SAN from EMC or we can put all disks
> into the machines with the required raid confiuration.
>
> We will install RedHat Advanced Server 2.1 to the machine as the operating system and postgresql as
> the database server. We have a database having 25 millions records  having the length of 250 bytes
> on average for each record. And there are 1000 operators accessing the database concurrently. The main
> operation on the database (about 95%) is select rather than insert, so do you have any idea about
> the performance of the system?

Assumig 325 bytes per tuple(250 bytes field+24-28 byte header+varchar fields)
gives 25 tuples per 8K page, there would be 8GB of data. This configuration
could fly with 12-16GB of RAM. After all data is read that is. You can cut down
on other requirements as well. May be a 2x opteron with 16GB RAMmight be a
better fit but check out how much CPU cache it has.

A grep -rwn across data directory would fill the disk cache pretty well..:-)

HTH

Bye
 Shridhar

--
Egotism, n:    Doing the New York Times crossword puzzle with a pen.Egotist, n:    A
person of low taste, more interested in himself than me.        -- Ambrose Bierce,
"The Devil's Dictionary"

Re: hardware performance and some more

From

Kasim Oztoprak

Date:

24 July 2003, 12:22:02

On 24 Jul 2003 17:08 EEST you wrote:

> On 24 Jul 2003 at 15:54, Kasim Oztoprak wrote:
>
> > The questions for this explanation are:
> >       1 - Can we use postgresql within clustered environment?
> >       2 - if the answer is yes, in which method can we use postgresql within a cluster?
> >       active - passive or active - active?
>
> Coupled with linux-HA( See http://linux-ha.org) heartbeat service, it *should*
> be possible to run postgresql in active-passive clustering.
>
> If postgresql supported read-only database so that several nodes could read off
> a single disk but only one could update that, a sort of active-active should be
> possible as well. But postgresql can not have a read only database. That would
> be a handy addition in such cases..
>

so in the master and slave configuration we can use the system within clustering environment.

> > Now, the second question is related to the performance of the database. Assuming we have a
> > dell's poweredge 6650 with 4 x 2.8 Ghz Xeon processors having 2 MB of cache for each, with the
> > main memory of lets say 32 GB. We can either use a small SAN from EMC or we can put all disks
> > into the machines with the required raid confiuration.
> >
> > We will install RedHat Advanced Server 2.1 to the machine as the operating system and postgresql as
> > the database server. We have a database having 25 millions records  having the length of 250 bytes
> > on average for each record. And there are 1000 operators accessing the database concurrently. The main
> > operation on the database (about 95%) is select rather than insert, so do you have any idea about
> > the performance of the system?
>
> Assumig 325 bytes per tuple(250 bytes field 24-28 byte header varchar fields)
> gives 25 tuples per 8K page, there would be 8GB of data. This configuration
> could fly with 12-16GB of RAM. After all data is read that is. You can cut down
> on other requirements as well. May be a 2x opteron with 16GB RAMmight be a
> better fit but check out how much CPU cache it has.

we do not have memory problem or disk problems. as I have seen in the list the best way to
use disks are using raid 10 for data and raid 1 for os. we can put as much memory as
we require.

now the question, if we have 100 searches per second and in each search if we need 30 sql
instruction, what will be the performance of the system in the order of time. Let us say
we have two machines described aove in a cluster.



>
> A grep -rwn across data directory would fill the disk cache pretty well..:-)
>
> HTH
>
> Bye
>  Shridhar
>
> --
> Egotism, n:    Doing the New York Times crossword puzzle with a pen.Egotist, n:    A
> person of low taste, more interested in himself than me.        -- Ambrose Bierce,
> "The Devil's Dictionary"
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

Re: hardware performance and some more

From

"Roman Fail"

Date:

24 July 2003, 12:29:00

> Now, the second question is related to the performance of the database. Assuming we have a
> dell's poweredge 6650 with 4 x 2.8 Ghz Xeon processors having 2 MB of cache for each, with the
> main memory of lets say 32 GB. We can either use a small SAN from EMC or we can put all disks
> into the machines with the required raid confiuration.
>
> We will install RedHat Advanced Server 2.1 to the machine as the operating system and postgresql as
> the database server. We have a database having 25 millions records  having the length of 250 bytes
> on average for each record. And there are 1000 operators accessing the database concurrently. The main
> operation on the database (about 95%) is select rather than insert, so do you have any idea about
> the performance of the system?

I have a very similar installation: Dell PE6600 with dual 2.0 Xeons/2MB cache, 4 GB memory, 6-disk RAID-10 for data,
2-diskRAID-1 for RH Linux 8.  My database has over 60 million records averaging  200 bytes per tuple.  I have a large
nightlydata load, then very complex multi-table join queries all day with a few INSERT transactions.  While I do not
have1000 concurrent users (more like 30 for me), my processors and disks seem to be idle the vast majority of the time
-this machine is overkill.  So I think you will have no problem with your hardware, and could probably easily get away
withonly two processors.  Someday, if you can determine with certainty that the CPU is a bottleneck, drop in the 3rd
and4th processors (and $10,000).   And save yourself money on the RAM as well - it's incredibly easy to put in more if
youneed it.  If you really want to spend money, set up the fastest disk arrays you can imagine.
 
 
I cannot emphasize enough: allocate a big chunk of time for tuning your database and learning from this list.  I
migratedfrom Microsoft SQL Server.  Out of the box PostgreSQL was horrible for me, and even after significant tuning it
crawledon certain queries (compared to MSSQL).  The list helped me find a data type mismatch in a JOIN clause, and
sincethen the performance of PostgreSQL has blown the doors off of MSSQL.  Since I only gave myself a couple days to do
tuningbefore the db had to go in production, I almost had to abandon PostgreSQL and revert to MS.  My problems were
solvedin the nick of time, but I really wish I had made more time for tuning.  
 
 
Running strong in production for 7 months now with PostgreSQL 7.3, and eagerly awaiting 7.4!
 
Roman Fail
POS Portal, Inc.

Re: hardware performance and some more

From

Kasim Oztoprak

Date:

24 July 2003, 12:51:56

On 24 Jul 2003 18:44 EEST you wrote:

> > Now, the second question is related to the performance of the database. Assuming we have a
> > dell's poweredge 6650 with 4 x 2.8 Ghz Xeon processors having 2 MB of cache for each, with the
> > main memory of lets say 32 GB. We can either use a small SAN from EMC or we can put all disks
> > into the machines with the required raid confiuration.
> >
> > We will install RedHat Advanced Server 2.1 to the machine as the operating system and postgresql as
> > the database server. We have a database having 25 millions records  having the length of 250 bytes
> > on average for each record. And there are 1000 operators accessing the database concurrently. The main
> > operation on the database (about 95%) is select rather than insert, so do you have any idea about
> > the performance of the system?
>
> I have a very similar installation: Dell PE6600 with dual 2.0 Xeons/2MB cache, 4 GB memory, 6-disk RAID-10 for data,
2-diskRAID-1 for RH Linux 8.  My database has over 60 million records averaging  200 bytes per tuple.  I have a large
nightlydata load, then very complex multi-table join queries all day with a few INSERT transactions.  While I do not
have1000 concurrent users (more like 30 for me), my processors and disks seem to be idle the vast majority of the time
-this machine is overkill.  So I think you will have no problem with your hardware, and could probably easily get away
withonly two processors.  Someday, if you can determine with certainty that the CPU is a bottleneck, drop in the 3rd
and4th processors (and $10,000).   And save yourself money on the RAM as well - it's incredibly easy to put in more if
youneed it.  If you really want to spend money, set up the fastest disk arrays you can imagine. 
>

i have some time for the production, therefore, i can wait for the beta and production of version 7.4.
as i have seeen from your comments, you have 30 clients reaching to the database. assuming the maximum number
of search for each client is 5 then, search per second will be atmost 3. in my case, there will be around
100 search per second. so the main bothleneck comes from there.

and finally, the rate for the insert operation is about %0.1  (1 in every thousand). I've started to learn
about my limitations a few days ago, i would like to learn whether i can solve my problem with postgresql
or not.

> I cannot emphasize enough: allocate a big chunk of time for tuning your database and learning from this list.  I
migratedfrom Microsoft SQL Server.  Out of the box PostgreSQL was horrible for me, and even after significant tuning it
crawledon certain queries (compared to MSSQL).  The list helped me find a data type mismatch in a JOIN clause, and
sincethen the performance of PostgreSQL has blown the doors off of MSSQL.  Since I only gave myself a couple days to do
tuningbefore the db had to go in production, I almost had to abandon PostgreSQL and revert to MS.  My problems were
solvedin the nick of time, but I really wish I had made more time for tuning.   
>
> Running strong in production for 7 months now with PostgreSQL 7.3, and eagerly awaiting 7.4!
>
> Roman Fail
> POS Portal, Inc.
>
>
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if your
>       joining column's datatypes do not match

Re: hardware performance and some more

From

William Yu

Date:

24 July 2003, 13:43:01

| first of all I would like to learn that, any of you use the postgresql
| within the clustered environment? Or, let me ask you the question, in
| different manner, can we use postgresql in a cluster environment? If
| we can do what is the support method of the postgresql for clusters?

You could do active-active but it would require work on your end. I did
a recent check on all the Postgres replication packages and they all
seem to be single master -> single/many slaves. Updating on more than 1
server looks to be problematic. I run an active-active now but I had to
develop my own custom replication strategy.

As a background, we develop & host web-based apps that use Postgres as
the DB engine. Since our clients access our server over the internet,
uptime is a big issue. Hence, we have two server farms: one colocated in
San Francisco and the other in Sterling, VA. In addition to redudancy,
we also wanted to spread the load across the servers. To do this, we
went with the expedient method of 1-minute DNS zonemaps where if both
servers are up, 70% traffic is sent to the faster farm and 30% to the
other. Both servers are constantly monitored and if one goes down, a new
zonemap is pushed out listing only the servers that are up.

The first step in making this work was converting all integer keys to
character keys. By making keys into characters, we could prepend a
server location code so ID 100 generated at SF would not conflict with
ID 100 generated in Sterling. Instead, they would be marked as S00000100
and V00000100. Another benefit is the increase of possible key
combinations by being able to use alpha characters. (36^(n-1) versus 10^n)

At this time, the method we use is a periodic sweep of all updated
records. In every table, we add extra fields to mark the date/time the
record was last inserted/updated/deleted. All records touched as of the
last resync are extracted, zipped up, pgp-encrypted and then posted on
an ftp server. Files are then transfered between servers, records
unpacked and inserted/updated. Some checks are needed to determine what
takes precedence if users updated the same record on both servers but
otherwise it's a straightforward process.

As far as I can tell, the performance impact seems to be minimal.
There's a periodic storm of replication updates in cases where there's
mass updates sync last resync. But if you have mostly reads and few
writes, you shouldn't see this situation. The biggest performance impact
seems to be the CPU power needed to zip/unzip/encrypt/decrypt files.

I'm thinking over strats to get more "real-time" replication working. I
suppose I could just make the resync program run more often but that's a
bit inelegant. Perhaps I could capture every update/delete/insert/alter
statement from the postgres logs, parsing them out to commands and then
zipping/encrypting every command as a separate item to be processed. Or
add triggers to every table where updated records are pushed to a custom
"updated log".

The biggest problem is of course locks -- especially at the application
level. I'm still thinking over what to do here.

Re: hardware performance and some more

From

Ron Johnson

Date:

24 July 2003, 17:09:31

On Thu, 2003-07-24 at 13:25, Kasim Oztoprak wrote:
> On 24 Jul 2003 17:08 EEST you wrote:
>
> > On 24 Jul 2003 at 15:54, Kasim Oztoprak wrote:
[snip]
>
> we do not have memory problem or disk problems. as I have seen in the list the best way to
> use disks are using raid 10 for data and raid 1 for os. we can put as much memory as
> we require.
>
> now the question, if we have 100 searches per second and in each search if we need 30 sql
> instruction, what will be the performance of the system in the order of time. Let us say
> we have two machines described aove in a cluster.

That's 3000 sql statements per second, 180 thousand per minute!!!!
What the heck is this database doing!!!!!

A quad-CPU Opteron sure is looking useful right about now...  Or
an quad-CPU AlphaServer ES45 running Linux, if 4x Opterons aren't
available.

How complicated are each of these SELECT statements?

--
+-----------------------------------------------------------------+
| Ron Johnson, Jr.        Home: ron.l.johnson@cox.net             |
| Jefferson, LA  USA                                              |
|                                                                 |
| "I'm not a vegetarian because I love animals, I'm a vegetarian  |
|  because I hate vegetables!"                                    |
|    unknown                                                      |
+-----------------------------------------------------------------+

Re: hardware performance and some more

From

Kasim Oztoprak

Date:

25 July 2003, 10:34:43

On 24 Jul 2003 23:25 EEST you wrote:

> On Thu, 2003-07-24 at 13:25, Kasim Oztoprak wrote:
> > On 24 Jul 2003 17:08 EEST you wrote:
> >
> > > On 24 Jul 2003 at 15:54, Kasim Oztoprak wrote:
> [snip]
> >
> > we do not have memory problem or disk problems. as I have seen in the list the best way to
> > use disks are using raid 10 for data and raid 1 for os. we can put as much memory as
> > we require.
> >
> > now the question, if we have 100 searches per second and in each search if we need 30 sql
> > instruction, what will be the performance of the system in the order of time. Let us say
> > we have two machines described aove in a cluster.
>
> That's 3000 sql statements per second, 180 thousand per minute!!!!
> What the heck is this database doing!!!!!
>
> A quad-CPU Opteron sure is looking useful right about now...  Or
> an quad-CPU AlphaServer ES45 running Linux, if 4x Opterons aren't
> available.
>
> How complicated are each of these SELECT statements?

this is kind of directory assistance application. actually the select statements are not
very complex. the database contain 25 million subscriber records and the operators searches
for the subscriber numbers or addresses. there are not much update operations actually the
update ratio is approximately %0.1 .

i will use at least 4 machines each having 4 cpu with the speed of 2.8 ghz xeon processors.
and suitable memory capacity with it.

i hope it will overcome with this problem. any similar implementation?

>
> --
>  -----------------------------------------------------------------
> | Ron Johnson, Jr.        Home: ron.l.johnson@cox.net             |
> | Jefferson, LA  USA                                              |
> |                                                                 |
> | "I'm not a vegetarian because I love animals, I'm a vegetarian  |
> |  because I hate vegetables!"                                    |
> |    unknown                                                      |
>  -----------------------------------------------------------------
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

Re: hardware performance and some more

From

"Shridhar Daithankar"

Date:

25 July 2003, 10:57:25

On 25 Jul 2003 at 16:38, Kasim Oztoprak wrote:
> this is kind of directory assistance application. actually the select statements are not
> very complex. the database contain 25 million subscriber records and the operators searches
> for the subscriber numbers or addresses. there are not much update operations actually the
> update ratio is approximately %0.1 .
>
> i will use at least 4 machines each having 4 cpu with the speed of 2.8 ghz xeon processors.
> and suitable memory capacity with it.

Are you going to duplicate the data?

If you are going to have 3000 sql statements per second, I would suggest,

1. Get quad CPU. You probably need that horsepower
2. Use prepared statements and stored procedures to avoid parsing overhead.

I doubt you would need cluster of machines though. If you run it thr. a pilot
program, that would give you an idea whether or not you need a cluster..

Bye
 Shridhar

--
Default, n.:    The hardware's, of course.

Re: hardware performance and some more

From

"Shridhar Daithankar"

Date:

25 July 2003, 11:01:40

On 24 Jul 2003 at 9:42, William Yu wrote:

> As far as I can tell, the performance impact seems to be minimal.
> There's a periodic storm of replication updates in cases where there's
> mass updates sync last resync. But if you have mostly reads and few
> writes, you shouldn't see this situation. The biggest performance impact
> seems to be the CPU power needed to zip/unzip/encrypt/decrypt files.

Can you use WAL based replication? I don't have a URL handy but there are
replication projects which transmit WAL files to another server when they fill
in.

OTOH, I was thinking of a simple replication theme. If postgresql provides a
hook where it calls an external library routine for each heapinsert in WAL,
there could be a simple multi-slave replication system. One doesn't have to
wait till WAL file fills up.

Of course, it's upto the library to make sure that it does not hold postgresql
commits for too long that would hamper the performance.

Also there would need a receiving hook which would directly heapinsert the data
on another node.

But if the external library is threaded, will that work well with postgresql?

Just a thought. If it works, load-balancing could be lot easy and near-
realtime..

Bye
 Shridhar

--
We fight only when there is no other choice.  We prefer the ways ofpeaceful contact.        -- Kirk, "Spectre of the
Gun",stardate 4385.3

Re: hardware performance and some more

From

Ron Johnson

Date:

25 July 2003, 11:19:58

On Fri, 2003-07-25 at 11:38, Kasim Oztoprak wrote:
> On 24 Jul 2003 23:25 EEST you wrote:
>
> > On Thu, 2003-07-24 at 13:25, Kasim Oztoprak wrote:
> > > On 24 Jul 2003 17:08 EEST you wrote:
> > >
> > > > On 24 Jul 2003 at 15:54, Kasim Oztoprak wrote:
> > [snip]
> > >
> > > we do not have memory problem or disk problems. as I have seen in the list the best way to
> > > use disks are using raid 10 for data and raid 1 for os. we can put as much memory as
> > > we require.
> > >
> > > now the question, if we have 100 searches per second and in each search if we need 30 sql
> > > instruction, what will be the performance of the system in the order of time. Let us say
> > > we have two machines described aove in a cluster.
> >
> > That's 3000 sql statements per second, 180 thousand per minute!!!!
> > What the heck is this database doing!!!!!
> >
> > A quad-CPU Opteron sure is looking useful right about now...  Or
> > an quad-CPU AlphaServer ES45 running Linux, if 4x Opterons aren't
> > available.
> >
> > How complicated are each of these SELECT statements?
>
> this is kind of directory assistance application. actually the select statements are not
> very complex. the database contain 25 million subscriber records and the operators searches
> for the subscriber numbers or addresses. there are not much update operations actually the
> update ratio is approximately %0.1 .
>
> i will use at least 4 machines each having 4 cpu with the speed of 2.8 ghz xeon processors.
> and suitable memory capacity with it.
>
> i hope it will overcome with this problem. any similar implementation?

Since PG doesn't have active-active clustering, that's out, but since
the database will be very static, why not have, say 8 machines, each
with it's own copy of the database?  (Since there are so few updates,
you feed the updates to a litle Perl app that then makes the changes
on each machine.)  (A round-robin load balancer would do the trick
in utilizing them all.)

Also, with lots of machines, you could get away with less expensive
machines, say 2GHz CPU, 1GB RAM and a 40GB IDE drive.  Then, if one
goes down for some reason, you've only lost a small portion of your
capacity, and replacing a part will be very inexpensive.

And if volume increases, just add more USD1000 machines...

--
+-----------------------------------------------------------------+
| Ron Johnson, Jr.        Home: ron.l.johnson@cox.net             |
| Jefferson, LA  USA                                              |
|                                                                 |
| "I'm not a vegetarian because I love animals, I'm a vegetarian  |
|  because I hate vegetables!"                                    |
|    unknown                                                      |
+-----------------------------------------------------------------+

Re: hardware performance and some more

From

Kasim Oztoprak

Date:

25 July 2003, 12:27:52

On 25 Jul 2003 17:13 EEST you wrote:

> On 25 Jul 2003 at 16:38, Kasim Oztoprak wrote:
> > this is kind of directory assistance application. actually the select statements are not
> > very complex. the database contain 25 million subscriber records and the operators searches
> > for the subscriber numbers or addresses. there are not much update operations actually the
> > update ratio is approximately %0.1 .
> >
> > i will use at least 4 machines each having 4 cpu with the speed of 2.8 ghz xeon processors.
> > and suitable memory capacity with it.
>
> Are you going to duplicate the data?
>
> If you are going to have 3000 sql statements per second, I would suggest,
>
> 1. Get quad CPU. You probably need that horsepower
> 2. Use prepared statements and stored procedures to avoid parsing overhead.
>
> I doubt you would need cluster of machines though. If you run it thr. a pilot
> program, that would give you an idea whether or not you need a cluster..
>
> Bye
>  Shridhar
>

i will try to cluster them. i can duplicate the data if i need. in the case of
update, then, i will fix them through.

what exactly do you mean from a pilot program?

-kas�m
> --
> Default, n.:    The hardware's, of course.
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if your
>       joining column's datatypes do not match

Re: hardware performance and some more

From

"Shridhar Daithankar"

Date:

25 July 2003, 12:30:03

On 25 Jul 2003 at 18:41, Kasim Oztoprak wrote:
> what exactly do you mean from a pilot program?

Like get a quad CPU box, load the data and ask only 10 operators to test the
system..

Beta testing basically..

Bye
 Shridhar

--
The man on tops walks a lonely street; the "chain" of command is often a noose.

Re: hardware performance and some more

From

Josh Berkus

Date:

25 July 2003, 13:14:50

Folks,

> Since PG doesn't have active-active clustering, that's out, but since
> the database will be very static, why not have, say 8 machines, each
> with it's own copy of the database?  (Since there are so few updates,
> you feed the updates to a litle Perl app that then makes the changes
> on each machine.)  (A round-robin load balancer would do the trick
> in utilizing them all.)

Another approach I've seen work is to have several servers connect to one SAN
or NAS where the data lives.  Only one server is enabled to handle "write"
requests; all the rest are read-only.  This does mean having dispacting
middleware that parcels out requests among the servers, but works very well
for the java-based company that's using it.

--
Josh Berkus
Aglio Database Solutions
San Francisco

Re: hardware performance and some more

From

Ron Johnson

Date:

25 July 2003, 14:13:03

On Fri, 2003-07-25 at 11:13, Josh Berkus wrote:
> Folks,
>
> > Since PG doesn't have active-active clustering, that's out, but since
> > the database will be very static, why not have, say 8 machines, each
> > with it's own copy of the database?  (Since there are so few updates,
> > you feed the updates to a litle Perl app that then makes the changes
> > on each machine.)  (A round-robin load balancer would do the trick
> > in utilizing them all.)
>
> Another approach I've seen work is to have several servers connect to one SAN
> or NAS where the data lives.  Only one server is enabled to handle "write"
> requests; all the rest are read-only.  This does mean having dispacting
> middleware that parcels out requests among the servers, but works very well
> for the java-based company that's using it.

Wouldn't the cache on the read-only databases get out of sync with
the true on-disk data?

--
+-----------------------------------------------------------------+
| Ron Johnson, Jr.        Home: ron.l.johnson@cox.net             |
| Jefferson, LA  USA                                              |
|                                                                 |
| "I'm not a vegetarian because I love animals, I'm a vegetarian  |
|  because I hate vegetables!"                                    |
|    unknown                                                      |
+-----------------------------------------------------------------+