Thread: Scalability

Scalability

From

Jan Ploski

Date:

14 March 2001, 08:54:17

Hello,

When large databases, complex queries and frequent access are combined,
the database backend may become a performance bottleneck. This is quite
obvious.

Do I assume correctly that CPU will be the earliest limiting factor
for a database (if we have a reasonable amount of memory in the box)?

If yes, one could deal with it by switching to a more powerful machine,
OR one could distribute the database between several equivalent machines,
as I imagine. The latter makes sense if data can be partitioned so that
consistency and dependencies can be worked around, that is, I can move
two subsets of data (say A and B) to two different boxes if rows in these
subsets are very unlikely to be combined in a single query. Then I can have
my application code access the correct database instance based on some
simple rule, for example, users with IDs 0-1000000 have their messages
stored on box A and all others on box B. Or: articles for newsgroups
with hash code == 0 -> box A, others -> box B. Do we get linear scalability
that way? I guess so. Does it sound cool? Yes. But does it cost less than
upgrading a single server? I'm not sure.

Is anyone out there using such a load balancing scheme with PostgreSQL?
Are there any foreseeable problems which would make it impractical?

Thanks -
JPL

Re: Scalability

From

"Gordan Bobic"

Date:

14 March 2001, 09:23:48

(Frequent Access)
If you just have lots of queries in parallel, try replication, and
pick a random server for each connection.

(Complex Queries)
If you absolutely, positively need one query to be executed across all
nodes in the cluster because one machine would just take too long no
matter how big, then you may have to shell out for Oracle...

(Splitting the Data)
Alternatively, you could put separate tables on separate machines, or
wait until the "schema" gets advanced enough to let you do full
cross-database SQL queries... No idea how far off this is in
PostgreSQL...

Personally, I find that CPU is rarely the bottleneck. It is usually
either lack of memory or the speed of the disks...

----- Original Message -----
From: "Jan Ploski" <jpljpl@gmx.de>
To: <pgsql-general@postgresql.org>
Sent: Wednesday, March 14, 2001 1:54 PM
Subject: [GENERAL] Scalability


> Hello,
>
> When large databases, complex queries and frequent access are
combined,
> the database backend may become a performance bottleneck. This is
quite
> obvious.
>
> Do I assume correctly that CPU will be the earliest limiting factor
> for a database (if we have a reasonable amount of memory in the
box)?
>
> If yes, one could deal with it by switching to a more powerful
machine,
> OR one could distribute the database between several equivalent
machines,
> as I imagine. The latter makes sense if data can be partitioned so
that
> consistency and dependencies can be worked around, that is, I can
move
> two subsets of data (say A and B) to two different boxes if rows in
these
> subsets are very unlikely to be combined in a single query. Then I
can have
> my application code access the correct database instance based on
some
> simple rule, for example, users with IDs 0-1000000 have their
messages
> stored on box A and all others on box B. Or: articles for newsgroups
> with hash code == 0 -> box A, others -> box B. Do we get linear
scalability
> that way? I guess so. Does it sound cool? Yes. But does it cost less
than
> upgrading a single server? I'm not sure.
>
> Is anyone out there using such a load balancing scheme with
PostgreSQL?
> Are there any foreseeable problems which would make it impractical?
>
> Thanks -
> JPL
>
> ---------------------------(end of
broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to
majordomo@postgresql.org)
>
>

Re: Scalability

From

Tom Lane

Date:

14 March 2001, 11:10:57

Jan Ploski <jpljpl@gmx.de> writes:
> Do I assume correctly that CPU will be the earliest limiting factor
> for a database (if we have a reasonable amount of memory in the box)?

Depends on your query mix.  If you have a lot of updates then I'd
recommend spending plenty of money on fast disk drives and good disk
controller boards.  If it's mostly SELECTs then you don't need such fast
drives, as long as you have enough RAM to cache a good chunk of the
active database.  In that case the CPU probably would be the limiting
factor.

> Is anyone out there using such a load balancing scheme with PostgreSQL?

Hardware is so cheap these days that I'd think it makes more sense just
to throw more hardware at the problem... I think client-driven load
balancing will be a pain in the neck to maintain.  (There has been some
talk of actual server support for load balancing, but it's still a long
way off.)

            regards, tom lane

Re: Scalability

From

"Gordon A. Runkle"

Date:

14 March 2001, 21:15:51

In article <006001c0ac91$4ebe3ea0$800010ac@localdomain>, "Gordan Bobic"
<gordan@freeuk.com> wrote:

> (Complex Queries)
> If you absolutely, positively need one query to be executed across all
> nodes in the cluster because one machine would just take too long no
> matter how big, then you may have to shell out for Oracle...  (Splitting

or DB2 EEE (Enterprise Extended Edition).  It's cheaper
than Oracle, too.

Gordon.
--
It doesn't get any easier, you just go faster.
   -- Greg LeMond