Thread: scaling a database

scaling a database

From

"Martin Dillard"

Date:

25 February 2002, 16:34:56

This is a fairly standard DBA-type question but I can't seem to find the answer in any of the Postgres mailing list archives or any other DBA-centric site out there.

We are growing a database on PostgreSQL/Linux fairly quickly and are concerned about reaching resource limits. More specifically, we currently have about 20,000 "users" in the system and one table contains an average of 50 rows per user (around 1 million rows). This table summarizes data for the user and contains 10 integer colums, 2 varying character columns up to 50, and 2 boolean columns. Fortunately, this data is read-only so the table is indexed on every relevant column. Our entire database is around 800 MB at this point.

What I am wondering is will we reach some kind of limit if we wanted to grow the database to accomodate 1 million or even 10 million users? Right now we have a dedicated dual processor server with 1 GB of RAM that PostgreSQL runs exclusively on. What should we expect to cause problems as the database size grows? What are the standard solutions? Would you generally throw more hardware at the problem? Would you partition the database into multiple instances each with a subset of the users and their data? Any information that someone can point me to would be greatly appreciated.

__________________________
Martin Dillard
Senior Engineer
Edusoft
http://www.edusoftinc.com

Direct: (415) 341-8562
Main: (415) 863-6315
E-mail: martin@edusoftinc.com

Re: scaling a database

From

"Steve Wolfe"

Date:

25 February 2002, 17:30:35

----
What I am wondering is will we reach some kind of limit if we wanted to
grow the database to accomodate 1 million or even 10 million users? Right
now we have a dedicated dual processor server with 1 GB of RAM that
PostgreSQL runs exclusively on. What should we expect to cause problems as
the database size grows? What are the standard solutions? Would you
generally throw more hardware at the problem? Would you partition the
database into multiple instances each with a subset of the users and their
data? Any information that someone can point me to would be greatly
appreciated.
------

   Well, it all depends on your definitions.  If you're expecting 1
million or 10 million *simultaneous* users, you're going to be
hard-pressed to accomplish that with ANY DBMS, and you're going to have to
throw fistfuls of hardware at it.  If each simultaneous was responsible
for one query per minute, you're still talking about 16,000 to 160,000
queries per second, which is quite a bit.

    If you're referring to a user as someone that has a username/password
in the database, and could potentially execute a query, then you shouldn't
have any problem with 10 million of them.  PostgreSQL can handle very
large databases and tables efficiently.  In that case, the capacity of the
server would depend on how many of those users would actually be logged in
and issuing queries at any one time, and the nature of those queries.

     To give much more detailed advice, we would really need a more
detailed description of your data and server usage.

steve

Re: scaling a database

From

"Martin Dillard"

Date:

25 February 2002, 17:37:47

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Good point. I should have elaborated on that. The users would not all
be connecting concurrently. We store data for all of them but only 10
connect concurrently right now out of the 20,000 users in the system
(obviously we hope this increases). This is a separate issue that we
will address but right now I am interested in knowing if anyone has
ever had to partition a database or a table as it grew because of the
resource limits reached.

I am basically looking for examples or case studies to learn from. I
realize that our application will be unique and that a valid answer
to my question is "it depends" but I am interested in hearing if
there are other measures required besides increasing the processing
power, memory, or disk space allocated to PostgreSQL.

- ----- Original Message -----
From: "Andrew Sullivan" <andrew@libertyrms.info>
To: "Martin Dillard" <martin@edusoftinc.com>
Sent: Monday, February 25, 2002 2:17 PM
Subject: Re: [GENERAL] scaling a database

> On Mon, Feb 25, 2002 at 01:32:52PM -0800, Martin Dillard wrote:
>
> > What I am wondering is will we reach some kind of limit if we
> > wanted to grow the database to accomodate 1 million or even 10
> > million users?
>
> You don't say what the concurrency of users is, nor how they are
> connecting.  If you tried 10 million users through psql, I'd be
> amazed that it could work.  But other applications are different.
>
> --
> ----
> Andrew Sullivan                               87 Mowat Avenue
> Liberty RMS                           Toronto, Ontario Canada
> <andrew@libertyrms.info>                              M6K 3E3
>                                          +1 416 646 3304 x110

-----BEGIN PGP SIGNATURE-----
Version: PGPfreeware 6.5.8 for non-commercial use <http://www.pgp.com>

iQA/AwUBPHq8KBiVliUcvJG9EQKDSQCdHVridI5NSeUJeWrGA+sP9UxLi7YAn3Dd
jvp1wbd/OFyfqLqBi9yQb3Bk
=dMck
-----END PGP SIGNATURE-----

Re: scaling a database

From

Jason Earl

Date:

25 February 2002, 18:59:58

"Martin Dillard" <martin@edusoftinc.com> writes:

> Good point. I should have elaborated on that. The users would not all
> be connecting concurrently. We store data for all of them but only 10
> connect concurrently right now out of the 20,000 users in the system
> (obviously we hope this increases). This is a separate issue that we
> will address but right now I am interested in knowing if anyone has
> ever had to partition a database or a table as it grew because of the
> resource limits reached.
>
> I am basically looking for examples or case studies to learn from. I
> realize that our application will be unique and that a valid answer
> to my question is "it depends" but I am interested in hearing if
> there are other measures required besides increasing the processing
> power, memory, or disk space allocated to PostgreSQL.

A list of *hard* limitations for PostgreSQL can be found here:

http://www.us.postgresql.org/users-lounge/limitations.html

        Limitations of PostgreSQL
        Maximum size for a database unlimited (60GB databases exist)
        Maximum size for a table 64 TB on all operating systems
        Maximum size for a row unlimited in 7.1 and later
        Maximum size for a field 1GB in 7.1 and later
        Maximum number of rows in a table unlimited
        Maximum number of columns in a table 1600
        Maximum number of indexes on a table unlimited

        Of course, these are not actually unlimited, but limited to
        available disk space and memory/swap space. Performance may
        suffer when these values get unusually large.

As you can see, PostgreSQL doesn't have any built in restrictions that
would preclude its use in even extremely large datasets.  However, my
guess is what you are really looking for is the degree to which
"performance may suffer when these values get unusually large" is
true.

From my own experience I have found that PostgreSQL does just fine
with tables that have millions of rows.  This is especially true in
cases like yours where the biggest of your tables are read only.  My
biggest problems to date have to do with vacuuming large tables
(between 10 and 20 million rows).  If you don't update the tables, on
the other hand, there is no need to vacuum.

Jason

Re: scaling a database

From

Francisco Reyes

Date:

08 March 2002, 12:43:12

On Mon, 25 Feb 2002, Martin Dillard wrote:

> I am basically looking for examples or case studies to learn from. I
> realize that our application will be unique and that a valid answer
> to my question is "it depends" but I am interested in hearing if
> there are other measures required besides increasing the processing
> power, memory, or disk space allocated to PostgreSQL.

Take a very good look at your DB design, OS settings, buffer settings.
One thing we did was to split a couple of tables in two even though the
data was totally unique. The key issue was that a small part of the data
was used probably 80% of the time while the other data was probably 60% +
in size, but used only 20% (pulling numbers out of a hat, but just to give
you an idea).

These splits seem to have helped. Didn't measure how much, but enough to
be noticeable.

Also make sure you have properly normalized your data. The name of the
game is I/O. The more you can change your data/layouts to minimize I/O the
better performance you will have.

Our tables combined hold about 10million records. Our biggest table is
probably in the 8 million record area.