Thread: scaling a database
This is a fairly standard DBA-type question but I can't seem to find the answer in any of the Postgres mailing list archives or any other DBA-centric site out there.
We are growing a database on PostgreSQL/Linux fairly quickly and are concerned about reaching resource limits. More specifically, we currently have about 20,000 "users" in the system and one table contains an average of 50 rows per user (around 1 million rows). This table summarizes data for the user and contains 10 integer colums, 2 varying character columns up to 50, and 2 boolean columns. Fortunately, this data is read-only so the table is indexed on every relevant column. Our entire database is around 800 MB at this point.
What I am wondering is will we reach some kind of limit if we wanted to grow the database to accomodate 1 million or even 10 million users? Right now we have a dedicated dual processor server with 1 GB of RAM that PostgreSQL runs exclusively on. What should we expect to cause problems as the database size grows? What are the standard solutions? Would you generally throw more hardware at the problem? Would you partition the database into multiple instances each with a subset of the users and their data? Any information that someone can point me to would be greatly appreciated.
---- What I am wondering is will we reach some kind of limit if we wanted to grow the database to accomodate 1 million or even 10 million users? Right now we have a dedicated dual processor server with 1 GB of RAM that PostgreSQL runs exclusively on. What should we expect to cause problems as the database size grows? What are the standard solutions? Would you generally throw more hardware at the problem? Would you partition the database into multiple instances each with a subset of the users and their data? Any information that someone can point me to would be greatly appreciated. ------ Well, it all depends on your definitions. If you're expecting 1 million or 10 million *simultaneous* users, you're going to be hard-pressed to accomplish that with ANY DBMS, and you're going to have to throw fistfuls of hardware at it. If each simultaneous was responsible for one query per minute, you're still talking about 16,000 to 160,000 queries per second, which is quite a bit. If you're referring to a user as someone that has a username/password in the database, and could potentially execute a query, then you shouldn't have any problem with 10 million of them. PostgreSQL can handle very large databases and tables efficiently. In that case, the capacity of the server would depend on how many of those users would actually be logged in and issuing queries at any one time, and the nature of those queries. To give much more detailed advice, we would really need a more detailed description of your data and server usage. steve
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Good point. I should have elaborated on that. The users would not all be connecting concurrently. We store data for all of them but only 10 connect concurrently right now out of the 20,000 users in the system (obviously we hope this increases). This is a separate issue that we will address but right now I am interested in knowing if anyone has ever had to partition a database or a table as it grew because of the resource limits reached. I am basically looking for examples or case studies to learn from. I realize that our application will be unique and that a valid answer to my question is "it depends" but I am interested in hearing if there are other measures required besides increasing the processing power, memory, or disk space allocated to PostgreSQL. - ----- Original Message ----- From: "Andrew Sullivan" <andrew@libertyrms.info> To: "Martin Dillard" <martin@edusoftinc.com> Sent: Monday, February 25, 2002 2:17 PM Subject: Re: [GENERAL] scaling a database > On Mon, Feb 25, 2002 at 01:32:52PM -0800, Martin Dillard wrote: > > > What I am wondering is will we reach some kind of limit if we > > wanted to grow the database to accomodate 1 million or even 10 > > million users? > > You don't say what the concurrency of users is, nor how they are > connecting. If you tried 10 million users through psql, I'd be > amazed that it could work. But other applications are different. > > -- > ---- > Andrew Sullivan 87 Mowat Avenue > Liberty RMS Toronto, Ontario Canada > <andrew@libertyrms.info> M6K 3E3 > +1 416 646 3304 x110 -----BEGIN PGP SIGNATURE----- Version: PGPfreeware 6.5.8 for non-commercial use <http://www.pgp.com> iQA/AwUBPHq8KBiVliUcvJG9EQKDSQCdHVridI5NSeUJeWrGA+sP9UxLi7YAn3Dd jvp1wbd/OFyfqLqBi9yQb3Bk =dMck -----END PGP SIGNATURE-----
"Martin Dillard" <martin@edusoftinc.com> writes: > Good point. I should have elaborated on that. The users would not all > be connecting concurrently. We store data for all of them but only 10 > connect concurrently right now out of the 20,000 users in the system > (obviously we hope this increases). This is a separate issue that we > will address but right now I am interested in knowing if anyone has > ever had to partition a database or a table as it grew because of the > resource limits reached. > > I am basically looking for examples or case studies to learn from. I > realize that our application will be unique and that a valid answer > to my question is "it depends" but I am interested in hearing if > there are other measures required besides increasing the processing > power, memory, or disk space allocated to PostgreSQL. A list of *hard* limitations for PostgreSQL can be found here: http://www.us.postgresql.org/users-lounge/limitations.html Limitations of PostgreSQL Maximum size for a database unlimited (60GB databases exist) Maximum size for a table 64 TB on all operating systems Maximum size for a row unlimited in 7.1 and later Maximum size for a field 1GB in 7.1 and later Maximum number of rows in a table unlimited Maximum number of columns in a table 1600 Maximum number of indexes on a table unlimited Of course, these are not actually unlimited, but limited to available disk space and memory/swap space. Performance may suffer when these values get unusually large. As you can see, PostgreSQL doesn't have any built in restrictions that would preclude its use in even extremely large datasets. However, my guess is what you are really looking for is the degree to which "performance may suffer when these values get unusually large" is true. From my own experience I have found that PostgreSQL does just fine with tables that have millions of rows. This is especially true in cases like yours where the biggest of your tables are read only. My biggest problems to date have to do with vacuuming large tables (between 10 and 20 million rows). If you don't update the tables, on the other hand, there is no need to vacuum. Jason
On Mon, 25 Feb 2002, Martin Dillard wrote: > I am basically looking for examples or case studies to learn from. I > realize that our application will be unique and that a valid answer > to my question is "it depends" but I am interested in hearing if > there are other measures required besides increasing the processing > power, memory, or disk space allocated to PostgreSQL. Take a very good look at your DB design, OS settings, buffer settings. One thing we did was to split a couple of tables in two even though the data was totally unique. The key issue was that a small part of the data was used probably 80% of the time while the other data was probably 60% + in size, but used only 20% (pulling numbers out of a hat, but just to give you an idea). These splits seem to have helped. Didn't measure how much, but enough to be noticeable. Also make sure you have properly normalized your data. The name of the game is I/O. The more you can change your data/layouts to minimize I/O the better performance you will have. Our tables combined hold about 10million records. Our biggest table is probably in the 8 million record area.