Thread: Experience with many schemas vs many databases
Hello everyone,
I have hit a wall on completing a solution I am working on. Originally, the app used a db per user (on MyIsam)....the solution did not fair so well in reliability and performance. I have been increasingly interested in Postgres lately.
Currently, I have about 30-35k users/databases. The general table layout is the same....only the data is different. I don't need to share data across databases. Very similar to a multi-tenant design.
Here are a few questions I have:
1. Could postgres support this many DBs? Are there any weird things that happen when the postgres is used this way?
2. Is the schema method better? Performance, maintainability, backups, vacuum? Weird issues?
Any incite is greatly appreciated.
Thanks.
Frank
undisclosed user wrote: > Hello everyone, > > I have hit a wall on completing a solution I am working on. > Originally, the app used a db per user (on MyIsam)....the solution did > not fair so well in reliability and performance. I have been > increasingly interested in Postgres lately. > > Currently, I have about 30-35k users/databases. The general table > layout is the same....only the data is different. I don't need to > share data across databases. Very similar to a multi-tenant design. 35000 users with separate databases isn't going to scale well on ANY conventional system I'm familiar with
Frank,
I had the same questioning a while ago and another thing that made me think was the amount of data per user.
In the end, I decided on using a single DB and single schema and add a clause to split everything by each customer (customer_id).
I then added an index on that column and my code became simpler and fast enough.
This also allowed me to work with some other aggregates that provided very useful "global" statistics.
--
Jorge Godoy <jgodoy@gmail.com>
I had the same questioning a while ago and another thing that made me think was the amount of data per user.
In the end, I decided on using a single DB and single schema and add a clause to split everything by each customer (customer_id).
I then added an index on that column and my code became simpler and fast enough.
This also allowed me to work with some other aggregates that provided very useful "global" statistics.
--
Jorge Godoy <jgodoy@gmail.com>
On Sun, Nov 15, 2009 at 04:28, undisclosed user <lovetodrinkpepsi@gmail.com> wrote:
Hello everyone,I have hit a wall on completing a solution I am working on. Originally, the app used a db per user (on MyIsam)....the solution did not fair so well in reliability and performance. I have been increasingly interested in Postgres lately.Currently, I have about 30-35k users/databases. The general table layout is the same....only the data is different. I don't need to share data across databases. Very similar to a multi-tenant design.Here are a few questions I have:1. Could postgres support this many DBs? Are there any weird things that happen when the postgres is used this way?2. Is the schema method better? Performance, maintainability, backups, vacuum? Weird issues?Any incite is greatly appreciated.Thanks.Frank
undisclosed user wrote: > I have hit a wall on completing a solution I am working on. Originally, > the app used a db per user (on MyIsam)....the solution did not fair so > well in reliability and performance. I have been increasingly interested > in Postgres lately. > > Currently, I have about 30-35k users/databases. The general table layout > is the same....only the data is different. I don't need to share data > across databases. Very similar to a multi-tenant design. > > Here are a few questions I have: > > 1. Could postgres support this many DBs? Are there any weird things that > happen when the postgres is used this way? As John indicated, not any traditional environment that will handle that well.. > 2. Is the schema method better? Performance, maintainability, backups, > vacuum? Weird issues? I would rather use schemas to logically group tables together. Insert a user_id column in the tables and ensure each user can only see the rows he has access to via query design to limit user access. Something in the line of: CREATE OR REPLACE VIEW SomeTableQuery AS SELECT * FROM SomeTable WHERE user_id = current_user; Where SomeTable has a column user_id that defaults to current_user. Johan Nel Pretoria, South Africa.
undisclosed user <lovetodrinkpepsi@gmail.com> writes: > I have hit a wall on completing a solution I am working on. Originally, the > app used a db per user (on MyIsam)....the solution did not fair so well in > reliability and performance. I have been increasingly interested in Postgres > lately. > Currently, I have about 30-35k users/databases. The general table layout is > the same....only the data is different. I don't need to share data across > databases. Very similar to a multi-tenant design. Use multiple schemas, not multiple databases. If you had it working in mysql then what you were using was more nearly schemas than databases anyway --- it's unfortunate that the two systems use the same word "database" for what are really different structures. regards, tom lane
On Sun, Nov 15, 2009 at 1:28 AM, undisclosed user <lovetodrinkpepsi@gmail.com> wrote: > Hello everyone, > I have hit a wall on completing a solution I am working on. Originally, the > app used a db per user (on MyIsam)....the solution did not fair so well in > reliability and performance. I have been increasingly interested in Postgres > lately. > Currently, I have about 30-35k users/databases. The general table layout is > the same....only the data is different. I don't need to share data across > databases. Very similar to a multi-tenant design. > Here are a few questions I have: > 1. Could postgres support this many DBs? Are there any weird things that > happen when the postgres is used this way? > 2. Is the schema method better? Performance, maintainability, backups, > vacuum? Weird issues? Use schema. Here's a pro tip: if you have any sql or pl/pgsql functions you can use the same function body across all the schema as long as you discard the plans when you want to move from schema to schema. I'm curious if those suggesting there is a practical upper limit of the number of schema postgres can handle have any hard information to back that up... merlin
On Sun, Nov 15, 2009 at 11:54 AM, Merlin Moncure <mmoncure@gmail.com> wrote: > > Use schema. Here's a pro tip: if you have any sql or pl/pgsql > functions you can use the same function body across all the schema as > long as you discard the plans when you want to move from schema to > schema. I too vote for schemas. > I'm curious if those suggesting there is a practical upper limit of > the number of schema postgres can handle have any hard information to > back that up... The real limit is performance of certain things over the catalogs, not the number of schemas, but how many objects are in the db seem to impact me more, and that's only with slony. Everything else runs fine with ~40k objects in my db.
undisclosed user wrote: > Currently, I have about 30-35k users/databases. The general table > layout is the same....only the data is different. I don't need to > share data across databases. Very similar to a multi-tenant design. Do these users make their own arbitrary SQL queries? Or is all the coding canned, and they are simply running applications? in the latter case, I would definitely suggest using a single database and schema, and one set of tables and having CustomerID be a field that you index in these tables.
If I were to switch to a single DB/single schema format shared among all users , how can I backup each user individually?
Frank
On Sat, Nov 14, 2009 at 10:28 PM, undisclosed user <lovetodrinkpepsi@gmail.com> wrote:
Hello everyone,I have hit a wall on completing a solution I am working on. Originally, the app used a db per user (on MyIsam)....the solution did not fair so well in reliability and performance. I have been increasingly interested in Postgres lately.Currently, I have about 30-35k users/databases. The general table layout is the same....only the data is different. I don't need to share data across databases. Very similar to a multi-tenant design.Here are a few questions I have:1. Could postgres support this many DBs? Are there any weird things that happen when the postgres is used this way?2. Is the schema method better? Performance, maintainability, backups, vacuum? Weird issues?Any incite is greatly appreciated.Thanks.Frank
undisclosed user wrote: > If I were to switch to a single DB/single schema format shared among > all users , how can I backup each user individually? depending on how many tables, etc, I suppose you could use a seperate series of SELECT statements ... but if this is a requirement, it certainly puts constraints on how you organize your data. without a much deeper knowlege of your application, data, and requirements, its kind of hard to give any sort of recommendations. you mentioned myISAM, so I gather this data isn't at all transactional, nor is relational integrity a priority.
On Nov 15, 1:07 pm, lovetodrinkpe...@gmail.com (undisclosed user) wrote: > If I were to switch to a single DB/single schema format shared among all > users , how can I backup each user individually? > > Frank > <snip> I would love to understand why that would be a requirement. I would much prefer backing up one database/schema. If I needed to restore a single user I then have options such as restoring to my test database then extracting that single user's data, though it is still unclear to me why that would be a requirement. If it is so single users can get a backup their own data, I would do it using a generic script or function. Since your users would see their own views only that should work fine. Loyal
The app is very similar to wordpress MU. Each user has the same schema but different data. The app uses the same codebase for every user. Users do not have direct access to data. Currently, the DB is 90% r / 10% w and about 80GB MyISAM. Most of the queries are simple (75%)...the rest are joins (25%). I am using myisam but I have too many concurrency and table crash issues... Mysql Fulltext search is horrible and causes a lot of lockups....tsearch2 seems like a good solution for us.
Basically, I want:
1. Good concurrency / decent performance
2. Data integrity
3. Fast Search
4. Ability to backup per user
Backing up data by user is required for my solution. A lot of times, users screw up and they want to rollback to a previous state.
If I were to do a database per user, the backup/restore would be very straight-forward. I believe backup/restore procedure is similar for schemas (let me know if I am wrong here)? If I were to do a single schema/database, is it possible to get data per user and back it up? Select user rows, copy to a temp table/db, backup?
Thanks,
Frank
On Sun, Nov 15, 2009 at 1:11 PM, John R Pierce <pierce@hogranch.com> wrote:
undisclosed user wrote:depending on how many tables, etc, I suppose you could use a seperate series of SELECT statements ...If I were to switch to a single DB/single schema format shared among all users , how can I backup each user individually?
but if this is a requirement, it certainly puts constraints on how you organize your data. without a much deeper knowlege of your application, data, and requirements, its kind of hard to give any sort of recommendations. you mentioned myISAM, so I gather this data isn't at all transactional, nor is relational integrity a priority.
On Sun, Nov 15, 2009 at 3:45 PM, undisclosed user <lovetodrinkpepsi@gmail.com> wrote: > Backing up data by user is required for my solution. A lot of times, users > screw up and they want to rollback to a previous state. In that case, being able to revert the state of an application should be part of the application (and database) design and not rely on database: history tables, etc... -- Peter Hunsberger
On Nov 15, 3:45 pm, lovetodrinkpe...@gmail.com (undisclosed user) wrote: > The app is very similar to wordpress MU. Each user has the same schema but > different data. > <snip> > 4. Ability to backup per user > > Backing up data by user is required for my solution. A lot of times, users > screw up and they want to rollback to a previous state. > <snip> I would still lean toward single schema, but that is just me. To make that work, I would need the following capabilities: 1. Ability to backup the entire database 2. Ability to restore the entire database elsewhere 3. Ability to restore a single user 1. The software exists as you know. BTW, the software also exists from PG if you wish to make one user per schema. You have to have the space to back it up. 80 GB might come out much larger in fact I am certain it would. 2. This requires that you have a test database you can restore to with enough room for the full data set plus the external file for the restore. The software is already available from PG. 3. This one depends on how automated you need the solution. If it has to be highly automated (read repeatable and less prone to error) that will require you to write the code to extract the data for a single user from the backup, delete the data for the single user, and restore the backup for the single user. 3a. If the occurrence can have less automation, you can use a tool to do most of the heavy lifting for you. I use phpPgAdmin. It can do the selects you need then allow you to do the deletes. It can extract the data for the user from the backup either in copy or distinct insert statements. Finally, if you want to get a start on the code, phpPgAdmin will in essence do that for you. When you do a query, it will usually give you an option to "edit" the SQL. This gives you copy/paste access to the code the tool has written. Loyal