Thread: Experience with many schemas vs many databases

Experience with many schemas vs many databases

From
undisclosed user
Date:
Hello everyone,

I have hit a wall on completing a solution I am working on. Originally, the app used a db per user (on MyIsam)....the solution did not fair so well in reliability and performance. I have been increasingly interested in Postgres lately. 

Currently, I have about 30-35k users/databases. The general table layout is the same....only the data is different. I don't need to share data across databases. Very similar to a multi-tenant design.

Here are a few questions I have:

1. Could postgres support this many DBs? Are there any weird things that happen when the postgres is used this way?
2. Is the schema method better? Performance, maintainability, backups, vacuum? Weird issues?


Any incite is greatly appreciated.

Thanks.
Frank

Re: Experience with many schemas vs many databases

From
John R Pierce
Date:
undisclosed user wrote:
> Hello everyone,
>
> I have hit a wall on completing a solution I am working on.
> Originally, the app used a db per user (on MyIsam)....the solution did
> not fair so well in reliability and performance. I have been
> increasingly interested in Postgres lately.
>
> Currently, I have about 30-35k users/databases. The general table
> layout is the same....only the data is different. I don't need to
> share data across databases. Very similar to a multi-tenant design.

35000 users with separate databases isn't going to scale well on ANY
conventional system I'm familiar with


Re: Experience with many schemas vs many databases

From
Jorge Godoy
Date:
Frank,

I had the same questioning a while ago and another thing that made me think was the amount of data per user.

In the end, I decided on using a single DB and single schema and add a clause to split everything by each customer (customer_id).

I then added an index on that column and my code became simpler and fast enough.

This also allowed me to work with some other aggregates that provided very useful "global" statistics.

--
Jorge Godoy     <jgodoy@gmail.com>


On Sun, Nov 15, 2009 at 04:28, undisclosed user <lovetodrinkpepsi@gmail.com> wrote:
Hello everyone,

I have hit a wall on completing a solution I am working on. Originally, the app used a db per user (on MyIsam)....the solution did not fair so well in reliability and performance. I have been increasingly interested in Postgres lately. 

Currently, I have about 30-35k users/databases. The general table layout is the same....only the data is different. I don't need to share data across databases. Very similar to a multi-tenant design.

Here are a few questions I have:

1. Could postgres support this many DBs? Are there any weird things that happen when the postgres is used this way?
2. Is the schema method better? Performance, maintainability, backups, vacuum? Weird issues?


Any incite is greatly appreciated.

Thanks.
Frank

Re: Experience with many schemas vs many databases

From
Johan Nel
Date:
undisclosed user wrote:
> I have hit a wall on completing a solution I am working on. Originally,
> the app used a db per user (on MyIsam)....the solution did not fair so
> well in reliability and performance. I have been increasingly interested
> in Postgres lately.
>
> Currently, I have about 30-35k users/databases. The general table layout
> is the same....only the data is different. I don't need to share data
> across databases. Very similar to a multi-tenant design.
>
> Here are a few questions I have:
>
> 1. Could postgres support this many DBs? Are there any weird things that
> happen when the postgres is used this way?
As John indicated, not any traditional environment that will handle that
well..

> 2. Is the schema method better? Performance, maintainability, backups,
> vacuum? Weird issues?
I would rather use schemas to logically group tables together.  Insert a
user_id column in the tables and ensure each user can only see the rows he
has access to via query design to limit user access.  Something in the
line of:

CREATE OR REPLACE VIEW SomeTableQuery AS
   SELECT * FROM SomeTable WHERE user_id = current_user;

Where SomeTable has a column user_id that defaults to current_user.

Johan Nel
Pretoria, South Africa.


Re: Experience with many schemas vs many databases

From
Tom Lane
Date:
undisclosed user <lovetodrinkpepsi@gmail.com> writes:
> I have hit a wall on completing a solution I am working on. Originally, the
> app used a db per user (on MyIsam)....the solution did not fair so well in
> reliability and performance. I have been increasingly interested in Postgres
> lately.

> Currently, I have about 30-35k users/databases. The general table layout is
> the same....only the data is different. I don't need to share data across
> databases. Very similar to a multi-tenant design.

Use multiple schemas, not multiple databases.  If you had it working in
mysql then what you were using was more nearly schemas than databases
anyway --- it's unfortunate that the two systems use the same word
"database" for what are really different structures.

            regards, tom lane

Re: Experience with many schemas vs many databases

From
Merlin Moncure
Date:
On Sun, Nov 15, 2009 at 1:28 AM, undisclosed user
<lovetodrinkpepsi@gmail.com> wrote:
> Hello everyone,
> I have hit a wall on completing a solution I am working on. Originally, the
> app used a db per user (on MyIsam)....the solution did not fair so well in
> reliability and performance. I have been increasingly interested in Postgres
> lately.
> Currently, I have about 30-35k users/databases. The general table layout is
> the same....only the data is different. I don't need to share data across
> databases. Very similar to a multi-tenant design.
> Here are a few questions I have:
> 1. Could postgres support this many DBs? Are there any weird things that
> happen when the postgres is used this way?
> 2. Is the schema method better? Performance, maintainability, backups,
> vacuum? Weird issues?


Use schema.  Here's a pro tip: if you have any sql or pl/pgsql
functions you can use the same function body across all the schema as
long as you discard the plans when you want to move from schema to
schema.

I'm curious if those suggesting there is a practical upper limit of
the number of schema postgres can handle have any hard information to
back that up...

merlin

Re: Experience with many schemas vs many databases

From
Scott Marlowe
Date:
On Sun, Nov 15, 2009 at 11:54 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
>
> Use schema.  Here's a pro tip: if you have any sql or pl/pgsql
> functions you can use the same function body across all the schema as
> long as you discard the plans when you want to move from schema to
> schema.

I too vote for schemas.

> I'm curious if those suggesting there is a practical upper limit of
> the number of schema postgres can handle have any hard information to
> back that up...

The real limit is performance of certain things over the catalogs, not
the number of schemas, but how many objects are in the db seem to
impact me more, and that's only with slony.  Everything else runs fine
with ~40k objects in my db.

Re: Experience with many schemas vs many databases

From
John R Pierce
Date:
undisclosed user wrote:
> Currently, I have about 30-35k users/databases. The general table
> layout is the same....only the data is different. I don't need to
> share data across databases. Very similar to a multi-tenant design.

Do these users make their own arbitrary SQL queries?    Or is all the
coding canned, and they are simply running applications?   in the latter
case, I would definitely suggest using a single database and schema, and
one set of tables and having CustomerID be a field that you index in
these tables.







Re: Experience with many schemas vs many databases

From
undisclosed user
Date:
If I were to switch to a single DB/single schema format shared among all users , how can I backup each user individually?

Frank

On Sat, Nov 14, 2009 at 10:28 PM, undisclosed user <lovetodrinkpepsi@gmail.com> wrote:
Hello everyone,

I have hit a wall on completing a solution I am working on. Originally, the app used a db per user (on MyIsam)....the solution did not fair so well in reliability and performance. I have been increasingly interested in Postgres lately. 

Currently, I have about 30-35k users/databases. The general table layout is the same....only the data is different. I don't need to share data across databases. Very similar to a multi-tenant design.

Here are a few questions I have:

1. Could postgres support this many DBs? Are there any weird things that happen when the postgres is used this way?
2. Is the schema method better? Performance, maintainability, backups, vacuum? Weird issues?


Any incite is greatly appreciated.

Thanks.
Frank

Re: Experience with many schemas vs many databases

From
John R Pierce
Date:
undisclosed user wrote:
> If I were to switch to a single DB/single schema format shared among
> all users , how can I backup each user individually?

depending on how many tables, etc, I suppose you could use a seperate
series of SELECT statements ...
but if this is a requirement, it certainly puts constraints on how you
organize your data.   without a much deeper knowlege of your
application, data, and requirements, its kind of hard to give any sort
of recommendations.   you mentioned myISAM, so I gather this data isn't
at all transactional, nor is relational integrity a priority.









Re: Experience with many schemas vs many databases

From
Loyal
Date:
On Nov 15, 1:07 pm, lovetodrinkpe...@gmail.com (undisclosed user)
wrote:
> If I were to switch to a single DB/single schema format shared among all
> users , how can I backup each user individually?
>
> Frank
> <snip>

I would love to understand why that would be a requirement.  I would
much
prefer backing up one database/schema.  If I needed to restore a
single user
I then have options such as restoring to my test database then
extracting
that single user's data, though it is still unclear to me why that
would be a
requirement.  If it is so single users can get a backup their own
data, I would
do it using a generic script or function.  Since your users would see
their own
views only that should work fine.

Loyal

Re: Experience with many schemas vs many databases

From
undisclosed user
Date:
The app is very similar to wordpress MU. Each user has the same schema but different data. The app uses the same codebase for every user. Users do not have direct access to data. Currently, the DB is 90% r / 10% w and about 80GB MyISAM. Most of the queries are simple (75%)...the rest are joins (25%). I am using myisam but I have too many concurrency and table crash issues...  Mysql Fulltext search is horrible and causes a lot of lockups....tsearch2 seems like a good solution for us. 

Basically, I want:
1. Good concurrency / decent performance
2. Data integrity
3. Fast Search
4. Ability to backup per user

Backing up data by user is required for my solution. A lot of times, users screw up and they want to rollback to a previous state. 

If I were to do a database per user, the backup/restore would be very straight-forward. I believe backup/restore procedure is similar for schemas (let me know if I am wrong here)? If I were to do a single schema/database, is it possible to get data per user and back it up? Select user rows, copy to a temp table/db, backup? 

Thanks,
Frank



On Sun, Nov 15, 2009 at 1:11 PM, John R Pierce <pierce@hogranch.com> wrote:
undisclosed user wrote:
If I were to switch to a single DB/single schema format shared among all users , how can I backup each user individually?

depending on how many tables, etc, I suppose you could use a seperate series of SELECT statements ...
but if this is a requirement, it certainly puts constraints on how you organize your data.   without a much deeper knowlege of your application, data, and requirements, its kind of hard to give any sort of recommendations.   you mentioned myISAM, so I gather this data isn't at all transactional, nor is relational integrity a priority.









Re: Experience with many schemas vs many databases

From
Peter Hunsberger
Date:
On Sun, Nov 15, 2009 at 3:45 PM, undisclosed user
<lovetodrinkpepsi@gmail.com> wrote:
> Backing up data by user is required for my solution. A lot of times, users
> screw up and they want to rollback to a previous state.

In that case, being able to revert the state of an application should
be part of the application (and database) design and not rely on
database: history tables, etc...

--
Peter Hunsberger

Re: Experience with many schemas vs many databases

From
Loyal
Date:
On Nov 15, 3:45 pm, lovetodrinkpe...@gmail.com (undisclosed user)
wrote:
> The app is very similar to wordpress MU. Each user has the same schema but
> different data.
> <snip>

> 4. Ability to backup per user
>
> Backing up data by user is required for my solution. A lot of times, users
> screw up and they want to rollback to a previous state.
> <snip>

I would still lean toward single schema, but that is just me.  To make
that work, I would need the following capabilities:
1. Ability to backup the entire database
2. Ability to restore the entire database elsewhere
3. Ability to restore a single user

1. The software exists as you know.  BTW, the software also exists
from PG if you wish to make one user per schema.  You have to have the
space to back it up.  80 GB might come out much larger in fact I am
certain it would.
2. This requires that you have a test database you can restore to with
enough room for the full data set plus the external file for the
restore.  The software is already available from PG.
3. This one depends on how automated you need the solution.  If it has
to be highly automated (read repeatable and less prone to error) that
will require you to write the code to extract the data for a single
user from the backup, delete the data for the single user, and restore
the backup for the single user.
3a. If the occurrence can have less automation, you can use a tool to
do most of the heavy lifting for you.  I use phpPgAdmin.  It can do
the selects you need then allow you to do the deletes.  It can extract
the data for the user from the backup either in copy or distinct
insert statements.  Finally, if you want to get a start on the code,
phpPgAdmin will in essence do that for you.  When you do a query, it
will usually give you an option to "edit" the SQL.  This gives you
copy/paste access to the code the tool has written.

Loyal