Thread: Multi tenancy : schema vs databases
Hi I would like to know which technique is better for supporting multi-tenancy= applications, going upto hundreds or even thousands of tenants. 1 - One database with difference schemas (one schema per tenant) or 2 - One database per tenant. The points to be considered are: 1 - which is more light weight from resources point of view. 2 - which is easier for backup/restore 3 - Which is better from security p.o.v It is possible that this question has been asked before. Is there a link t= o earlier posts on the subject. Thanks.
On Fri, Sep 30, 2016 at 5:18 AM, Rakesh Kumar <rakeshkumar464@outlook.com> wrote:
Regards,
Hi
I would like to know which technique is better for supporting multi-tenancy=
applications, going upto hundreds or even thousands of tenants.
1 - One database with difference schemas (one schema per tenant)
or
2 - One database per tenant.
Did you mean one database with-in a postgresql cluster ?
The points to be considered are:
1 - which is more light weight from resources point of view.
2 - which is easier for backup/restore
3 - Which is better from security p.o.v
A schema per tenant would probably be a good idea to go with. Since, you are saying there could be thousands of tenants, going for single-database-per-tenant could possibly end up in a very bad and complex database design.
One point to consider would be that, how different could be the backup/restore, security or any other database policies for different tenants.
Regards,
Venkata B N
Database Consultant / Architect
________________________________________ From: Venkata B Nagothi <nag1010@gmail.com> Sent: Thursday, September 29, 2016 17:25 To: Rakesh Kumar Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] Multi tenancy : schema vs databases On Fri, Sep 30, 2016 at 5:18 AM, Rakesh Kumar <rakeshkumar464@outlook.com<mailto:rakeshkumar464@outlook.com>> wrote: Hi I would like to know which technique is better for supporting multi-tenancy= applications, going upto hundreds or even thousands of tenants. 1 - One database with difference schemas (one schema per tenant) or 2 - One database per tenant. Did you mean one database with-in a postgresql cluster ? Yes. Say something like this within a PG cluster db4978 db6234 ... 100s of such databases.
On 9/29/2016 2:25 PM, Venkata B Nagothi wrote: > Since, you are saying there could be thousands of tenants, going for > single-database-per-tenant could possibly end up in a very bad and > complex database design. > worse, it would also require each tenant to have unique connections, making connection pooling a nightmare. depending on the nature of the application, its data isolation requirements, and how much per-tenant customization there is, assuming the customers('tenants') aren't directly accessing SQL, I could see many scenarios with ONE database+schema, and 'tenant' is just a field that qualifies queries. From a pure performance standpoint, this likely woudl be the most efficient, as 1000s of schemas with 100s of tables each == 100s of 1000s of tables, which means massive bloat of the postgres catalog, and also makes caching less effective. -- john r pierce, recycling bits in santa cruz
On Fri, Sep 30, 2016 at 10:16 AM, Rakesh Kumar <rakeshkumar464@outlook.com> wrote:
________________________________________
From: Venkata B Nagothi <nag1010@gmail.com>
Sent: Thursday, September 29, 2016 17:25
To: Rakesh Kumar
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Multi tenancy : schema vs databases
On Fri, Sep 30, 2016 at 5:18 AM, Rakesh Kumar <rakeshkumar464@outlook.com<mailto:rakeshkumar464@outlook. com>> wrote:
Hi
I would like to know which technique is better for supporting multi-tenancy=
applications, going upto hundreds or even thousands of tenants.
1 - One database with difference schemas (one schema per tenant)
or
2 - One database per tenant.
Did you mean one database with-in a postgresql cluster ?
Yes. Say something like this within a PG cluster
db4978
db6234
...
100s of such databases.
That would make things worst if you are going for one database per tenant. As said by John just now, it would end up in an very complex and bad design contributing to very poor performance and high maintenance overhead.
A schema per tenant would be a good idea and its hard to say without knowing the data isolation levels you require for each tenant.
Regards,
Venkata B N
Database consultant / Architect
________________________________________ From: Venkata B Nagothi <nag1010@gmail.com> Sent: Friday, September 30, 2016 02:48 To: Rakesh Kumar Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] Multi tenancy : schema vs databases On Fri, Sep 30, 2016 at 10:16 AM, Rakesh Kumar <rakeshkumar464@outlook.com<mailto:rakeshkumar464@outlook.com>> wrote: ________________________________________ From: Venkata B Nagothi <nag1010@gmail.com<mailto:nag1010@gmail.com>> Sent: Thursday, September 29, 2016 17:25 To: Rakesh Kumar Cc: pgsql-general@postgresql.org<mailto:pgsql-general@postgresql.org> Subject: Re: [GENERAL] Multi tenancy : schema vs databases On Fri, Sep 30, 2016 at 5:18 AM, Rakesh Kumar <rakeshkumar464@outlook.com<mailto:rakeshkumar464@outlook.com><mailto:rakeshkumar464@outlook.com<mailto:rakeshkumar464@outlook.com>>> wrote: Hi I would like to know which technique is better for supporting multi-tenancy= applications, going upto hundreds or even thousands of tenants. 1 - One database with difference schemas (one schema per tenant) or 2 - One database per tenant. Did you mean one database with-in a postgresql cluster ? Yes. Say something like this within a PG cluster db4978 db6234 ... 100s of such databases. That would make things worst if you are going for one database per tenant. As said by John just now, it would end up in anvery complex and bad design contributing to very poor performance and high maintenance overhead. A schema per tenant would be a good idea and its hard to say without knowing the data isolation levels you require for eachtenant. ======== We require complete data isolation. Absolutely nothing should be shared between two tenants. WHy would multiple dbs be any worse than multiple schemas in performance?
On 9/30/2016 2:06 AM, Rakesh Kumar wrote: > We require complete data isolation. Absolutely nothing should be shared between two tenants. > > WHy would multiple dbs be any worse than multiple schemas in performance? complete? use 1000s of seperate VM instances, one per tennant. ok, thats ridiculous, isn't it. so now its time to find a compromise. -- john r pierce, recycling bits in santa cruz
Via schemata if the tenants represent sub entities of the same organization. This gives the top level mgmt the ability to have a consolidated view of the whole organization. On 30/09/2016 12:06, Rakesh Kumar wrote: > > ________________________________________ > From: Venkata B Nagothi <nag1010@gmail.com> > Sent: Friday, September 30, 2016 02:48 > To: Rakesh Kumar > Cc: pgsql-general@postgresql.org > Subject: Re: [GENERAL] Multi tenancy : schema vs databases > > On Fri, Sep 30, 2016 at 10:16 AM, Rakesh Kumar <rakeshkumar464@outlook.com<mailto:rakeshkumar464@outlook.com>> wrote: > > > ________________________________________ > From: Venkata B Nagothi <nag1010@gmail.com<mailto:nag1010@gmail.com>> > Sent: Thursday, September 29, 2016 17:25 > To: Rakesh Kumar > Cc: pgsql-general@postgresql.org<mailto:pgsql-general@postgresql.org> > Subject: Re: [GENERAL] Multi tenancy : schema vs databases > > On Fri, Sep 30, 2016 at 5:18 AM, Rakesh Kumar <rakeshkumar464@outlook.com<mailto:rakeshkumar464@outlook.com><mailto:rakeshkumar464@outlook.com<mailto:rakeshkumar464@outlook.com>>> wrote: > > Hi > > I would like to know which technique is better for supporting multi-tenancy= > applications, going upto hundreds or even thousands of tenants. > > 1 - One database with difference schemas (one schema per tenant) > or > 2 - One database per tenant. > > Did you mean one database with-in a postgresql cluster ? > > Yes. Say something like this within a PG cluster > > db4978 > db6234 > ... > 100s of such databases. > > That would make things worst if you are going for one database per tenant. As said by John just now, it would end up inan very complex and bad design contributing to very poor performance and high maintenance overhead. > A schema per tenant would be a good idea and its hard to say without knowing the data isolation levels you require foreach tenant. > ======== > > We require complete data isolation. Absolutely nothing should be shared between two tenants. > > WHy would multiple dbs be any worse than multiple schemas in performance? > > > > -- Achilleas Mantzios IT DEV Lead IT DEPT Dynacom Tankers Mgmt
> ok, thats ridiculous, isn't it. so now its time to find a compromise. You don't understand how sales people pitch our products. We deal with financial data and our customers are extremely sensitive to even imagining that their data will co-reside with that of their competitors who also are our customers. A typical fear mongering Q from them "what if due to a bug in your s/w, our competitors end up looking at our data" or something like that. That's why schema level vs db level discussion. Just a reminder, I started this thread to learn more on the technical drawbacks of choosing either option. For example, in SQL Server, having multiple databases in an instance does not mean more significantly pressure on resources (as compared to multiple schemas). In DB2 it does since many resources like cache (buffers) are db specific.
you could also use a hybrid approach :
- have a systematic tenant_id field in your tables, allowing for 1 db / 1 schema multi-tenants
- give your application the ability to set the schema path for a tenant, so it will locate the tenant schema if it has a decidated schema
- maybe go to the extreme to be able to specialize the db per tenant
- ..
this would allow you to easily re-organize your tenants to find the best compromise depending on their status (small tenants, huge tenant, security freaks tenants, ..).
if going with schema based tenants, make sure you have administrative tasks to check the diffs between the schemas because if 1000s schemas diverge it will be bring technical debt down the line.
On Fri, Sep 30, 2016 at 11:47 AM, Achilleas Mantzios <achill@matrix.gatewaynet.com> wrote:
Via schemata if the tenants represent sub entities of the same organization.
This gives the top level mgmt the ability to have a consolidated view of the whole organization.
On 30/09/2016 12:06, Rakesh Kumar wrote:
________________________________________
From: Venkata B Nagothi <nag1010@gmail.com>
Sent: Friday, September 30, 2016 02:48
To: Rakesh Kumar
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Multi tenancy : schema vs databases
On Fri, Sep 30, 2016 at 10:16 AM, Rakesh Kumar <rakeshkumar464@outlook.com<mailto:rakeshkumar464@outlook.co m>> wrote:
________________________________________
From: Venkata B Nagothi <nag1010@gmail.com<mailto:nag1010@gmail.com>>
Sent: Thursday, September 29, 2016 17:25
To: Rakesh Kumar
Cc: pgsql-general@postgresql.org<mailto:pgsql-general@postgresql .org>
Subject: Re: [GENERAL] Multi tenancy : schema vs databases
On Fri, Sep 30, 2016 at 5:18 AM, Rakesh Kumar <rakeshkumar464@outlook.com<mailto:rakeshkumar464@outlook.co m><mailto:rakeshkumar464@outlo ok.com<mailto:rakeshkumar464@ outlook.com>>> wrote:
Hi
I would like to know which technique is better for supporting multi-tenancy=
applications, going upto hundreds or even thousands of tenants.
1 - One database with difference schemas (one schema per tenant)
or
2 - One database per tenant.
Did you mean one database with-in a postgresql cluster ?
Yes. Say something like this within a PG cluster
db4978
db6234
...
100s of such databases.
That would make things worst if you are going for one database per tenant. As said by John just now, it would end up in an very complex and bad design contributing to very poor performance and high maintenance overhead.
A schema per tenant would be a good idea and its hard to say without knowing the data isolation levels you require for each tenant.
========
We require complete data isolation. Absolutely nothing should be shared between two tenants.
WHy would multiple dbs be any worse than multiple schemas in performance?
--
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On Fri, Sep 30, 2016 at 5:11 AM, John R Pierce <pierce@hogranch.com> wrote: > On 9/30/2016 2:06 AM, Rakesh Kumar wrote: >> >> We require complete data isolation. Absolutely nothing should be shared >> between two tenants. >> >> WHy would multiple dbs be any worse than multiple schemas in performance? > > > complete? use 1000s of seperate VM instances, one per tennant. > Well, VM's don't always provide 100% isolation, so separate hardware, with each on its own VLAN seems right to me. And then make sure your switch doesn't leak across VLANs. Your requirements need refinement, at the least :)
On Fri, Sep 30, 2016 at 6:06 AM, Rakesh Kumar <rakeshkumar464@outlook.com> wrote: > A typical fear mongering Q from > them "what if due to a bug in your s/w, our competitors end up looking at our data" or > something like that. That's why schema level vs db level discussion. So... if your software isolates customers based on a "customer_id" field in a table, how is that less secure than isolating a customer based on the schema? Ie, you've just moved the customer_id field from a column to part of the table name itself. One step up from that, what keeps your software from selecting the wrong customer_id database name too? See, it is all just *where* you put the distinguishing name... For your resource question, having multiple DBs require each to have its own handle within your app. When you have many simultaneous users, you have lots and lots of file handles to manage. You want to minimize the number of connections to the postgres server itself.
Rakesh,
As long as one application knows how to connect to more than 1 tenant, there will always be the possibility that a software bug in your application causes one tenant to access another tenant's data. I think this is why you're getting people asking you to refine your requirements. There quite simply is no way to "guarantee" one tenant can't see another tenant when they are sharing resources.
As Vick mentioned, whether you do this by database, schema, or a column in a table, your application has to decide which tenant to connect to and when. You could make the argument that using a column such as customer_id increases the surface area of the potential failures, but generally speaking - regardless of the solution you go with - your best investment would be in a testing infrastructure for your application that ensures data remains tenant-isolated.
I know none of this answers your most recent question, about the technical tradeoffs between schema per tenant or database per tenant, but I think it's still relevant to your original question. Both of those solutions will be considerably harder to manage than a shared-everything infrastructure. They'll certainly both require more resources, and they introduce the problem for other issues (inconsistent schema and/or table definitions across multiple tenants being one of the most problematic and difficult to prevent).
Best,
Colin
Worth noting: in the shared everything infrastructure, it's far more likely that a bug in your software results in one tenant seeing all data across all tenants, as opposed to data for one wrong tenant. The good news(?) here is that these kinds of bugs are generally very easy to spot before they ever make it out to customers. In any case, the answer is almost certainly going to be testing.
On Fri, Sep 30, 2016 at 10:34 AM Vick Khera <vivek@khera.org> wrote:
On Fri, Sep 30, 2016 at 5:11 AM, John R Pierce <pierce@hogranch.com> wrote:
> On 9/30/2016 2:06 AM, Rakesh Kumar wrote:
>>
>> We require complete data isolation. Absolutely nothing should be shared
>> between two tenants.
>>
>> WHy would multiple dbs be any worse than multiple schemas in performance?
>
>
> complete? use 1000s of seperate VM instances, one per tennant.
>
Well, VM's don't always provide 100% isolation, so separate hardware,
with each on its own VLAN seems right to me. And then make sure your
switch doesn't leak across VLANs.
Your requirements need refinement, at the least :)
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On Fri, Sep 30, 2016 at 6:06 AM, Rakesh Kumar <rakeshkumar464@outlook.com> wrote: > A typical fear mongering Q from > them "what if due to a bug in your s/w, our competitors end up looking at our data" or > something like that. That's why schema level vs db level discussion. I've been reading this discussion with great interest, to see what other Postgres experts think. :-) I've almost always taken the customer_id approach, and I prefer it---but I also agree it is easier to make mistakes, because you have to include that condition in your code everywhere. With per-schema or per-database, you can manage access simply by handing out connections. If isolation is your goal, one drawback with one-database-many-schemas is that there is no way to prevent users from listing all the schemas in the database. In psql this is `\dn`, but you can also do it with SQL against the system catalog. You can forbid querying the tables in the schema, but anyone can see that the schema itself is there. So that would leak some information---at the very least the number of customers you have. You haven't said whether users will go through an application tier or have direct SQL access, but if it's the latter, this is something to be aware of. I believe it is possible to prevent, but only by taking away access from important catalog tables that would also break `\dt` or `\d foo`. (I would love to be corrected btw!) Also you can't use RLS against the system catalog, so there's no solution there. Good luck! Paul
> I've been reading this discussion with great interest, to see what other > Postgres experts think. :-) I am bit disappointed that most of the replies are questioning why we are doing what we are doing. Once again, we (db designers) have no choice in that. What I would like to know that which one is better :- multiple db vs multiple schema. Read few interesting arguments and noted that connection pooling works better with multiple schemas than dbs. Anything else? thanks
On Fri, Sep 30, 2016 at 2:06 AM, Rakesh Kumar <rakeshkumar464@outlook.com> wrote:
We require complete data isolation. Absolutely nothing should be shared between two tenants.
Then you need different clusters per tenant. Otherwise, the WAL records of different tenants are inextricably mingled together.
Cheers,
Jeff
* Jeff Janes (jeff.janes@gmail.com) wrote: > On Fri, Sep 30, 2016 at 2:06 AM, Rakesh Kumar <rakeshkumar464@outlook.com> > wrote: > > We require complete data isolation. Absolutely nothing should be shared > > between two tenants. > > Then you need different clusters per tenant. Otherwise, the WAL records of > different tenants are inextricably mingled together. Different clusters are also required to have independent file-level backups, independent roles, independent tablespaces, etc. It's also far easier to move a single cluster from one system to another to adjust for growth than to try and move an individual schema or database. Thanks! Stephen
Attachment
>Then you need different clusters per tenant. Otherwise, the WAL records > of different tenants are inextricably mingled together. Yes we are aware of it .This part is OK as it is not deemed as user table data.
>> I've been reading this discussion with great interest, to see what >> other Postgres experts think. :-) > >I am bit disappointed that most of the replies are questioning why we are doing what we are doing. Once again, we (db designers)have no choice in that. What I would like to know that which one is better :- multiple db vs multiple schema. Read few interesting arguments and noted that connection pooling works better with multiple schemas than dbs. Anythingelse? I've been curious to see what the others would tell you too. :) There's a lot about the admin side I can't advise you on,but I'll take a shot anyway from an overall-app view... Your situation sounds somewhat similar to my previous job. There, we hosted multiple customers on the same physical server.We also used Mysql, so your question wouldn't have applied there. But translating that situation to if they had usedPostgres, I think I'd have told them to do 1 DB and many schemas because of the resource sharing. That would have workedfor them because the DBs were completely internal; i.e. the customer could not get to the DB directly -- the customercould only see the data thru our app. Given that, then each DB server would have hosted between 1-50 customers (dependingon their size). The difficult spot with Postgres (AFAICT) is that if your customer has direct access to the DB, then 1 DB to many schemawould break your security requirements. We had a situation at my present job recently where one of customers wantedaccess to our log tables. The first idea was to grant them select-only privs to the logging schema thinking that wouldbe safe enough as they couldn't get to the main data schema. However, in testing, we found that wasn't good enough asit allowed them to at least look at table designs even if they couldn't get to the data. That was bad so in a sense Postgresfailed us (to the experts if there is a way to do this, I'd love to know how to do that). We considered creatinga separate DB for the logging data, but decided that would make things too difficult and we didn't want to "waste"server resources in that way. We ended up writing a small app that the customer could query and it read the log filesfor them, ensuring security was maintained. This is why others are asking you about your [security] requirements. BTW, if you go the 1 DB and many schema way, be sure you fully understand "search_path". I don't know if that's helpful to you or not, but hopefully it was at least a little. Kevin
I don't know if that's helpful to you or not, but hopefully it was at least a little. === yes it was. thanks
On Thu, Sep 29, 2016 at 12:18 PM, Rakesh Kumar <rakeshkumar464@outlook.com> wrote:
Hi
I would like to know which technique is better for supporting multi-tenancy=
applications, going upto hundreds or even thousands of tenants.
1 - One database with difference schemas (one schema per tenant)
or
2 - One database per tenant.
The points to be considered are:
1 - which is more light weight from resources point of view.
Multiple databases used to be more resource intensive, but that was mostly fixed in 9.3. Now I don't believe there is all that much difference while running (databases are much more heavy to create in the first place).
As others have said, different databases makes connection pooling less efficient, which could be very important to you, or could be irrelevant.
2 - which is easier for backup/restore
That depends on how you want to backup restore. If you use log archiving, it will make no difference. If you use sql dumps, then do you want to make it easier or harder to backup and restore all clients together, or all of them separately? Are you restoring because your whole system failed, or because one client did something wrong and needs just their data rolled back?
3 - Which is better from security p.o.v
Any implementation can be done wrong in a way that causes security problems. Do your clients authenticate directly to the database, or to the app server?
Cheers,
Jeff
>Are you restoring because your whole system failed, or because one client did something >wrong and needs just their data rolled back? Chances of restoring just for one client will probably be 99% of use cases. > Do your clients authenticate directly to the database, or to the app server? thru app server. thanks.
On 10/1/2016 11:39 AM, Jeff Janes wrote: > > As others have said, different databases makes connection pooling less > efficient, which could be very important to you, or could be irrelevant. 1000 apps running at once each with their own DB and 10s of connections == 10s of 1000s of database connections, which means 10s of 1000s of database processes. 1000 apps only connecting as needed for each transaction == database server forks for every connection, expensive. -- john r pierce, recycling bits in santa cruz
On 10/1/2016 12:52 PM, Rakesh Kumar wrote:
Do your clients authenticate directly to the database, or to the app server?thru app server.
do you run a separate instance of the app for each tenant, or is there one app that identifies the tenant and handles them accordingly ?
-- john r pierce, recycling bits in santa cruz
On Sat, Oct 1, 2016 at 4:52 PM, John R Pierce <pierce@hogranch.com> wrote:
On 10/1/2016 12:52 PM, Rakesh Kumar wrote:Do your clients authenticate directly to the database, or to the app server?thru app server.do you run a separate instance of the app for each tenant, or is there one app that identifies the tenant and handles them accordingly ?
-- john r pierce, recycling bits in santa cruz
I would like to comment on the multiple schema vs databases situation.
First of all, 1000's of databases is insanity and just asking for trouble. Next, 1000's of schemas is a nightmare to maintain. I understand the requirement for client data to be "isolated", but in reality, data is never really separated. Once it's on the server, any good hacker with a knowledge of SQL can find it. So, IMHO, the best solution is to isolate by a client ID in the tables of one database. Then make sure you have sufficient and correct security on those tables.
--
Melvin Davidson
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.
>do you run a separate instance of the app for each tenant, or is there one app that identifies the >tenant and handles them accordingly ? Each tenant will have different app server. there will be persistent connection for each tenant.
On 30/09/2016 18:45, Rakesh Kumar wrote: >> I've been reading this discussion with great interest, to see what other >> Postgres experts think. :-) > I am bit disappointed that most of the replies are questioning why we are > doing what we are doing. Once again, we (db designers) have no choice > in that. What I would like to know that which one is better :- multiple db > vs multiple schema. Read few interesting arguments and noted that > connection pooling works better with multiple schemas than dbs. Anything else? Since you missed it, i write it once again : same organization (e.g. holding) -> schemas different organizations -> DBs > > thanks > -- Achilleas Mantzios IT DEV Lead IT DEPT Dynacom Tankers Mgmt
On Sat, Oct 01, 2016 at 07:21:47PM -0400, Melvin Davidson wrote: > *I would like to comment on the multiple schema vs databases situation. > First of all, 1000's of databases is insanity and just asking for trouble. > Next, 1000's of schemas is a nightmare to maintain. I understand the > requirement for client data to be "isolated", but in reality, data is never > really separated. Once it's on the server, any good hacker with a knowledge > of SQL can find it. So, IMHO, the best solution is to isolate by a client > ID in the tables of one database. Then make sure you have sufficient and > correct security on those tables.* The concern was raised that if tenant data is separated only by client_id within one and the same table a simple client_id related error in the app would expose another tenants data. Would not RLS help in avoiding this sort of thing ? Tie RLS based row visibility to the logged in user and hand out different accounts to tenants. That way, the app cannot see anything beyond what the user types into the application in terms of credentials. The app can't really do it wrong -- the user provides credentials and the database provides data based on those credentials. Karsten -- GPG key ID E4071346 @ eu.pool.sks-keyservers.net E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346