Thread: strategies for segregating client data when using PostgreSQL in a web app
strategies for segregating client data when using PostgreSQL in a web app
From
Menelaos PerdikeasSemantix
Date:
I would like to know what are the best practices / common patterns (or pointers to such) for using PostgreSQL in the context of a "big" web application with substantial data per user.
Namely, we are designing an ERP / accounting / business intelligence Web Application where each client company will have its own substantial data. The application will be fronted by a JBoss Application Server with PostgreSQL at the back. We are targeting for a few thousand clients at the maximum, and that after 2-3 years at the earliest. I understand that there could be several approaches. In the following I am assuming we are running only one PostgreSQL server instance (process) or perhaps a few (3-4) in a cluster but I don't suppose that affects much the options below. So, I see the following options:
[1] use just one database and schema and logically segregate companies data by having all tables have a client_id column as part of their primary key.
[2] use multiple database (in the same server instance) and only the public schema in each of them for the customer's data.
[3] use one database and multiple schemas to separate the different customer's data.
(the [2] and [3] in particular seem practically indistinguishable to me).
What are the trade-offs in terms of:
[1] enforcing security and access separation
[2] administering the database and responding to inquiries like "please reset my company's data to the image of yesterday cause we messed up some tables" or "we are taking our business elsewhere, can we please have a dump of our data?" or "we would like a weekly DVD with our data".
[3] backup / restore and partitioning
[4] potential for connection pooling at the Application Server.
Any information, even pointers for further study will be greatly appreciated.
-- Menelaos.
Re: strategies for segregating client data when using PostgreSQL in a web app
From
Chris Angelico
Date:
On Sat, Aug 4, 2012 at 6:05 AM, Menelaos PerdikeasSemantix <mperdikeas.semantix@gmail.com> wrote: > [1] use just one database and schema and logically segregate companies data > by having all tables have a client_id column as part of their primary key. > [2] use multiple database (in the same server instance) and only the public > schema in each of them for the customer's data. > [3] use one database and multiple schemas to separate the different > customer's data. > > (the [2] and [3] in particular seem practically indistinguishable to me). At my work, we started with a MySQL setup involving one schema (what MySQL calls a "database") for global data and one for each customer's. After moving to Postgres, we solidified that as schemas in a database, and since then have moved quite a bit of data into the global-data schema - so we're in a hybrid of [1] and [3]. > What are the trade-offs in terms of: > > [1] enforcing security and access separation If your customers aren't able to write SQL themselves, this is easy in any scheme. Otherwise, [2] and [3] become far easier (but [1] is still possible, it just requires some careful work with views). > [2] administering the database and responding to inquiries like "please > reset my company's data to the image of yesterday cause we messed up some > tables" or "we are taking our business elsewhere, can we please have a dump > of our data?" or "we would like a weekly DVD with our data". Ouch. This is only possible, AT ALL, on condition that nobody's data affects anybody else's. But assuming that, I'd recommend [2] or [3]. > [3] backup / restore and partitioning Definitely [3], you can back the whole lot up easily. > [4] potential for connection pooling at the Application Server. Never done this, can't speak to it. On the whole, I would advise option 3. You get good separation, and you can do schema upgrades with short outages for each customer rather than a long outage for everyone. Though this could work the other way too - it might be more convenient to go for option 1, guaranteeing that every customer's tables are following the same structure. ChrisA
Re: strategies for segregating client data when using PostgreSQL in a web app
From
Rory Campbell-Lange
Date:
On 03/08/12, Menelaos PerdikeasSemantix (mperdikeas.semantix@gmail.com) wrote: > I would like to know what are the best practices / common patterns (or > pointers to such) for using PostgreSQL in the context of a "big" web > application with substantial data per user. ... > [1] use just one database and schema and logically segregate companies data > by having all tables have a client_id column as part of their primary key. > [2] use multiple database (in the same server instance) and only the public > schema in each of them for the customer's data. > [3] use one database and multiple schemas to separate the different > customer's data. We have about 75 clients with 10-50GB of data each, running two main services. Early on we decided to give each client their own database. This has the following advantages: - backup per client is really easy - moving databases between db servers is easy - a sequential pg_dump over all the dbs in a server runs pretty quickly - testing on a subset of dbs works well Problems: - the team needs to get really good at running upgrades across all dbs - if you need to do cross-db work you need to start looking towards cross-database tools or messaging e.g. AMQP Apart from the issues of dealing with cross-database queries (if you require them) I firmly believe this is the way to go. If you are using relatively small databases like us, you can scale when you need to, and use commodity hardware instead of needing something exotic. Put this together with the new streaming replication stuff, and you have a robust and easily scalable solution.
Re: strategies for segregating client data when using PostgreSQL in a web app
From
"David Johnston"
Date:
From: pgsql-general-owner@postgresql> .org [mailto:pgsql-general-owner@postgresql> .org] On Behalf Of Menelaos PerdikeasSemantix Sent: Friday, August 03, 2012 4:05 PM To: pgsql-general@postgresql> .org Subject: [GENERAL] strategies for segregating client data when using PostgreSQL in a web app I would like to know what are the best practices / common patterns (or pointers to such) for using PostgreSQL in the context of a "big" web application with substantial data per user> . Namely, we are designing an ERP / accounting / business intelligence Web Application where each client company will have its own substantial data> . The application will be fronted by a JBoss Application Server with PostgreSQL at the back> . We are targeting for a few thousand clients at the maximum, and that after 2-3 years at the earliest> . I understand that there could be several approaches> . In the following I am assuming we are running only one PostgreSQL server instance (process) or perhaps a few (3-4) in a cluster but I don't suppose that affects much the options below> . So, I see the following options: [1] use just one database and schema and logically segregate companies data by having all tables have a client_id column as part of their primary key> . [2] use multiple database (in the same server instance) and only the public schema in each of them for the customer's data> . [3] use one database and multiple schemas to separate the different customer's data> . (the [2] and [3] in particular seem practically indistinguishable to me)> . What are the trade-offs in terms of: [1] enforcing security and access separation [2] administering the database and responding to inquiries like "please reset my company's data to the image of yesterday cause we messed up some tables" or "we are taking our business elsewhere, can we please have a dump of our data?" or "we would like a weekly DVD with our data"> . [3] backup / restore and partitioning [4] potential for connection pooling at the Application Server> . Any information, even pointers for further study will be greatly appreciated> . -- Menelaos> . ============================================================================ ============ One approach I have been considering is: 1) Primary Database for "global" information 2) Per-Client Databases for "local" information + Maintain a local cache of whatever "global" information is needed 2a) Read-Only Client Database Slaves 3) Auxiliary Database to store client data that wants to be consolidated with other clients There are a lot of considerations and trade-offs that need to be evaluated. Level of client access Level of third-party access Desirability of on-premise Your resources and timeframe Generally I would suggest identifying different types of data/owners and keep their tables in separate schemas. Additionally I would allow for the possibility of multiple "clients" having data on the same physical tables. Beyond that decide what kind of client/internal meta-data is going to be necessary to keep to organize "modules". David J.
I'm going to be accessing a Postgres database through DB2 Federated Server and extracting/inserting with IBM Optim. I want to verify that Optim and DB2 are not corrupting text characters. This SQL should generate a sample set of characters: Select genseries, chr( genseries) from generate_series(1,2000) as genseries After extracting with Optim and inserting into a new table, I can write a simple query to compare the tables for differences. The Postgres database is using UTF8 encoding. I'm not sure if the chr function is the best way to generate a sample set of characters and I'm not if this is the correct range. I would appreciate any feedback on this approach and welcome suggestions on another approach. Thanks Alex