Thread: strategies for segregating client data when using PostgreSQL in a web app

strategies for segregating client data when using PostgreSQL in a web app

From
Menelaos PerdikeasSemantix
Date:
I would like to know what are the best practices / common patterns (or pointers to such) for using PostgreSQL in the context of a "big" web application with substantial data per user.

Namely, we are designing an ERP / accounting / business intelligence Web Application where each client company will have its own substantial data. The application will be fronted by a JBoss Application Server with PostgreSQL at the back. We are targeting for a few thousand clients at the maximum, and that after 2-3 years at the earliest. I understand that there could be several approaches. In the following I am assuming we are running only one PostgreSQL server instance (process) or perhaps a few (3-4) in a cluster but I don't suppose that affects much the  options below. So, I see the following options:

[1] use just one database and schema and logically segregate companies data by having all tables have a client_id column as part of their primary key.
[2] use multiple database (in the same server instance) and only the public schema in each of them for the customer's data.
[3] use one database and multiple schemas to separate the different customer's data.

(the [2] and [3] in particular seem practically indistinguishable to me).

What are the trade-offs in terms of:

[1] enforcing security and access separation
[2] administering the database and responding to inquiries like "please reset my company's data to the image of yesterday cause we messed up some tables" or "we are taking our business elsewhere, can we please have a dump of our data?" or "we would like a weekly DVD with our data".
[3] backup / restore and partitioning
[4] potential for connection pooling at the Application Server.

Any information, even pointers for further study will be greatly appreciated.
-- Menelaos.

Re: strategies for segregating client data when using PostgreSQL in a web app

From
Chris Angelico
Date:
On Sat, Aug 4, 2012 at 6:05 AM, Menelaos PerdikeasSemantix
<mperdikeas.semantix@gmail.com> wrote:
> [1] use just one database and schema and logically segregate companies data
> by having all tables have a client_id column as part of their primary key.
> [2] use multiple database (in the same server instance) and only the public
> schema in each of them for the customer's data.
> [3] use one database and multiple schemas to separate the different
> customer's data.
>
> (the [2] and [3] in particular seem practically indistinguishable to me).

At my work, we started with a MySQL setup involving one schema (what
MySQL calls a "database") for global data and one for each customer's.
After moving to Postgres, we solidified that as schemas in a database,
and since then have moved quite a bit of data into the global-data
schema - so we're in a hybrid of [1] and [3].

> What are the trade-offs in terms of:
>
> [1] enforcing security and access separation

If your customers aren't able to write SQL themselves, this is easy in
any scheme. Otherwise, [2] and [3] become far easier (but [1] is still
possible, it just requires some careful work with views).

> [2] administering the database and responding to inquiries like "please
> reset my company's data to the image of yesterday cause we messed up some
> tables" or "we are taking our business elsewhere, can we please have a dump
> of our data?" or "we would like a weekly DVD with our data".

Ouch. This is only possible, AT ALL, on condition that nobody's data
affects anybody else's. But assuming that, I'd recommend [2] or [3].

> [3] backup / restore and partitioning

Definitely [3], you can back the whole lot up easily.

> [4] potential for connection pooling at the Application Server.

Never done this, can't speak to it.

On the whole, I would advise option 3. You get good separation, and
you can do schema upgrades with short outages for each customer rather
than a long outage for everyone. Though this could work the other way
too - it might be more convenient to go for option 1, guaranteeing
that every customer's tables are following the same structure.

ChrisA

Re: strategies for segregating client data when using PostgreSQL in a web app

From
Rory Campbell-Lange
Date:
On 03/08/12, Menelaos PerdikeasSemantix (mperdikeas.semantix@gmail.com) wrote:
> I would like to know what are the best practices / common patterns (or
> pointers to such) for using PostgreSQL in the context of a "big" web
> application with substantial data per user.
...
> [1] use just one database and schema and logically segregate companies data
> by having all tables have a client_id column as part of their primary key.
> [2] use multiple database (in the same server instance) and only the public
> schema in each of them for the customer's data.
> [3] use one database and multiple schemas to separate the different
> customer's data.

We have about 75 clients with 10-50GB of data each, running two main
services. Early on we decided to give each client their own database.

This has the following advantages:
- backup per client is really easy
- moving databases between db servers is easy
- a sequential pg_dump over all the dbs in a server runs pretty quickly
- testing on a subset of dbs works well

Problems:
- the team needs to get really good at running upgrades across all dbs
- if you need to do cross-db work you need to start looking towards
  cross-database tools or messaging e.g. AMQP

Apart from the issues of dealing with cross-database queries (if you
require them) I firmly believe this is the way to go. If you are using
relatively small databases like us, you can scale when you need to, and
use commodity hardware instead of needing something exotic. Put this
together with the new streaming replication stuff, and you have a robust
and easily scalable solution.



Re: strategies for segregating client data when using PostgreSQL in a web app

From
"David Johnston"
Date:
From: pgsql-general-owner@postgresql> .org
[mailto:pgsql-general-owner@postgresql> .org] On Behalf Of Menelaos
PerdikeasSemantix
Sent: Friday, August 03, 2012 4:05 PM
To: pgsql-general@postgresql> .org
Subject: [GENERAL] strategies for segregating client data when using
PostgreSQL in a web app

I would like to know what are the best practices / common patterns (or
pointers to such) for using PostgreSQL in the context of a "big" web
application with substantial data per user> .

Namely, we are designing an ERP / accounting / business intelligence Web
Application where each client company will have its own substantial data> .
The application will be fronted by a JBoss Application Server with
PostgreSQL at the back> . We are targeting for a few thousand clients at the
maximum, and that after 2-3 years at the earliest> . I understand that there
could be several approaches> . In the following I am assuming we are running
only one PostgreSQL server instance (process) or perhaps a few (3-4) in a
cluster but I don't suppose that affects much the  options below> . So, I
see the following options:

[1] use just one database and schema and logically segregate companies data
by having all tables have a client_id column as part of their primary key> .
[2] use multiple database (in the same server instance) and only the public
schema in each of them for the customer's data> .
[3] use one database and multiple schemas to separate the different
customer's data> .

(the [2] and [3] in particular seem practically indistinguishable to me)> .

What are the trade-offs in terms of:

[1] enforcing security and access separation
[2] administering the database and responding to inquiries like "please
reset my company's data to the image of yesterday cause we messed up some
tables" or "we are taking our business elsewhere, can we please have a dump
of our data?" or "we would like a weekly DVD with our data"> .
[3] backup / restore and partitioning
[4] potential for connection pooling at the Application Server> .

Any information, even pointers for further study will be greatly
appreciated> .
-- Menelaos> .
============================================================================
============

One approach I have been considering is:

1) Primary Database for "global" information
2) Per-Client Databases for "local" information + Maintain a local cache of
whatever "global" information is needed
2a) Read-Only Client Database Slaves
3) Auxiliary Database to store client data that wants to be consolidated
with other clients

There are a lot of considerations and trade-offs that need to be evaluated.

Level of client access
Level of third-party access
Desirability of on-premise
Your resources and timeframe

Generally I would suggest identifying different types of data/owners and
keep their tables in separate schemas.  Additionally I would allow for the
possibility of multiple "clients" having data on the same physical tables.

Beyond that decide what kind of client/internal meta-data is going to be
necessary to keep to organize "modules".

David J.






Character validation with DB2 Fed Server and Optim

From
"Alexander Gataric"
Date:
I'm going to be accessing a Postgres database through DB2 Federated Server
and extracting/inserting with IBM Optim. I want to verify that Optim and DB2
are not corrupting text characters. This SQL should generate a sample set of
characters:

Select genseries, chr( genseries) from generate_series(1,2000) as genseries

After extracting with Optim and inserting into a new table, I can write a
simple query to compare the tables for differences. The Postgres database is
using UTF8 encoding. I'm not sure if the chr function is the best way to
generate a sample set of characters and I'm not if this is the correct
range.

I would appreciate any feedback on this approach and welcome suggestions on
another approach.

Thanks
Alex