I would like to know what are the best practices / common patterns (or pointers to such) for using PostgreSQL in the context of a "big" web application with substantial data per user.
Namely, we are designing an ERP / accounting / business intelligence Web Application where each client company will have its own substantial data. The application will be fronted by a JBoss Application Server with PostgreSQL at the back. We are targeting for a few thousand clients at the maximum, and that after 2-3 years at the earliest. I understand that there could be several approaches. In the following I am assuming we are running only one PostgreSQL server instance (process) or perhaps a few (3-4) in a cluster but I don't suppose that affects much the options below. So, I see the following options:
[1] use just one database and schema and logically segregate companies data by having all tables have a client_id column as part of their primary key.
[2] use multiple database (in the same server instance) and only the public schema in each of them for the customer's data.
[3] use one database and multiple schemas to separate the different customer's data.
(the [2] and [3] in particular seem practically indistinguishable to me).
What are the trade-offs in terms of:
[1] enforcing security and access separation
[2] administering the database and responding to inquiries like "please reset my company's data to the image of yesterday cause we messed up some tables" or "we are taking our business elsewhere, can we please have a dump of our data?" or "we would like a weekly DVD with our data".
[3] backup / restore and partitioning
[4] potential for connection pooling at the Application Server.
Any information, even pointers for further study will be greatly appreciated.
-- Menelaos.