Re: Table partitioning for cloud service? - Mailing list pgsql-general
From | Christopher Browne |
---|---|
Subject | Re: Table partitioning for cloud service? |
Date | |
Msg-id | CAFNqd5XP0rZQoV4s-shdgunjhUur1pxc2HwJFM=4USAguMNpQw@mail.gmail.com Whole thread Raw |
In response to | Re: Table partitioning for cloud service? (Israel Brewster <israel@brewstersoft.com>) |
Responses |
Re: Table partitioning for cloud service?
Re: Table partitioning for cloud service? |
List | pgsql-general |
On Thu, 21 May 2020 at 11:53, Israel Brewster <israel@brewstersoft.com> wrote:
On May 21, 2020, at 7:36 AM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:On 5/21/20 8:29 AM, Israel Brewster wrote:I’m working on my first cloud service, which will be backed by a postgresql database. Currently I only have a single customer, but of course I want to design with the possibility of multiple customers in mind. In that vein, I’m wondering what is “typical” in terms of designing the DB structure to make sure that one customer doesn’t “accidentally" get data for another customer? At the moment I am leaning towards giving each customer their own set of tables, with a unique prefix for each. This would provide a “hard” separation for the data,^^^^^^^^^^^^ I think that is called a schema:)Ok. That’s probably an option. Although it looks from a cursory perusal that for that to work, I would have to have separate DB users with different permissions. Which would be fine, except that I don’t have permissions to create users.Or set up a separate database for each in the cluster.Same as above - no permissions.At the moment, I am running on Heroku, which gives me a postgresql database, but not full control over the cluster. I may need to move to something more robust, if having completely separate databases is the best option. I was hoping to avoid SysAdmin stuff as much as possible, and focus on the software side, but obviously some sys admin is required.
There's a whole lot of "that depends" to this.
If there is not much data shared across customers, then it's a pretty good answer to create a database for each one. This is especially good if they are only occasionally connected.
If there is a LOT of shared data, then "life gets more complicated."
It's a decently well documented problem out there; I just searched for "multitenant database design" which showed up a number of decent (not-Postgres-specific) answers
Some common patterns include:
- A database per tenant (on Postgres, that means that PGDATABASE and/or connection URIs change for each tenant)
- A schema per tenant (on Postgres, that means each time a tenant is added, you need "CREATE NAMESPACE" to establish the tenancy and "CREATE TABLE" for each table in that tenancy, and connections use "set search_path=tenantname;" to select data from the right tenant)
- Table-based tenancy (e.g. - each table has a "tenant_id" and queries need to specify the tenant)
An interesting answer, if there needs to be shared data, is for the shared data to go in its own database, and use a Foreign Data Wrapper to have each tenants' database access it <https://www.postgresql.org/docs/12/postgres-fdw.html>
There are lots of tradeoffs involved in each case; each of the above patterns has merits and demerits particularly as the number of tenants scales, as well as when you discover there are both tiny and large tenants with differing requirements.
You need to look at it from various perspectives:
- How do application schema changes get handled as the application evolves?
- What are the security concerns about data sharing across tenants?
- What issues come up when managing storage across tenants? (Some approaches are easier to cope with than others)
If you don't have a fair bit of technical expertise locally, then sophisticated choices will cause you problems that you won't be able to solve.
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"
question, "How would the Lone Ranger handle this?"
pgsql-general by date: