Thread: Table partitioning for cloud service?

Table partitioning for cloud service?

From
Israel Brewster
Date:
I’m working on my first cloud service, which will be backed by a postgresql database. Currently I only have a single customer, but of course I want to design with the possibility of multiple customers in mind. In that vein, I’m wondering what is “typical” in terms of designing the DB structure to make sure that one customer doesn’t “accidentally" get data for another customer? At the moment I am leaning towards giving each customer their own set of tables, with a unique prefix for each. This would provide a “hard” separation for the data, but would also increase maintenance efforts, as if I needed to add a field I would have to add it to every table. On the other hand, keeping everything in the same set of tables would mean having to be VERY careful with my SQL to make sure no customer could access another’s data.

How is this typically done?

---
Israel Brewster
BrewsterSoft Development
Home of EZPunch and Lyrics Presenter



Re: Table partitioning for cloud service?

From
Adrian Klaver
Date:
On 5/21/20 8:29 AM, Israel Brewster wrote:
> I’m working on my first cloud service, which will be backed by a 
> postgresql database. Currently I only have a single customer, but of 
> course I want to design with the possibility of multiple customers in 
> mind. In that vein, I’m wondering what is “typical” in terms of 
> designing the DB structure to make sure that one customer doesn’t 
> “accidentally" get data for another customer? At the moment I am leaning 
> towards giving each customer their own set of tables, with a unique 
> prefix for each. This would provide a “hard” separation for the data, 
^^^^^^^^^^^^ I think that is called a schema:)
Or set up a separate database for each in the cluster.

> but would also increase maintenance efforts, as if I needed to add a 
> field I would have to add it to every table. On the other hand, keeping 
> everything in the same set of tables would mean having to be VERY 
> careful with my SQL to make sure no customer could access another’s data.
> 
> How is this typically done?
> 
> ---
> Israel Brewster
> BrewsterSoft Development
> http://www.brewstersoft.com <http://www.brewstersoft.com/>
> Home of EZPunch and Lyrics Presenter
> 
> 
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: Table partitioning for cloud service?

From
Israel Brewster
Date:
On May 21, 2020, at 7:36 AM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:

On 5/21/20 8:29 AM, Israel Brewster wrote:
I’m working on my first cloud service, which will be backed by a postgresql database. Currently I only have a single customer, but of course I want to design with the possibility of multiple customers in mind. In that vein, I’m wondering what is “typical” in terms of designing the DB structure to make sure that one customer doesn’t “accidentally" get data for another customer? At the moment I am leaning towards giving each customer their own set of tables, with a unique prefix for each. This would provide a “hard” separation for the data,
^^^^^^^^^^^^ I think that is called a schema:)

Ok. That’s probably an option. Although it looks from a cursory perusal that for that to work, I would have to have separate DB users with different permissions. Which would be fine, except that I don’t have permissions to create users.

Or set up a separate database for each in the cluster.

Same as above - no permissions.

At the moment, I am running on Heroku, which gives me a postgresql database, but not full control over the cluster. I may need to move to something more robust, if having completely separate databases is the best option. I was hoping to avoid SysAdmin stuff as much as possible, and focus on the software side, but obviously some sys admin is required.

---
Israel Brewster
BrewsterSoft Development
Home of EZPunch and Lyrics Presenter


but would also increase maintenance efforts, as if I needed to add a field I would have to add it to every table. On the other hand, keeping everything in the same set of tables would mean having to be VERY careful with my SQL to make sure no customer could access another’s data.
How is this typically done?
---
Israel Brewster
BrewsterSoft Development
http://www.brewstersoft.com <http://www.brewstersoft.com/>
Home of EZPunch and Lyrics Presenter


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: Table partitioning for cloud service?

From
Adrian Klaver
Date:
On 5/21/20 8:53 AM, Israel Brewster wrote:
>> On May 21, 2020, at 7:36 AM, Adrian Klaver <adrian.klaver@aklaver.com 
>> <mailto:adrian.klaver@aklaver.com>> wrote:
>>
>> On 5/21/20 8:29 AM, Israel Brewster wrote:
>>> I’m working on my first cloud service, which will be backed by a 
>>> postgresql database. Currently I only have a single customer, but of 
>>> course I want to design with the possibility of multiple customers in 
>>> mind. In that vein, I’m wondering what is “typical” in terms of 
>>> designing the DB structure to make sure that one customer doesn’t 
>>> “accidentally" get data for another customer? At the moment I am 
>>> leaning towards giving each customer their own set of tables, with a 
>>> unique prefix for each. This would provide a “hard” separation for 
>>> the data,
>> ^^^^^^^^^^^^ I think that is called a schema:)
> 
> Ok. That’s probably an option. Although it looks from a cursory perusal 
> that for that to work, I would have to have separate DB users with 
> different permissions. Which would be fine, except that I don’t have 
> permissions to create users.
> 
>> Or set up a separate database for each in the cluster.
> 
> Same as above - no permissions.
> 
> At the moment, I am running on Heroku, which gives me a postgresql 
> database, but not full control over the cluster. I may need to move to 
> something more robust, if having completely separate databases is the 
> best option. I was hoping to avoid SysAdmin stuff as much as possible, 
> and focus on the software side, but obviously some sys admin is required.

You can't use this?:

https://devcenter.heroku.com/articles/heroku-postgresql-credentials

> 
> ---
> Israel Brewster
> BrewsterSoft Development
> http://www.brewstersoft.com
> Home of EZPunch and Lyrics Presenter
> 
>>
>>> but would also increase maintenance efforts, as if I needed to add a 
>>> field I would have to add it to every table. On the other hand, 
>>> keeping everything in the same set of tables would mean having to be 
>>> VERY careful with my SQL to make sure no customer could access 
>>> another’s data.
>>> How is this typically done?
>>> ---
>>> Israel Brewster
>>> BrewsterSoft Development
>>> http://www.brewstersoft.com <http://www.brewstersoft.com/>
>>> Home of EZPunch and Lyrics Presenter
>>
>>
>> -- 
>> Adrian Klaver
>> adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
>>
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: Table partitioning for cloud service?

From
Christopher Browne
Date:
On Thu, 21 May 2020 at 11:53, Israel Brewster <israel@brewstersoft.com> wrote:
On May 21, 2020, at 7:36 AM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:

On 5/21/20 8:29 AM, Israel Brewster wrote:
I’m working on my first cloud service, which will be backed by a postgresql database. Currently I only have a single customer, but of course I want to design with the possibility of multiple customers in mind. In that vein, I’m wondering what is “typical” in terms of designing the DB structure to make sure that one customer doesn’t “accidentally" get data for another customer? At the moment I am leaning towards giving each customer their own set of tables, with a unique prefix for each. This would provide a “hard” separation for the data,
^^^^^^^^^^^^ I think that is called a schema:)

Ok. That’s probably an option. Although it looks from a cursory perusal that for that to work, I would have to have separate DB users with different permissions. Which would be fine, except that I don’t have permissions to create users.

Or set up a separate database for each in the cluster.

Same as above - no permissions.

At the moment, I am running on Heroku, which gives me a postgresql database, but not full control over the cluster. I may need to move to something more robust, if having completely separate databases is the best option. I was hoping to avoid SysAdmin stuff as much as possible, and focus on the software side, but obviously some sys admin is required.

There's a whole lot of "that depends" to this.

If there is not much data shared across customers, then it's a pretty good answer to create a database for each one.  This is especially good if they are only occasionally connected.

If there is a LOT of shared data, then "life gets more complicated." 

It's a decently well documented problem out there; I just searched for "multitenant database design" which showed up a number of decent (not-Postgres-specific) answers


Some common patterns include:
- A database per tenant (on Postgres, that means that PGDATABASE and/or connection URIs change for each tenant)
- A schema per tenant (on Postgres, that means each time a tenant is added, you need "CREATE NAMESPACE" to establish the tenancy and "CREATE TABLE" for each table in that tenancy, and connections use "set search_path=tenantname;" to select data from the right tenant)
- Table-based tenancy (e.g. - each table has a "tenant_id" and queries need to specify the tenant)

An interesting answer, if there needs to be shared data, is for the shared data to go in its own database, and use a Foreign Data Wrapper to have each tenants' database access it <https://www.postgresql.org/docs/12/postgres-fdw.html>

There are lots of tradeoffs involved in each case; each of the above patterns has merits and demerits particularly as the number of tenants scales, as well as when you discover there are both tiny and large tenants with differing requirements.

You need to look at it from various perspectives:
- How do application schema changes get handled as the application evolves?
- What are the security concerns about data sharing across tenants?
- What issues come up when managing storage across tenants?  (Some approaches are easier to cope with than others)

If you don't have a fair bit of technical expertise locally, then sophisticated choices will cause you problems that you won't be able to solve.
--
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"

Re: Table partitioning for cloud service?

From
Israel Brewster
Date:


On May 21, 2020, at 7:57 AM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:

On 5/21/20 8:53 AM, Israel Brewster wrote:
On May 21, 2020, at 7:36 AM, Adrian Klaver <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>> wrote:

On 5/21/20 8:29 AM, Israel Brewster wrote:
I’m working on my first cloud service, which will be backed by a postgresql database. Currently I only have a single customer, but of course I want to design with the possibility of multiple customers in mind. In that vein, I’m wondering what is “typical” in terms of designing the DB structure to make sure that one customer doesn’t “accidentally" get data for another customer? At the moment I am leaning towards giving each customer their own set of tables, with a unique prefix for each. This would provide a “hard” separation for the data,
^^^^^^^^^^^^ I think that is called a schema:)
Ok. That’s probably an option. Although it looks from a cursory perusal that for that to work, I would have to have separate DB users with different permissions. Which would be fine, except that I don’t have permissions to create users.
Or set up a separate database for each in the cluster.
Same as above - no permissions.
At the moment, I am running on Heroku, which gives me a postgresql database, but not full control over the cluster. I may need to move to something more robust, if having completely separate databases is the best option. I was hoping to avoid SysAdmin stuff as much as possible, and focus on the software side, but obviously some sys admin is required.

You can't use this?:

https://devcenter.heroku.com/articles/heroku-postgresql-credentials

Wasn’t aware of that. I *did* mention this is my first cloud project. Done plenty of DB/web/application development, but not cloud/multi-customer. Thanks for the pointer.


---
Israel Brewster
BrewsterSoft Development
http://www.brewstersoft.com
Home of EZPunch and Lyrics Presenter

but would also increase maintenance efforts, as if I needed to add a field I would have to add it to every table. On the other hand, keeping everything in the same set of tables would mean having to be VERY careful with my SQL to make sure no customer could access another’s data.
How is this typically done?
---
Israel Brewster
BrewsterSoft Development
http://www.brewstersoft.com <http://www.brewstersoft.com/>
Home of EZPunch and Lyrics Presenter


-- 
Adrian Klaver
adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>



-- 
Adrian Klaver
adrian.klaver@aklaver.com

Re: Table partitioning for cloud service?

From
Adam Brusselback
Date:
>  An interesting answer, if there needs to be shared data, is for the shared data to go in its own database, and use a Foreign Data Wrapper to have each tenants' database access it <https://www.postgresql.org/docs/12/postgres-fdw.html>  

For my application I went the schema-per-tenant route, but I have a need to have a single login which will work for all tenants you've been given access to. Not all tenants are required to be on the same database host, so I broke that piece out into it's own database and used postgres fdw to make it seem local to each tenant.

So i've got first hand experience with this for the past ~5 years, but this approach has serious tradeoffs. Queries that need to access the remote table can just fall on their face sometimes.  You will also need to deal with practically every connection spawning 1-or-more new connections which will stay open taking resources the first time a query is issued that accesses foreign data.

As an optimization I just worked on for my database earlier this week, I decided to logically replicate that table from my main authentication database into a each cluster, and I replaced all references to the FDW for read-only queries to use the logically replicated table. All write queries still hit the FDW as before.

This was acceptable for my use case, and drastically improved performance for some queries where I had previously had to use a CTE to force materialization to get acceptable performance due to the nature of going over the FDW for that data.

It's a very cool tool, just be careful about how it can impact performance if you don't measure for your specific use case.

Re: Table partitioning for cloud service?

From
Michael Lewis
Date:
On Thu, May 21, 2020 at 11:41 AM Adam Brusselback <adambrusselback@gmail.com> wrote:
As an optimization I just worked on for my database earlier this week, I decided to logically replicate that table from my main authentication database into a each cluster, and I replaced all references to the FDW for read-only queries to use the logically replicated table. All write queries still hit the FDW as before.

Perhaps you considered this, but if you had not wanted to deal with the administration side with replication, and the centralized data is changed infrequently from only one application/source perhaps, then updating the source and then refreshing a materialized view on each local db that pulls in the foreign data could be a good option. Some chance of stale data since the refresh must be triggered, but for some use cases it may be the simplest setup.

Re: Table partitioning for cloud service?

From
Sándor Daku
Date:
Hi,

On Thu, 21 May 2020 at 18:14, Christopher Browne <cbbrowne@gmail.com> wrote:
On Thu, 21 May 2020 at 11:53, Israel Brewster <israel@brewstersoft.com> wrote:

- Table-based tenancy (e.g. - each table has a "tenant_id" and queries need to specify the tenant)

The database/schema per tenant solution can be tedious when you want to modify something on the structure and you have numerous tenants.
Therefore I used the "tables with tenant_id" version in a similar situation but with a slight twist. One of the biggest issue of this solution is that if you forget to add the tenant_id to the where clause you are going to reveal one tenant's data to another.
I came up with the solution that the database user have no privileges for accessing the base tables. Instead of that I generate views for each tenant and they can access their own data in the underlying table through these views. Now if forget to address the right tenant in my client code(it still happens sometimes) and try to directly access the base tables I get a strongly worded reminder from the server.


Regards,
Sándor    

      

 

Re: Table partitioning for cloud service?

From
Israel Brewster
Date:
On May 21, 2020, at 12:12 PM, Sándor Daku <daku.sandor@gmail.com> wrote:
Hi,

On Thu, 21 May 2020 at 18:14, Christopher Browne <cbbrowne@gmail.com> wrote:
On Thu, 21 May 2020 at 11:53, Israel Brewster <israel@brewstersoft.com> wrote:

- Table-based tenancy (e.g. - each table has a "tenant_id" and queries need to specify the tenant)

The database/schema per tenant solution can be tedious when you want to modify something on the structure and you have numerous tenants.
Therefore I used the "tables with tenant_id" version in a similar situation but with a slight twist. One of the biggest issue of this solution is that if you forget to add the tenant_id to the where clause you are going to reveal one tenant's data to another.
I came up with the solution that the database user have no privileges for accessing the base tables. Instead of that I generate views for each tenant and they can access their own data in the underlying table through these views. Now if forget to address the right tenant in my client code(it still happens sometimes) and try to directly access the base tables I get a strongly worded reminder from the server.

Nice solution! I think I may go to something like that once I upgrade to a cloud solution that lets me add multiple users to the DB (the free tier of Heroku does not). In the meantime, while I just have the single customer, I can fake it easily enough.

Is there any shortcuts for referencing the proper views, or do you just append/prepend something to every table reference in your SQL? One nice thing about the database/schema approach is that I can just specify the search_path (or database) in the connection command, and then all the table references remain the same for all tenants. Also helps avoid the situation you mentioned where you forget to address the right tenant, since you only have to do it in one place. Of course, as you said, it can be tedious when you want to modify the structure. I’ll have to think about that a bit more.

---
Israel Brewster
BrewsterSoft Development
Home of EZPunch and Lyrics Presenter


Regards,
Sándor    

      

 

Re: Table partitioning for cloud service?

From
Ravi Krishna
Date:
>
> The database/schema per tenant solution can be tedious when you want to modify something on the structure and you
havenumerous tenants. 
> Therefore I used the "tables with tenant_id" version in a similar situation but with a slight twist. One of the
biggestissue of this solution is that if you forget to add the tenant_id to the where clause you are going to reveal
onetenant's data to another. 
> I came up with the solution that the database user have no privileges for accessing the base tables. Instead of that
Igenerate views for each tenant and they can access their own data in the underlying table through these views. Now if
forgetto address the right tenant in my client code(it still happens sometimes) and try to directly access the base
tablesI get a strongly worded reminder from the server. 

1. If you have 50 tables and say 100 tenants, we are talking about 5,000 views. I am not sure
    whether it is any more elegant than having 100 schemas.
2. In your approach you can do any phased DDL upgrade.  It is typical to do rolling upgrades
    in a multi tenant databases, starting with least risky tenant.




Re: Table partitioning for cloud service?

From
Adrian Klaver
Date:
On 5/21/20 1:23 PM, Israel Brewster wrote:
> On May 21, 2020, at 12:12 PM, Sándor Daku <daku.sandor@gmail.com 
> <mailto:daku.sandor@gmail.com>> wrote:
>> Hi,
>>
>> On Thu, 21 May 2020 at 18:14, Christopher Browne <cbbrowne@gmail.com 
>> <mailto:cbbrowne@gmail.com>> wrote:
>>
>>     On Thu, 21 May 2020 at 11:53, Israel Brewster
>>     <israel@brewstersoft.com <mailto:israel@brewstersoft.com>> wrote:
>>
>>>
>>     - Table-based tenancy (e.g. - each table has a "tenant_id" and
>>     queries need to specify the tenant)
>>
>>
>> The database/schema per tenant solution can be tedious when you want 
>> to modify something on the structure and you have numerous tenants.
>> Therefore I used the "tables with tenant_id" version in a similar 
>> situation but with a slight twist. One of the biggest issue of this 
>> solution is that if you forget to add the tenant_id to the where 
>> clause you are going to reveal one tenant's data to another.
>> I came up with the solution that the database user have no privileges 
>> for accessing the base tables. Instead of that I generate views for 
>> each tenant and they can access their own data in the underlying table 
>> through these views. Now if forget to address the right tenant in my 
>> client code(it still happens sometimes) and try to directly access the 
>> base tables I get a strongly worded reminder from the server.
> 
> Nice solution! I think I may go to something like that once I upgrade to 
> a cloud solution that lets me add multiple users to the DB (the free 
> tier of Heroku does not). In the meantime, while I just have the single 
> customer, I can fake it easily enough.
> 
> Is there any shortcuts for referencing the proper views, or do you just 
> append/prepend something to every table reference in your SQL? One nice 
> thing about the database/schema approach is that I can just specify the 
> search_path (or database) in the connection command, and then all the 
> table references remain the same for all tenants. Also helps avoid the 
> situation you mentioned where you forget to address the right tenant, 
> since you only have to do it in one place. Of course, as you said, it 
> can be tedious when you want to modify the structure. I’ll have to think 
> about that a bit more.

If you want to take the tedium out of it take a look at Sqitch:

https://sqitch.org/

Then all you have to do is create the change once and deploy to the 
targets.

> 
> ---
> Israel Brewster
> BrewsterSoft Development
> http://www.brewstersoft.com <http://www.brewstersoft.com/>
> Home of EZPunch and Lyrics Presenter
>>
>>
>> Regards,
>> Sándor
>>
>>
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: Table partitioning for cloud service?

From
Edson Richter
Date:


------ Mensagem original------
De: Ravi Krishna
Data: qui, 21 de mai de 2020 18:50
Para:
Cc:PostgreSQL Mailing Lists;
Assunto:Re: Table partitioning for cloud service?

> 
> The database/schema per tenant solution can be tedious when you want to modify something on the structure and you have numerous tenants.
> Therefore I used the "tables with tenant_id" version in a similar situation but with a slight twist. One of the biggest issue of this solution is that if you forget to add the tenant_id to the where clause you are going to reveal one tenant's data to another.
> I came up with the solution that the database user have no privileges for accessing the base tables. Instead of that I generate views for each tenant and they can access their own data in the underlying table through these views. Now if forget to address the right tenant in my client code(it still happens sometimes) and try to directly access the base tables I get a strongly worded reminder from the server.

1. If you have 50 tables and say 100 tenants, we are talking about 5,000 views. I am not sure    whether it is any more elegant than having 100 schemas.
2. In your approach you can do any phased DDL upgrade.  It is typical to do rolling upgrades   in a multi tenant databases, starting with least risky tenant.


Be carefull to plan your backup/recovery strategy. How do you plan to recover one customer from backup without interfering with the others?
What will be your disaster recover strategy? Have you considered replication?
Large databases take longer to backup.
Keep schemas in sync can be easily solved.

Regards,

Edson