Thread: Proposal for disk quota feature

Proposal for disk quota feature

From
Hubert Zhang
Date:
Hi all,
We want to introduce disk quota feature into Postgres.

Why disk quota
In a multi-tenant environment, there is a requirement to limit the disk quota that database/schema/table can be written or a user can consume for different organizations.
Meanwhile, other databases such as Oracle, Teradata, DB2 have already supported disk quota feature.

Heikki has already implemented disk quota feature in Postgres as an extension pg_quota. We plan to enhance disk quota feature based on Heikki's implementation.

Scope

The scope of disk quota feature is to limit the disk usage of database/schema/table objects and users.

Here table means heap table, ao table, index table, partition table and associated table( toast table, visible table, large object etc.). Schema disk quota is the disk quota of all the tables in this schema. Database disk quota is the disk quota of all the tables in this database.

User's quota is the size of all the tables whose owner are this user.Out of Scope: Note that spill files, xlogs, clogs and logs are not considered for database object level disk quota at this stage.



Design
We propose disk quota with the following components:

1. Quota Setting Store is where the disk quota setting to be stored and accessed. DBA or object owner uses SQL queries to configure the disk quota for each database objects.


2. Quota Change Detector is the monitor of size change of database objects in Postgres. It will write change information to shared memory to notify Quota Size Checker. The implementation of Change Detector could be hooks in smgr_extend/smgr_unlink/smgr_truncate when modifying the size of a heap table. The hooks will write to shared memory in a batched way to reduce the impact on OLTP performance.

3. Quota Size Checker is implemented as a worker process. It maintains the current disk usage for each database objects and users, and compare them with settings in Quota Setting Store. If it detects the disk usage hit the quota redzone(either upon or below), it will notify Quota Enforcement Operator.


4. Quota Enforcement Operator has two roles: one is to check the disk quota permission before queries are executed(QueryBeforeRun Check), the other is to cancel the running queries when it reaches the disk quota limit dynamically(QueryRunning Check). Quota Enforcement Operator uses shared memory to store the enforcement information to guarantee a quick check.



To implement the right proof of concept, we want to receive feedback from the community from the following aspects:
Q1. User Interface: when setting a disk quota, 
    Option 1 is to use insert into quota.config(user1, 10G)
    Option 2 is to use UDF select set_quota("role","user1",10G)
    Option 3 is to use native SQL syntax create disk quota on ROLE user1 10G, or create disk quota on SCHEMA s1 25G;
Q2. Quota Setting Store using user table or catalog?
    Option 1 is to create a schema called quota for each database and write quota settings into quota.config table, only DBA could modify it. This corresponds to Q1.option1
    Option 2 is to store quota settings into the catalog. For Schema and Table write them to database level catalog. For database or user, write them to either database level or global catalog.  

I personally prefer Q1's option3 and Q2's option2, since it makes disk quota more like a native feature. We could support the quota worker process implementation as an extension for now, but in the long term, disk quota is like a fundamental feature of a database and should be a native feature just like other databases. So store quota conf into catalog and supply native syntax is better.

Open Problem
We prepare to implement Quota Size Checker as a worker process. Worker process needs to connect to a database to build the disk usage map and quota map(e.g. a user/shcema's disk usage on a given database) But one worker process can only bind to one database(InitPostgres(dbname)) at the initialization stage. It results in we need a separate worker process for each database. The solution to this problem is not straightforward, here are some ideas:
1 To make worker process could retrieve and cache information from all the databases.  As Tom Lane pointed out that it needs to flush all the database specific thing, like relcache syscache etc.
2 Keep one worker process for each database. But using a parent/global quota worker process to manage the lifecycle of database level worker processes. It could handle the newly created database(avoid restart database) and save resource when a database is not used. But this needs to change worker process to be hierarchical. Postmaster becomes the grandfather of database level worker processes in this case.

Any better ideas on it?



--
Thanks

Hubert Zhang

Re: Proposal for disk quota feature

From
Chapman Flack
Date:
On 08/30/2018 09:57 AM, Hubert Zhang wrote:

> 2 Keep one worker process for each database. But using a parent/global
> quota worker process to manage the lifecycle of database level worker
> processes. It could handle the newly created database(avoid restart
> database) and save resource when a database is not used. But this needs to
> change worker process to be hierarchical. Postmaster becomes the grandfather
>  of database level worker processes in this case.

I am using background workers this way in 9.5 at $work.

In my case, one worker lives forever, wakes up on a set period, and
starts a short-lived worker for every database, waiting for each
one before starting the next.

It was straightforward to implement. Looking back over the code,
I see the global worker assigns its own PID to worker.bgw_notify_pid
of each of its children, and also obtains a handle for each child
from RegisterDynamicBackgroundWorker().

I imagine the global quota worker would prefer to start workers
for every database and then just wait for notifications from any
of them, but that seems equally straightforward at first glance.

-Chap


Re: Proposal for disk quota feature

From
Pavel Stehule
Date:


2018-08-30 16:22 GMT+02:00 Chapman Flack <chap@anastigmatix.net>:
On 08/30/2018 09:57 AM, Hubert Zhang wrote:

> 2 Keep one worker process for each database. But using a parent/global
> quota worker process to manage the lifecycle of database level worker
> processes. It could handle the newly created database(avoid restart
> database) and save resource when a database is not used. But this needs to
> change worker process to be hierarchical. Postmaster becomes the grandfather
>  of database level worker processes in this case.

I am using background workers this way in 9.5 at $work.

In my case, one worker lives forever, wakes up on a set period, and
starts a short-lived worker for every database, waiting for each
one before starting the next.

It was straightforward to implement. Looking back over the code,
I see the global worker assigns its own PID to worker.bgw_notify_pid
of each of its children, and also obtains a handle for each child
from RegisterDynamicBackgroundWorker().

I imagine the global quota worker would prefer to start workers
for every database and then just wait for notifications from any
of them, but that seems equally straightforward at first glance.

There are servers with thousands databases. Worker per database is not good idea.

It should to share ideas, code with autovacuum process.

Not sure, how to effective implementation based on bg workers can be. On servers with large set of databases, large set of tables it can identify too big table too late.

Isn't better to implement some quotas on storage level?

Regards

Pavel



-Chap


Re: Proposal for disk quota feature

From
Hubert Zhang
Date:
Thanks Chapman.
@Pavel,  could you please explain more about your second suggestion "implement some quotas on storage level?"
We will not keep the long-lived processes attach to all databases(just like you mentioned servers with thousands of databases)  
And you are right, we could share ideas with autovacuum process, fork worker processes in need. 
"autovacuum checks for tables that have had a large number of inserted, updated or deleted tuples. These checks use the statistics collection facility"
diskquota process is similar to autovacuum at caring about insert, but the difference is that it also care about vucuum full, truncate and drop. While update and delete may not be interested since no file change happens. So a separate diskquota process is preferred.

So if we implemented disk quota as a full native feature, and in the first initial version I prefer to implement the following features:
1 Fork diskquota launcher process under Postmaster serverloop, which is long-lived.
2 Diskquota launcher process is responsible for creating diskquota worker process for every database.
3 DIskquota setting is stored in a separate catalog table for each database.
4 Initialization stage, Diskquota launcher process creates diskquota worker process for all the databases(traverse like autovacuum). Worker process calculates disk usage of db objects and their diskquota setting. If any db object exceeds its quota limit, put them into the blacklist in the shared memory, which will later be used by enforcement operator. Worker process exits when works are done.
5 Running stage, Diskquota launcher process creates diskquota worker process for the database with a large number of insert, copy, truncate, drop etc. or create disk quota statement. Worker process updates the file size for db objects containing the result relation, and compare with the diskquota setting. Again, if exceeds quota limit, put them into blacklist, remove from blacklist vice versa. Worker process exits when works are done and a GUC could control the frequency of worker process restart to a specific database. As you know, this GUC also controls the delay when we do enforcement.
6 Enforcement. When postgres backend executes queries, check the blacklist in shared memory to determine whether the query is allowed(before execute) or need rollback(is executing)?

If we implemented disk quota as an extension, we could just use background worker to start diskquota launcher process and use RegisterDynamicBackgroundWorker() to fork child diskquota worker processes by the launcher process as suggested by @Chapman. Diskquota setting could be stored in user table in a separate schema for each database(Schema and table created by create extension statement) just like what Heikki has done in pg_quota project. But in this case, we need to create extension for each database before diskquota worker process can be set up for that database.

Any comments on the above design and which is preferred, native feature or extension as the POC? 


-- Hubert



On Fri, Aug 31, 2018 at 3:32 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:


2018-08-30 16:22 GMT+02:00 Chapman Flack <chap@anastigmatix.net>:
On 08/30/2018 09:57 AM, Hubert Zhang wrote:

> 2 Keep one worker process for each database. But using a parent/global
> quota worker process to manage the lifecycle of database level worker
> processes. It could handle the newly created database(avoid restart
> database) and save resource when a database is not used. But this needs to
> change worker process to be hierarchical. Postmaster becomes the grandfather
>  of database level worker processes in this case.

I am using background workers this way in 9.5 at $work.

In my case, one worker lives forever, wakes up on a set period, and
starts a short-lived worker for every database, waiting for each
one before starting the next.

It was straightforward to implement. Looking back over the code,
I see the global worker assigns its own PID to worker.bgw_notify_pid
of each of its children, and also obtains a handle for each child
from RegisterDynamicBackgroundWorker().

I imagine the global quota worker would prefer to start workers
for every database and then just wait for notifications from any
of them, but that seems equally straightforward at first glance.

There are servers with thousands databases. Worker per database is not good idea.

It should to share ideas, code with autovacuum process.

Not sure, how to effective implementation based on bg workers can be. On servers with large set of databases, large set of tables it can identify too big table too late.

Isn't better to implement some quotas on storage level?

Regards

Pavel



-Chap





--
Thanks

Hubert Zhang

Re: Proposal for disk quota feature

From
Pavel Stehule
Date:
Hi

2018-09-02 14:18 GMT+02:00 Hubert Zhang <hzhang@pivotal.io>:
Thanks Chapman.
@Pavel,  could you please explain more about your second suggestion "implement some quotas on storage level?"

See attached patch - it is very simple - and good enough for our purposes.

Regards

Pavel

 
We will not keep the long-lived processes attach to all databases(just like you mentioned servers with thousands of databases)  
And you are right, we could share ideas with autovacuum process, fork worker processes in need. 
"autovacuum checks for tables that have had a large number of inserted, updated or deleted tuples. These checks use the statistics collection facility"
diskquota process is similar to autovacuum at caring about insert, but the difference is that it also care about vucuum full, truncate and drop. While update and delete may not be interested since no file change happens. So a separate diskquota process is preferred.

So if we implemented disk quota as a full native feature, and in the first initial version I prefer to implement the following features:
1 Fork diskquota launcher process under Postmaster serverloop, which is long-lived.
2 Diskquota launcher process is responsible for creating diskquota worker process for every database.
3 DIskquota setting is stored in a separate catalog table for each database.
4 Initialization stage, Diskquota launcher process creates diskquota worker process for all the databases(traverse like autovacuum). Worker process calculates disk usage of db objects and their diskquota setting. If any db object exceeds its quota limit, put them into the blacklist in the shared memory, which will later be used by enforcement operator. Worker process exits when works are done.
5 Running stage, Diskquota launcher process creates diskquota worker process for the database with a large number of insert, copy, truncate, drop etc. or create disk quota statement. Worker process updates the file size for db objects containing the result relation, and compare with the diskquota setting. Again, if exceeds quota limit, put them into blacklist, remove from blacklist vice versa. Worker process exits when works are done and a GUC could control the frequency of worker process restart to a specific database. As you know, this GUC also controls the delay when we do enforcement.
6 Enforcement. When postgres backend executes queries, check the blacklist in shared memory to determine whether the query is allowed(before execute) or need rollback(is executing)?

If we implemented disk quota as an extension, we could just use background worker to start diskquota launcher process and use RegisterDynamicBackgroundWorker() to fork child diskquota worker processes by the launcher process as suggested by @Chapman. Diskquota setting could be stored in user table in a separate schema for each database(Schema and table created by create extension statement) just like what Heikki has done in pg_quota project. But in this case, we need to create extension for each database before diskquota worker process can be set up for that database.

Any comments on the above design and which is preferred, native feature or extension as the POC? 


-- Hubert



On Fri, Aug 31, 2018 at 3:32 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:


2018-08-30 16:22 GMT+02:00 Chapman Flack <chap@anastigmatix.net>:
On 08/30/2018 09:57 AM, Hubert Zhang wrote:

> 2 Keep one worker process for each database. But using a parent/global
> quota worker process to manage the lifecycle of database level worker
> processes. It could handle the newly created database(avoid restart
> database) and save resource when a database is not used. But this needs to
> change worker process to be hierarchical. Postmaster becomes the grandfather
>  of database level worker processes in this case.

I am using background workers this way in 9.5 at $work.

In my case, one worker lives forever, wakes up on a set period, and
starts a short-lived worker for every database, waiting for each
one before starting the next.

It was straightforward to implement. Looking back over the code,
I see the global worker assigns its own PID to worker.bgw_notify_pid
of each of its children, and also obtains a handle for each child
from RegisterDynamicBackgroundWorker().

I imagine the global quota worker would prefer to start workers
for every database and then just wait for notifications from any
of them, but that seems equally straightforward at first glance.

There are servers with thousands databases. Worker per database is not good idea.

It should to share ideas, code with autovacuum process.

Not sure, how to effective implementation based on bg workers can be. On servers with large set of databases, large set of tables it can identify too big table too late.

Isn't better to implement some quotas on storage level?

Regards

Pavel



-Chap





--
Thanks

Hubert Zhang

Attachment

Re: Proposal for disk quota feature

From
Hubert Zhang
Date:
Thanks Pavel.
Your patch did enforcement on storage level(md.c or we could also use smgr_extend). It's straight forward.
But I prefer to implement disk_quota as a feature with following objectives:
1 set/alter disk quota setting on different database objects, e.g. user, database, schema etc. not only a general GUC, but we could set separate quota limit for a specific objects. 
2 enforcement operator should work at two positions: before query is running and when query is running. The latter one's implementation maybe similar to your patch.

On Sun, Sep 2, 2018 at 8:44 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
Hi

2018-09-02 14:18 GMT+02:00 Hubert Zhang <hzhang@pivotal.io>:
Thanks Chapman.
@Pavel,  could you please explain more about your second suggestion "implement some quotas on storage level?"

See attached patch - it is very simple - and good enough for our purposes.

Regards

Pavel

 
We will not keep the long-lived processes attach to all databases(just like you mentioned servers with thousands of databases)  
And you are right, we could share ideas with autovacuum process, fork worker processes in need. 
"autovacuum checks for tables that have had a large number of inserted, updated or deleted tuples. These checks use the statistics collection facility"
diskquota process is similar to autovacuum at caring about insert, but the difference is that it also care about vucuum full, truncate and drop. While update and delete may not be interested since no file change happens. So a separate diskquota process is preferred.

So if we implemented disk quota as a full native feature, and in the first initial version I prefer to implement the following features:
1 Fork diskquota launcher process under Postmaster serverloop, which is long-lived.
2 Diskquota launcher process is responsible for creating diskquota worker process for every database.
3 DIskquota setting is stored in a separate catalog table for each database.
4 Initialization stage, Diskquota launcher process creates diskquota worker process for all the databases(traverse like autovacuum). Worker process calculates disk usage of db objects and their diskquota setting. If any db object exceeds its quota limit, put them into the blacklist in the shared memory, which will later be used by enforcement operator. Worker process exits when works are done.
5 Running stage, Diskquota launcher process creates diskquota worker process for the database with a large number of insert, copy, truncate, drop etc. or create disk quota statement. Worker process updates the file size for db objects containing the result relation, and compare with the diskquota setting. Again, if exceeds quota limit, put them into blacklist, remove from blacklist vice versa. Worker process exits when works are done and a GUC could control the frequency of worker process restart to a specific database. As you know, this GUC also controls the delay when we do enforcement.
6 Enforcement. When postgres backend executes queries, check the blacklist in shared memory to determine whether the query is allowed(before execute) or need rollback(is executing)?

If we implemented disk quota as an extension, we could just use background worker to start diskquota launcher process and use RegisterDynamicBackgroundWorker() to fork child diskquota worker processes by the launcher process as suggested by @Chapman. Diskquota setting could be stored in user table in a separate schema for each database(Schema and table created by create extension statement) just like what Heikki has done in pg_quota project. But in this case, we need to create extension for each database before diskquota worker process can be set up for that database.

Any comments on the above design and which is preferred, native feature or extension as the POC? 


-- Hubert



On Fri, Aug 31, 2018 at 3:32 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:


2018-08-30 16:22 GMT+02:00 Chapman Flack <chap@anastigmatix.net>:
On 08/30/2018 09:57 AM, Hubert Zhang wrote:

> 2 Keep one worker process for each database. But using a parent/global
> quota worker process to manage the lifecycle of database level worker
> processes. It could handle the newly created database(avoid restart
> database) and save resource when a database is not used. But this needs to
> change worker process to be hierarchical. Postmaster becomes the grandfather
>  of database level worker processes in this case.

I am using background workers this way in 9.5 at $work.

In my case, one worker lives forever, wakes up on a set period, and
starts a short-lived worker for every database, waiting for each
one before starting the next.

It was straightforward to implement. Looking back over the code,
I see the global worker assigns its own PID to worker.bgw_notify_pid
of each of its children, and also obtains a handle for each child
from RegisterDynamicBackgroundWorker().

I imagine the global quota worker would prefer to start workers
for every database and then just wait for notifications from any
of them, but that seems equally straightforward at first glance.

There are servers with thousands databases. Worker per database is not good idea.

It should to share ideas, code with autovacuum process.

Not sure, how to effective implementation based on bg workers can be. On servers with large set of databases, large set of tables it can identify too big table too late.

Isn't better to implement some quotas on storage level?

Regards

Pavel



-Chap





--
Thanks

Hubert Zhang




--
Thanks

Hubert Zhang

Re: Proposal for disk quota feature

From
Pavel Stehule
Date:


2018-09-03 3:49 GMT+02:00 Hubert Zhang <hzhang@pivotal.io>:
Thanks Pavel.
Your patch did enforcement on storage level(md.c or we could also use smgr_extend). It's straight forward.
But I prefer to implement disk_quota as a feature with following objectives:
1 set/alter disk quota setting on different database objects, e.g. user, database, schema etc. not only a general GUC, but we could set separate quota limit for a specific objects. 
2 enforcement operator should work at two positions: before query is running and when query is running. The latter one's implementation maybe similar to your patch.

The patch was just example. The resource quotes should be more complex - per partition, table, schema, database, user - so GUC are possible, but not very user friendly.

Our case is specific, but not too much. The servers are used for multidimensional analyses - and some tables can grow too fast (COPY, INSERT SELECT). We need to solve limits immediately. The implementation is simple, so I did it. Same implementation on database level, or schema level needs some more locks, so it will not be too effective. The resource management can be complex very complex, and I expect so it will be hard work.

Regards

Pavel


On Sun, Sep 2, 2018 at 8:44 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
Hi

2018-09-02 14:18 GMT+02:00 Hubert Zhang <hzhang@pivotal.io>:
Thanks Chapman.
@Pavel,  could you please explain more about your second suggestion "implement some quotas on storage level?"

See attached patch - it is very simple - and good enough for our purposes.

Regards

Pavel

 
We will not keep the long-lived processes attach to all databases(just like you mentioned servers with thousands of databases)  
And you are right, we could share ideas with autovacuum process, fork worker processes in need. 
"autovacuum checks for tables that have had a large number of inserted, updated or deleted tuples. These checks use the statistics collection facility"
diskquota process is similar to autovacuum at caring about insert, but the difference is that it also care about vucuum full, truncate and drop. While update and delete may not be interested since no file change happens. So a separate diskquota process is preferred.

So if we implemented disk quota as a full native feature, and in the first initial version I prefer to implement the following features:
1 Fork diskquota launcher process under Postmaster serverloop, which is long-lived.
2 Diskquota launcher process is responsible for creating diskquota worker process for every database.
3 DIskquota setting is stored in a separate catalog table for each database.
4 Initialization stage, Diskquota launcher process creates diskquota worker process for all the databases(traverse like autovacuum). Worker process calculates disk usage of db objects and their diskquota setting. If any db object exceeds its quota limit, put them into the blacklist in the shared memory, which will later be used by enforcement operator. Worker process exits when works are done.
5 Running stage, Diskquota launcher process creates diskquota worker process for the database with a large number of insert, copy, truncate, drop etc. or create disk quota statement. Worker process updates the file size for db objects containing the result relation, and compare with the diskquota setting. Again, if exceeds quota limit, put them into blacklist, remove from blacklist vice versa. Worker process exits when works are done and a GUC could control the frequency of worker process restart to a specific database. As you know, this GUC also controls the delay when we do enforcement.
6 Enforcement. When postgres backend executes queries, check the blacklist in shared memory to determine whether the query is allowed(before execute) or need rollback(is executing)?

If we implemented disk quota as an extension, we could just use background worker to start diskquota launcher process and use RegisterDynamicBackgroundWorker() to fork child diskquota worker processes by the launcher process as suggested by @Chapman. Diskquota setting could be stored in user table in a separate schema for each database(Schema and table created by create extension statement) just like what Heikki has done in pg_quota project. But in this case, we need to create extension for each database before diskquota worker process can be set up for that database.

Any comments on the above design and which is preferred, native feature or extension as the POC? 


-- Hubert



On Fri, Aug 31, 2018 at 3:32 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:


2018-08-30 16:22 GMT+02:00 Chapman Flack <chap@anastigmatix.net>:
On 08/30/2018 09:57 AM, Hubert Zhang wrote:

> 2 Keep one worker process for each database. But using a parent/global
> quota worker process to manage the lifecycle of database level worker
> processes. It could handle the newly created database(avoid restart
> database) and save resource when a database is not used. But this needs to
> change worker process to be hierarchical. Postmaster becomes the grandfather
>  of database level worker processes in this case.

I am using background workers this way in 9.5 at $work.

In my case, one worker lives forever, wakes up on a set period, and
starts a short-lived worker for every database, waiting for each
one before starting the next.

It was straightforward to implement. Looking back over the code,
I see the global worker assigns its own PID to worker.bgw_notify_pid
of each of its children, and also obtains a handle for each child
from RegisterDynamicBackgroundWorker().

I imagine the global quota worker would prefer to start workers
for every database and then just wait for notifications from any
of them, but that seems equally straightforward at first glance.

There are servers with thousands databases. Worker per database is not good idea.

It should to share ideas, code with autovacuum process.

Not sure, how to effective implementation based on bg workers can be. On servers with large set of databases, large set of tables it can identify too big table too late.

Isn't better to implement some quotas on storage level?

Regards

Pavel



-Chap





--
Thanks

Hubert Zhang




--
Thanks

Hubert Zhang

Re: Proposal for disk quota feature

From
Hubert Zhang
Date:

Hi all,

We redesign disk quota feature based on the comments from Pavel Stehule and Chapman Flack. Here are the new design.

Overview

Basically,  disk quota feature is used to support multi-tenancy environment, different level of database objects could be set a quota limit to avoid over use of disk space. A common case could be as follows: DBA could enable disk quota on a specified database list. DBA could set disk quota limit for tables/schemas/roles in these databases. Separate disk quota worker process will monitor the disk usage for these objects and detect the objects which exceed their quota limit. Queries loading data into these “out of disk quota” tables/schemas/roles will be cancelled.

We are currently working at init implementation stage. We would like to propose our idea firstly and get feedbacks from community to do quick iteration.

SQL Syntax (How to use disk quota)

1 Specify the databases with disk quota enabled in GUC “diskquota_databases” in postgresql.conf and restart the database.

2 DBA could set disk quota limit for table/schema/role.

CREATE DISK QUOTA tablea1 ON TABLE a1 with (quota = ‘1MB’);

CREATE DISK QUOTA roleu1 ON USER u1 with (quota = ‘1GB’);

CREATE DISK QUOTA schemas1 ON SCHEMA s1 with (quota = ‘3MB’);

3 Simulate a schema out of quota limit case: suppose table a1 and table a2 are both under schema s1.

INSERT INTO a1 SELECT generate_series(1,1000);

INSERT INTO a2 SELECT generate_series(1,3000000);

SELECT pg_sleep(5)

INSERT INTO a1 SELECT generate_series(1,1000);

ERROR:  schema's disk space quota exceeded

DROP TABLE a2;

SELECT pg_sleep(5)

INSERT INTO a1 SELECT generate_series(1,1000);

INSERT 0 1000

Architecture

Disk quota has the following components.

1. Quota Setting Store is where the disk quota setting to be stored and accessed. We plan to use catalog table pg_diskquota to store these information. pg_diskquota is like:

CATALOG(pg_diskquota,6122,DiskQuotaRelationId)

{

NameData quotaname; /* diskquota name */       

int16 quotatype; /* diskquota type name */

Oid quotatargetoid; /* diskquota target db object oid*/

int32 quotalimit; /* diskquota size limit in MB*/

int32 quotaredzone; /* diskquota redzone in MB*/

} FormData_pg_diskquota;

2. Quota Change Detector is the monitor of size change of database objects. We plan to use stat collector to detect the ‘active’ table list at initial stage. But stat collector has some limitation on finding the active table which is in a running transaction. Details see TODO section.

3. Quota Size Checker is where to calculate the size and compare with quota limit for database objects. According to Pavel’s comment, autovacuum launcher and worker process could be a good reference to disk quota. So we plan to use a disk quota launcher daemon process and several disk quota worker process to finish this work. Launcher process is responsible for starting worker process based on a user defined database list from GUC. Worker process will connect to its target database and monitor the disk usage for objects in this database. In init stage of worker process, it will call calculate_total_relation_size() to calculate the size for each user table. After init stage, worker process will refresh the disk model every N seconds. Refreshing will only recalculate the size of tables in ‘active’ table list, which is generated by Quata Change Detector to minimize the cost.

4. Quota Enforcement Operator is where to check for the quota limitation at postgres backend side. We will firstly implement it in ExecCheckRTPerms() as pre-running enforcement. It will check the disk quota of tables being inserted or updated, and report error if table’s or table’s schema’s or table’s owner’s quota limit is exceeded. As a native feature, we plan to add more checkpoint to do running query enforcement. For example, if a disk quota lefts 10MB quota, a query could insert 1GB data. This query could be allowed in pre-running enforcement check, but will be cancelled in running query enforcement check. Therefore, it can improve the accurate of disk quota usage. To achieve this, we plan to add a checkpoint in lower API such as smgr_extened. Hence, the Quota Enforcement Operator will check the disk quota usage when smgr_extened is called. If the quota is over limited, current query will be cancelled.

Highlight

1. Native feature.

Support native Create/Drop Disk Quota SQL statement.

New catalog table pg_diskquota to store disk quota setting.

2. Auto DML/DDL detection.

Table create/update/insert/delete/vacuum/truncate/drop/schema_change/owner_change,  Schema create/drop and Role create/drop will be detected by disk quota automatically.

3. Low cost disk quota checker.

Worker process of disk quota need to refresh the disk usage model every N seconds. Since recalculate the file size using stat() system call is expensive for a large number of files, we use an ‘active’ table list to reduce the real work at each iteration. A basic experiment on our init stage implementation on database with 20K tables shows that the refresh cost is 1% cpu usage and will be finished within 50ms.


Todo/Limitation

Before we propose our patch, we plan to enhance it with the following ideas:

1. Setting database list with disk quota enabled dynamically without restart database. Since we have the disk quota launcher process, it could detect the new ‘diskquota_databases’ list and start/stop the corresponding disk quota worker process.

2. Enforcement when query is running. Considering the case when there is 10MB quota left, but next query will insert 10GB data. Current enforcement design will allow this query to be executed. This is limited by the ‘active’ table detection is generated by stat collector. Postgres backend will only send table stat information to collector only when the transaction ends. We need a new way to detect the ‘active’ table even when this table is being modified inside a running transaction.

3. Monitor unlimited number of databases. Current we set the max number of disk quota worker process to be 10 to reduce the affection normal workload. But how about  if we want to monitor the disk quota of more than 10 databases? Our solution is to let disk quota launcher to manage a queue of database need to be monitored. And disk quota worker process consuming the queue and refresh the disk usage/quota for this database. After some periods, worker will return the database to the queue, and fetch the top database from queue to process. The period determine the delay of detecting disk quota change. To implement this feature, we need to support a subprocess of postmaster to rebind to another database instead of the database binded in InitPostgres().

4. Support active table detection on vacuum full and vacuum analyze. Currently vacuum full and vacuum analyze are not tracked by stat collector.

Thanks to Heikki, Pavel Stehule,Chapman Flack for the former comments on disk quota feature. Any comments on how to improve disk quota feature are appreciated.


On Mon, Sep 3, 2018 at 12:05 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:


2018-09-03 3:49 GMT+02:00 Hubert Zhang <hzhang@pivotal.io>:
Thanks Pavel.
Your patch did enforcement on storage level(md.c or we could also use smgr_extend). It's straight forward.
But I prefer to implement disk_quota as a feature with following objectives:
1 set/alter disk quota setting on different database objects, e.g. user, database, schema etc. not only a general GUC, but we could set separate quota limit for a specific objects. 
2 enforcement operator should work at two positions: before query is running and when query is running. The latter one's implementation maybe similar to your patch.

The patch was just example. The resource quotes should be more complex - per partition, table, schema, database, user - so GUC are possible, but not very user friendly.

Our case is specific, but not too much. The servers are used for multidimensional analyses - and some tables can grow too fast (COPY, INSERT SELECT). We need to solve limits immediately. The implementation is simple, so I did it. Same implementation on database level, or schema level needs some more locks, so it will not be too effective. The resource management can be complex very complex, and I expect so it will be hard work.

Regards

Pavel


On Sun, Sep 2, 2018 at 8:44 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
Hi

2018-09-02 14:18 GMT+02:00 Hubert Zhang <hzhang@pivotal.io>:
Thanks Chapman.
@Pavel,  could you please explain more about your second suggestion "implement some quotas on storage level?"

See attached patch - it is very simple - and good enough for our purposes.

Regards

Pavel

 
We will not keep the long-lived processes attach to all databases(just like you mentioned servers with thousands of databases)  
And you are right, we could share ideas with autovacuum process, fork worker processes in need. 
"autovacuum checks for tables that have had a large number of inserted, updated or deleted tuples. These checks use the statistics collection facility"
diskquota process is similar to autovacuum at caring about insert, but the difference is that it also care about vucuum full, truncate and drop. While update and delete may not be interested since no file change happens. So a separate diskquota process is preferred.

So if we implemented disk quota as a full native feature, and in the first initial version I prefer to implement the following features:
1 Fork diskquota launcher process under Postmaster serverloop, which is long-lived.
2 Diskquota launcher process is responsible for creating diskquota worker process for every database.
3 DIskquota setting is stored in a separate catalog table for each database.
4 Initialization stage, Diskquota launcher process creates diskquota worker process for all the databases(traverse like autovacuum). Worker process calculates disk usage of db objects and their diskquota setting. If any db object exceeds its quota limit, put them into the blacklist in the shared memory, which will later be used by enforcement operator. Worker process exits when works are done.
5 Running stage, Diskquota launcher process creates diskquota worker process for the database with a large number of insert, copy, truncate, drop etc. or create disk quota statement. Worker process updates the file size for db objects containing the result relation, and compare with the diskquota setting. Again, if exceeds quota limit, put them into blacklist, remove from blacklist vice versa. Worker process exits when works are done and a GUC could control the frequency of worker process restart to a specific database. As you know, this GUC also controls the delay when we do enforcement.
6 Enforcement. When postgres backend executes queries, check the blacklist in shared memory to determine whether the query is allowed(before execute) or need rollback(is executing)?

If we implemented disk quota as an extension, we could just use background worker to start diskquota launcher process and use RegisterDynamicBackgroundWorker() to fork child diskquota worker processes by the launcher process as suggested by @Chapman. Diskquota setting could be stored in user table in a separate schema for each database(Schema and table created by create extension statement) just like what Heikki has done in pg_quota project. But in this case, we need to create extension for each database before diskquota worker process can be set up for that database.

Any comments on the above design and which is preferred, native feature or extension as the POC? 


-- Hubert



On Fri, Aug 31, 2018 at 3:32 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:


2018-08-30 16:22 GMT+02:00 Chapman Flack <chap@anastigmatix.net>:
On 08/30/2018 09:57 AM, Hubert Zhang wrote:

> 2 Keep one worker process for each database. But using a parent/global
> quota worker process to manage the lifecycle of database level worker
> processes. It could handle the newly created database(avoid restart
> database) and save resource when a database is not used. But this needs to
> change worker process to be hierarchical. Postmaster becomes the grandfather
>  of database level worker processes in this case.

I am using background workers this way in 9.5 at $work.

In my case, one worker lives forever, wakes up on a set period, and
starts a short-lived worker for every database, waiting for each
one before starting the next.

It was straightforward to implement. Looking back over the code,
I see the global worker assigns its own PID to worker.bgw_notify_pid
of each of its children, and also obtains a handle for each child
from RegisterDynamicBackgroundWorker().

I imagine the global quota worker would prefer to start workers
for every database and then just wait for notifications from any
of them, but that seems equally straightforward at first glance.

There are servers with thousands databases. Worker per database is not good idea.

It should to share ideas, code with autovacuum process.

Not sure, how to effective implementation based on bg workers can be. On servers with large set of databases, large set of tables it can identify too big table too late.

Isn't better to implement some quotas on storage level?

Regards

Pavel



-Chap





--
Thanks

Hubert Zhang




--
Thanks

Hubert Zhang




--
Thanks

Hubert Zhang

Re: Proposal for disk quota feature

From
Pavel Stehule
Date:


pá 21. 9. 2018 v 13:32 odesílatel Hubert Zhang <hzhang@pivotal.io> napsal:

Hi all,

We redesign disk quota feature based on the comments from Pavel Stehule and Chapman Flack. Here are the new design.

Overview

Basically,  disk quota feature is used to support multi-tenancy environment, different level of database objects could be set a quota limit to avoid over use of disk space. A common case could be as follows: DBA could enable disk quota on a specified database list. DBA could set disk quota limit for tables/schemas/roles in these databases. Separate disk quota worker process will monitor the disk usage for these objects and detect the objects which exceed their quota limit. Queries loading data into these “out of disk quota” tables/schemas/roles will be cancelled.

We are currently working at init implementation stage. We would like to propose our idea firstly and get feedbacks from community to do quick iteration.

SQL Syntax (How to use disk quota)

1 Specify the databases with disk quota enabled in GUC “diskquota_databases” in postgresql.conf and restart the database.

2 DBA could set disk quota limit for table/schema/role.

CREATE DISK QUOTA tablea1 ON TABLE a1 with (quota = ‘1MB’);

CREATE DISK QUOTA roleu1 ON USER u1 with (quota = ‘1GB’);

CREATE DISK QUOTA schemas1 ON SCHEMA s1 with (quota = ‘3MB’);


just fast reaction - why QUOTA object?

Isn't ALTER SET enough?

Some like

ALTER TABLE a1 SET quote = 1MB;
ALTER USER ...
ALTER SCHEMA ..

New DDL commans looks like too hard hammer .



3 Simulate a schema out of quota limit case: suppose table a1 and table a2 are both under schema s1.

INSERT INTO a1 SELECT generate_series(1,1000);

INSERT INTO a2 SELECT generate_series(1,3000000);

SELECT pg_sleep(5)

INSERT INTO a1 SELECT generate_series(1,1000);

ERROR:  schema's disk space quota exceeded

DROP TABLE a2;

SELECT pg_sleep(5)

INSERT INTO a1 SELECT generate_series(1,1000);

INSERT 0 1000

Architecture

Disk quota has the following components.

1. Quota Setting Store is where the disk quota setting to be stored and accessed. We plan to use catalog table pg_diskquota to store these information. pg_diskquota is like:

CATALOG(pg_diskquota,6122,DiskQuotaRelationId)

{

NameData quotaname; /* diskquota name */       

int16 quotatype; /* diskquota type name */

Oid quotatargetoid; /* diskquota target db object oid*/

int32 quotalimit; /* diskquota size limit in MB*/

int32 quotaredzone; /* diskquota redzone in MB*/

} FormData_pg_diskquota;

2. Quota Change Detector is the monitor of size change of database objects. We plan to use stat collector to detect the ‘active’ table list at initial stage. But stat collector has some limitation on finding the active table which is in a running transaction. Details see TODO section.

3. Quota Size Checker is where to calculate the size and compare with quota limit for database objects. According to Pavel’s comment, autovacuum launcher and worker process could be a good reference to disk quota. So we plan to use a disk quota launcher daemon process and several disk quota worker process to finish this work. Launcher process is responsible for starting worker process based on a user defined database list from GUC. Worker process will connect to its target database and monitor the disk usage for objects in this database. In init stage of worker process, it will call calculate_total_relation_size() to calculate the size for each user table. After init stage, worker process will refresh the disk model every N seconds. Refreshing will only recalculate the size of tables in ‘active’ table list, which is generated by Quata Change Detector to minimize the cost.

4. Quota Enforcement Operator is where to check for the quota limitation at postgres backend side. We will firstly implement it in ExecCheckRTPerms() as pre-running enforcement. It will check the disk quota of tables being inserted or updated, and report error if table’s or table’s schema’s or table’s owner’s quota limit is exceeded. As a native feature, we plan to add more checkpoint to do running query enforcement. For example, if a disk quota lefts 10MB quota, a query could insert 1GB data. This query could be allowed in pre-running enforcement check, but will be cancelled in running query enforcement check. Therefore, it can improve the accurate of disk quota usage. To achieve this, we plan to add a checkpoint in lower API such as smgr_extened. Hence, the Quota Enforcement Operator will check the disk quota usage when smgr_extened is called. If the quota is over limited, current query will be cancelled.

Highlight

1. Native feature.

Support native Create/Drop Disk Quota SQL statement.

New catalog table pg_diskquota to store disk quota setting.

2. Auto DML/DDL detection.

Table create/update/insert/delete/vacuum/truncate/drop/schema_change/owner_change,  Schema create/drop and Role create/drop will be detected by disk quota automatically.

3. Low cost disk quota checker.

Worker process of disk quota need to refresh the disk usage model every N seconds. Since recalculate the file size using stat() system call is expensive for a large number of files, we use an ‘active’ table list to reduce the real work at each iteration. A basic experiment on our init stage implementation on database with 20K tables shows that the refresh cost is 1% cpu usage and will be finished within 50ms.


Todo/Limitation

Before we propose our patch, we plan to enhance it with the following ideas:

1. Setting database list with disk quota enabled dynamically without restart database. Since we have the disk quota launcher process, it could detect the new ‘diskquota_databases’ list and start/stop the corresponding disk quota worker process.

2. Enforcement when query is running. Considering the case when there is 10MB quota left, but next query will insert 10GB data. Current enforcement design will allow this query to be executed. This is limited by the ‘active’ table detection is generated by stat collector. Postgres backend will only send table stat information to collector only when the transaction ends. We need a new way to detect the ‘active’ table even when this table is being modified inside a running transaction.

3. Monitor unlimited number of databases. Current we set the max number of disk quota worker process to be 10 to reduce the affection normal workload. But how about  if we want to monitor the disk quota of more than 10 databases? Our solution is to let disk quota launcher to manage a queue of database need to be monitored. And disk quota worker process consuming the queue and refresh the disk usage/quota for this database. After some periods, worker will return the database to the queue, and fetch the top database from queue to process. The period determine the delay of detecting disk quota change. To implement this feature, we need to support a subprocess of postmaster to rebind to another database instead of the database binded in InitPostgres().

4. Support active table detection on vacuum full and vacuum analyze. Currently vacuum full and vacuum analyze are not tracked by stat collector.

Thanks to Heikki, Pavel Stehule,Chapman Flack for the former comments on disk quota feature. Any comments on how to improve disk quota feature are appreciated.


On Mon, Sep 3, 2018 at 12:05 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:


2018-09-03 3:49 GMT+02:00 Hubert Zhang <hzhang@pivotal.io>:
Thanks Pavel.
Your patch did enforcement on storage level(md.c or we could also use smgr_extend). It's straight forward.
But I prefer to implement disk_quota as a feature with following objectives:
1 set/alter disk quota setting on different database objects, e.g. user, database, schema etc. not only a general GUC, but we could set separate quota limit for a specific objects. 
2 enforcement operator should work at two positions: before query is running and when query is running. The latter one's implementation maybe similar to your patch.

The patch was just example. The resource quotes should be more complex - per partition, table, schema, database, user - so GUC are possible, but not very user friendly.

Our case is specific, but not too much. The servers are used for multidimensional analyses - and some tables can grow too fast (COPY, INSERT SELECT). We need to solve limits immediately. The implementation is simple, so I did it. Same implementation on database level, or schema level needs some more locks, so it will not be too effective. The resource management can be complex very complex, and I expect so it will be hard work.

Regards

Pavel


On Sun, Sep 2, 2018 at 8:44 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
Hi

2018-09-02 14:18 GMT+02:00 Hubert Zhang <hzhang@pivotal.io>:
Thanks Chapman.
@Pavel,  could you please explain more about your second suggestion "implement some quotas on storage level?"

See attached patch - it is very simple - and good enough for our purposes.

Regards

Pavel

 
We will not keep the long-lived processes attach to all databases(just like you mentioned servers with thousands of databases)  
And you are right, we could share ideas with autovacuum process, fork worker processes in need. 
"autovacuum checks for tables that have had a large number of inserted, updated or deleted tuples. These checks use the statistics collection facility"
diskquota process is similar to autovacuum at caring about insert, but the difference is that it also care about vucuum full, truncate and drop. While update and delete may not be interested since no file change happens. So a separate diskquota process is preferred.

So if we implemented disk quota as a full native feature, and in the first initial version I prefer to implement the following features:
1 Fork diskquota launcher process under Postmaster serverloop, which is long-lived.
2 Diskquota launcher process is responsible for creating diskquota worker process for every database.
3 DIskquota setting is stored in a separate catalog table for each database.
4 Initialization stage, Diskquota launcher process creates diskquota worker process for all the databases(traverse like autovacuum). Worker process calculates disk usage of db objects and their diskquota setting. If any db object exceeds its quota limit, put them into the blacklist in the shared memory, which will later be used by enforcement operator. Worker process exits when works are done.
5 Running stage, Diskquota launcher process creates diskquota worker process for the database with a large number of insert, copy, truncate, drop etc. or create disk quota statement. Worker process updates the file size for db objects containing the result relation, and compare with the diskquota setting. Again, if exceeds quota limit, put them into blacklist, remove from blacklist vice versa. Worker process exits when works are done and a GUC could control the frequency of worker process restart to a specific database. As you know, this GUC also controls the delay when we do enforcement.
6 Enforcement. When postgres backend executes queries, check the blacklist in shared memory to determine whether the query is allowed(before execute) or need rollback(is executing)?

If we implemented disk quota as an extension, we could just use background worker to start diskquota launcher process and use RegisterDynamicBackgroundWorker() to fork child diskquota worker processes by the launcher process as suggested by @Chapman. Diskquota setting could be stored in user table in a separate schema for each database(Schema and table created by create extension statement) just like what Heikki has done in pg_quota project. But in this case, we need to create extension for each database before diskquota worker process can be set up for that database.

Any comments on the above design and which is preferred, native feature or extension as the POC? 


-- Hubert



On Fri, Aug 31, 2018 at 3:32 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:


2018-08-30 16:22 GMT+02:00 Chapman Flack <chap@anastigmatix.net>:
On 08/30/2018 09:57 AM, Hubert Zhang wrote:

> 2 Keep one worker process for each database. But using a parent/global
> quota worker process to manage the lifecycle of database level worker
> processes. It could handle the newly created database(avoid restart
> database) and save resource when a database is not used. But this needs to
> change worker process to be hierarchical. Postmaster becomes the grandfather
>  of database level worker processes in this case.

I am using background workers this way in 9.5 at $work.

In my case, one worker lives forever, wakes up on a set period, and
starts a short-lived worker for every database, waiting for each
one before starting the next.

It was straightforward to implement. Looking back over the code,
I see the global worker assigns its own PID to worker.bgw_notify_pid
of each of its children, and also obtains a handle for each child
from RegisterDynamicBackgroundWorker().

I imagine the global quota worker would prefer to start workers
for every database and then just wait for notifications from any
of them, but that seems equally straightforward at first glance.

There are servers with thousands databases. Worker per database is not good idea.

It should to share ideas, code with autovacuum process.

Not sure, how to effective implementation based on bg workers can be. On servers with large set of databases, large set of tables it can identify too big table too late.

Isn't better to implement some quotas on storage level?

Regards

Pavel



-Chap





--
Thanks

Hubert Zhang




--
Thanks

Hubert Zhang




--
Thanks

Hubert Zhang

Re: Proposal for disk quota feature

From
Hubert Zhang
Date:
just fast reaction - why QUOTA object?
Isn't ALTER SET enough?
Some like
ALTER TABLE a1 SET quote = 1MB;
ALTER USER ...
ALTER SCHEMA ..
New DDL commans looks like too hard hammer .

It's an option. Prefer to consider quota setting store together:
CREATE DISK QUOTA way is more nature to store quota setting in a separate pg_diskquota catalog
While ALTER SET way is more close to store quota setting in pg_class, pg_role, pg_namespace. etc in an integrated way.
(Note that here I mean nature/close is not must, ALTER SET could also store in pg_diskquota and vice versa.)

Here are some differences I can think of:
1 pg_role is a global catalog, not per database level. It's harder to tracker the user's disk usage in the whole clusters(considering 1000+ databases).  So the semantic of  CREATE DISK QUOTA ON USER is limited: it only tracks the user's disk usage inside the current database.
2 using separate pg_diskquota could add more field except for quota limit without adding too many fields in pg_class, e.g. red zone to give the user a warning or the current disk usage of the db objects.

On Fri, Sep 21, 2018 at 8:01 PM Pavel Stehule <pavel.stehule@gmail.com> wrote:


pá 21. 9. 2018 v 13:32 odesílatel Hubert Zhang <hzhang@pivotal.io> napsal:

Hi all,

We redesign disk quota feature based on the comments from Pavel Stehule and Chapman Flack. Here are the new design.

Overview

Basically,  disk quota feature is used to support multi-tenancy environment, different level of database objects could be set a quota limit to avoid over use of disk space. A common case could be as follows: DBA could enable disk quota on a specified database list. DBA could set disk quota limit for tables/schemas/roles in these databases. Separate disk quota worker process will monitor the disk usage for these objects and detect the objects which exceed their quota limit. Queries loading data into these “out of disk quota” tables/schemas/roles will be cancelled.

We are currently working at init implementation stage. We would like to propose our idea firstly and get feedbacks from community to do quick iteration.

SQL Syntax (How to use disk quota)

1 Specify the databases with disk quota enabled in GUC “diskquota_databases” in postgresql.conf and restart the database.

2 DBA could set disk quota limit for table/schema/role.

CREATE DISK QUOTA tablea1 ON TABLE a1 with (quota = ‘1MB’);

CREATE DISK QUOTA roleu1 ON USER u1 with (quota = ‘1GB’);

CREATE DISK QUOTA schemas1 ON SCHEMA s1 with (quota = ‘3MB’);


just fast reaction - why QUOTA object?

Isn't ALTER SET enough?

Some like

ALTER TABLE a1 SET quote = 1MB;
ALTER USER ...
ALTER SCHEMA ..

New DDL commans looks like too hard hammer .



3 Simulate a schema out of quota limit case: suppose table a1 and table a2 are both under schema s1.

INSERT INTO a1 SELECT generate_series(1,1000);

INSERT INTO a2 SELECT generate_series(1,3000000);

SELECT pg_sleep(5)

INSERT INTO a1 SELECT generate_series(1,1000);

ERROR:  schema's disk space quota exceeded

DROP TABLE a2;

SELECT pg_sleep(5)

INSERT INTO a1 SELECT generate_series(1,1000);

INSERT 0 1000

Architecture

Disk quota has the following components.

1. Quota Setting Store is where the disk quota setting to be stored and accessed. We plan to use catalog table pg_diskquota to store these information. pg_diskquota is like:

CATALOG(pg_diskquota,6122,DiskQuotaRelationId)

{

NameData quotaname; /* diskquota name */       

int16 quotatype; /* diskquota type name */

Oid quotatargetoid; /* diskquota target db object oid*/

int32 quotalimit; /* diskquota size limit in MB*/

int32 quotaredzone; /* diskquota redzone in MB*/

} FormData_pg_diskquota;

2. Quota Change Detector is the monitor of size change of database objects. We plan to use stat collector to detect the ‘active’ table list at initial stage. But stat collector has some limitation on finding the active table which is in a running transaction. Details see TODO section.

3. Quota Size Checker is where to calculate the size and compare with quota limit for database objects. According to Pavel’s comment, autovacuum launcher and worker process could be a good reference to disk quota. So we plan to use a disk quota launcher daemon process and several disk quota worker process to finish this work. Launcher process is responsible for starting worker process based on a user defined database list from GUC. Worker process will connect to its target database and monitor the disk usage for objects in this database. In init stage of worker process, it will call calculate_total_relation_size() to calculate the size for each user table. After init stage, worker process will refresh the disk model every N seconds. Refreshing will only recalculate the size of tables in ‘active’ table list, which is generated by Quata Change Detector to minimize the cost.

4. Quota Enforcement Operator is where to check for the quota limitation at postgres backend side. We will firstly implement it in ExecCheckRTPerms() as pre-running enforcement. It will check the disk quota of tables being inserted or updated, and report error if table’s or table’s schema’s or table’s owner’s quota limit is exceeded. As a native feature, we plan to add more checkpoint to do running query enforcement. For example, if a disk quota lefts 10MB quota, a query could insert 1GB data. This query could be allowed in pre-running enforcement check, but will be cancelled in running query enforcement check. Therefore, it can improve the accurate of disk quota usage. To achieve this, we plan to add a checkpoint in lower API such as smgr_extened. Hence, the Quota Enforcement Operator will check the disk quota usage when smgr_extened is called. If the quota is over limited, current query will be cancelled.

Highlight

1. Native feature.

Support native Create/Drop Disk Quota SQL statement.

New catalog table pg_diskquota to store disk quota setting.

2. Auto DML/DDL detection.

Table create/update/insert/delete/vacuum/truncate/drop/schema_change/owner_change,  Schema create/drop and Role create/drop will be detected by disk quota automatically.

3. Low cost disk quota checker.

Worker process of disk quota need to refresh the disk usage model every N seconds. Since recalculate the file size using stat() system call is expensive for a large number of files, we use an ‘active’ table list to reduce the real work at each iteration. A basic experiment on our init stage implementation on database with 20K tables shows that the refresh cost is 1% cpu usage and will be finished within 50ms.


Todo/Limitation

Before we propose our patch, we plan to enhance it with the following ideas:

1. Setting database list with disk quota enabled dynamically without restart database. Since we have the disk quota launcher process, it could detect the new ‘diskquota_databases’ list and start/stop the corresponding disk quota worker process.

2. Enforcement when query is running. Considering the case when there is 10MB quota left, but next query will insert 10GB data. Current enforcement design will allow this query to be executed. This is limited by the ‘active’ table detection is generated by stat collector. Postgres backend will only send table stat information to collector only when the transaction ends. We need a new way to detect the ‘active’ table even when this table is being modified inside a running transaction.

3. Monitor unlimited number of databases. Current we set the max number of disk quota worker process to be 10 to reduce the affection normal workload. But how about  if we want to monitor the disk quota of more than 10 databases? Our solution is to let disk quota launcher to manage a queue of database need to be monitored. And disk quota worker process consuming the queue and refresh the disk usage/quota for this database. After some periods, worker will return the database to the queue, and fetch the top database from queue to process. The period determine the delay of detecting disk quota change. To implement this feature, we need to support a subprocess of postmaster to rebind to another database instead of the database binded in InitPostgres().

4. Support active table detection on vacuum full and vacuum analyze. Currently vacuum full and vacuum analyze are not tracked by stat collector.

Thanks to Heikki, Pavel Stehule,Chapman Flack for the former comments on disk quota feature. Any comments on how to improve disk quota feature are appreciated.


On Mon, Sep 3, 2018 at 12:05 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:


2018-09-03 3:49 GMT+02:00 Hubert Zhang <hzhang@pivotal.io>:
Thanks Pavel.
Your patch did enforcement on storage level(md.c or we could also use smgr_extend). It's straight forward.
But I prefer to implement disk_quota as a feature with following objectives:
1 set/alter disk quota setting on different database objects, e.g. user, database, schema etc. not only a general GUC, but we could set separate quota limit for a specific objects. 
2 enforcement operator should work at two positions: before query is running and when query is running. The latter one's implementation maybe similar to your patch.

The patch was just example. The resource quotes should be more complex - per partition, table, schema, database, user - so GUC are possible, but not very user friendly.

Our case is specific, but not too much. The servers are used for multidimensional analyses - and some tables can grow too fast (COPY, INSERT SELECT). We need to solve limits immediately. The implementation is simple, so I did it. Same implementation on database level, or schema level needs some more locks, so it will not be too effective. The resource management can be complex very complex, and I expect so it will be hard work.

Regards

Pavel


On Sun, Sep 2, 2018 at 8:44 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
Hi

2018-09-02 14:18 GMT+02:00 Hubert Zhang <hzhang@pivotal.io>:
Thanks Chapman.
@Pavel,  could you please explain more about your second suggestion "implement some quotas on storage level?"

See attached patch - it is very simple - and good enough for our purposes.

Regards

Pavel

 
We will not keep the long-lived processes attach to all databases(just like you mentioned servers with thousands of databases)  
And you are right, we could share ideas with autovacuum process, fork worker processes in need. 
"autovacuum checks for tables that have had a large number of inserted, updated or deleted tuples. These checks use the statistics collection facility"
diskquota process is similar to autovacuum at caring about insert, but the difference is that it also care about vucuum full, truncate and drop. While update and delete may not be interested since no file change happens. So a separate diskquota process is preferred.

So if we implemented disk quota as a full native feature, and in the first initial version I prefer to implement the following features:
1 Fork diskquota launcher process under Postmaster serverloop, which is long-lived.
2 Diskquota launcher process is responsible for creating diskquota worker process for every database.
3 DIskquota setting is stored in a separate catalog table for each database.
4 Initialization stage, Diskquota launcher process creates diskquota worker process for all the databases(traverse like autovacuum). Worker process calculates disk usage of db objects and their diskquota setting. If any db object exceeds its quota limit, put them into the blacklist in the shared memory, which will later be used by enforcement operator. Worker process exits when works are done.
5 Running stage, Diskquota launcher process creates diskquota worker process for the database with a large number of insert, copy, truncate, drop etc. or create disk quota statement. Worker process updates the file size for db objects containing the result relation, and compare with the diskquota setting. Again, if exceeds quota limit, put them into blacklist, remove from blacklist vice versa. Worker process exits when works are done and a GUC could control the frequency of worker process restart to a specific database. As you know, this GUC also controls the delay when we do enforcement.
6 Enforcement. When postgres backend executes queries, check the blacklist in shared memory to determine whether the query is allowed(before execute) or need rollback(is executing)?

If we implemented disk quota as an extension, we could just use background worker to start diskquota launcher process and use RegisterDynamicBackgroundWorker() to fork child diskquota worker processes by the launcher process as suggested by @Chapman. Diskquota setting could be stored in user table in a separate schema for each database(Schema and table created by create extension statement) just like what Heikki has done in pg_quota project. But in this case, we need to create extension for each database before diskquota worker process can be set up for that database.

Any comments on the above design and which is preferred, native feature or extension as the POC? 


-- Hubert



On Fri, Aug 31, 2018 at 3:32 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:


2018-08-30 16:22 GMT+02:00 Chapman Flack <chap@anastigmatix.net>:
On 08/30/2018 09:57 AM, Hubert Zhang wrote:

> 2 Keep one worker process for each database. But using a parent/global
> quota worker process to manage the lifecycle of database level worker
> processes. It could handle the newly created database(avoid restart
> database) and save resource when a database is not used. But this needs to
> change worker process to be hierarchical. Postmaster becomes the grandfather
>  of database level worker processes in this case.

I am using background workers this way in 9.5 at $work.

In my case, one worker lives forever, wakes up on a set period, and
starts a short-lived worker for every database, waiting for each
one before starting the next.

It was straightforward to implement. Looking back over the code,
I see the global worker assigns its own PID to worker.bgw_notify_pid
of each of its children, and also obtains a handle for each child
from RegisterDynamicBackgroundWorker().

I imagine the global quota worker would prefer to start workers
for every database and then just wait for notifications from any
of them, but that seems equally straightforward at first glance.

There are servers with thousands databases. Worker per database is not good idea.

It should to share ideas, code with autovacuum process.

Not sure, how to effective implementation based on bg workers can be. On servers with large set of databases, large set of tables it can identify too big table too late.

Isn't better to implement some quotas on storage level?

Regards

Pavel



-Chap





--
Thanks

Hubert Zhang




--
Thanks

Hubert Zhang




--
Thanks

Hubert Zhang


--
Thanks

Hubert Zhang

Re: Proposal for disk quota feature

From
Jeremy Finzel
Date:


On Fri, Sep 21, 2018 at 9:21 AM Hubert Zhang <hzhang@pivotal.io> wrote:
just fast reaction - why QUOTA object?
Isn't ALTER SET enough?
Some like
ALTER TABLE a1 SET quote = 1MB;
ALTER USER ...
ALTER SCHEMA ..
New DDL commans looks like too hard hammer .

It's an option. Prefer to consider quota setting store together:
CREATE DISK QUOTA way is more nature to store quota setting in a separate pg_diskquota catalog
While ALTER SET way is more close to store quota setting in pg_class, pg_role, pg_namespace. etc in an integrated way.
(Note that here I mean nature/close is not must, ALTER SET could also store in pg_diskquota and vice versa.)

Here are some differences I can think of:
1 pg_role is a global catalog, not per database level. It's harder to tracker the user's disk usage in the whole clusters(considering 1000+ databases).  So the semantic of  CREATE DISK QUOTA ON USER is limited: it only tracks the user's disk usage inside the current database.
2 using separate pg_diskquota could add more field except for quota limit without adding too many fields in pg_class, e.g. red zone to give the user a warning or the current disk usage of the db objects.

On Fri, Sep 21, 2018 at 8:01 PM Pavel Stehule <pavel.stehule@gmail.com> wrote:


pá 21. 9. 2018 v 13:32 odesílatel Hubert Zhang <hzhang@pivotal.io> napsal:

Hi all,

We redesign disk quota feature based on the comments from Pavel Stehule and Chapman Flack. Here are the new design.

Overview

Basically,  disk quota feature is used to support multi-tenancy environment, different level of database objects could be set a quota limit to avoid over use of disk space. A common case could be as follows: DBA could enable disk quota on a specified database list. DBA could set disk quota limit for tables/schemas/roles in these databases. Separate disk quota worker process will monitor the disk usage for these objects and detect the objects which exceed their quota limit. Queries loading data into these “out of disk quota” tables/schemas/roles will be cancelled.

We are currently working at init implementation stage. We would like to propose our idea firstly and get feedbacks from community to do quick iteration.

SQL Syntax (How to use disk quota)

1 Specify the databases with disk quota enabled in GUC “diskquota_databases” in postgresql.conf and restart the database.

2 DBA could set disk quota limit for table/schema/role.

CREATE DISK QUOTA tablea1 ON TABLE a1 with (quota = ‘1MB’);

CREATE DISK QUOTA roleu1 ON USER u1 with (quota = ‘1GB’);

CREATE DISK QUOTA schemas1 ON SCHEMA s1 with (quota = ‘3MB’);


just fast reaction - why QUOTA object?

Isn't ALTER SET enough?

Some like

ALTER TABLE a1 SET quote = 1MB;
ALTER USER ...
ALTER SCHEMA ..

New DDL commans looks like too hard hammer .



3 Simulate a schema out of quota limit case: suppose table a1 and table a2 are both under schema s1.

INSERT INTO a1 SELECT generate_series(1,1000);

INSERT INTO a2 SELECT generate_series(1,3000000);

SELECT pg_sleep(5)

INSERT INTO a1 SELECT generate_series(1,1000);

ERROR:  schema's disk space quota exceeded

DROP TABLE a2;

SELECT pg_sleep(5)

INSERT INTO a1 SELECT generate_series(1,1000);

INSERT 0 1000

Architecture

Disk quota has the following components.

1. Quota Setting Store is where the disk quota setting to be stored and accessed. We plan to use catalog table pg_diskquota to store these information. pg_diskquota is like:

CATALOG(pg_diskquota,6122,DiskQuotaRelationId)

{

NameData quotaname; /* diskquota name */       

int16 quotatype; /* diskquota type name */

Oid quotatargetoid; /* diskquota target db object oid*/

int32 quotalimit; /* diskquota size limit in MB*/

int32 quotaredzone; /* diskquota redzone in MB*/

} FormData_pg_diskquota;

2. Quota Change Detector is the monitor of size change of database objects. We plan to use stat collector to detect the ‘active’ table list at initial stage. But stat collector has some limitation on finding the active table which is in a running transaction. Details see TODO section.

3. Quota Size Checker is where to calculate the size and compare with quota limit for database objects. According to Pavel’s comment, autovacuum launcher and worker process could be a good reference to disk quota. So we plan to use a disk quota launcher daemon process and several disk quota worker process to finish this work. Launcher process is responsible for starting worker process based on a user defined database list from GUC. Worker process will connect to its target database and monitor the disk usage for objects in this database. In init stage of worker process, it will call calculate_total_relation_size() to calculate the size for each user table. After init stage, worker process will refresh the disk model every N seconds. Refreshing will only recalculate the size of tables in ‘active’ table list, which is generated by Quata Change Detector to minimize the cost.

4. Quota Enforcement Operator is where to check for the quota limitation at postgres backend side. We will firstly implement it in ExecCheckRTPerms() as pre-running enforcement. It will check the disk quota of tables being inserted or updated, and report error if table’s or table’s schema’s or table’s owner’s quota limit is exceeded. As a native feature, we plan to add more checkpoint to do running query enforcement. For example, if a disk quota lefts 10MB quota, a query could insert 1GB data. This query could be allowed in pre-running enforcement check, but will be cancelled in running query enforcement check. Therefore, it can improve the accurate of disk quota usage. To achieve this, we plan to add a checkpoint in lower API such as smgr_extened. Hence, the Quota Enforcement Operator will check the disk quota usage when smgr_extened is called. If the quota is over limited, current query will be cancelled.

Highlight

1. Native feature.

Support native Create/Drop Disk Quota SQL statement.

New catalog table pg_diskquota to store disk quota setting.

2. Auto DML/DDL detection.

Table create/update/insert/delete/vacuum/truncate/drop/schema_change/owner_change,  Schema create/drop and Role create/drop will be detected by disk quota automatically.

3. Low cost disk quota checker.

Worker process of disk quota need to refresh the disk usage model every N seconds. Since recalculate the file size using stat() system call is expensive for a large number of files, we use an ‘active’ table list to reduce the real work at each iteration. A basic experiment on our init stage implementation on database with 20K tables shows that the refresh cost is 1% cpu usage and will be finished within 50ms.


Todo/Limitation

Before we propose our patch, we plan to enhance it with the following ideas:

1. Setting database list with disk quota enabled dynamically without restart database. Since we have the disk quota launcher process, it could detect the new ‘diskquota_databases’ list and start/stop the corresponding disk quota worker process.

2. Enforcement when query is running. Considering the case when there is 10MB quota left, but next query will insert 10GB data. Current enforcement design will allow this query to be executed. This is limited by the ‘active’ table detection is generated by stat collector. Postgres backend will only send table stat information to collector only when the transaction ends. We need a new way to detect the ‘active’ table even when this table is being modified inside a running transaction.

3. Monitor unlimited number of databases. Current we set the max number of disk quota worker process to be 10 to reduce the affection normal workload. But how about  if we want to monitor the disk quota of more than 10 databases? Our solution is to let disk quota launcher to manage a queue of database need to be monitored. And disk quota worker process consuming the queue and refresh the disk usage/quota for this database. After some periods, worker will return the database to the queue, and fetch the top database from queue to process. The period determine the delay of detecting disk quota change. To implement this feature, we need to support a subprocess of postmaster to rebind to another database instead of the database binded in InitPostgres().

4. Support active table detection on vacuum full and vacuum analyze. Currently vacuum full and vacuum analyze are not tracked by stat collector.

Thanks to Heikki, Pavel Stehule,Chapman Flack for the former comments on disk quota feature. Any comments on how to improve disk quota feature are appreciated.


On Mon, Sep 3, 2018 at 12:05 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:


2018-09-03 3:49 GMT+02:00 Hubert Zhang <hzhang@pivotal.io>:
Thanks Pavel.
Your patch did enforcement on storage level(md.c or we could also use smgr_extend). It's straight forward.
But I prefer to implement disk_quota as a feature with following objectives:
1 set/alter disk quota setting on different database objects, e.g. user, database, schema etc. not only a general GUC, but we could set separate quota limit for a specific objects. 
2 enforcement operator should work at two positions: before query is running and when query is running. The latter one's implementation maybe similar to your patch.

The patch was just example. The resource quotes should be more complex - per partition, table, schema, database, user - so GUC are possible, but not very user friendly.

Our case is specific, but not too much. The servers are used for multidimensional analyses - and some tables can grow too fast (COPY, INSERT SELECT). We need to solve limits immediately. The implementation is simple, so I did it. Same implementation on database level, or schema level needs some more locks, so it will not be too effective. The resource management can be complex very complex, and I expect so it will be hard work.

Regards

Pavel


On Sun, Sep 2, 2018 at 8:44 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
Hi

2018-09-02 14:18 GMT+02:00 Hubert Zhang <hzhang@pivotal.io>:
Thanks Chapman.
@Pavel,  could you please explain more about your second suggestion "implement some quotas on storage level?"

See attached patch - it is very simple - and good enough for our purposes.

Regards

Pavel

 
We will not keep the long-lived processes attach to all databases(just like you mentioned servers with thousands of databases)  
And you are right, we could share ideas with autovacuum process, fork worker processes in need. 
"autovacuum checks for tables that have had a large number of inserted, updated or deleted tuples. These checks use the statistics collection facility"
diskquota process is similar to autovacuum at caring about insert, but the difference is that it also care about vucuum full, truncate and drop. While update and delete may not be interested since no file change happens. So a separate diskquota process is preferred.

So if we implemented disk quota as a full native feature, and in the first initial version I prefer to implement the following features:
1 Fork diskquota launcher process under Postmaster serverloop, which is long-lived.
2 Diskquota launcher process is responsible for creating diskquota worker process for every database.
3 DIskquota setting is stored in a separate catalog table for each database.
4 Initialization stage, Diskquota launcher process creates diskquota worker process for all the databases(traverse like autovacuum). Worker process calculates disk usage of db objects and their diskquota setting. If any db object exceeds its quota limit, put them into the blacklist in the shared memory, which will later be used by enforcement operator. Worker process exits when works are done.
5 Running stage, Diskquota launcher process creates diskquota worker process for the database with a large number of insert, copy, truncate, drop etc. or create disk quota statement. Worker process updates the file size for db objects containing the result relation, and compare with the diskquota setting. Again, if exceeds quota limit, put them into blacklist, remove from blacklist vice versa. Worker process exits when works are done and a GUC could control the frequency of worker process restart to a specific database. As you know, this GUC also controls the delay when we do enforcement.
6 Enforcement. When postgres backend executes queries, check the blacklist in shared memory to determine whether the query is allowed(before execute) or need rollback(is executing)?

If we implemented disk quota as an extension, we could just use background worker to start diskquota launcher process and use RegisterDynamicBackgroundWorker() to fork child diskquota worker processes by the launcher process as suggested by @Chapman. Diskquota setting could be stored in user table in a separate schema for each database(Schema and table created by create extension statement) just like what Heikki has done in pg_quota project. But in this case, we need to create extension for each database before diskquota worker process can be set up for that database.

Any comments on the above design and which is preferred, native feature or extension as the POC? 


-- Hubert



On Fri, Aug 31, 2018 at 3:32 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:


2018-08-30 16:22 GMT+02:00 Chapman Flack <chap@anastigmatix.net>:
On 08/30/2018 09:57 AM, Hubert Zhang wrote:

> 2 Keep one worker process for each database. But using a parent/global
> quota worker process to manage the lifecycle of database level worker
> processes. It could handle the newly created database(avoid restart
> database) and save resource when a database is not used. But this needs to
> change worker process to be hierarchical. Postmaster becomes the grandfather
>  of database level worker processes in this case.

I am using background workers this way in 9.5 at $work.

In my case, one worker lives forever, wakes up on a set period, and
starts a short-lived worker for every database, waiting for each
one before starting the next.

It was straightforward to implement. Looking back over the code,
I see the global worker assigns its own PID to worker.bgw_notify_pid
of each of its children, and also obtains a handle for each child
from RegisterDynamicBackgroundWorker().

I imagine the global quota worker would prefer to start workers
for every database and then just wait for notifications from any
of them, but that seems equally straightforward at first glance.

There are servers with thousands databases. Worker per database is not good idea.

It should to share ideas, code with autovacuum process.

Not sure, how to effective implementation based on bg workers can be. On servers with large set of databases, large set of tables it can identify too big table too late.

Isn't better to implement some quotas on storage level?

FWIW, we are super interested in this feature at a role level especially in reporting systems where users can for example accidentally start a create table statement overnight and sure enough after their Cartesian product generated 2TB of data they run the system out of disk.

That problem is really hard to manage currently and this feature would really solve that.

Thanks,
Jeremy

Re: Proposal for disk quota feature

From
Pavel Stehule
Date:


pá 21. 9. 2018 v 16:21 odesílatel Hubert Zhang <hzhang@pivotal.io> napsal:
just fast reaction - why QUOTA object?
Isn't ALTER SET enough?
Some like
ALTER TABLE a1 SET quote = 1MB;
ALTER USER ...
ALTER SCHEMA ..
New DDL commans looks like too hard hammer .

It's an option. Prefer to consider quota setting store together:
CREATE DISK QUOTA way is more nature to store quota setting in a separate pg_diskquota catalog
While ALTER SET way is more close to store quota setting in pg_class, pg_role, pg_namespace. etc in an integrated way.
(Note that here I mean nature/close is not must, ALTER SET could also store in pg_diskquota and vice versa.)

I have not a problem with new special table for storing this information. But it looks like redundant to current GUC configuration and limits. Can be messy do some work with ALTER ROLE, and some work via CREATE QUOTE.

Regards

Pavel


Here are some differences I can think of:
1 pg_role is a global catalog, not per database level. It's harder to tracker the user's disk usage in the whole clusters(considering 1000+ databases).  So the semantic of  CREATE DISK QUOTA ON USER is limited: it only tracks the user's disk usage inside the current database.
2 using separate pg_diskquota could add more field except for quota limit without adding too many fields in pg_class, e.g. red zone to give the user a warning or the current disk usage of the db objects.

On Fri, Sep 21, 2018 at 8:01 PM Pavel Stehule <pavel.stehule@gmail.com> wrote:


pá 21. 9. 2018 v 13:32 odesílatel Hubert Zhang <hzhang@pivotal.io> napsal:

Hi all,

We redesign disk quota feature based on the comments from Pavel Stehule and Chapman Flack. Here are the new design.

Overview

Basically,  disk quota feature is used to support multi-tenancy environment, different level of database objects could be set a quota limit to avoid over use of disk space. A common case could be as follows: DBA could enable disk quota on a specified database list. DBA could set disk quota limit for tables/schemas/roles in these databases. Separate disk quota worker process will monitor the disk usage for these objects and detect the objects which exceed their quota limit. Queries loading data into these “out of disk quota” tables/schemas/roles will be cancelled.

We are currently working at init implementation stage. We would like to propose our idea firstly and get feedbacks from community to do quick iteration.

SQL Syntax (How to use disk quota)

1 Specify the databases with disk quota enabled in GUC “diskquota_databases” in postgresql.conf and restart the database.

2 DBA could set disk quota limit for table/schema/role.

CREATE DISK QUOTA tablea1 ON TABLE a1 with (quota = ‘1MB’);

CREATE DISK QUOTA roleu1 ON USER u1 with (quota = ‘1GB’);

CREATE DISK QUOTA schemas1 ON SCHEMA s1 with (quota = ‘3MB’);


just fast reaction - why QUOTA object?

Isn't ALTER SET enough?

Some like

ALTER TABLE a1 SET quote = 1MB;
ALTER USER ...
ALTER SCHEMA ..

New DDL commans looks like too hard hammer .



3 Simulate a schema out of quota limit case: suppose table a1 and table a2 are both under schema s1.

INSERT INTO a1 SELECT generate_series(1,1000);

INSERT INTO a2 SELECT generate_series(1,3000000);

SELECT pg_sleep(5)

INSERT INTO a1 SELECT generate_series(1,1000);

ERROR:  schema's disk space quota exceeded

DROP TABLE a2;

SELECT pg_sleep(5)

INSERT INTO a1 SELECT generate_series(1,1000);

INSERT 0 1000

Architecture

Disk quota has the following components.

1. Quota Setting Store is where the disk quota setting to be stored and accessed. We plan to use catalog table pg_diskquota to store these information. pg_diskquota is like:

CATALOG(pg_diskquota,6122,DiskQuotaRelationId)

{

NameData quotaname; /* diskquota name */       

int16 quotatype; /* diskquota type name */

Oid quotatargetoid; /* diskquota target db object oid*/

int32 quotalimit; /* diskquota size limit in MB*/

int32 quotaredzone; /* diskquota redzone in MB*/

} FormData_pg_diskquota;

2. Quota Change Detector is the monitor of size change of database objects. We plan to use stat collector to detect the ‘active’ table list at initial stage. But stat collector has some limitation on finding the active table which is in a running transaction. Details see TODO section.

3. Quota Size Checker is where to calculate the size and compare with quota limit for database objects. According to Pavel’s comment, autovacuum launcher and worker process could be a good reference to disk quota. So we plan to use a disk quota launcher daemon process and several disk quota worker process to finish this work. Launcher process is responsible for starting worker process based on a user defined database list from GUC. Worker process will connect to its target database and monitor the disk usage for objects in this database. In init stage of worker process, it will call calculate_total_relation_size() to calculate the size for each user table. After init stage, worker process will refresh the disk model every N seconds. Refreshing will only recalculate the size of tables in ‘active’ table list, which is generated by Quata Change Detector to minimize the cost.

4. Quota Enforcement Operator is where to check for the quota limitation at postgres backend side. We will firstly implement it in ExecCheckRTPerms() as pre-running enforcement. It will check the disk quota of tables being inserted or updated, and report error if table’s or table’s schema’s or table’s owner’s quota limit is exceeded. As a native feature, we plan to add more checkpoint to do running query enforcement. For example, if a disk quota lefts 10MB quota, a query could insert 1GB data. This query could be allowed in pre-running enforcement check, but will be cancelled in running query enforcement check. Therefore, it can improve the accurate of disk quota usage. To achieve this, we plan to add a checkpoint in lower API such as smgr_extened. Hence, the Quota Enforcement Operator will check the disk quota usage when smgr_extened is called. If the quota is over limited, current query will be cancelled.

Highlight

1. Native feature.

Support native Create/Drop Disk Quota SQL statement.

New catalog table pg_diskquota to store disk quota setting.

2. Auto DML/DDL detection.

Table create/update/insert/delete/vacuum/truncate/drop/schema_change/owner_change,  Schema create/drop and Role create/drop will be detected by disk quota automatically.

3. Low cost disk quota checker.

Worker process of disk quota need to refresh the disk usage model every N seconds. Since recalculate the file size using stat() system call is expensive for a large number of files, we use an ‘active’ table list to reduce the real work at each iteration. A basic experiment on our init stage implementation on database with 20K tables shows that the refresh cost is 1% cpu usage and will be finished within 50ms.


Todo/Limitation

Before we propose our patch, we plan to enhance it with the following ideas:

1. Setting database list with disk quota enabled dynamically without restart database. Since we have the disk quota launcher process, it could detect the new ‘diskquota_databases’ list and start/stop the corresponding disk quota worker process.

2. Enforcement when query is running. Considering the case when there is 10MB quota left, but next query will insert 10GB data. Current enforcement design will allow this query to be executed. This is limited by the ‘active’ table detection is generated by stat collector. Postgres backend will only send table stat information to collector only when the transaction ends. We need a new way to detect the ‘active’ table even when this table is being modified inside a running transaction.

3. Monitor unlimited number of databases. Current we set the max number of disk quota worker process to be 10 to reduce the affection normal workload. But how about  if we want to monitor the disk quota of more than 10 databases? Our solution is to let disk quota launcher to manage a queue of database need to be monitored. And disk quota worker process consuming the queue and refresh the disk usage/quota for this database. After some periods, worker will return the database to the queue, and fetch the top database from queue to process. The period determine the delay of detecting disk quota change. To implement this feature, we need to support a subprocess of postmaster to rebind to another database instead of the database binded in InitPostgres().

4. Support active table detection on vacuum full and vacuum analyze. Currently vacuum full and vacuum analyze are not tracked by stat collector.

Thanks to Heikki, Pavel Stehule,Chapman Flack for the former comments on disk quota feature. Any comments on how to improve disk quota feature are appreciated.


On Mon, Sep 3, 2018 at 12:05 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:


2018-09-03 3:49 GMT+02:00 Hubert Zhang <hzhang@pivotal.io>:
Thanks Pavel.
Your patch did enforcement on storage level(md.c or we could also use smgr_extend). It's straight forward.
But I prefer to implement disk_quota as a feature with following objectives:
1 set/alter disk quota setting on different database objects, e.g. user, database, schema etc. not only a general GUC, but we could set separate quota limit for a specific objects. 
2 enforcement operator should work at two positions: before query is running and when query is running. The latter one's implementation maybe similar to your patch.

The patch was just example. The resource quotes should be more complex - per partition, table, schema, database, user - so GUC are possible, but not very user friendly.

Our case is specific, but not too much. The servers are used for multidimensional analyses - and some tables can grow too fast (COPY, INSERT SELECT). We need to solve limits immediately. The implementation is simple, so I did it. Same implementation on database level, or schema level needs some more locks, so it will not be too effective. The resource management can be complex very complex, and I expect so it will be hard work.

Regards

Pavel


On Sun, Sep 2, 2018 at 8:44 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
Hi

2018-09-02 14:18 GMT+02:00 Hubert Zhang <hzhang@pivotal.io>:
Thanks Chapman.
@Pavel,  could you please explain more about your second suggestion "implement some quotas on storage level?"

See attached patch - it is very simple - and good enough for our purposes.

Regards

Pavel

 
We will not keep the long-lived processes attach to all databases(just like you mentioned servers with thousands of databases)  
And you are right, we could share ideas with autovacuum process, fork worker processes in need. 
"autovacuum checks for tables that have had a large number of inserted, updated or deleted tuples. These checks use the statistics collection facility"
diskquota process is similar to autovacuum at caring about insert, but the difference is that it also care about vucuum full, truncate and drop. While update and delete may not be interested since no file change happens. So a separate diskquota process is preferred.

So if we implemented disk quota as a full native feature, and in the first initial version I prefer to implement the following features:
1 Fork diskquota launcher process under Postmaster serverloop, which is long-lived.
2 Diskquota launcher process is responsible for creating diskquota worker process for every database.
3 DIskquota setting is stored in a separate catalog table for each database.
4 Initialization stage, Diskquota launcher process creates diskquota worker process for all the databases(traverse like autovacuum). Worker process calculates disk usage of db objects and their diskquota setting. If any db object exceeds its quota limit, put them into the blacklist in the shared memory, which will later be used by enforcement operator. Worker process exits when works are done.
5 Running stage, Diskquota launcher process creates diskquota worker process for the database with a large number of insert, copy, truncate, drop etc. or create disk quota statement. Worker process updates the file size for db objects containing the result relation, and compare with the diskquota setting. Again, if exceeds quota limit, put them into blacklist, remove from blacklist vice versa. Worker process exits when works are done and a GUC could control the frequency of worker process restart to a specific database. As you know, this GUC also controls the delay when we do enforcement.
6 Enforcement. When postgres backend executes queries, check the blacklist in shared memory to determine whether the query is allowed(before execute) or need rollback(is executing)?

If we implemented disk quota as an extension, we could just use background worker to start diskquota launcher process and use RegisterDynamicBackgroundWorker() to fork child diskquota worker processes by the launcher process as suggested by @Chapman. Diskquota setting could be stored in user table in a separate schema for each database(Schema and table created by create extension statement) just like what Heikki has done in pg_quota project. But in this case, we need to create extension for each database before diskquota worker process can be set up for that database.

Any comments on the above design and which is preferred, native feature or extension as the POC? 


-- Hubert



On Fri, Aug 31, 2018 at 3:32 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:


2018-08-30 16:22 GMT+02:00 Chapman Flack <chap@anastigmatix.net>:
On 08/30/2018 09:57 AM, Hubert Zhang wrote:

> 2 Keep one worker process for each database. But using a parent/global
> quota worker process to manage the lifecycle of database level worker
> processes. It could handle the newly created database(avoid restart
> database) and save resource when a database is not used. But this needs to
> change worker process to be hierarchical. Postmaster becomes the grandfather
>  of database level worker processes in this case.

I am using background workers this way in 9.5 at $work.

In my case, one worker lives forever, wakes up on a set period, and
starts a short-lived worker for every database, waiting for each
one before starting the next.

It was straightforward to implement. Looking back over the code,
I see the global worker assigns its own PID to worker.bgw_notify_pid
of each of its children, and also obtains a handle for each child
from RegisterDynamicBackgroundWorker().

I imagine the global quota worker would prefer to start workers
for every database and then just wait for notifications from any
of them, but that seems equally straightforward at first glance.

There are servers with thousands databases. Worker per database is not good idea.

It should to share ideas, code with autovacuum process.

Not sure, how to effective implementation based on bg workers can be. On servers with large set of databases, large set of tables it can identify too big table too late.

Isn't better to implement some quotas on storage level?

Regards

Pavel



-Chap





--
Thanks

Hubert Zhang




--
Thanks

Hubert Zhang




--
Thanks

Hubert Zhang


--
Thanks

Hubert Zhang

Re: Proposal for disk quota feature

From
Hubert Zhang
Date:
But it looks like redundant to current GUC configuration and limits
what do you mean by current GUC configuration? Is that the general block number limit in your patch? If yes, the difference between GUC and pg_diskquota catalog is that pg_diskquota will store different quota limit for the different role, schema or table instead of a single GUC value.

On Sat, Sep 22, 2018 at 11:17 AM Pavel Stehule <pavel.stehule@gmail.com> wrote:


pá 21. 9. 2018 v 16:21 odesílatel Hubert Zhang <hzhang@pivotal.io> napsal:
just fast reaction - why QUOTA object?
Isn't ALTER SET enough?
Some like
ALTER TABLE a1 SET quote = 1MB;
ALTER USER ...
ALTER SCHEMA ..
New DDL commans looks like too hard hammer .

It's an option. Prefer to consider quota setting store together:
CREATE DISK QUOTA way is more nature to store quota setting in a separate pg_diskquota catalog
While ALTER SET way is more close to store quota setting in pg_class, pg_role, pg_namespace. etc in an integrated way.
(Note that here I mean nature/close is not must, ALTER SET could also store in pg_diskquota and vice versa.)

I have not a problem with new special table for storing this information. But it looks like redundant to current GUC configuration and limits. Can be messy do some work with ALTER ROLE, and some work via CREATE QUOTE.

Regards

Pavel


Here are some differences I can think of:
1 pg_role is a global catalog, not per database level. It's harder to tracker the user's disk usage in the whole clusters(considering 1000+ databases).  So the semantic of  CREATE DISK QUOTA ON USER is limited: it only tracks the user's disk usage inside the current database.
2 using separate pg_diskquota could add more field except for quota limit without adding too many fields in pg_class, e.g. red zone to give the user a warning or the current disk usage of the db objects.

On Fri, Sep 21, 2018 at 8:01 PM Pavel Stehule <pavel.stehule@gmail.com> wrote:


pá 21. 9. 2018 v 13:32 odesílatel Hubert Zhang <hzhang@pivotal.io> napsal:

Hi all,

We redesign disk quota feature based on the comments from Pavel Stehule and Chapman Flack. Here are the new design.

Overview

Basically,  disk quota feature is used to support multi-tenancy environment, different level of database objects could be set a quota limit to avoid over use of disk space. A common case could be as follows: DBA could enable disk quota on a specified database list. DBA could set disk quota limit for tables/schemas/roles in these databases. Separate disk quota worker process will monitor the disk usage for these objects and detect the objects which exceed their quota limit. Queries loading data into these “out of disk quota” tables/schemas/roles will be cancelled.

We are currently working at init implementation stage. We would like to propose our idea firstly and get feedbacks from community to do quick iteration.

SQL Syntax (How to use disk quota)

1 Specify the databases with disk quota enabled in GUC “diskquota_databases” in postgresql.conf and restart the database.

2 DBA could set disk quota limit for table/schema/role.

CREATE DISK QUOTA tablea1 ON TABLE a1 with (quota = ‘1MB’);

CREATE DISK QUOTA roleu1 ON USER u1 with (quota = ‘1GB’);

CREATE DISK QUOTA schemas1 ON SCHEMA s1 with (quota = ‘3MB’);


just fast reaction - why QUOTA object?

Isn't ALTER SET enough?

Some like

ALTER TABLE a1 SET quote = 1MB;
ALTER USER ...
ALTER SCHEMA ..

New DDL commans looks like too hard hammer .



3 Simulate a schema out of quota limit case: suppose table a1 and table a2 are both under schema s1.

INSERT INTO a1 SELECT generate_series(1,1000);

INSERT INTO a2 SELECT generate_series(1,3000000);

SELECT pg_sleep(5)

INSERT INTO a1 SELECT generate_series(1,1000);

ERROR:  schema's disk space quota exceeded

DROP TABLE a2;

SELECT pg_sleep(5)

INSERT INTO a1 SELECT generate_series(1,1000);

INSERT 0 1000

Architecture

Disk quota has the following components.

1. Quota Setting Store is where the disk quota setting to be stored and accessed. We plan to use catalog table pg_diskquota to store these information. pg_diskquota is like:

CATALOG(pg_diskquota,6122,DiskQuotaRelationId)

{

NameData quotaname; /* diskquota name */       

int16 quotatype; /* diskquota type name */

Oid quotatargetoid; /* diskquota target db object oid*/

int32 quotalimit; /* diskquota size limit in MB*/

int32 quotaredzone; /* diskquota redzone in MB*/

} FormData_pg_diskquota;

2. Quota Change Detector is the monitor of size change of database objects. We plan to use stat collector to detect the ‘active’ table list at initial stage. But stat collector has some limitation on finding the active table which is in a running transaction. Details see TODO section.

3. Quota Size Checker is where to calculate the size and compare with quota limit for database objects. According to Pavel’s comment, autovacuum launcher and worker process could be a good reference to disk quota. So we plan to use a disk quota launcher daemon process and several disk quota worker process to finish this work. Launcher process is responsible for starting worker process based on a user defined database list from GUC. Worker process will connect to its target database and monitor the disk usage for objects in this database. In init stage of worker process, it will call calculate_total_relation_size() to calculate the size for each user table. After init stage, worker process will refresh the disk model every N seconds. Refreshing will only recalculate the size of tables in ‘active’ table list, which is generated by Quata Change Detector to minimize the cost.

4. Quota Enforcement Operator is where to check for the quota limitation at postgres backend side. We will firstly implement it in ExecCheckRTPerms() as pre-running enforcement. It will check the disk quota of tables being inserted or updated, and report error if table’s or table’s schema’s or table’s owner’s quota limit is exceeded. As a native feature, we plan to add more checkpoint to do running query enforcement. For example, if a disk quota lefts 10MB quota, a query could insert 1GB data. This query could be allowed in pre-running enforcement check, but will be cancelled in running query enforcement check. Therefore, it can improve the accurate of disk quota usage. To achieve this, we plan to add a checkpoint in lower API such as smgr_extened. Hence, the Quota Enforcement Operator will check the disk quota usage when smgr_extened is called. If the quota is over limited, current query will be cancelled.

Highlight

1. Native feature.

Support native Create/Drop Disk Quota SQL statement.

New catalog table pg_diskquota to store disk quota setting.

2. Auto DML/DDL detection.

Table create/update/insert/delete/vacuum/truncate/drop/schema_change/owner_change,  Schema create/drop and Role create/drop will be detected by disk quota automatically.

3. Low cost disk quota checker.

Worker process of disk quota need to refresh the disk usage model every N seconds. Since recalculate the file size using stat() system call is expensive for a large number of files, we use an ‘active’ table list to reduce the real work at each iteration. A basic experiment on our init stage implementation on database with 20K tables shows that the refresh cost is 1% cpu usage and will be finished within 50ms.


Todo/Limitation

Before we propose our patch, we plan to enhance it with the following ideas:

1. Setting database list with disk quota enabled dynamically without restart database. Since we have the disk quota launcher process, it could detect the new ‘diskquota_databases’ list and start/stop the corresponding disk quota worker process.

2. Enforcement when query is running. Considering the case when there is 10MB quota left, but next query will insert 10GB data. Current enforcement design will allow this query to be executed. This is limited by the ‘active’ table detection is generated by stat collector. Postgres backend will only send table stat information to collector only when the transaction ends. We need a new way to detect the ‘active’ table even when this table is being modified inside a running transaction.

3. Monitor unlimited number of databases. Current we set the max number of disk quota worker process to be 10 to reduce the affection normal workload. But how about  if we want to monitor the disk quota of more than 10 databases? Our solution is to let disk quota launcher to manage a queue of database need to be monitored. And disk quota worker process consuming the queue and refresh the disk usage/quota for this database. After some periods, worker will return the database to the queue, and fetch the top database from queue to process. The period determine the delay of detecting disk quota change. To implement this feature, we need to support a subprocess of postmaster to rebind to another database instead of the database binded in InitPostgres().

4. Support active table detection on vacuum full and vacuum analyze. Currently vacuum full and vacuum analyze are not tracked by stat collector.

Thanks to Heikki, Pavel Stehule,Chapman Flack for the former comments on disk quota feature. Any comments on how to improve disk quota feature are appreciated.


On Mon, Sep 3, 2018 at 12:05 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:


2018-09-03 3:49 GMT+02:00 Hubert Zhang <hzhang@pivotal.io>:
Thanks Pavel.
Your patch did enforcement on storage level(md.c or we could also use smgr_extend). It's straight forward.
But I prefer to implement disk_quota as a feature with following objectives:
1 set/alter disk quota setting on different database objects, e.g. user, database, schema etc. not only a general GUC, but we could set separate quota limit for a specific objects. 
2 enforcement operator should work at two positions: before query is running and when query is running. The latter one's implementation maybe similar to your patch.

The patch was just example. The resource quotes should be more complex - per partition, table, schema, database, user - so GUC are possible, but not very user friendly.

Our case is specific, but not too much. The servers are used for multidimensional analyses - and some tables can grow too fast (COPY, INSERT SELECT). We need to solve limits immediately. The implementation is simple, so I did it. Same implementation on database level, or schema level needs some more locks, so it will not be too effective. The resource management can be complex very complex, and I expect so it will be hard work.

Regards

Pavel


On Sun, Sep 2, 2018 at 8:44 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
Hi

2018-09-02 14:18 GMT+02:00 Hubert Zhang <hzhang@pivotal.io>:
Thanks Chapman.
@Pavel,  could you please explain more about your second suggestion "implement some quotas on storage level?"

See attached patch - it is very simple - and good enough for our purposes.

Regards

Pavel

 
We will not keep the long-lived processes attach to all databases(just like you mentioned servers with thousands of databases)  
And you are right, we could share ideas with autovacuum process, fork worker processes in need. 
"autovacuum checks for tables that have had a large number of inserted, updated or deleted tuples. These checks use the statistics collection facility"
diskquota process is similar to autovacuum at caring about insert, but the difference is that it also care about vucuum full, truncate and drop. While update and delete may not be interested since no file change happens. So a separate diskquota process is preferred.

So if we implemented disk quota as a full native feature, and in the first initial version I prefer to implement the following features:
1 Fork diskquota launcher process under Postmaster serverloop, which is long-lived.
2 Diskquota launcher process is responsible for creating diskquota worker process for every database.
3 DIskquota setting is stored in a separate catalog table for each database.
4 Initialization stage, Diskquota launcher process creates diskquota worker process for all the databases(traverse like autovacuum). Worker process calculates disk usage of db objects and their diskquota setting. If any db object exceeds its quota limit, put them into the blacklist in the shared memory, which will later be used by enforcement operator. Worker process exits when works are done.
5 Running stage, Diskquota launcher process creates diskquota worker process for the database with a large number of insert, copy, truncate, drop etc. or create disk quota statement. Worker process updates the file size for db objects containing the result relation, and compare with the diskquota setting. Again, if exceeds quota limit, put them into blacklist, remove from blacklist vice versa. Worker process exits when works are done and a GUC could control the frequency of worker process restart to a specific database. As you know, this GUC also controls the delay when we do enforcement.
6 Enforcement. When postgres backend executes queries, check the blacklist in shared memory to determine whether the query is allowed(before execute) or need rollback(is executing)?

If we implemented disk quota as an extension, we could just use background worker to start diskquota launcher process and use RegisterDynamicBackgroundWorker() to fork child diskquota worker processes by the launcher process as suggested by @Chapman. Diskquota setting could be stored in user table in a separate schema for each database(Schema and table created by create extension statement) just like what Heikki has done in pg_quota project. But in this case, we need to create extension for each database before diskquota worker process can be set up for that database.

Any comments on the above design and which is preferred, native feature or extension as the POC? 


-- Hubert



On Fri, Aug 31, 2018 at 3:32 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:


2018-08-30 16:22 GMT+02:00 Chapman Flack <chap@anastigmatix.net>:
On 08/30/2018 09:57 AM, Hubert Zhang wrote:

> 2 Keep one worker process for each database. But using a parent/global
> quota worker process to manage the lifecycle of database level worker
> processes. It could handle the newly created database(avoid restart
> database) and save resource when a database is not used. But this needs to
> change worker process to be hierarchical. Postmaster becomes the grandfather
>  of database level worker processes in this case.

I am using background workers this way in 9.5 at $work.

In my case, one worker lives forever, wakes up on a set period, and
starts a short-lived worker for every database, waiting for each
one before starting the next.

It was straightforward to implement. Looking back over the code,
I see the global worker assigns its own PID to worker.bgw_notify_pid
of each of its children, and also obtains a handle for each child
from RegisterDynamicBackgroundWorker().

I imagine the global quota worker would prefer to start workers
for every database and then just wait for notifications from any
of them, but that seems equally straightforward at first glance.

There are servers with thousands databases. Worker per database is not good idea.

It should to share ideas, code with autovacuum process.

Not sure, how to effective implementation based on bg workers can be. On servers with large set of databases, large set of tables it can identify too big table too late.

Isn't better to implement some quotas on storage level?

Regards

Pavel



-Chap





--
Thanks

Hubert Zhang




--
Thanks

Hubert Zhang




--
Thanks

Hubert Zhang


--
Thanks

Hubert Zhang


--
Thanks

Hubert Zhang

Re: Proposal for disk quota feature

From
Pavel Stehule
Date:


so 22. 9. 2018 v 8:48 odesílatel Hubert Zhang <hzhang@pivotal.io> napsal:
But it looks like redundant to current GUC configuration and limits
what do you mean by current GUC configuration? Is that the general block number limit in your patch? If yes, the difference between GUC and pg_diskquota catalog is that pg_diskquota will store different quota limit for the different role, schema or table instead of a single GUC value.

storage is not relevant in this moment.

I don't see to consistent to sets some limits via SET command, or ALTER X SET, and some other with CREATE QUOTA ON.

The quotas or object limits, resource limits are pretty useful and necessary, but I don't see these like new type of objects, it is much more some property of current objects. Because we have one syntax for this purpose I prefer it. Because is not good to have two syntaxes for similar purpose.

So instead CREATE DISC QUATA ON SCHEMA xxx some value I prefer

ALTER SCHEMA xxx SET disc_quota = xxx;

The functionality is +/- same. But ALTER XX SET was introduce first, and I don't feel comfortable to have any new syntax for similar purpose

Regards

Pavel



 

On Sat, Sep 22, 2018 at 11:17 AM Pavel Stehule <pavel.stehule@gmail.com> wrote:


pá 21. 9. 2018 v 16:21 odesílatel Hubert Zhang <hzhang@pivotal.io> napsal:
just fast reaction - why QUOTA object?
Isn't ALTER SET enough?
Some like
ALTER TABLE a1 SET quote = 1MB;
ALTER USER ...
ALTER SCHEMA ..
New DDL commans looks like too hard hammer .

It's an option. Prefer to consider quota setting store together:
CREATE DISK QUOTA way is more nature to store quota setting in a separate pg_diskquota catalog
While ALTER SET way is more close to store quota setting in pg_class, pg_role, pg_namespace. etc in an integrated way.
(Note that here I mean nature/close is not must, ALTER SET could also store in pg_diskquota and vice versa.)

I have not a problem with new special table for storing this information. But it looks like redundant to current GUC configuration and limits. Can be messy do some work with ALTER ROLE, and some work via CREATE QUOTE.

Regards

Pavel


Here are some differences I can think of:
1 pg_role is a global catalog, not per database level. It's harder to tracker the user's disk usage in the whole clusters(considering 1000+ databases).  So the semantic of  CREATE DISK QUOTA ON USER is limited: it only tracks the user's disk usage inside the current database.
2 using separate pg_diskquota could add more field except for quota limit without adding too many fields in pg_class, e.g. red zone to give the user a warning or the current disk usage of the db objects.

On Fri, Sep 21, 2018 at 8:01 PM Pavel Stehule <pavel.stehule@gmail.com> wrote:


pá 21. 9. 2018 v 13:32 odesílatel Hubert Zhang <hzhang@pivotal.io> napsal:

Hi all,

We redesign disk quota feature based on the comments from Pavel Stehule and Chapman Flack. Here are the new design.

Overview

Basically,  disk quota feature is used to support multi-tenancy environment, different level of database objects could be set a quota limit to avoid over use of disk space. A common case could be as follows: DBA could enable disk quota on a specified database list. DBA could set disk quota limit for tables/schemas/roles in these databases. Separate disk quota worker process will monitor the disk usage for these objects and detect the objects which exceed their quota limit. Queries loading data into these “out of disk quota” tables/schemas/roles will be cancelled.

We are currently working at init implementation stage. We would like to propose our idea firstly and get feedbacks from community to do quick iteration.

SQL Syntax (How to use disk quota)

1 Specify the databases with disk quota enabled in GUC “diskquota_databases” in postgresql.conf and restart the database.

2 DBA could set disk quota limit for table/schema/role.

CREATE DISK QUOTA tablea1 ON TABLE a1 with (quota = ‘1MB’);

CREATE DISK QUOTA roleu1 ON USER u1 with (quota = ‘1GB’);

CREATE DISK QUOTA schemas1 ON SCHEMA s1 with (quota = ‘3MB’);


just fast reaction - why QUOTA object?

Isn't ALTER SET enough?

Some like

ALTER TABLE a1 SET quote = 1MB;
ALTER USER ...
ALTER SCHEMA ..

New DDL commans looks like too hard hammer .



3 Simulate a schema out of quota limit case: suppose table a1 and table a2 are both under schema s1.

INSERT INTO a1 SELECT generate_series(1,1000);

INSERT INTO a2 SELECT generate_series(1,3000000);

SELECT pg_sleep(5)

INSERT INTO a1 SELECT generate_series(1,1000);

ERROR:  schema's disk space quota exceeded

DROP TABLE a2;

SELECT pg_sleep(5)

INSERT INTO a1 SELECT generate_series(1,1000);

INSERT 0 1000

Architecture

Disk quota has the following components.

1. Quota Setting Store is where the disk quota setting to be stored and accessed. We plan to use catalog table pg_diskquota to store these information. pg_diskquota is like:

CATALOG(pg_diskquota,6122,DiskQuotaRelationId)

{

NameData quotaname; /* diskquota name */       

int16 quotatype; /* diskquota type name */

Oid quotatargetoid; /* diskquota target db object oid*/

int32 quotalimit; /* diskquota size limit in MB*/

int32 quotaredzone; /* diskquota redzone in MB*/

} FormData_pg_diskquota;

2. Quota Change Detector is the monitor of size change of database objects. We plan to use stat collector to detect the ‘active’ table list at initial stage. But stat collector has some limitation on finding the active table which is in a running transaction. Details see TODO section.

3. Quota Size Checker is where to calculate the size and compare with quota limit for database objects. According to Pavel’s comment, autovacuum launcher and worker process could be a good reference to disk quota. So we plan to use a disk quota launcher daemon process and several disk quota worker process to finish this work. Launcher process is responsible for starting worker process based on a user defined database list from GUC. Worker process will connect to its target database and monitor the disk usage for objects in this database. In init stage of worker process, it will call calculate_total_relation_size() to calculate the size for each user table. After init stage, worker process will refresh the disk model every N seconds. Refreshing will only recalculate the size of tables in ‘active’ table list, which is generated by Quata Change Detector to minimize the cost.

4. Quota Enforcement Operator is where to check for the quota limitation at postgres backend side. We will firstly implement it in ExecCheckRTPerms() as pre-running enforcement. It will check the disk quota of tables being inserted or updated, and report error if table’s or table’s schema’s or table’s owner’s quota limit is exceeded. As a native feature, we plan to add more checkpoint to do running query enforcement. For example, if a disk quota lefts 10MB quota, a query could insert 1GB data. This query could be allowed in pre-running enforcement check, but will be cancelled in running query enforcement check. Therefore, it can improve the accurate of disk quota usage. To achieve this, we plan to add a checkpoint in lower API such as smgr_extened. Hence, the Quota Enforcement Operator will check the disk quota usage when smgr_extened is called. If the quota is over limited, current query will be cancelled.

Highlight

1. Native feature.

Support native Create/Drop Disk Quota SQL statement.

New catalog table pg_diskquota to store disk quota setting.

2. Auto DML/DDL detection.

Table create/update/insert/delete/vacuum/truncate/drop/schema_change/owner_change,  Schema create/drop and Role create/drop will be detected by disk quota automatically.

3. Low cost disk quota checker.

Worker process of disk quota need to refresh the disk usage model every N seconds. Since recalculate the file size using stat() system call is expensive for a large number of files, we use an ‘active’ table list to reduce the real work at each iteration. A basic experiment on our init stage implementation on database with 20K tables shows that the refresh cost is 1% cpu usage and will be finished within 50ms.


Todo/Limitation

Before we propose our patch, we plan to enhance it with the following ideas:

1. Setting database list with disk quota enabled dynamically without restart database. Since we have the disk quota launcher process, it could detect the new ‘diskquota_databases’ list and start/stop the corresponding disk quota worker process.

2. Enforcement when query is running. Considering the case when there is 10MB quota left, but next query will insert 10GB data. Current enforcement design will allow this query to be executed. This is limited by the ‘active’ table detection is generated by stat collector. Postgres backend will only send table stat information to collector only when the transaction ends. We need a new way to detect the ‘active’ table even when this table is being modified inside a running transaction.

3. Monitor unlimited number of databases. Current we set the max number of disk quota worker process to be 10 to reduce the affection normal workload. But how about  if we want to monitor the disk quota of more than 10 databases? Our solution is to let disk quota launcher to manage a queue of database need to be monitored. And disk quota worker process consuming the queue and refresh the disk usage/quota for this database. After some periods, worker will return the database to the queue, and fetch the top database from queue to process. The period determine the delay of detecting disk quota change. To implement this feature, we need to support a subprocess of postmaster to rebind to another database instead of the database binded in InitPostgres().

4. Support active table detection on vacuum full and vacuum analyze. Currently vacuum full and vacuum analyze are not tracked by stat collector.

Thanks to Heikki, Pavel Stehule,Chapman Flack for the former comments on disk quota feature. Any comments on how to improve disk quota feature are appreciated.


On Mon, Sep 3, 2018 at 12:05 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:


2018-09-03 3:49 GMT+02:00 Hubert Zhang <hzhang@pivotal.io>:
Thanks Pavel.
Your patch did enforcement on storage level(md.c or we could also use smgr_extend). It's straight forward.
But I prefer to implement disk_quota as a feature with following objectives:
1 set/alter disk quota setting on different database objects, e.g. user, database, schema etc. not only a general GUC, but we could set separate quota limit for a specific objects. 
2 enforcement operator should work at two positions: before query is running and when query is running. The latter one's implementation maybe similar to your patch.

The patch was just example. The resource quotes should be more complex - per partition, table, schema, database, user - so GUC are possible, but not very user friendly.

Our case is specific, but not too much. The servers are used for multidimensional analyses - and some tables can grow too fast (COPY, INSERT SELECT). We need to solve limits immediately. The implementation is simple, so I did it. Same implementation on database level, or schema level needs some more locks, so it will not be too effective. The resource management can be complex very complex, and I expect so it will be hard work.

Regards

Pavel


On Sun, Sep 2, 2018 at 8:44 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
Hi

2018-09-02 14:18 GMT+02:00 Hubert Zhang <hzhang@pivotal.io>:
Thanks Chapman.
@Pavel,  could you please explain more about your second suggestion "implement some quotas on storage level?"

See attached patch - it is very simple - and good enough for our purposes.

Regards

Pavel

 
We will not keep the long-lived processes attach to all databases(just like you mentioned servers with thousands of databases)  
And you are right, we could share ideas with autovacuum process, fork worker processes in need. 
"autovacuum checks for tables that have had a large number of inserted, updated or deleted tuples. These checks use the statistics collection facility"
diskquota process is similar to autovacuum at caring about insert, but the difference is that it also care about vucuum full, truncate and drop. While update and delete may not be interested since no file change happens. So a separate diskquota process is preferred.

So if we implemented disk quota as a full native feature, and in the first initial version I prefer to implement the following features:
1 Fork diskquota launcher process under Postmaster serverloop, which is long-lived.
2 Diskquota launcher process is responsible for creating diskquota worker process for every database.
3 DIskquota setting is stored in a separate catalog table for each database.
4 Initialization stage, Diskquota launcher process creates diskquota worker process for all the databases(traverse like autovacuum). Worker process calculates disk usage of db objects and their diskquota setting. If any db object exceeds its quota limit, put them into the blacklist in the shared memory, which will later be used by enforcement operator. Worker process exits when works are done.
5 Running stage, Diskquota launcher process creates diskquota worker process for the database with a large number of insert, copy, truncate, drop etc. or create disk quota statement. Worker process updates the file size for db objects containing the result relation, and compare with the diskquota setting. Again, if exceeds quota limit, put them into blacklist, remove from blacklist vice versa. Worker process exits when works are done and a GUC could control the frequency of worker process restart to a specific database. As you know, this GUC also controls the delay when we do enforcement.
6 Enforcement. When postgres backend executes queries, check the blacklist in shared memory to determine whether the query is allowed(before execute) or need rollback(is executing)?

If we implemented disk quota as an extension, we could just use background worker to start diskquota launcher process and use RegisterDynamicBackgroundWorker() to fork child diskquota worker processes by the launcher process as suggested by @Chapman. Diskquota setting could be stored in user table in a separate schema for each database(Schema and table created by create extension statement) just like what Heikki has done in pg_quota project. But in this case, we need to create extension for each database before diskquota worker process can be set up for that database.

Any comments on the above design and which is preferred, native feature or extension as the POC? 


-- Hubert



On Fri, Aug 31, 2018 at 3:32 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:


2018-08-30 16:22 GMT+02:00 Chapman Flack <chap@anastigmatix.net>:
On 08/30/2018 09:57 AM, Hubert Zhang wrote:

> 2 Keep one worker process for each database. But using a parent/global
> quota worker process to manage the lifecycle of database level worker
> processes. It could handle the newly created database(avoid restart
> database) and save resource when a database is not used. But this needs to
> change worker process to be hierarchical. Postmaster becomes the grandfather
>  of database level worker processes in this case.

I am using background workers this way in 9.5 at $work.

In my case, one worker lives forever, wakes up on a set period, and
starts a short-lived worker for every database, waiting for each
one before starting the next.

It was straightforward to implement. Looking back over the code,
I see the global worker assigns its own PID to worker.bgw_notify_pid
of each of its children, and also obtains a handle for each child
from RegisterDynamicBackgroundWorker().

I imagine the global quota worker would prefer to start workers
for every database and then just wait for notifications from any
of them, but that seems equally straightforward at first glance.

There are servers with thousands databases. Worker per database is not good idea.

It should to share ideas, code with autovacuum process.

Not sure, how to effective implementation based on bg workers can be. On servers with large set of databases, large set of tables it can identify too big table too late.

Isn't better to implement some quotas on storage level?

Regards

Pavel



-Chap





--
Thanks

Hubert Zhang




--
Thanks

Hubert Zhang




--
Thanks

Hubert Zhang


--
Thanks

Hubert Zhang


--
Thanks

Hubert Zhang

Re: Proposal for disk quota feature

From
Hubert Zhang
Date:
The quotas or object limits, resource limits are pretty useful and necessary, but I don't see these like new type of objects, it is much more some property of current objects. Because we have one syntax for this purpose I prefer it. Because is not good to have two syntaxes for similar purpose.
SCHEMA and TABLE are OK for me, But as I mentioned before, ROLE is a special case when using ALTER SET at this moment.
TABLE and SCHEMA are both database level, e.g. pg_class and pg_namespace both residents in one database. But ROLE is cluster-level. They don't belong to a database. ALTER ROLE XXX SET disk_quota = xxx means to set the quota for the user on all the databases in the first glance. But in our first stage design, ROLE's quota is bind to a specific database. E.g. Role Jack could have 10GB quota on database A and 2GB quota on database B. 

SQL syntax is not hard to modify,  I don't think this should block the main design of disk quota feature. Is there any comment on the design and architecture? If no, we'll firstly submit our patch and involve more discussion?

On Sat, Sep 22, 2018 at 3:03 PM Pavel Stehule <pavel.stehule@gmail.com> wrote:


so 22. 9. 2018 v 8:48 odesílatel Hubert Zhang <hzhang@pivotal.io> napsal:
But it looks like redundant to current GUC configuration and limits
what do you mean by current GUC configuration? Is that the general block number limit in your patch? If yes, the difference between GUC and pg_diskquota catalog is that pg_diskquota will store different quota limit for the different role, schema or table instead of a single GUC value.

storage is not relevant in this moment.

I don't see to consistent to sets some limits via SET command, or ALTER X SET, and some other with CREATE QUOTA ON.

The quotas or object limits, resource limits are pretty useful and necessary, but I don't see these like new type of objects, it is much more some property of current objects. Because we have one syntax for this purpose I prefer it. Because is not good to have two syntaxes for similar purpose.

So instead CREATE DISC QUATA ON SCHEMA xxx some value I prefer

ALTER SCHEMA xxx SET disc_quota = xxx;

The functionality is +/- same. But ALTER XX SET was introduce first, and I don't feel comfortable to have any new syntax for similar purpose

Regards

Pavel



 

On Sat, Sep 22, 2018 at 11:17 AM Pavel Stehule <pavel.stehule@gmail.com> wrote:


pá 21. 9. 2018 v 16:21 odesílatel Hubert Zhang <hzhang@pivotal.io> napsal:
just fast reaction - why QUOTA object?
Isn't ALTER SET enough?
Some like
ALTER TABLE a1 SET quote = 1MB;
ALTER USER ...
ALTER SCHEMA ..
New DDL commans looks like too hard hammer .

It's an option. Prefer to consider quota setting store together:
CREATE DISK QUOTA way is more nature to store quota setting in a separate pg_diskquota catalog
While ALTER SET way is more close to store quota setting in pg_class, pg_role, pg_namespace. etc in an integrated way.
(Note that here I mean nature/close is not must, ALTER SET could also store in pg_diskquota and vice versa.)

I have not a problem with new special table for storing this information. But it looks like redundant to current GUC configuration and limits. Can be messy do some work with ALTER ROLE, and some work via CREATE QUOTE.

Regards

Pavel


Here are some differences I can think of:
1 pg_role is a global catalog, not per database level. It's harder to tracker the user's disk usage in the whole clusters(considering 1000+ databases).  So the semantic of  CREATE DISK QUOTA ON USER is limited: it only tracks the user's disk usage inside the current database.
2 using separate pg_diskquota could add more field except for quota limit without adding too many fields in pg_class, e.g. red zone to give the user a warning or the current disk usage of the db objects.

On Fri, Sep 21, 2018 at 8:01 PM Pavel Stehule <pavel.stehule@gmail.com> wrote:


pá 21. 9. 2018 v 13:32 odesílatel Hubert Zhang <hzhang@pivotal.io> napsal:

Hi all,

We redesign disk quota feature based on the comments from Pavel Stehule and Chapman Flack. Here are the new design.

Overview

Basically,  disk quota feature is used to support multi-tenancy environment, different level of database objects could be set a quota limit to avoid over use of disk space. A common case could be as follows: DBA could enable disk quota on a specified database list. DBA could set disk quota limit for tables/schemas/roles in these databases. Separate disk quota worker process will monitor the disk usage for these objects and detect the objects which exceed their quota limit. Queries loading data into these “out of disk quota” tables/schemas/roles will be cancelled.

We are currently working at init implementation stage. We would like to propose our idea firstly and get feedbacks from community to do quick iteration.

SQL Syntax (How to use disk quota)

1 Specify the databases with disk quota enabled in GUC “diskquota_databases” in postgresql.conf and restart the database.

2 DBA could set disk quota limit for table/schema/role.

CREATE DISK QUOTA tablea1 ON TABLE a1 with (quota = ‘1MB’);

CREATE DISK QUOTA roleu1 ON USER u1 with (quota = ‘1GB’);

CREATE DISK QUOTA schemas1 ON SCHEMA s1 with (quota = ‘3MB’);


just fast reaction - why QUOTA object?

Isn't ALTER SET enough?

Some like

ALTER TABLE a1 SET quote = 1MB;
ALTER USER ...
ALTER SCHEMA ..

New DDL commans looks like too hard hammer .



3 Simulate a schema out of quota limit case: suppose table a1 and table a2 are both under schema s1.

INSERT INTO a1 SELECT generate_series(1,1000);

INSERT INTO a2 SELECT generate_series(1,3000000);

SELECT pg_sleep(5)

INSERT INTO a1 SELECT generate_series(1,1000);

ERROR:  schema's disk space quota exceeded

DROP TABLE a2;

SELECT pg_sleep(5)

INSERT INTO a1 SELECT generate_series(1,1000);

INSERT 0 1000

Architecture

Disk quota has the following components.

1. Quota Setting Store is where the disk quota setting to be stored and accessed. We plan to use catalog table pg_diskquota to store these information. pg_diskquota is like:

CATALOG(pg_diskquota,6122,DiskQuotaRelationId)

{

NameData quotaname; /* diskquota name */       

int16 quotatype; /* diskquota type name */

Oid quotatargetoid; /* diskquota target db object oid*/

int32 quotalimit; /* diskquota size limit in MB*/

int32 quotaredzone; /* diskquota redzone in MB*/

} FormData_pg_diskquota;

2. Quota Change Detector is the monitor of size change of database objects. We plan to use stat collector to detect the ‘active’ table list at initial stage. But stat collector has some limitation on finding the active table which is in a running transaction. Details see TODO section.

3. Quota Size Checker is where to calculate the size and compare with quota limit for database objects. According to Pavel’s comment, autovacuum launcher and worker process could be a good reference to disk quota. So we plan to use a disk quota launcher daemon process and several disk quota worker process to finish this work. Launcher process is responsible for starting worker process based on a user defined database list from GUC. Worker process will connect to its target database and monitor the disk usage for objects in this database. In init stage of worker process, it will call calculate_total_relation_size() to calculate the size for each user table. After init stage, worker process will refresh the disk model every N seconds. Refreshing will only recalculate the size of tables in ‘active’ table list, which is generated by Quata Change Detector to minimize the cost.

4. Quota Enforcement Operator is where to check for the quota limitation at postgres backend side. We will firstly implement it in ExecCheckRTPerms() as pre-running enforcement. It will check the disk quota of tables being inserted or updated, and report error if table’s or table’s schema’s or table’s owner’s quota limit is exceeded. As a native feature, we plan to add more checkpoint to do running query enforcement. For example, if a disk quota lefts 10MB quota, a query could insert 1GB data. This query could be allowed in pre-running enforcement check, but will be cancelled in running query enforcement check. Therefore, it can improve the accurate of disk quota usage. To achieve this, we plan to add a checkpoint in lower API such as smgr_extened. Hence, the Quota Enforcement Operator will check the disk quota usage when smgr_extened is called. If the quota is over limited, current query will be cancelled.

Highlight

1. Native feature.

Support native Create/Drop Disk Quota SQL statement.

New catalog table pg_diskquota to store disk quota setting.

2. Auto DML/DDL detection.

Table create/update/insert/delete/vacuum/truncate/drop/schema_change/owner_change,  Schema create/drop and Role create/drop will be detected by disk quota automatically.

3. Low cost disk quota checker.

Worker process of disk quota need to refresh the disk usage model every N seconds. Since recalculate the file size using stat() system call is expensive for a large number of files, we use an ‘active’ table list to reduce the real work at each iteration. A basic experiment on our init stage implementation on database with 20K tables shows that the refresh cost is 1% cpu usage and will be finished within 50ms.


Todo/Limitation

Before we propose our patch, we plan to enhance it with the following ideas:

1. Setting database list with disk quota enabled dynamically without restart database. Since we have the disk quota launcher process, it could detect the new ‘diskquota_databases’ list and start/stop the corresponding disk quota worker process.

2. Enforcement when query is running. Considering the case when there is 10MB quota left, but next query will insert 10GB data. Current enforcement design will allow this query to be executed. This is limited by the ‘active’ table detection is generated by stat collector. Postgres backend will only send table stat information to collector only when the transaction ends. We need a new way to detect the ‘active’ table even when this table is being modified inside a running transaction.

3. Monitor unlimited number of databases. Current we set the max number of disk quota worker process to be 10 to reduce the affection normal workload. But how about  if we want to monitor the disk quota of more than 10 databases? Our solution is to let disk quota launcher to manage a queue of database need to be monitored. And disk quota worker process consuming the queue and refresh the disk usage/quota for this database. After some periods, worker will return the database to the queue, and fetch the top database from queue to process. The period determine the delay of detecting disk quota change. To implement this feature, we need to support a subprocess of postmaster to rebind to another database instead of the database binded in InitPostgres().

4. Support active table detection on vacuum full and vacuum analyze. Currently vacuum full and vacuum analyze are not tracked by stat collector.

Thanks to Heikki, Pavel Stehule,Chapman Flack for the former comments on disk quota feature. Any comments on how to improve disk quota feature are appreciated.


On Mon, Sep 3, 2018 at 12:05 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:


2018-09-03 3:49 GMT+02:00 Hubert Zhang <hzhang@pivotal.io>:
Thanks Pavel.
Your patch did enforcement on storage level(md.c or we could also use smgr_extend). It's straight forward.
But I prefer to implement disk_quota as a feature with following objectives:
1 set/alter disk quota setting on different database objects, e.g. user, database, schema etc. not only a general GUC, but we could set separate quota limit for a specific objects. 
2 enforcement operator should work at two positions: before query is running and when query is running. The latter one's implementation maybe similar to your patch.

The patch was just example. The resource quotes should be more complex - per partition, table, schema, database, user - so GUC are possible, but not very user friendly.

Our case is specific, but not too much. The servers are used for multidimensional analyses - and some tables can grow too fast (COPY, INSERT SELECT). We need to solve limits immediately. The implementation is simple, so I did it. Same implementation on database level, or schema level needs some more locks, so it will not be too effective. The resource management can be complex very complex, and I expect so it will be hard work.

Regards

Pavel


On Sun, Sep 2, 2018 at 8:44 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
Hi

2018-09-02 14:18 GMT+02:00 Hubert Zhang <hzhang@pivotal.io>:
Thanks Chapman.
@Pavel,  could you please explain more about your second suggestion "implement some quotas on storage level?"

See attached patch - it is very simple - and good enough for our purposes.

Regards

Pavel

 
We will not keep the long-lived processes attach to all databases(just like you mentioned servers with thousands of databases)  
And you are right, we could share ideas with autovacuum process, fork worker processes in need. 
"autovacuum checks for tables that have had a large number of inserted, updated or deleted tuples. These checks use the statistics collection facility"
diskquota process is similar to autovacuum at caring about insert, but the difference is that it also care about vucuum full, truncate and drop. While update and delete may not be interested since no file change happens. So a separate diskquota process is preferred.

So if we implemented disk quota as a full native feature, and in the first initial version I prefer to implement the following features:
1 Fork diskquota launcher process under Postmaster serverloop, which is long-lived.
2 Diskquota launcher process is responsible for creating diskquota worker process for every database.
3 DIskquota setting is stored in a separate catalog table for each database.
4 Initialization stage, Diskquota launcher process creates diskquota worker process for all the databases(traverse like autovacuum). Worker process calculates disk usage of db objects and their diskquota setting. If any db object exceeds its quota limit, put them into the blacklist in the shared memory, which will later be used by enforcement operator. Worker process exits when works are done.
5 Running stage, Diskquota launcher process creates diskquota worker process for the database with a large number of insert, copy, truncate, drop etc. or create disk quota statement. Worker process updates the file size for db objects containing the result relation, and compare with the diskquota setting. Again, if exceeds quota limit, put them into blacklist, remove from blacklist vice versa. Worker process exits when works are done and a GUC could control the frequency of worker process restart to a specific database. As you know, this GUC also controls the delay when we do enforcement.
6 Enforcement. When postgres backend executes queries, check the blacklist in shared memory to determine whether the query is allowed(before execute) or need rollback(is executing)?

If we implemented disk quota as an extension, we could just use background worker to start diskquota launcher process and use RegisterDynamicBackgroundWorker() to fork child diskquota worker processes by the launcher process as suggested by @Chapman. Diskquota setting could be stored in user table in a separate schema for each database(Schema and table created by create extension statement) just like what Heikki has done in pg_quota project. But in this case, we need to create extension for each database before diskquota worker process can be set up for that database.

Any comments on the above design and which is preferred, native feature or extension as the POC? 


-- Hubert



On Fri, Aug 31, 2018 at 3:32 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:


2018-08-30 16:22 GMT+02:00 Chapman Flack <chap@anastigmatix.net>:
On 08/30/2018 09:57 AM, Hubert Zhang wrote:

> 2 Keep one worker process for each database. But using a parent/global
> quota worker process to manage the lifecycle of database level worker
> processes. It could handle the newly created database(avoid restart
> database) and save resource when a database is not used. But this needs to
> change worker process to be hierarchical. Postmaster becomes the grandfather
>  of database level worker processes in this case.

I am using background workers this way in 9.5 at $work.

In my case, one worker lives forever, wakes up on a set period, and
starts a short-lived worker for every database, waiting for each
one before starting the next.

It was straightforward to implement. Looking back over the code,
I see the global worker assigns its own PID to worker.bgw_notify_pid
of each of its children, and also obtains a handle for each child
from RegisterDynamicBackgroundWorker().

I imagine the global quota worker would prefer to start workers
for every database and then just wait for notifications from any
of them, but that seems equally straightforward at first glance.

There are servers with thousands databases. Worker per database is not good idea.

It should to share ideas, code with autovacuum process.

Not sure, how to effective implementation based on bg workers can be. On servers with large set of databases, large set of tables it can identify too big table too late.

Isn't better to implement some quotas on storage level?

Regards

Pavel



-Chap





--
Thanks

Hubert Zhang




--
Thanks

Hubert Zhang




--
Thanks

Hubert Zhang


--
Thanks

Hubert Zhang


--
Thanks

Hubert Zhang


--
Thanks

Hubert Zhang

Re: Proposal for disk quota feature

From
Haozhou Wang
Date:
Hi All,

We prepare a WIP patch which is implemented based on our redesigned proposal. This patch is for demonstration our ideas and currently can support create/delete disk quota for table/schema/user. 

Here is the usage,
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 Specify the databases with disk quota enabled in GUC “diskquota_databases” in postgresql.conf and restart the database.

2 DBA could set disk quota limit for table/schema/role.

CREATE DISK QUOTA tablea1 ON TABLE a1 with (quota = ‘1MB’);
CREATE DISK QUOTA roleu1 ON USER u1 with (quota = ‘1GB’);
CREATE DISK QUOTA schemas1 ON SCHEMA s1 with (quota = ‘3MB’);

3 Simulate a schema out of quota limit case: suppose table a1 and table a2 are both under schema s1.

INSERT INTO a1 SELECT generate_series(1,1000);
INSERT INTO a2 SELECT generate_series(1,3000000);
SELECT pg_sleep(5)

INSERT INTO a1 SELECT generate_series(1,1000);
ERROR:  schema's disk space quota exceeded

DROP TABLE a2;
SELECT pg_sleep(5)

INSERT INTO a1 SELECT generate_series(1,1000);
INSERT 0 1000
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

This patch works well on small size of databases (tens of databases) as each database in GUC diskquota_databases will have a background worker process for monitoring purpose.

Thanks very much for Heikki's suggestion and comments. 
All comments are very welcome!

On Mon, Sep 24, 2018 at 9:56 PM Hubert Zhang <hzhang@pivotal.io> wrote:
The quotas or object limits, resource limits are pretty useful and necessary, but I don't see these like new type of objects, it is much more some property of current objects. Because we have one syntax for this purpose I prefer it. Because is not good to have two syntaxes for similar purpose.
SCHEMA and TABLE are OK for me, But as I mentioned before, ROLE is a special case when using ALTER SET at this moment.
TABLE and SCHEMA are both database level, e.g. pg_class and pg_namespace both residents in one database. But ROLE is cluster-level. They don't belong to a database. ALTER ROLE XXX SET disk_quota = xxx means to set the quota for the user on all the databases in the first glance. But in our first stage design, ROLE's quota is bind to a specific database. E.g. Role Jack could have 10GB quota on database A and 2GB quota on database B. 

SQL syntax is not hard to modify,  I don't think this should block the main design of disk quota feature. Is there any comment on the design and architecture? If no, we'll firstly submit our patch and involve more discussion?

On Sat, Sep 22, 2018 at 3:03 PM Pavel Stehule <pavel.stehule@gmail.com> wrote:


so 22. 9. 2018 v 8:48 odesílatel Hubert Zhang <hzhang@pivotal.io> napsal:
But it looks like redundant to current GUC configuration and limits
what do you mean by current GUC configuration? Is that the general block number limit in your patch? If yes, the difference between GUC and pg_diskquota catalog is that pg_diskquota will store different quota limit for the different role, schema or table instead of a single GUC value.

storage is not relevant in this moment.

I don't see to consistent to sets some limits via SET command, or ALTER X SET, and some other with CREATE QUOTA ON.

The quotas or object limits, resource limits are pretty useful and necessary, but I don't see these like new type of objects, it is much more some property of current objects. Because we have one syntax for this purpose I prefer it. Because is not good to have two syntaxes for similar purpose.

So instead CREATE DISC QUATA ON SCHEMA xxx some value I prefer

ALTER SCHEMA xxx SET disc_quota = xxx;

The functionality is +/- same. But ALTER XX SET was introduce first, and I don't feel comfortable to have any new syntax for similar purpose

Regards

Pavel



 

On Sat, Sep 22, 2018 at 11:17 AM Pavel Stehule <pavel.stehule@gmail.com> wrote:


pá 21. 9. 2018 v 16:21 odesílatel Hubert Zhang <hzhang@pivotal.io> napsal:
just fast reaction - why QUOTA object?
Isn't ALTER SET enough?
Some like
ALTER TABLE a1 SET quote = 1MB;
ALTER USER ...
ALTER SCHEMA ..
New DDL commans looks like too hard hammer .

It's an option. Prefer to consider quota setting store together:
CREATE DISK QUOTA way is more nature to store quota setting in a separate pg_diskquota catalog
While ALTER SET way is more close to store quota setting in pg_class, pg_role, pg_namespace. etc in an integrated way.
(Note that here I mean nature/close is not must, ALTER SET could also store in pg_diskquota and vice versa.)

I have not a problem with new special table for storing this information. But it looks like redundant to current GUC configuration and limits. Can be messy do some work with ALTER ROLE, and some work via CREATE QUOTE.

Regards

Pavel


Here are some differences I can think of:
1 pg_role is a global catalog, not per database level. It's harder to tracker the user's disk usage in the whole clusters(considering 1000+ databases).  So the semantic of  CREATE DISK QUOTA ON USER is limited: it only tracks the user's disk usage inside the current database.
2 using separate pg_diskquota could add more field except for quota limit without adding too many fields in pg_class, e.g. red zone to give the user a warning or the current disk usage of the db objects.

On Fri, Sep 21, 2018 at 8:01 PM Pavel Stehule <pavel.stehule@gmail.com> wrote:


pá 21. 9. 2018 v 13:32 odesílatel Hubert Zhang <hzhang@pivotal.io> napsal:

Hi all,

We redesign disk quota feature based on the comments from Pavel Stehule and Chapman Flack. Here are the new design.

Overview

Basically,  disk quota feature is used to support multi-tenancy environment, different level of database objects could be set a quota limit to avoid over use of disk space. A common case could be as follows: DBA could enable disk quota on a specified database list. DBA could set disk quota limit for tables/schemas/roles in these databases. Separate disk quota worker process will monitor the disk usage for these objects and detect the objects which exceed their quota limit. Queries loading data into these “out of disk quota” tables/schemas/roles will be cancelled.

We are currently working at init implementation stage. We would like to propose our idea firstly and get feedbacks from community to do quick iteration.

SQL Syntax (How to use disk quota)

1 Specify the databases with disk quota enabled in GUC “diskquota_databases” in postgresql.conf and restart the database.

2 DBA could set disk quota limit for table/schema/role.

CREATE DISK QUOTA tablea1 ON TABLE a1 with (quota = ‘1MB’);

CREATE DISK QUOTA roleu1 ON USER u1 with (quota = ‘1GB’);

CREATE DISK QUOTA schemas1 ON SCHEMA s1 with (quota = ‘3MB’);


just fast reaction - why QUOTA object?

Isn't ALTER SET enough?

Some like

ALTER TABLE a1 SET quote = 1MB;
ALTER USER ...
ALTER SCHEMA ..

New DDL commans looks like too hard hammer .



3 Simulate a schema out of quota limit case: suppose table a1 and table a2 are both under schema s1.

INSERT INTO a1 SELECT generate_series(1,1000);

INSERT INTO a2 SELECT generate_series(1,3000000);

SELECT pg_sleep(5)

INSERT INTO a1 SELECT generate_series(1,1000);

ERROR:  schema's disk space quota exceeded

DROP TABLE a2;

SELECT pg_sleep(5)

INSERT INTO a1 SELECT generate_series(1,1000);

INSERT 0 1000

Architecture

Disk quota has the following components.

1. Quota Setting Store is where the disk quota setting to be stored and accessed. We plan to use catalog table pg_diskquota to store these information. pg_diskquota is like:

CATALOG(pg_diskquota,6122,DiskQuotaRelationId)

{

NameData quotaname; /* diskquota name */       

int16 quotatype; /* diskquota type name */

Oid quotatargetoid; /* diskquota target db object oid*/

int32 quotalimit; /* diskquota size limit in MB*/

int32 quotaredzone; /* diskquota redzone in MB*/

} FormData_pg_diskquota;

2. Quota Change Detector is the monitor of size change of database objects. We plan to use stat collector to detect the ‘active’ table list at initial stage. But stat collector has some limitation on finding the active table which is in a running transaction. Details see TODO section.

3. Quota Size Checker is where to calculate the size and compare with quota limit for database objects. According to Pavel’s comment, autovacuum launcher and worker process could be a good reference to disk quota. So we plan to use a disk quota launcher daemon process and several disk quota worker process to finish this work. Launcher process is responsible for starting worker process based on a user defined database list from GUC. Worker process will connect to its target database and monitor the disk usage for objects in this database. In init stage of worker process, it will call calculate_total_relation_size() to calculate the size for each user table. After init stage, worker process will refresh the disk model every N seconds. Refreshing will only recalculate the size of tables in ‘active’ table list, which is generated by Quata Change Detector to minimize the cost.

4. Quota Enforcement Operator is where to check for the quota limitation at postgres backend side. We will firstly implement it in ExecCheckRTPerms() as pre-running enforcement. It will check the disk quota of tables being inserted or updated, and report error if table’s or table’s schema’s or table’s owner’s quota limit is exceeded. As a native feature, we plan to add more checkpoint to do running query enforcement. For example, if a disk quota lefts 10MB quota, a query could insert 1GB data. This query could be allowed in pre-running enforcement check, but will be cancelled in running query enforcement check. Therefore, it can improve the accurate of disk quota usage. To achieve this, we plan to add a checkpoint in lower API such as smgr_extened. Hence, the Quota Enforcement Operator will check the disk quota usage when smgr_extened is called. If the quota is over limited, current query will be cancelled.

Highlight

1. Native feature.

Support native Create/Drop Disk Quota SQL statement.

New catalog table pg_diskquota to store disk quota setting.

2. Auto DML/DDL detection.

Table create/update/insert/delete/vacuum/truncate/drop/schema_change/owner_change,  Schema create/drop and Role create/drop will be detected by disk quota automatically.

3. Low cost disk quota checker.

Worker process of disk quota need to refresh the disk usage model every N seconds. Since recalculate the file size using stat() system call is expensive for a large number of files, we use an ‘active’ table list to reduce the real work at each iteration. A basic experiment on our init stage implementation on database with 20K tables shows that the refresh cost is 1% cpu usage and will be finished within 50ms.


Todo/Limitation

Before we propose our patch, we plan to enhance it with the following ideas:

1. Setting database list with disk quota enabled dynamically without restart database. Since we have the disk quota launcher process, it could detect the new ‘diskquota_databases’ list and start/stop the corresponding disk quota worker process.

2. Enforcement when query is running. Considering the case when there is 10MB quota left, but next query will insert 10GB data. Current enforcement design will allow this query to be executed. This is limited by the ‘active’ table detection is generated by stat collector. Postgres backend will only send table stat information to collector only when the transaction ends. We need a new way to detect the ‘active’ table even when this table is being modified inside a running transaction.

3. Monitor unlimited number of databases. Current we set the max number of disk quota worker process to be 10 to reduce the affection normal workload. But how about  if we want to monitor the disk quota of more than 10 databases? Our solution is to let disk quota launcher to manage a queue of database need to be monitored. And disk quota worker process consuming the queue and refresh the disk usage/quota for this database. After some periods, worker will return the database to the queue, and fetch the top database from queue to process. The period determine the delay of detecting disk quota change. To implement this feature, we need to support a subprocess of postmaster to rebind to another database instead of the database binded in InitPostgres().

4. Support active table detection on vacuum full and vacuum analyze. Currently vacuum full and vacuum analyze are not tracked by stat collector.

Thanks to Heikki, Pavel Stehule,Chapman Flack for the former comments on disk quota feature. Any comments on how to improve disk quota feature are appreciated.


On Mon, Sep 3, 2018 at 12:05 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:


2018-09-03 3:49 GMT+02:00 Hubert Zhang <hzhang@pivotal.io>:
Thanks Pavel.
Your patch did enforcement on storage level(md.c or we could also use smgr_extend). It's straight forward.
But I prefer to implement disk_quota as a feature with following objectives:
1 set/alter disk quota setting on different database objects, e.g. user, database, schema etc. not only a general GUC, but we could set separate quota limit for a specific objects. 
2 enforcement operator should work at two positions: before query is running and when query is running. The latter one's implementation maybe similar to your patch.

The patch was just example. The resource quotes should be more complex - per partition, table, schema, database, user - so GUC are possible, but not very user friendly.

Our case is specific, but not too much. The servers are used for multidimensional analyses - and some tables can grow too fast (COPY, INSERT SELECT). We need to solve limits immediately. The implementation is simple, so I did it. Same implementation on database level, or schema level needs some more locks, so it will not be too effective. The resource management can be complex very complex, and I expect so it will be hard work.

Regards

Pavel


On Sun, Sep 2, 2018 at 8:44 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
Hi

2018-09-02 14:18 GMT+02:00 Hubert Zhang <hzhang@pivotal.io>:
Thanks Chapman.
@Pavel,  could you please explain more about your second suggestion "implement some quotas on storage level?"

See attached patch - it is very simple - and good enough for our purposes.

Regards

Pavel

 
We will not keep the long-lived processes attach to all databases(just like you mentioned servers with thousands of databases)  
And you are right, we could share ideas with autovacuum process, fork worker processes in need. 
"autovacuum checks for tables that have had a large number of inserted, updated or deleted tuples. These checks use the statistics collection facility"
diskquota process is similar to autovacuum at caring about insert, but the difference is that it also care about vucuum full, truncate and drop. While update and delete may not be interested since no file change happens. So a separate diskquota process is preferred.

So if we implemented disk quota as a full native feature, and in the first initial version I prefer to implement the following features:
1 Fork diskquota launcher process under Postmaster serverloop, which is long-lived.
2 Diskquota launcher process is responsible for creating diskquota worker process for every database.
3 DIskquota setting is stored in a separate catalog table for each database.
4 Initialization stage, Diskquota launcher process creates diskquota worker process for all the databases(traverse like autovacuum). Worker process calculates disk usage of db objects and their diskquota setting. If any db object exceeds its quota limit, put them into the blacklist in the shared memory, which will later be used by enforcement operator. Worker process exits when works are done.
5 Running stage, Diskquota launcher process creates diskquota worker process for the database with a large number of insert, copy, truncate, drop etc. or create disk quota statement. Worker process updates the file size for db objects containing the result relation, and compare with the diskquota setting. Again, if exceeds quota limit, put them into blacklist, remove from blacklist vice versa. Worker process exits when works are done and a GUC could control the frequency of worker process restart to a specific database. As you know, this GUC also controls the delay when we do enforcement.
6 Enforcement. When postgres backend executes queries, check the blacklist in shared memory to determine whether the query is allowed(before execute) or need rollback(is executing)?

If we implemented disk quota as an extension, we could just use background worker to start diskquota launcher process and use RegisterDynamicBackgroundWorker() to fork child diskquota worker processes by the launcher process as suggested by @Chapman. Diskquota setting could be stored in user table in a separate schema for each database(Schema and table created by create extension statement) just like what Heikki has done in pg_quota project. But in this case, we need to create extension for each database before diskquota worker process can be set up for that database.

Any comments on the above design and which is preferred, native feature or extension as the POC? 


-- Hubert



On Fri, Aug 31, 2018 at 3:32 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:


2018-08-30 16:22 GMT+02:00 Chapman Flack <chap@anastigmatix.net>:
On 08/30/2018 09:57 AM, Hubert Zhang wrote:

> 2 Keep one worker process for each database. But using a parent/global
> quota worker process to manage the lifecycle of database level worker
> processes. It could handle the newly created database(avoid restart
> database) and save resource when a database is not used. But this needs to
> change worker process to be hierarchical. Postmaster becomes the grandfather
>  of database level worker processes in this case.

I am using background workers this way in 9.5 at $work.

In my case, one worker lives forever, wakes up on a set period, and
starts a short-lived worker for every database, waiting for each
one before starting the next.

It was straightforward to implement. Looking back over the code,
I see the global worker assigns its own PID to worker.bgw_notify_pid
of each of its children, and also obtains a handle for each child
from RegisterDynamicBackgroundWorker().

I imagine the global quota worker would prefer to start workers
for every database and then just wait for notifications from any
of them, but that seems equally straightforward at first glance.

There are servers with thousands databases. Worker per database is not good idea.

It should to share ideas, code with autovacuum process.

Not sure, how to effective implementation based on bg workers can be. On servers with large set of databases, large set of tables it can identify too big table too late.

Isn't better to implement some quotas on storage level?

Regards

Pavel



-Chap





--
Thanks

Hubert Zhang




--
Thanks

Hubert Zhang




--
Thanks

Hubert Zhang


--
Thanks

Hubert Zhang


--
Thanks

Hubert Zhang


--
Thanks

Hubert Zhang


--
Regards,
Haozhou
Attachment