Re: Control your disk usage in PG: Introduction to Disk Quota Extension - Mailing list pgsql-hackers

From Hubert Zhang
Subject Re: Control your disk usage in PG: Introduction to Disk Quota Extension
Date
Msg-id CAB0yre=PbBY-obEFqmkLpSvL=VRx_D3DPsnBn7kAKOah8Z-aVg@mail.gmail.com
Whole thread Raw
In response to Re: Control your disk usage in PG: Introduction to Disk Quota Extension  (Hubert Zhang <hzhang@pivotal.io>)
Responses Re: Control your disk usage in PG: Introduction to Disk Quota Extension  (Hubert Zhang <hzhang@pivotal.io>)
Re: Control your disk usage in PG: Introduction to Disk QuotaExtension  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
Hi Michael, Robert
For you question about the hook position, I want to explain more about the background why we want to introduce these hooks.
We wrote a diskquota extension for Postgresql(which is inspired by Heikki's pg_quota). Diskquota extension is used to control the disk usage in Postgresql in a fine-grained way, which means:
1. You could set disk quota limit at schema level or role level.
2. A background worker will gather the current disk usage for each schema/role in realtime.
3. A background worker will generate the blacklist for schema/role whose quota limit is exceeded.
4. New transaction want to insert data into the schema/role in the blacklist will be cancelled.

In step 2, gathering the current disk usage for each schema needs to sum disk size of all the tables in this schema. This is a time consuming operation. We want to use hooks in SMGR to detect the Active Table, and only recalculate the disk size of all the Active Tables.
For example, the smgrextend hook indicates that you allocate a new block and the table need to be treated as Active Table.

Do you have some better hook positions recommend to solve the above user case?
Thanks in advance.

Hubert





On Tue, Jan 22, 2019 at 12:08 PM Hubert Zhang <hzhang@pivotal.io> wrote:
> For this particular purpose, I don't immediately see why you need a
> hook in both places.  If ReadBuffer is called with P_NEW, aren't we
> guaranteed to end up in smgrextend()?
Yes, that's a bit awkward.
 
 Hi Michael, we revisit the ReadBuffer hook and remove it in the latest patch.
ReadBuffer hook is original used to do enforcement(e.g. out of diskquota limit) when query is loading data.
We plan to put the enforcement work of running query to separate diskquota worker process.
Let worker process to detect the backends to be cancelled and send SIGINT to these backends.
So there is no need for ReadBuffer hook anymore.

Our patch currently only contains smgr related hooks to catch the file change and get the Active Table list for diskquota extension.

Thanks Hubert.


On Mon, Jan 7, 2019 at 6:56 PM Haozhou Wang <hawang@pivotal.io> wrote:
Thanks very much for your comments.

To the best of my knowledge, smgr is a layer that abstract the storage operations. Therefore, it is a good place to control or collect information the storage operations without touching the physical storage layer.
Moreover, smgr is coming with actual disk IO operation (not consider the OS cache) for postgres. So we do not need to worry about the buffer management in postgres. 
It will make the purpose of hook is pure: a hook for actual disk IO.

Regards,
Haozhou

On Wed, Dec 26, 2018 at 1:56 PM Michael Paquier <michael@paquier.xyz> wrote:
On Wed, Nov 21, 2018 at 09:47:44AM -0500, Robert Haas wrote:
> +1 for adding some hooks to support this kind of thing, but I think
> the names you've chosen are not very good.  The hook name should
> describe the place from which it is called, not the purpose for which
> one imagines that it will be used, because somebody else might imagine
> another use.  Both BufferExtendCheckPerms_hook_type and
> SmgrStat_hook_type are imagining that they know what the hook does -
> CheckPerms in the first case and Stat in the second case.

I personally don't mind making Postgres more pluggable, but I don't
think that we actually need the extra ones proposed here at the layer
of smgr, as smgr is already a layer designed to call an underlying set
of APIs able to extend, unlink, etc. depending on the storage type.

> For this particular purpose, I don't immediately see why you need a
> hook in both places.  If ReadBuffer is called with P_NEW, aren't we
> guaranteed to end up in smgrextend()?

Yes, that's a bit awkward.
--
Michael


--
Thanks

Hubert Zhang


--
Thanks

Hubert Zhang

pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: ALTER SESSION
Next
From: Andres Freund
Date:
Subject: Re: jsonpath