Thread: Extensible storage manager API - smgr hooks

Extensible storage manager API - smgr hooks

From
Anastasia Lubennikova
Date:
Hi, hackers!

Many recently discussed features can make use of an extensible storage manager API. Namely, storage level compression and encryption [1], [2], [3], disk quota feature [4], SLRU storage changes [5], and any other features that may want to substitute PostgreSQL storage layer with their implementation (i.e. lazy_restore [6]).

Attached is a proposal to change smgr API to make it extensible.  The idea is to add a hook for plugins to get control in smgr and define custom storage managers. The patch replaces smgrsw[] array and smgr_sw selector with smgr() function that loads f_smgr implementation.

As before it has only one implementation - smgr_md, which is wrapped into smgr_standard().

To create custom implementation, a developer needs to implement smgr API functions
    static const struct f_smgr smgr_custom =
    {
        .smgr_init = custominit,
        ...
    }

create a hook function
   const f_smgr * smgr_custom(BackendId backend, RelFileNode rnode)
  {
      //Here we can also add some logic and chose which smgr to use based on rnode and backend
      return &smgr_custom;
  }

and finally set the hook:
    smgr_hook = smgr_custom;



--
Best regards,
Lubennikova Anastasia
Attachment

Re: Extensible storage manager API - smgr hooks

From
Yura Sokolov
Date:
Anastasia Lubennikova писал 2021-06-30 00:49:
> Hi, hackers!
> 
> Many recently discussed features can make use of an extensible storage
> manager API. Namely, storage level compression and encryption [1],
> [2], [3], disk quota feature [4], SLRU storage changes [5], and any
> other features that may want to substitute PostgreSQL storage layer
> with their implementation (i.e. lazy_restore [6]).
> 
> Attached is a proposal to change smgr API to make it extensible.  The
> idea is to add a hook for plugins to get control in smgr and define
> custom storage managers. The patch replaces smgrsw[] array and smgr_sw
> selector with smgr() function that loads f_smgr implementation.
> 
> As before it has only one implementation - smgr_md, which is wrapped
> into smgr_standard().
> 
> To create custom implementation, a developer needs to implement smgr
> API functions
>     static const struct f_smgr smgr_custom =
>     {
>         .smgr_init = custominit,
>         ...
>     }
> 
> create a hook function
> 
>    const f_smgr * smgr_custom(BackendId backend, RelFileNode rnode)
>   {
>       //Here we can also add some logic and chose which smgr to use
> based on rnode and backend
>       return &smgr_custom;
>   }
> 
> and finally set the hook:
>     smgr_hook = smgr_custom;
> 
> [1]
> https://www.postgresql.org/message-id/flat/11996861554042351@iva4-dd95b404a60b.qloud-c.yandex.net
> [2]
> https://www.postgresql.org/message-id/flat/272dd2d9.e52a.17235f2c050.Coremail.chjischj%40163.com
> [3] https://postgrespro.com/docs/enterprise/9.6/cfs
> [4]
> https://www.postgresql.org/message-id/flat/CAB0yre%3DRP_ho6Bq4cV23ELKxRcfhV2Yqrb1zHp0RfUPEWCnBRw%40mail.gmail.com
> [5]
> https://www.postgresql.org/message-id/flat/20180814213500.GA74618%4060f81dc409fc.ant.amazon.com
> [6]
> https://wiki.postgresql.org/wiki/PGCon_2021_Fun_With_WAL#Lazy_Restore
> 
> --
> 
> Best regards,
> Lubennikova Anastasia

Good day, Anastasia.

I also think smgr should be extended with different implementations 
aside of md.
But which way concrete implementation will be chosen for particular 
relation?
I believe it should be (immutable!) property of tablespace, and should 
be passed
to smgropen. Patch in current state doesn't show clear way to distinct 
different
implementations per relation.

I don't think patch should be that invasive. smgrsw could pointer to
array instead of static array as it is of now, and then reln->smgr_which
will remain with same meaning. Yep it then will need a way to select 
specific
implementation, but something like `char smgr_name[NAMEDATALEN]` field 
with
linear search in (i believe) small smgrsw array should be enough.

Maybe I'm missing something?

regards,
Sokolov Yura.
Attachment

Re: Extensible storage manager API - smgr hooks

From
Andres Freund
Date:
Hi,

On 2021-06-30 05:36:11 +0300, Yura Sokolov wrote:
> Anastasia Lubennikova писал 2021-06-30 00:49:
> > Hi, hackers!
> > 
> > Many recently discussed features can make use of an extensible storage
> > manager API. Namely, storage level compression and encryption [1],
> > [2], [3], disk quota feature [4], SLRU storage changes [5], and any
> > other features that may want to substitute PostgreSQL storage layer
> > with their implementation (i.e. lazy_restore [6]).
> > 
> > Attached is a proposal to change smgr API to make it extensible.  The
> > idea is to add a hook for plugins to get control in smgr and define
> > custom storage managers. The patch replaces smgrsw[] array and smgr_sw
> > selector with smgr() function that loads f_smgr implementation.
> > 
> > As before it has only one implementation - smgr_md, which is wrapped
> > into smgr_standard().
> > 
> > To create custom implementation, a developer needs to implement smgr
> > API functions
> >     static const struct f_smgr smgr_custom =
> >     {
> >         .smgr_init = custominit,
> >         ...
> >     }
> > 
> > create a hook function
> > 
> >    const f_smgr * smgr_custom(BackendId backend, RelFileNode rnode)
> >   {
> >       //Here we can also add some logic and chose which smgr to use
> > based on rnode and backend
> >       return &smgr_custom;
> >   }
> > 
> > and finally set the hook:
> >     smgr_hook = smgr_custom;
> > 
> > [1]
> > https://www.postgresql.org/message-id/flat/11996861554042351@iva4-dd95b404a60b.qloud-c.yandex.net
> > [2]
> > https://www.postgresql.org/message-id/flat/272dd2d9.e52a.17235f2c050.Coremail.chjischj%40163.com
> > [3] https://postgrespro.com/docs/enterprise/9.6/cfs
> > [4]
> > https://www.postgresql.org/message-id/flat/CAB0yre%3DRP_ho6Bq4cV23ELKxRcfhV2Yqrb1zHp0RfUPEWCnBRw%40mail.gmail.com
> > [5]
> > https://www.postgresql.org/message-id/flat/20180814213500.GA74618%4060f81dc409fc.ant.amazon.com
> > [6]
> > https://wiki.postgresql.org/wiki/PGCon_2021_Fun_With_WAL#Lazy_Restore
> > 
> > --
> > 
> > Best regards,
> > Lubennikova Anastasia
> 
> Good day, Anastasia.
> 
> I also think smgr should be extended with different implementations aside of
> md.
> But which way concrete implementation will be chosen for particular
> relation?
> I believe it should be (immutable!) property of tablespace, and should be
> passed
> to smgropen. Patch in current state doesn't show clear way to distinct
> different
> implementations per relation.
> 
> I don't think patch should be that invasive. smgrsw could pointer to
> array instead of static array as it is of now, and then reln->smgr_which
> will remain with same meaning. Yep it then will need a way to select
> specific
> implementation, but something like `char smgr_name[NAMEDATALEN]` field with
> linear search in (i believe) small smgrsw array should be enough.
> 
> Maybe I'm missing something?

There has been no activity on this thread for > 6 months. Therefore I'm
marking it as returned with feedback. Anastasia, if you want to work on this,
please do, but there's obviously no way it can be merged into 15...

Greetings,

Andres



Re: Extensible storage manager API - smgr hooks

From
Kirill Reshke
Date:
Hello Yura and Anastasia.

I have tried to implement per-relation SMGR approach, and faced with a serious problem with redo.

So, to implement per-relation SMGR feature i have tried to do things similar to custom table AM apporach: that is, we can define our custom SMGR in an extention (which defines smgr handle) and then use this SMGR in relation definition. like this:

```postgres=# create extension proxy_smgr ;
CREATE EXTENSION
postgres=# select * from pg_smgr ;
  oid  |  smgrname  |    smgrhandler
-------+------------+--------------------
  4646 | md         | smgr_md_handler
 16386 | proxy_smgr | proxy_smgr_handler
(2 rows)

postgres=# create table tt(i int) storage manager proxy_smgr_handler;
ERROR:  storage manager "proxy_smgr_handler" does not exist
postgres=# create table tt(i int) storage manager proxy_smgr;
INFO:  proxy open 1663 5 16391
INFO:  proxy create 16391
INFO:  proxy close, 16391
INFO:  proxy close, 16391
INFO:  proxy close, 16391
INFO:  proxy close, 16391
CREATE TABLE
postgres=# select * from tt;
INFO:  proxy open 1663 5 16391
INFO:  proxy nblocks 16391
INFO:  proxy nblocks 16391
 i
---
(0 rows)

postgres=# insert into tt values(1);
INFO:  proxy exists 16391
INFO:  proxy nblocks 16391
INFO:  proxy nblocks 16391
INFO:  proxcy extend 16391
INSERT 0 1
postgres=# select * from tt;
INFO:  proxy nblocks 16391
INFO:  proxy nblocks 16391
 i
---
 1
(1 row)
```

extention sql files looks like this:

```
CREATE FUNCTION proxy_smgr_handler(internal)
RETURNS table_smgr_handler
AS 'MODULE_PATHNAME'
LANGUAGE C;

-- Storage manager
CREATE STORAGE MANAGER proxy_smgr HANDLER proxy_smgr_handler;
```

To do this i have defined catalog relation pg_smgr where i store smgr`s handlers and use this relation when we need to open some other(non-catalog) relations in smgropen function. The patch almost passes regression tests(8 of 214 tests failed.) but it fails on first checkpoint or in crash recorvery. Also, i have changed WAL format, added SMGR oid to each WAL record with RelFileNode structure. Why do we need WAL changes? well, i tried to solve folowing issue.

As i mentioned, there is a problem with redo, with is: we cannot do syscache search to get relation`s SMGR to apply wal, because syscache is not initialized during redo (crash recovery). As i understand, syscache is not initialised because system catalogs are not consistent until crash recovery is done.


So, thants it, I decided to write to this thread to get feedback and understand how best to solve the problem with redo.

What do you think?

On Thu, Jun 16, 2022 at 1:38 PM Andres Freund <andres@anarazel.de> wrote:
Hi,

On 2021-06-30 05:36:11 +0300, Yura Sokolov wrote:
> Anastasia Lubennikova писал 2021-06-30 00:49:
> > Hi, hackers!
> >
> > Many recently discussed features can make use of an extensible storage
> > manager API. Namely, storage level compression and encryption [1],
> > [2], [3], disk quota feature [4], SLRU storage changes [5], and any
> > other features that may want to substitute PostgreSQL storage layer
> > with their implementation (i.e. lazy_restore [6]).
> >
> > Attached is a proposal to change smgr API to make it extensible.  The
> > idea is to add a hook for plugins to get control in smgr and define
> > custom storage managers. The patch replaces smgrsw[] array and smgr_sw
> > selector with smgr() function that loads f_smgr implementation.
> >
> > As before it has only one implementation - smgr_md, which is wrapped
> > into smgr_standard().
> >
> > To create custom implementation, a developer needs to implement smgr
> > API functions
> >     static const struct f_smgr smgr_custom =
> >     {
> >         .smgr_init = custominit,
> >         ...
> >     }
> >
> > create a hook function
> >
> >    const f_smgr * smgr_custom(BackendId backend, RelFileNode rnode)
> >   {
> >       //Here we can also add some logic and chose which smgr to use
> > based on rnode and backend
> >       return &smgr_custom;
> >   }
> >
> > and finally set the hook:
> >     smgr_hook = smgr_custom;
> >
> > [1]
> > https://www.postgresql.org/message-id/flat/11996861554042351@iva4-dd95b404a60b.qloud-c.yandex.net
> > [2]
> > https://www.postgresql.org/message-id/flat/272dd2d9.e52a.17235f2c050.Coremail.chjischj%40163.com
> > [3] https://postgrespro.com/docs/enterprise/9.6/cfs
> > [4]
> > https://www.postgresql.org/message-id/flat/CAB0yre%3DRP_ho6Bq4cV23ELKxRcfhV2Yqrb1zHp0RfUPEWCnBRw%40mail.gmail.com
> > [5]
> > https://www.postgresql.org/message-id/flat/20180814213500.GA74618%4060f81dc409fc.ant.amazon.com
> > [6]
> > https://wiki.postgresql.org/wiki/PGCon_2021_Fun_With_WAL#Lazy_Restore
> >
> > --
> >
> > Best regards,
> > Lubennikova Anastasia
>
> Good day, Anastasia.
>
> I also think smgr should be extended with different implementations aside of
> md.
> But which way concrete implementation will be chosen for particular
> relation?
> I believe it should be (immutable!) property of tablespace, and should be
> passed
> to smgropen. Patch in current state doesn't show clear way to distinct
> different
> implementations per relation.
>
> I don't think patch should be that invasive. smgrsw could pointer to
> array instead of static array as it is of now, and then reln->smgr_which
> will remain with same meaning. Yep it then will need a way to select
> specific
> implementation, but something like `char smgr_name[NAMEDATALEN]` field with
> linear search in (i believe) small smgrsw array should be enough.
>
> Maybe I'm missing something?

There has been no activity on this thread for > 6 months. Therefore I'm
marking it as returned with feedback. Anastasia, if you want to work on this,
please do, but there's obviously no way it can be merged into 15...

Greetings,

Andres




Attachment

Re: Extensible storage manager API - smgr hooks

From
Andrey Borodin
Date:

> On 16 Jun 2022, at 13:41, Kirill Reshke <reshke@double.cloud> wrote:
>
> Hello Yura and Anastasia.

FWIW this technology is now a part of Greenplum [0]. We are building GP extension that automatically offloads cold data
toS3 - a very simplified version of Neon for analytical workloads. 
When a segment of a table is not used for a long period of time, extension will sync files with backup storage in the
Cloud.
When the user touches data, extension's smgr will bring table segments back from backup or latest synced version.

Our #1 goal is to provide a tool useful for the community. We easily can provide same extension for Postgres if this
technology(extensible smgr) is in core. Does such an extension seem useful for Postgres? Or does this data access
patternseems unusual for Postgres? By pattern I mean vast amounts of cold data only ever appended and never touched. 


Best regards, Andrey Borodin.

[0] https://github.com/greenplum-db/gpdb/pull/13601