Re: Extensible storage manager API - SMGR hook Redux - Mailing list pgsql-hackers

From Xun Gong
Subject Re: Extensible storage manager API - SMGR hook Redux
Date
Msg-id CAPP=Hha_wV1MV9yR70QZ5pk5dtNP+bOyBiFxPmrMKqnQeKMAwQ@mail.gmail.com
Whole thread Raw
In response to Re: Extensible storage manager API - SMGR hook Redux  (Nitin Jadhav <nitinjadhavpostgres@gmail.com>)
List pgsql-hackers

Thank you for your detailed review and insights. I share your view that a registration-based approach for custom storage managers (smgr) is more versatile. This method allows for the implementation of custom table access methods, facilitating the use of various storage services (such as file services or object storage), different file organization formats (files in one directory  or many sub directories), and flexible deletion logic (direct deletion or mark-and-sweep).

While I acknowledge that the registration-based approach requires more modifications, I believe the benefits in terms of extensibility and functionality are significant. I have seen similar explorations in the implementation of AO tables in Greenplum, where a dedicated smgr was created due to its distinct file organization compared to heap tables.

I look forward to further discussions on this topic


Nitin Jadhav <nitinjadhavpostgres@gmail.com> 于2024年12月10日周二 11:25写道:
Hi,

I reviewed the discussion and took a look at the patch sets. It seems
like many things are combined here. Based on the subject, I initially
thought it aimed to provide the infrastructure to easily extend
storage managers. This would allow anyone to create their own storage
managers using this infrastructure. While it addresses this, it also
includes additional features like fsync_checker, which I believe
should be a separate feature. Even though it might use the same
infrastructure, it appears to be a different functionality. I think we
should focus solely on providing the infrastructure here.

We need to decide on our approach—whether to use a hook-based method
or a registration-based method—and I believe this requires further
discussion.

The hook-based approach is simple and works well for anyone writing
their own storage manager. However, it has its limitations as we can
either use the default storage manager or a custom-built one for all
the work load, but we cannot choose between multiple storage managers.
On the other hand, the registration-based approach allows choosing
between multiple storage managers based on the workload, though it
requires a lot of changes.

Are we planning to support other storage managers in PostgreSQL in the
near future? If not, it is better to go with the hook-based approach.
Otherwise, the registration-based approach is preferable as it offers
more flexibility to users and enhances PostgreSQL’s functionality.

Could you please share your thoughts on this? Also, let me know if
this topic has already been discussed and if any conclusions were
reached.

Best Regards,
Nitin Jadhav
Azure Database for PostgreSQL
Microsoft

On Sat, Jan 13, 2024 at 1:27 AM Tristan Partin <tristan@neon.tech> wrote:
>
> Thought I would show off what is possible with this patchset.
>
> Heikki, a couple of months ago in our internal Slack, said:
>
> > [I would like] a debugging tool that checks that we're not missing any
> > fsyncs. I bumped into a few missing fsync bugs with unlogged tables
> > lately and a tool like that would've been very helpful.
>
> My task was to create such a tool, and off I went. I started with the
> storage manager extension patch that Matthias sent to the list last
> year[0].
>
> Andres, in another thread[1], said:
>
> > I've been thinking that we need a validation layer for fsyncs, it's too hard
> > to get right without testing, and crash testing is not likel enough to catch
> > problems quickly / resource intensive.
> >
> > My thought was that we could keep a shared hash table of all files created /
> > dirtied at the fd layer, with the filename as key and the value the current
> > LSN. We'd delete files from it when they're fsynced. At checkpoints we go
> > through the list and see if there's any files from before the redo that aren't
> > yet fsynced.  All of this in assert builds only, of course.
>
> I took this idea and ran with it. I call it the fsync_checker™️. It is an
> extension that prints relations that haven't been fsynced prior to
> a CHECKPOINT. Note that this idea doesn't work in practice because
> relations might not be fsynced, but they might be WAL-logged, like in
> the case of createdb. See log_smgrcreate(). I can't think of an easy way
> to solve this problem looking at the codebase as it stands.
>
> Here is a description of the patches:
>
> 0001:
>
> This is essentially just the patch that Matthias posted earlier, but
> rebased and adjusted a little bit so storage managers can "inherit" from
> other storage managers.
>
> 0002:
>
> This is an extension of 0001, which allows for extensions to set
> a global storage manager. This is pretty hacky, and if it was going to
> be pulled into mainline, it would need some better protection. For
> instance, only one extension should be able to set the global storage
> manager. We wouldn't want extensions stepping over each other, etc.
>
> 0003:
>
> Adds a hook for extensions to inspect a checkpoint before it actually
> occurs. The purpose for the fsync_checker is so that I can iterate over
> all the relations the extension tracks to find files that haven't been
> synced prior to the completion of the checkpoint.
>
> 0004:
>
> This is the actual fsync_checker extension itself. It must be preloaded.
>
> Hopefully this is a good illustration of how the initial patch could be
> used, even though it isn't perfect.
>
> [0]: https://www.postgresql.org/message-id/CAEze2WgMySu2suO_TLvFyGY3URa4mAx22WeoEicnK%3DPCNWEMrA%40mail.gmail.com
> [1]: https://www.postgresql.org/message-id/20220127182838.ba3434dp2pe5vcia%40alap3.anarazel.de
>
> --
> Tristan Partin
> Neon (https://neon.tech)




pgsql-hackers by date:

Previous
From: Andy Fan
Date:
Subject: Re: testing framework for MVCC & vacuum (freeze) & heap_page_prune etc.
Next
From: jian he
Date:
Subject: Re: NOT ENFORCED constraint feature