Re: Table AM and DROP TABLE [ Was: Table AM and DDLs] - Mailing list pgsql-hackers

From Mats Kindahl
Subject Re: Table AM and DROP TABLE [ Was: Table AM and DDLs]
Date
Msg-id CA+144275w8hxENZeaSFtG8qQE5YJY1saVyj70mXQWd-EGKiPjw@mail.gmail.com
Whole thread Raw
In response to Re: Table AM and DROP TABLE [ Was: Table AM and DDLs]  (Andres Freund <andres@anarazel.de>)
Responses Re: Table AM and DROP TABLE [ Was: Table AM and DDLs]
List pgsql-hackers
Hello all,

I think the discussion went a little sideways, so let me recap what I'm suggesting:
  1. I mentioned that there is a missing callback when the filenode is unlinked and this is particularly evident when dropping a table.
  2. It was correctly pointed out to me that an implementor need to ensure that dropping a table is transactional.
  3. I argued that the callback is still correct and outlined how this can be handled by a table access method using xact callbacks, if necessary.
I see huge potential in the table access method and would like to do my part in helping it in succeeding. I noted that the API is biased in how the existing heap implementation works and is also very focused on implementations of "local" storage engines. For more novel architectures (for example, various sorts of distributed architectures) and to be easier to work with, I think that the API can be improved in a few places. This is a first step in the direction of making the API both easier to use as well as enabling more novel use-cases.

Writing implementations with ease is more about having the right callbacks in the right places and providing the right information, but in some cases it is just not possible to implement efficient functionality with the current interface. I think it can be useful to separate these two kinds of enhancements when discussing the API, but I think both are important for the table access methods to be practically usable and to leverage the full power of this concept.

On Tue, Aug 2, 2022 at 1:44 AM Andres Freund <andres@anarazel.de> wrote:
Hi,

On 2021-04-05 21:57:12 +0200, Mats Kindahl wrote:
>    2. In the storage layer, the function RelationDropStorage is called,
>    which will record the table to be dropped in the pendingDeletes
>
> When committing (or aborting) the transaction, there are two calls that are
> interesting, in this order:
>
>    1. CallXactCallbacks which calls registered callbacks
>    2. smgrDoPendingDeletes, which calls the storage layer directly to
>    perform the actual deletion, if necessary.
>
> Now, suppose that we want to replace the storage layer with a different
> one. It is straightforward to replace it by implementing the Table AM
> methods above, but we are missing a callback on dropping the table. If we
> have that, we can record the table-to-be-dropped in a similar manner to how
> the heap AM does it and register a transaction callback using
> RegisterXactCallback.

  don't think implementing dropping relation data at-commit/rollback using
xact callbacks can be correct. The dropping needs to be integrated with the
commit / abort records, so it is redone during crash recovery - that's not
possible with xact callbacks.
 
Yes, but this patch is about making the extension aware that a file node is being unlinked.


To me it still seems fundamentally the wrong direction to implement a "drop
relation callback" tableam callback.

This is not really a "drop table" callback, it is just the most obvious case where this is missing. So, just to recap the situation as it looks right now.

Here is (transactional) truncate table:
  1. Allocate a new file node in the same tablespace as the table
  2. Add the file node to the list of pending node to delete
  3. Overwrite the existing file node in the relation with the new one
  4. Call table_relation_set_new_filenode to tell extension that there is a new filenode for the relation
Here is drop table:
  1. Add the existing file node to the list of pending deletes
  2. Remove the table from the catalogs
For an extension writer, the disappearance of the old file node is "invisible" since there is no callback about this, but it is very clear when a new file node is allocated. In addition to being inconsistent, it adds an extra burden on the extension writer. To notice that a file node has been unlinked you can register a transaction handler and investigate the pending list at commit or abort time. Even though possible, there are two problems with this: 1) the table access method is notified "late" in the transaction that the file node is going away, and 2) it is unnecessarily complicated to register a transaction handler only for inspecting this.

Telling the access method that the filenode is unlinked by adding a callback is by far the best solution since it does not affect existing extensions and will give the table access methods opportunities to act on it immediately.

I have attached an updated patch that changes the names of the callbacks since there was a name change. I had also missed the case of unlinking a file node when tables were truncated, so I have added a callback for this as well.

Best wishes,
Mats Kindahl


Greetings,

Andres Freund
Attachment

pgsql-hackers by date:

Previous
From: Pavel Borisov
Date:
Subject: Re: [PoC] configurable out of disk space elog level
Next
From: Ted Yu
Date:
Subject: Re: closing file in adjust_data_dir