Re: Table AM and DROP TABLE [ Was: Table AM and DDLs] - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Table AM and DROP TABLE [ Was: Table AM and DDLs]
Date
Msg-id 20221116180211.g7o3cthoucynfa6a@awork3.anarazel.de
Whole thread Raw
In response to Re: Table AM and DROP TABLE [ Was: Table AM and DDLs]  (Mats Kindahl <mats@timescale.com>)
List pgsql-hackers
Hi,

On 2022-11-16 14:49:59 +0100, Mats Kindahl wrote:
> I think the discussion went a little sideways, so let me recap what I'm
> suggesting:
> 
>    1. I mentioned that there is a missing callback when the filenode is
>    unlinked and this is particularly evident when dropping a table.
>    2. It was correctly pointed out to me that an implementor need to ensure
>    that dropping a table is transactional.
>    3. I argued that the callback is still correct and outlined how this can
>    be handled by a table access method using xact callbacks, if necessary.

I still think 3) isn't a solution to 2). The main issue is that xact callbacks
don't address that storage has to be removed during redo as well. For that
dropping storage has to be integrated with commit / abort records.

I don't think custom a custom WAL rmgr addresses this either - it has to be
integrated with the commit record, and the record has to be replayed as a whole.


> I see huge potential in the table access method and would like to do my
> part in helping it in succeeding. I noted that the API is biased in how the
> existing heap implementation works and is also very focused on
> implementations of "local" storage engines. For more novel architectures
> (for example, various sorts of distributed architectures) and to be easier
> to work with, I think that the API can be improved in a few places. This is
> a first step in the direction of making the API both easier to use as well
> as enabling more novel use-cases.

I agree with that - there's lots more work to be done and the evolution from
not having the abstraction clearly shows. Some of the deficiencies are easy to
fix, but others are there because there's no quick solution to them.


> > To me it still seems fundamentally the wrong direction to implement a "drop
> > relation callback" tableam callback.
> >
> 
> This is not really a "drop table" callback, it is just the most obvious
> case where this is missing. So, just to recap the situation as it looks
> right now.
> 
> Here is (transactional) truncate table:
> 
>    1. Allocate a new file node in the same tablespace as the table
>    2. Add the file node to the list of pending node to delete
>    3. Overwrite the existing file node in the relation with the new one
>    4. Call table_relation_set_new_filenode to tell extension that there is
>    a new filenode for the relation
> 
> Here is drop table:
> 
>    1. Add the existing file node to the list of pending deletes
>    2. Remove the table from the catalogs
> 
> For an extension writer, the disappearance of the old file node is
> "invisible" since there is no callback about this, but it is very clear
> when a new file node is allocated. In addition to being inconsistent, it
> adds an extra burden on the extension writer. To notice that a file node
> has been unlinked you can register a transaction handler and investigate
> the pending list at commit or abort time. Even though possible, there are
> two problems with this: 1) the table access method is notified "late" in
> the transaction that the file node is going away, and 2) it is
> unnecessarily complicated to register a transaction handler only for
> inspecting this.

Using a transaction callback solely also doesn't address the redo issue...


I think to make this viable for filenodes that don't look like md.c's, you'd
have to add a way to make commit/abort records extensible by custom
rmgrs. Then the replay of a commit/abort could call the custom rmgr for the
"sub-record" during replay.


> Telling the access method that the filenode is unlinked by adding a
> callback is by far the best solution since it does not affect existing
> extensions and will give the table access methods opportunities to act on
> it immediately.

I'm loathe to add a callback that I don't think can be used correctly without
further changes.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Add sub-transaction overflow status in pg_stat_activity
Next
From: Andres Freund
Date:
Subject: Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)