Re: Modest proposal to extend TableAM API for controlling cluster commands - Mailing list pgsql-hackers

From David G. Johnston
Subject Re: Modest proposal to extend TableAM API for controlling cluster commands
Date
Msg-id CAKFQuwY_xuyeRDXJ9pOMGTDurQ=c6BY7tJFw9Tu5weLETgkyBg@mail.gmail.com
Whole thread Raw
In response to Re: Modest proposal to extend TableAM API for controlling cluster commands  (Mark Dilger <mark.dilger@enterprisedb.com>)
Responses Re: Modest proposal to extend TableAM API for controlling cluster commands
List pgsql-hackers
On Wed, Jun 15, 2022 at 11:23 PM Mark Dilger <mark.dilger@enterprisedb.com> wrote:

> On Jun 15, 2022, at 8:50 PM, David G. Johnston <david.g.johnston@gmail.com> wrote:
>
> On Wed, Jun 15, 2022 at 8:18 PM Andres Freund <andres@anarazel.de> wrote:
> > If a simple callback like
> > relation_supports_cluster(Relation rel) is too simplistic
>
> Seems like it should be called: relation_supports_compaction[_by_removal_of_interspersed_dead_tuples]

Ok.

> Basically, if the user tells the table to make itself smaller on disk by removing dead tuples, should we support the case where the Table AM says: "Sorry, I cannot do that"?

I submit that's the only sane thing to do if the table AM already guarantees that the table will always be fully compacted.  There is no justification for forcing the table contents to be copied without benefit.

I accept that this is a valid outcome that should be accommodated for.

> If yes, then naming the table explicitly should elicit an error.  Having the table chosen implicitly should provoke a warning.  For ALTER TABLE CLUSTER there should be an error: which makes the implicit CLUSTER command a non-factor.

I'm basically fine with how you would design the TAM, but I'm going to argue again that the core project should not dictate these decisions.  The TAM's relation_supports_compaction() function can return true/false, or raise an error.  If raising an error is the right action, the TAM can do that. 
 
If the core code makes that decision, the TAM can't override, and that paints TAM authors into a corner.

The core code has to decide what the template pattern code looks like, including what things it will provide and what it requires extensions to provide.  To a large extent, providing a consistent end-user experience is the template's, and thus core code's, job.
 
How about a TAM that implements a write-once, read-many logic.  You get one multi-insert, and forever after you can't modify it (other than to drop the table, or perhaps to truncate it).

So now the AM wants to ignore ALTER TABLE, INSERT, and DELETE commands.

  That's a completely made-up-on-the-spot example, but it's not entirely without merit.

In what sense does this made-up TAM conflict with mvcc and wal?  It doesn't have all the features of heap, but that's not the same thing as violating mvcc or breaking wal.


I am nowhere near informed enough to speak to the implementation details here, and my imagination is probably lacking too, but I'll accept that the current system does indeed make assumptions in the template design that are now being seen as incorrect in light of new algorithms.

But you are basically proposing a reworking of the existing system into one that makes pretty much any SQL Command something that a TAM can treat as being an optional request by the user; whereas today the system presumes that the implementations will respond to these commands.  And to make this change without any core code having such a need. Or even a working extension that can be incorporated during development.  And, as per the above, all of this requires coming to some kind of agreement on the desired user experience (I don't immediately accept the "let the AM decide" option).

Anyway, that was mostly my attempt at Devil's Advocate.
I was going to originally post that the template simply inspect whether the new physical relation file, after the copy was requested, had a non-zero size, and if so finish performing the swap the way we do today, otherwise basically abort (or otherwise perform the minimal amount of catalog changes) so the existing relation file continues to be pointed at.  Something to consider with a smaller API footprint than a gatekeeper hook.

I think that all boils down to - it seems preferable to simply continue assuming all these commands are accepted, but figure out whether a "no-op" is a valid outcome and, if so, ensure there is a way to identify that no-op meaningfully.  While hopefully designing the surrounding code so that unnecessary work is not performed in front of a no-op.  This seems preferable to spreading hooks throughout the code that basically ask "do you handle this SQL command?".  The specifics of the existing code may dictate otherwise.

David J.

pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Modest proposal to extend TableAM API for controlling cluster commands
Next
From: John Naylor
Date:
Subject: Re: [PoC] Improve dead tuple storage for lazy vacuum