Re: Modest proposal to extend TableAM API for controlling cluster commands - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Modest proposal to extend TableAM API for controlling cluster commands
Date
Msg-id 20220616031850.phyyk2qbazhwcusy@alap3.anarazel.de
Whole thread Raw
In response to Re: Modest proposal to extend TableAM API for controlling cluster commands  (Mark Dilger <mark.dilger@enterprisedb.com>)
Responses Re: Modest proposal to extend TableAM API for controlling cluster commands
Re: Modest proposal to extend TableAM API for controlling cluster commands
List pgsql-hackers
Hi,

On 2022-06-15 20:10:30 -0700, Mark Dilger wrote:
> > On Jun 15, 2022, at 7:30 PM, Andres Freund <andres@anarazel.de> wrote:
> >> But it's up to the TAM what it wants to do with that boolean, if in fact it
> >> does anything at all based on that.  A TAM could decide to do the exact
> >> opposite of what Heap-AM does and instead sort on VACUUM FULL but not sort
> >> on CLUSTER, or perhaps perform a randomized shuffle, or <insert your weird
> >> behavior here>.
> > 
> > That's bogus. Yes, an AM can do stupid stuff in a callback. But so what,
> > that's possible with all extension APIs.
> 
> I don't think it's "stupid stuff".  A major motivation, perhaps the only
> useful motivation, for implementing a TAM is to get a non-trivial
> performance boost (relative to heap) for some target workload, almost
> certainly at the expense of worse performance for another workload.  To
> optimize any particular workload sufficiently to make it worth the bother,
> you've pretty much got to do something meaningfully different than what heap
> does.

Sure. I just don't see what that has to do with doing something widely
differing in VACUUM FULL. Which AM can't support that? I guess there's some
where implementing the full visibility semantics isn't feasible, but that's
afaics OK.


> >> From the point-of-view of a TAM implementor, VACUUM FULL and CLUSTER are
> >> synonyms.  Or am I missing something?
> > 
> > The callback gets passed use_sort. So just implement it use_sort = false and
> > error out if use_sort = true?
> 
> I'm not going to say that your idea is unreasonable for a TAM that you might
> choose to implement, but I don't see why that should be required of all TAMs
> anybody might ever implement.

> The callback that gets use_sort is called from copy_table_data().  By that time, it's too late to avoid the
> 
>     /*
>      * Open the relations we need.
>      */
>     NewHeap = table_open(OIDNewHeap, AccessExclusiveLock);
>     OldHeap = table_open(OIDOldHeap, AccessExclusiveLock);
> 
> code that has already happened in cluster.c's copy_table_data() function,
> and unless I raise an error, after returning from my TAM's callback, the
> cluster code will replace the old table with the new one.  I'm left no
> choices but to copy my data over, loose my data, or abort the command.  None
> of those are OK options for me.

I think you need to do a bit more explaining of what you're actually trying to
achieve here. You're just saying "I don't want to", which doesn't really help
me to understand the set of useful options.


> I'm open to different solutions.  If a simple callback like
> relation_supports_cluster(Relation rel) is too simplistic, and more
> parameters with more context information is wanted, then fine, let's do
> that.

FWIW, I want to go *simpler* if anything, not more complicated. I.e. make the
relation_copy_for_cluster optional.

I still think it's a terrible idea to silently ignore tables in CLUSTER or
VACUUM FULL.


> But I can't really complete my work with the interface as it stands
> now.

Since you've not described that work to a meaningful degree...

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: warn if GUC set to an invalid shared library
Next
From: Michael Paquier
Date:
Subject: Re: [Proposal] pg_rewind integration into core