Re: Modest proposal to extend TableAM API for controlling cluster commands - Mailing list pgsql-hackers

From Mark Dilger
Subject Re: Modest proposal to extend TableAM API for controlling cluster commands
Date
Msg-id FEBF35DC-8EC1-4D3A-9232-BA337FBCF423@enterprisedb.com
Whole thread Raw
In response to Re: Modest proposal to extend TableAM API for controlling cluster commands  (Andres Freund <andres@anarazel.de>)
Responses Re: Modest proposal to extend TableAM API for controlling cluster commands
List pgsql-hackers

> On Jun 15, 2022, at 7:30 PM, Andres Freund <andres@anarazel.de> wrote:
>
>> It's effectively a synonym which determines whether the "bool use_sort"
>> parameter of the table AM's relation_copy_for_cluster will be set.  Heap-AM
>> plays along and sorts or not based on that.
>
> Hardly a synonym if it behaves differently?

I don't think this point is really worth arguing.  We don't have to call it a synonym, and the rest of the discussion
remainsthe same. 

>> But it's up to the TAM what it wants to do with that boolean, if in fact it
>> does anything at all based on that.  A TAM could decide to do the exact
>> opposite of what Heap-AM does and instead sort on VACUUM FULL but not sort
>> on CLUSTER, or perhaps perform a randomized shuffle, or <insert your weird
>> behavior here>.
>
> That's bogus. Yes, an AM can do stupid stuff in a callback. But so what,
> that's possible with all extension APIs.

I don't think it's "stupid stuff".  A major motivation, perhaps the only useful motivation, for implementing a TAM is
toget a non-trivial performance boost (relative to heap) for some target workload, almost certainly at the expense of
worseperformance for another workload.  To optimize any particular workload sufficiently to make it worth the bother,
you'vepretty much got to do something meaningfully different than what heap does. 


>> From the point-of-view of a TAM implementor, VACUUM FULL and CLUSTER are
>> synonyms.  Or am I missing something?
>
> The callback gets passed use_sort. So just implement it use_sort = false and
> error out if use_sort = true?

I'm not going to say that your idea is unreasonable for a TAM that you might choose to implement, but I don't see why
thatshould be required of all TAMs anybody might ever implement. 

The callback that gets use_sort is called from copy_table_data().  By that time, it's too late to avoid the

    /*
     * Open the relations we need.
     */
    NewHeap = table_open(OIDNewHeap, AccessExclusiveLock);
    OldHeap = table_open(OIDOldHeap, AccessExclusiveLock);

code that has already happened in cluster.c's copy_table_data() function, and unless I raise an error, after returning
frommy TAM's callback, the cluster code will replace the old table with the new one.  I'm left no choices but to copy
mydata over, loose my data, or abort the command.  None of those are OK options for me. 

I'm open to different solutions.  If a simple callback like relation_supports_cluster(Relation rel) is too simplistic,
andmore parameters with more context information is wanted, then fine, let's do that.  But I can't really complete my
workwith the interface as it stands now. 

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company






pgsql-hackers by date:

Previous
From: Kyotaro Horiguchi
Date:
Subject: Re: Using PQexecQuery in pipeline mode produces unexpected Close messages
Next
From: Michael Paquier
Date:
Subject: Re: warn if GUC set to an invalid shared library