Thread: Modest proposal to extend TableAM API for controlling cluster commands

Modest proposal to extend TableAM API for controlling cluster commands

From
Mark Dilger
Date:
Hackers,

While developing various Table Access Methods, I have wanted a callback for determining if CLUSTER (and VACUUM FULL)
shouldbe run against a table backed by a given TAM.  The current API contains a callback for doing the guts of the
cluster,but by that time, it's a bit too late to cleanly back out.  For single relation cluster commands, raising an
errorfrom that callback is probably not too bad.  For multi-relation cluster commands, that aborts the clustering of
otheryet to be processed relations, which doesn't seem acceptable.  I tried fixing this with a ProcessUtility_hook, but
thatfires before the multi-relation cluster command has compiled the list of relations to cluster, meaning the
ProcessUtility_hookdoesn't have access to the necessary information.  (It can be hacked to compile the list of
relationsitself, but that duplicates both code and effort, with the usual risks that the code will get out of sync.) 

For my personal development, I have declared a new hook, bool (*relation_supports_cluster) (Relation rel).  It gets
calledfrom commands/cluster.c in both the single-relation and multi-relation code paths, with warning or debug log
messagesoutput for relations that decline to be clustered, respectively. 

Before posting a patch, do people think this sounds useful?  Would you like the hook function signature to differ in
someway?  Is a simple "yes this relation may be clustered" vs. "no this relation may not be clustered" interface overly
simplistic?

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company






Re: Modest proposal to extend TableAM API for controlling cluster commands

From
Andres Freund
Date:
Hi,

On 2022-06-15 17:21:56 -0700, Mark Dilger wrote:
> While developing various Table Access Methods, I have wanted a callback for
> determining if CLUSTER (and VACUUM FULL) should be run against a table
> backed by a given TAM.  The current API contains a callback for doing the
> guts of the cluster, but by that time, it's a bit too late to cleanly back
> out.  For single relation cluster commands, raising an error from that
> callback is probably not too bad.  For multi-relation cluster commands, that
> aborts the clustering of other yet to be processed relations, which doesn't
> seem acceptable.

Why not? What else do you want to do in that case? Silently ignoring
non-clusterable tables doesn't seem right either. What's the use-case for
swallowing the error?


> I tried fixing this with a ProcessUtility_hook, but that
> fires before the multi-relation cluster command has compiled the list of
> relations to cluster, meaning the ProcessUtility_hook doesn't have access to
> the necessary information.  (It can be hacked to compile the list of
> relations itself, but that duplicates both code and effort, with the usual
> risks that the code will get out of sync.)
> 
> For my personal development, I have declared a new hook, bool
> (*relation_supports_cluster) (Relation rel).  It gets called from
> commands/cluster.c in both the single-relation and multi-relation code
> paths, with warning or debug log messages output for relations that decline
> to be clustered, respectively.

Do you actually need to dynamically decide whether CLUSTER is supported?
Otherwise we could just make the existing cluster callback optional and error
out if a table is clustered that doesn't have the callback.


> Before posting a patch, do people think this sounds useful?  Would you like
> the hook function signature to differ in some way?  Is a simple "yes this
> relation may be clustered" vs. "no this relation may not be clustered"
> interface overly simplistic?

It seems overly complicated, if anything ;)

Greetings,

Andres Freund




> On Jun 15, 2022, at 6:01 PM, Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2022-06-15 17:21:56 -0700, Mark Dilger wrote:
>> While developing various Table Access Methods, I have wanted a callback for
>> determining if CLUSTER (and VACUUM FULL) should be run against a table
>> backed by a given TAM.  The current API contains a callback for doing the
>> guts of the cluster, but by that time, it's a bit too late to cleanly back
>> out.  For single relation cluster commands, raising an error from that
>> callback is probably not too bad.  For multi-relation cluster commands, that
>> aborts the clustering of other yet to be processed relations, which doesn't
>> seem acceptable.
>
> Why not? What else do you want to do in that case? Silently ignoring
> non-clusterable tables doesn't seem right either. What's the use-case for
> swallowing the error?

Imagine you develop a TAM for which the concept of "clustering" doesn't have any defined meaning.  Perhaps you've
arrangedthe data in a way that has no similarity to heap, and now somebody runs a CLUSTER command (with no arguments.)
It'sreasonable that they want all their heap tables to get the usual cluster_rel treatment, and that the existence of a
tableusing your exotic TAM shouldn't interfere with that.  Then what?  You are forced to copy all the data from your
OldHeap(badly named) to the NewHeap (also badly named), or to raise an error.  That doesn't seem ok. 

>> I tried fixing this with a ProcessUtility_hook, but that
>> fires before the multi-relation cluster command has compiled the list of
>> relations to cluster, meaning the ProcessUtility_hook doesn't have access to
>> the necessary information.  (It can be hacked to compile the list of
>> relations itself, but that duplicates both code and effort, with the usual
>> risks that the code will get out of sync.)
>>
>> For my personal development, I have declared a new hook, bool
>> (*relation_supports_cluster) (Relation rel).  It gets called from
>> commands/cluster.c in both the single-relation and multi-relation code
>> paths, with warning or debug log messages output for relations that decline
>> to be clustered, respectively.
>
> Do you actually need to dynamically decide whether CLUSTER is supported?
> Otherwise we could just make the existing cluster callback optional and error
> out if a table is clustered that doesn't have the callback.

Same as above, I don't know why erroring would be the right thing to do.  As a comparison, consider that we don't
attemptto cluster a partitioned table, but rather just silently skip it.  Imagine if, when we introduced the concept of
partitionedtables, we made unqualified CLUSTER commands always fail when they encountered a partitioned table. 

>> Before posting a patch, do people think this sounds useful?  Would you like
>> the hook function signature to differ in some way?  Is a simple "yes this
>> relation may be clustered" vs. "no this relation may not be clustered"
>> interface overly simplistic?
>
> It seems overly complicated, if anything ;)

For the TAMs I've developed thus far, I don't need the (Relation rel) parameter, and could just have easily used
(void). But that seems to fence in what other TAM authors could do in future. 

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company






Re: Modest proposal to extend TableAM API for controlling cluster commands

From
Andres Freund
Date:
Hi,

On 2022-06-15 18:24:45 -0700, Mark Dilger wrote:
> > On Jun 15, 2022, at 6:01 PM, Andres Freund <andres@anarazel.de> wrote:
> > On 2022-06-15 17:21:56 -0700, Mark Dilger wrote:
> >> While developing various Table Access Methods, I have wanted a callback for
> >> determining if CLUSTER (and VACUUM FULL) should be run against a table
> >> backed by a given TAM.  The current API contains a callback for doing the
> >> guts of the cluster, but by that time, it's a bit too late to cleanly back
> >> out.  For single relation cluster commands, raising an error from that
> >> callback is probably not too bad.  For multi-relation cluster commands, that
> >> aborts the clustering of other yet to be processed relations, which doesn't
> >> seem acceptable.
> > 
> > Why not? What else do you want to do in that case? Silently ignoring
> > non-clusterable tables doesn't seem right either. What's the use-case for
> > swallowing the error?
> 
> Imagine you develop a TAM for which the concept of "clustering" doesn't have
> any defined meaning.  Perhaps you've arranged the data in a way that has no
> similarity to heap, and now somebody runs a CLUSTER command (with no
> arguments.)

I think nothing would happen in this case - only pre-clustered tables get
clustered in an argumentless CLUSTER. What am I missing?

Greetings,

Andres Freund




> On Jun 15, 2022, at 6:55 PM, Andres Freund <andres@anarazel.de> wrote:
>
> I think nothing would happen in this case - only pre-clustered tables get
> clustered in an argumentless CLUSTER. What am I missing?

The "VACUUM FULL" synonym of "CLUSTER" doesn't depend on whether the target is pre-clustered, and both will run against
thetable if the user has run an ALTER TABLE..CLUSTER ON.  Now, we could try to catch that latter command with a utility
hook,but since the VACUUM FULL is still problematic, it seems cleaner to just add the callback I am proposing. 

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company






Re: Modest proposal to extend TableAM API for controlling cluster commands

From
Andres Freund
Date:
Hi,

On 2022-06-15 19:07:50 -0700, Mark Dilger wrote:
> > On Jun 15, 2022, at 6:55 PM, Andres Freund <andres@anarazel.de> wrote:
> > 
> > I think nothing would happen in this case - only pre-clustered tables get
> > clustered in an argumentless CLUSTER. What am I missing?
> 
> The "VACUUM FULL" synonym of "CLUSTER" doesn't depend on whether the target
> is pre-clustered

VACUUM FULL isn't a synonym of CLUSTER. While a good bit of the implementation
is shared, VACUUM FULL doesn't order the table contents. I see now reason why
an AM shouldn't support VACUUM FULL?


> , and both will run against the table if the user has run an ALTER
> TABLE..CLUSTER ON.

If a user does that for a table that doesn't support clustering, well, I don't
see what's gained by not erroring out.

Greetings,

Andres Freund




> On Jun 15, 2022, at 7:14 PM, Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2022-06-15 19:07:50 -0700, Mark Dilger wrote:
>>> On Jun 15, 2022, at 6:55 PM, Andres Freund <andres@anarazel.de> wrote:
>>>
>>> I think nothing would happen in this case - only pre-clustered tables get
>>> clustered in an argumentless CLUSTER. What am I missing?
>>
>> The "VACUUM FULL" synonym of "CLUSTER" doesn't depend on whether the target
>> is pre-clustered
>
> VACUUM FULL isn't a synonym of CLUSTER. While a good bit of the implementation
> is shared, VACUUM FULL doesn't order the table contents. I see now reason why
> an AM shouldn't support VACUUM FULL?

It's effectively a synonym which determines whether the "bool use_sort" parameter of the table AM's
relation_copy_for_clusterwill be set.  Heap-AM plays along and sorts or not based on that.  But it's up to the TAM what
itwants to do with that boolean, if in fact it does anything at all based on that.  A TAM could decide to do the exact
oppositeof what Heap-AM does and instead sort on VACUUM FULL but not sort on CLUSTER, or perhaps perform a randomized
shuffle,or <insert your weird behavior here>.  From the point-of-view of a TAM implementor, VACUUM FULL and CLUSTER are
synonyms. Or am I missing something? 

>> , and both will run against the table if the user has run an ALTER
>> TABLE..CLUSTER ON.
>
> If a user does that for a table that doesn't support clustering, well, I don't
> see what's gained by not erroring out.

Perhaps they want to give the TAM information about which index to use for sorting, on those occasions when the TAM's
logicdictates that sorting is appropriate, but not in response to a cluster command. 

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







> On Jun 15, 2022, at 7:21 PM, Mark Dilger <mark.dilger@enterprisedb.com> wrote:
>
>> If a user does that for a table that doesn't support clustering, well, I don't
>> see what's gained by not erroring out.
>
> Perhaps they want to give the TAM information about which index to use for sorting, on those occasions when the TAM's
logicdictates that sorting is appropriate, but not in response to a cluster command. 

I should admit that this is a bit hack-ish, but TAM authors haven't been left a lot of options here.  Index AMs allow
forcustom storage parameters, but Table AMs don't, so getting information to the TAM about how to behave takes more
thana little slight of hand.  Simon's proposal from a while back (don't have the link just now) to allow TAMs to define
customstorage parameters would go some distance here. 

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company






Re: Modest proposal to extend TableAM API for controlling cluster commands

From
Andres Freund
Date:
Hi,

On 2022-06-15 19:21:42 -0700, Mark Dilger wrote:
> > On Jun 15, 2022, at 7:14 PM, Andres Freund <andres@anarazel.de> wrote:
> > On 2022-06-15 19:07:50 -0700, Mark Dilger wrote:
> >>> On Jun 15, 2022, at 6:55 PM, Andres Freund <andres@anarazel.de> wrote:
> >>> 
> >>> I think nothing would happen in this case - only pre-clustered tables get
> >>> clustered in an argumentless CLUSTER. What am I missing?
> >> 
> >> The "VACUUM FULL" synonym of "CLUSTER" doesn't depend on whether the target
> >> is pre-clustered
> > 
> > VACUUM FULL isn't a synonym of CLUSTER. While a good bit of the implementation
> > is shared, VACUUM FULL doesn't order the table contents. I see now reason why
> > an AM shouldn't support VACUUM FULL?
> 
> It's effectively a synonym which determines whether the "bool use_sort"
> parameter of the table AM's relation_copy_for_cluster will be set.  Heap-AM
> plays along and sorts or not based on that.

Hardly a synonym if it behaves differently?


> But it's up to the TAM what it wants to do with that boolean, if in fact it
> does anything at all based on that.  A TAM could decide to do the exact
> opposite of what Heap-AM does and instead sort on VACUUM FULL but not sort
> on CLUSTER, or perhaps perform a randomized shuffle, or <insert your weird
> behavior here>.

That's bogus. Yes, an AM can do stupid stuff in a callback. But so what,
that's possible with all extension APIs.



> From the point-of-view of a TAM implementor, VACUUM FULL and CLUSTER are
> synonyms.  Or am I missing something?

The callback gets passed use_sort. So just implement it use_sort = false and
error out if use_sort = true?


> >> , and both will run against the table if the user has run an ALTER
> >> TABLE..CLUSTER ON.
> > 
> > If a user does that for a table that doesn't support clustering, well, I don't
> > see what's gained by not erroring out.
> 
> Perhaps they want to give the TAM information about which index to use for
> sorting, on those occasions when the TAM's logic dictates that sorting is
> appropriate, but not in response to a cluster command.

I have little sympathy to randomly misusing catalog contents and then
complaining that those catalog contents have an effect.

Greetings,

Andres Freund




> On Jun 15, 2022, at 7:30 PM, Andres Freund <andres@anarazel.de> wrote:
>
>> It's effectively a synonym which determines whether the "bool use_sort"
>> parameter of the table AM's relation_copy_for_cluster will be set.  Heap-AM
>> plays along and sorts or not based on that.
>
> Hardly a synonym if it behaves differently?

I don't think this point is really worth arguing.  We don't have to call it a synonym, and the rest of the discussion
remainsthe same. 

>> But it's up to the TAM what it wants to do with that boolean, if in fact it
>> does anything at all based on that.  A TAM could decide to do the exact
>> opposite of what Heap-AM does and instead sort on VACUUM FULL but not sort
>> on CLUSTER, or perhaps perform a randomized shuffle, or <insert your weird
>> behavior here>.
>
> That's bogus. Yes, an AM can do stupid stuff in a callback. But so what,
> that's possible with all extension APIs.

I don't think it's "stupid stuff".  A major motivation, perhaps the only useful motivation, for implementing a TAM is
toget a non-trivial performance boost (relative to heap) for some target workload, almost certainly at the expense of
worseperformance for another workload.  To optimize any particular workload sufficiently to make it worth the bother,
you'vepretty much got to do something meaningfully different than what heap does. 


>> From the point-of-view of a TAM implementor, VACUUM FULL and CLUSTER are
>> synonyms.  Or am I missing something?
>
> The callback gets passed use_sort. So just implement it use_sort = false and
> error out if use_sort = true?

I'm not going to say that your idea is unreasonable for a TAM that you might choose to implement, but I don't see why
thatshould be required of all TAMs anybody might ever implement. 

The callback that gets use_sort is called from copy_table_data().  By that time, it's too late to avoid the

    /*
     * Open the relations we need.
     */
    NewHeap = table_open(OIDNewHeap, AccessExclusiveLock);
    OldHeap = table_open(OIDOldHeap, AccessExclusiveLock);

code that has already happened in cluster.c's copy_table_data() function, and unless I raise an error, after returning
frommy TAM's callback, the cluster code will replace the old table with the new one.  I'm left no choices but to copy
mydata over, loose my data, or abort the command.  None of those are OK options for me. 

I'm open to different solutions.  If a simple callback like relation_supports_cluster(Relation rel) is too simplistic,
andmore parameters with more context information is wanted, then fine, let's do that.  But I can't really complete my
workwith the interface as it stands now. 

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company






Re: Modest proposal to extend TableAM API for controlling cluster commands

From
Andres Freund
Date:
Hi,

On 2022-06-15 20:10:30 -0700, Mark Dilger wrote:
> > On Jun 15, 2022, at 7:30 PM, Andres Freund <andres@anarazel.de> wrote:
> >> But it's up to the TAM what it wants to do with that boolean, if in fact it
> >> does anything at all based on that.  A TAM could decide to do the exact
> >> opposite of what Heap-AM does and instead sort on VACUUM FULL but not sort
> >> on CLUSTER, or perhaps perform a randomized shuffle, or <insert your weird
> >> behavior here>.
> > 
> > That's bogus. Yes, an AM can do stupid stuff in a callback. But so what,
> > that's possible with all extension APIs.
> 
> I don't think it's "stupid stuff".  A major motivation, perhaps the only
> useful motivation, for implementing a TAM is to get a non-trivial
> performance boost (relative to heap) for some target workload, almost
> certainly at the expense of worse performance for another workload.  To
> optimize any particular workload sufficiently to make it worth the bother,
> you've pretty much got to do something meaningfully different than what heap
> does.

Sure. I just don't see what that has to do with doing something widely
differing in VACUUM FULL. Which AM can't support that? I guess there's some
where implementing the full visibility semantics isn't feasible, but that's
afaics OK.


> >> From the point-of-view of a TAM implementor, VACUUM FULL and CLUSTER are
> >> synonyms.  Or am I missing something?
> > 
> > The callback gets passed use_sort. So just implement it use_sort = false and
> > error out if use_sort = true?
> 
> I'm not going to say that your idea is unreasonable for a TAM that you might
> choose to implement, but I don't see why that should be required of all TAMs
> anybody might ever implement.

> The callback that gets use_sort is called from copy_table_data().  By that time, it's too late to avoid the
> 
>     /*
>      * Open the relations we need.
>      */
>     NewHeap = table_open(OIDNewHeap, AccessExclusiveLock);
>     OldHeap = table_open(OIDOldHeap, AccessExclusiveLock);
> 
> code that has already happened in cluster.c's copy_table_data() function,
> and unless I raise an error, after returning from my TAM's callback, the
> cluster code will replace the old table with the new one.  I'm left no
> choices but to copy my data over, loose my data, or abort the command.  None
> of those are OK options for me.

I think you need to do a bit more explaining of what you're actually trying to
achieve here. You're just saying "I don't want to", which doesn't really help
me to understand the set of useful options.


> I'm open to different solutions.  If a simple callback like
> relation_supports_cluster(Relation rel) is too simplistic, and more
> parameters with more context information is wanted, then fine, let's do
> that.

FWIW, I want to go *simpler* if anything, not more complicated. I.e. make the
relation_copy_for_cluster optional.

I still think it's a terrible idea to silently ignore tables in CLUSTER or
VACUUM FULL.


> But I can't really complete my work with the interface as it stands
> now.

Since you've not described that work to a meaningful degree...

Greetings,

Andres Freund



Re: Modest proposal to extend TableAM API for controlling cluster commands

From
"David G. Johnston"
Date:
On Wed, Jun 15, 2022 at 8:18 PM Andres Freund <andres@anarazel.de> wrote:
> If a simple callback like
> relation_supports_cluster(Relation rel) is too simplistic

Seems like it should be called: relation_supports_compaction[_by_removal_of_interspersed_dead_tuples]

Basically, if the user tells the table to make itself smaller on disk by removing dead tuples, should we support the case where the Table AM says: "Sorry, I cannot do that"?

If yes, then naming the table explicitly should elicit an error.  Having the table chosen implicitly should provoke a warning.  For ALTER TABLE CLUSTER there should be an error: which makes the implicit CLUSTER command a non-factor.

However, given that should the table structure change it is imperative that the Table AM be capable of producing a new physical relation with the correct data, which will have been compacted as a side-effect, it seems like, explicit or implicit, expecting any Table AM to do that when faced with Vacuum Full is reasonable.  Which leaves deciding how to allow a table with a given TAM to prevent itself from being added to the CLUSTER roster.  And decide whether an opt-out feature for implicit VACUUM FULL is something we should offer as well.

I'm doubtful that a TAM that is pluggable into the MVCC and WAL architecture of PostgreSQL could avoid this basic contract between the system and its users.

David J.


> On Jun 15, 2022, at 8:18 PM, Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2022-06-15 20:10:30 -0700, Mark Dilger wrote:
>>> On Jun 15, 2022, at 7:30 PM, Andres Freund <andres@anarazel.de> wrote:
>>>> But it's up to the TAM what it wants to do with that boolean, if in fact it
>>>> does anything at all based on that.  A TAM could decide to do the exact
>>>> opposite of what Heap-AM does and instead sort on VACUUM FULL but not sort
>>>> on CLUSTER, or perhaps perform a randomized shuffle, or <insert your weird
>>>> behavior here>.
>>>
>>> That's bogus. Yes, an AM can do stupid stuff in a callback. But so what,
>>> that's possible with all extension APIs.
>>
>> I don't think it's "stupid stuff".  A major motivation, perhaps the only
>> useful motivation, for implementing a TAM is to get a non-trivial
>> performance boost (relative to heap) for some target workload, almost
>> certainly at the expense of worse performance for another workload.  To
>> optimize any particular workload sufficiently to make it worth the bother,
>> you've pretty much got to do something meaningfully different than what heap
>> does.
>
> Sure. I just don't see what that has to do with doing something widely
> differing in VACUUM FULL. Which AM can't support that? I guess there's some
> where implementing the full visibility semantics isn't feasible, but that's
> afaics OK.

The problem isn't that the TAM can't do it.  Any TAM can probably copy its contents verbatim from the OldHeap to the
NewHeap. But it's really stupid to have to do that if you're not going to change anything along the way, and I think if
youdivorce your thinking from how heap-AM works sufficiently, you might start to see how other TAMs might have nothing
usefulto do at this step.  So there's a strong motivation to not be forced into a "move all the data uselessly" step. 

I don't really want to discuss the proprietary details of any TAMs I'm developing, so I'll use some made up examples.
Imagineyou have decided to trade off the need to vacuum against the cost of storing 64bit xids.  That doesn't mean that
compactionisn't maybe still a good thing, but you don't need to vacuum for anti-wraparound purposes anymore.  Now
imagineyou've also decided to trade off insert speed for select speed, and you sort the entire table on every insert,
andto keep indexes happy, you keep a "externally visible TID" to "internal actual location" mapping in a separate fork.
 Let's say you also don't support UPDATE or DELETE at all.  Those immediately draw an error, "not implemented by this
tam".

At this point, you have a fully sorted and completely compacted table at all times.  It's simply an invariant of the
TAM. So, now what exactly is VACUUM FULL or CLUSTER supposed to do?  Seems like the answer is "diddly squat", and yet
youseem to propose requiring the TAM to do it.  I don't like that. 

>>>> From the point-of-view of a TAM implementor, VACUUM FULL and CLUSTER are
>>>> synonyms.  Or am I missing something?
>>>
>>> The callback gets passed use_sort. So just implement it use_sort = false and
>>> error out if use_sort = true?
>>
>> I'm not going to say that your idea is unreasonable for a TAM that you might
>> choose to implement, but I don't see why that should be required of all TAMs
>> anybody might ever implement.
>
>> The callback that gets use_sort is called from copy_table_data().  By that time, it's too late to avoid the
>>
>>    /*
>>     * Open the relations we need.
>>     */
>>    NewHeap = table_open(OIDNewHeap, AccessExclusiveLock);
>>    OldHeap = table_open(OIDOldHeap, AccessExclusiveLock);
>>
>> code that has already happened in cluster.c's copy_table_data() function,
>> and unless I raise an error, after returning from my TAM's callback, the
>> cluster code will replace the old table with the new one.  I'm left no
>> choices but to copy my data over, loose my data, or abort the command.  None
>> of those are OK options for me.
>
> I think you need to do a bit more explaining of what you're actually trying to
> achieve here. You're just saying "I don't want to", which doesn't really help
> me to understand the set of useful options.

I'm trying to opt out of cluster/vacfull.

>> I'm open to different solutions.  If a simple callback like
>> relation_supports_cluster(Relation rel) is too simplistic, and more
>> parameters with more context information is wanted, then fine, let's do
>> that.
>
> FWIW, I want to go *simpler* if anything, not more complicated. I.e. make the
> relation_copy_for_cluster optional.
>
> I still think it's a terrible idea to silently ignore tables in CLUSTER or
> VACUUM FULL.

I'm not entirely against you on that, but it makes me cringe that we impose design decisions like that on any and all
futureTAMs.  It seems better to me to let the TAM author decide to emit an error, warning, notice, or whatever, as they
seefit. 

>> But I can't really complete my work with the interface as it stands
>> now.
>
> Since you've not described that work to a meaningful degree...

I don't think I should have to do so.  It's like saying, "I think I should have freedom of speech", and you say, "well,
I'mnot sure about that; tell me what you want to say, and I'll decide if I'm going to let you say it".  That's not
freedom. I think TAM authors should have broad discretion over anything that the core system doesn't have a compelling
interestin controlling.  You've not yet said why a TAM should be prohibited from opting out of cluster/vacfull. 

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







> On Jun 15, 2022, at 8:50 PM, David G. Johnston <david.g.johnston@gmail.com> wrote:
>
> On Wed, Jun 15, 2022 at 8:18 PM Andres Freund <andres@anarazel.de> wrote:
> > If a simple callback like
> > relation_supports_cluster(Relation rel) is too simplistic
>
> Seems like it should be called: relation_supports_compaction[_by_removal_of_interspersed_dead_tuples]

Ok.

> Basically, if the user tells the table to make itself smaller on disk by removing dead tuples, should we support the
casewhere the Table AM says: "Sorry, I cannot do that"? 

I submit that's the only sane thing to do if the table AM already guarantees that the table will always be fully
compacted. There is no justification for forcing the table contents to be copied without benefit. 

> If yes, then naming the table explicitly should elicit an error.  Having the table chosen implicitly should provoke a
warning. For ALTER TABLE CLUSTER there should be an error: which makes the implicit CLUSTER command a non-factor. 

I'm basically fine with how you would design the TAM, but I'm going to argue again that the core project should not
dictatethese decisions.  The TAM's relation_supports_compaction() function can return true/false, or raise an error.
Ifraising an error is the right action, the TAM can do that.  If the core code makes that decision, the TAM can't
override,and that paints TAM authors into a corner. 

> However, given that should the table structure change it is imperative that the Table AM be capable of producing a
newphysical relation with the correct data, which will have been compacted as a side-effect, it seems like, explicit or
implicit,expecting any Table AM to do that when faced with Vacuum Full is reasonable.  Which leaves deciding how to
allowa table with a given TAM to prevent itself from being added to the CLUSTER roster.  And decide whether an opt-out
featurefor implicit VACUUM FULL is something we should offer as well. 
>
> I'm doubtful that a TAM that is pluggable into the MVCC and WAL architecture of PostgreSQL could avoid this basic
contractbetween the system and its users. 

How about a TAM that implements a write-once, read-many logic.  You get one multi-insert, and forever after you can't
modifyit (other than to drop the table, or perhaps to truncate it).  That's a completely made-up-on-the-spot example,
butit's not entirely without merit.  You could avoid a lot of locking overhead when using such a table, since you'd
knowa priori that nobody else is modifying it.  It could also be implemented with a smaller tuple header, since a lot
ofthe header bytes in heap tuples are dedicated to tracking updates.  You wouldn't need a per-row inserting
transaction-Ideither, since you could just store one per table, knowing that all the rows were inserted in the same
transaction.

In what sense does this made-up TAM conflict with mvcc and wal?  It doesn't have all the features of heap, but that's
notthe same thing as violating mvcc or breaking wal. 

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company






Re: Modest proposal to extend TableAM API for controlling cluster commands

From
Andres Freund
Date:
Hi,

On 2022-06-15 22:23:36 -0700, Mark Dilger wrote:
> I'm not entirely against you on that, but it makes me cringe that we impose
> design decisions like that on any and all future TAMs.  It seems better to
> me to let the TAM author decide to emit an error, warning, notice, or
> whatever, as they see fit.

The tradeoff is that that pushes down complexity and makes the overall system
harder to understand. I'm not saying that there's no possible use for such
callbacks / configurability, I'm just not convinced it's worth the cost.


> >> But I can't really complete my work with the interface as it stands
> >> now.
> > 
> > Since you've not described that work to a meaningful degree...
> 
> I don't think I should have to do so.  It's like saying, "I think I should
> have freedom of speech", and you say, "well, I'm not sure about that; tell
> me what you want to say, and I'll decide if I'm going to let you say it".'
> That's not freedom.  I think TAM authors should have broad discretion over
> anything that the core system doesn't have a compelling interest in
> controlling.

That's insultingly ridiculous. You can say, do whatever you want, but that
doesn't mean I have to be convinced by it (i.e. +1 adding an API) - that'd be
compelled speech, to go with your image...

It's utterly normal to be asked what the use case for a new API is when
proposing one.


> You've not yet said why a TAM should be prohibited from opting
> out of cluster/vacfull.

API / behavioural complexity. If we make ever nook and cranny configurable,
we'll have an ever harder to use / administer system (from a user's POV) and
have difficulty understanding the state of the system when writing patches
(from a core PG developer's POV). It might be the right thing in this case -
hence me asking for what the motivation is.

Greetings,

Andres Freund



Re: Modest proposal to extend TableAM API for controlling cluster commands

From
"David G. Johnston"
Date:
On Wed, Jun 15, 2022 at 11:23 PM Mark Dilger <mark.dilger@enterprisedb.com> wrote:

> On Jun 15, 2022, at 8:50 PM, David G. Johnston <david.g.johnston@gmail.com> wrote:
>
> On Wed, Jun 15, 2022 at 8:18 PM Andres Freund <andres@anarazel.de> wrote:
> > If a simple callback like
> > relation_supports_cluster(Relation rel) is too simplistic
>
> Seems like it should be called: relation_supports_compaction[_by_removal_of_interspersed_dead_tuples]

Ok.

> Basically, if the user tells the table to make itself smaller on disk by removing dead tuples, should we support the case where the Table AM says: "Sorry, I cannot do that"?

I submit that's the only sane thing to do if the table AM already guarantees that the table will always be fully compacted.  There is no justification for forcing the table contents to be copied without benefit.

I accept that this is a valid outcome that should be accommodated for.

> If yes, then naming the table explicitly should elicit an error.  Having the table chosen implicitly should provoke a warning.  For ALTER TABLE CLUSTER there should be an error: which makes the implicit CLUSTER command a non-factor.

I'm basically fine with how you would design the TAM, but I'm going to argue again that the core project should not dictate these decisions.  The TAM's relation_supports_compaction() function can return true/false, or raise an error.  If raising an error is the right action, the TAM can do that. 
 
If the core code makes that decision, the TAM can't override, and that paints TAM authors into a corner.

The core code has to decide what the template pattern code looks like, including what things it will provide and what it requires extensions to provide.  To a large extent, providing a consistent end-user experience is the template's, and thus core code's, job.
 
How about a TAM that implements a write-once, read-many logic.  You get one multi-insert, and forever after you can't modify it (other than to drop the table, or perhaps to truncate it).

So now the AM wants to ignore ALTER TABLE, INSERT, and DELETE commands.

  That's a completely made-up-on-the-spot example, but it's not entirely without merit.

In what sense does this made-up TAM conflict with mvcc and wal?  It doesn't have all the features of heap, but that's not the same thing as violating mvcc or breaking wal.


I am nowhere near informed enough to speak to the implementation details here, and my imagination is probably lacking too, but I'll accept that the current system does indeed make assumptions in the template design that are now being seen as incorrect in light of new algorithms.

But you are basically proposing a reworking of the existing system into one that makes pretty much any SQL Command something that a TAM can treat as being an optional request by the user; whereas today the system presumes that the implementations will respond to these commands.  And to make this change without any core code having such a need. Or even a working extension that can be incorporated during development.  And, as per the above, all of this requires coming to some kind of agreement on the desired user experience (I don't immediately accept the "let the AM decide" option).

Anyway, that was mostly my attempt at Devil's Advocate.
I was going to originally post that the template simply inspect whether the new physical relation file, after the copy was requested, had a non-zero size, and if so finish performing the swap the way we do today, otherwise basically abort (or otherwise perform the minimal amount of catalog changes) so the existing relation file continues to be pointed at.  Something to consider with a smaller API footprint than a gatekeeper hook.

I think that all boils down to - it seems preferable to simply continue assuming all these commands are accepted, but figure out whether a "no-op" is a valid outcome and, if so, ensure there is a way to identify that no-op meaningfully.  While hopefully designing the surrounding code so that unnecessary work is not performed in front of a no-op.  This seems preferable to spreading hooks throughout the code that basically ask "do you handle this SQL command?".  The specifics of the existing code may dictate otherwise.

David J.


> On Jun 16, 2022, at 12:28 AM, David G. Johnston <david.g.johnston@gmail.com> wrote:
>
> But you are basically proposing a reworking of the existing system into one that makes pretty much any SQL Command
somethingthat a TAM can treat as being an optional request by the user; 

Yes, and I think I'm perfectly correct in asking for that.  If the standard you are proposing (albeit as Devil's
Advocate)were applied to filesystems, nobody could ever implement /dev/null, on the argument that users have a
reasonableexpectation that data they write to a file will be there for them to read later.  Yet Michael Paquier wrote a
blackholeTAM, and although I don't find it terribly useful, I do think it's a reasonable thing for somebody to be able
towrite.  

> whereas today the system presumes that the implementations will respond to these commands.

That depends on what you mean by "respond to".  A TAM which implements a tamper resistant audit log responds to update
anddelete commands with a "permission denied" error.  A TAM which implements running aggregates implements insert
commandsby computing and inserting a new running aggregate value and reclaiming space from old running aggregate values
whenno transaction could any longer see them.  You can do this stuff at a higher level with hooks, functions, triggers,
andrules, inserting into a heap, and having to periodically vacuum, by why would you want to?  That's almost guaranteed
tobe slower, maybe even orders of magnitude slower.  

> And to make this change without any core code having such a need.

The core code won't have any such need, because the core code is content with heap, and the API already accommodates
heap. It seems Andres moved the project in the direction of allowing custom TAMs when he created the Table AM
interface,and I'm quite pleased that he did so, but it doesn't allow nearly enough flexibility to do all the
interestingthings a TAM could otherwise do.  Consider for example that the multi_insert hook uses a BulkInsertStateData
parameter,defined as:  

typedef struct BulkInsertStateData
{
    BufferAccessStrategy strategy;  /* our BULKWRITE strategy object */
    Buffer      current_buf;    /* current insertion target page */
} BulkInsertStateData;

which is just the structure heap would want, but what about a TAM that wants to route different tuples to different
pages? The "current_buf" isn't enough information, and there's no void *extra field, so you're just sunk. 

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







> On Jun 16, 2022, at 12:27 AM, Andres Freund <andres@anarazel.de> wrote:
>
>> I don't think I should have to do so.  It's like saying, "I think I should
>> have freedom of speech", and you say, "well, I'm not sure about that; tell
>> me what you want to say, and I'll decide if I'm going to let you say it".'
>> That's not freedom.  I think TAM authors should have broad discretion over
>> anything that the core system doesn't have a compelling interest in
>> controlling.
>
> That's insultingly ridiculous. You can say, do whatever you want, but that
> doesn't mean I have to be convinced by it (i.e. +1 adding an API) - that'd be
> compelled speech, to go with your image...

Indeed it would be compelled speech, and I'm not trying to compel you, only to convince you.  And my apologies if it
cameacross as insulting.  I have a lot of respect for you, as do others at EDB, per invariably complementary comments
I'veheard others express. 

> It's utterly normal to be asked what the use case for a new API is when
> proposing one.

It seems like we're talking on two different levels.  I've said what the use case is, which is to implement a TAM that
doesn'tbenefit from cluster or vacuum full, without the overhead of needlessly copying itself, and without causing
argumentlessVACUUM FULL commands to fail.  I'm *emphatically* not asking the community to accept the TAM back as a
patch. The freedom I'm talking about is the freedom to design and implement such a third-party TAM without seeking
communityapproval of the TAM's merits. 

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company