Re: [HACKERS] Restrict concurrent update/delete with UPDATE ofpartition key - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: [HACKERS] Restrict concurrent update/delete with UPDATE ofpartition key |
Date | |
Msg-id | 20180405014439.fbezvbjrmcw64vjc@alap3.anarazel.de Whole thread Raw |
In response to | Re: [HACKERS] Restrict concurrent update/delete with UPDATE ofpartition key (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: [HACKERS] Restrict concurrent update/delete with UPDATE ofpartition key
Re: [HACKERS] Restrict concurrent update/delete with UPDATE ofpartition key Re: [HACKERS] Restrict concurrent update/delete with UPDATE ofpartition key |
List | pgsql-hackers |
Hi, On 2018-04-02 11:26:38 -0400, Robert Haas wrote: > On Wed, Mar 28, 2018 at 2:12 PM, Andres Freund <andres@anarazel.de> wrote: > > How will it break it? They'll see an invalid ctid and conclude that the > > tuple is dead? Without any changes that's already something that can > > happen if a later tuple in the chain has been pruned away. Sure, that > > code won't realize it should error out because the tuple is now in a > > different partition, but neither would a infomask bit. > > > > I think my big problem is that I just don't see what the worst that can > > happen is. We'd potentially see a broken ctid chain, something that very > > commonly happens, and consider the tuple to be invisible. That seems > > pretty sane behaviour for unadapted code, and not any worse than other > > potential solutions. > > This is more or less my feeling as well. I think it's better to > conserve our limited supply of infomask bits as much as we can, and I > do think that we should try to reclaimed HEAP_MOVED_IN and > HEAP_MOVED_OFF in the future instead of defining the combination of > the two of them to mean something now. Yep. It'd also make locking more complicated or require to keep more information around in HeapUpdateFailureData. In a number of places we currently release the buffer pin before switching over heap_lock_tuple etc results, or there's not even a way to get at the infomask currently (heap_update failing). > Modulo implementation quality, I think the risk > level of this patch is somewhat but not vastly higher than > 37484ad2aacef5ec794f4dd3d5cf814475180a78, which similarly defined a > previously-unused bit pattern in the tuple header. Personally I think that change was vastly riskier, because it affected freezing and wraparounds. Which is something we've repeatedly gotten wrong. > The reason I think this one might be somewhat riskier is because > AFAICS it's not so easy to make sure we've found all the code, even in > core, that might care, as it was in that case; and also because > updates happen more than freezing. Butthe consequences of not catching a changed piece of code are fairly harmless. And I'd say things that happen more often are actually easier to validate than something that with default settings requires hours of testing... I've attached a noticeably editorialized patch: - I'm uncomfortable with the "moved" information not being crash-safe / replicated. Thus I added a new flag to preserve it, and removed the masking of the moved bit in the ctid from heap_mask(). - renamed macros to not mention valid / invalid block numbers, but rather HeapTupleHeaderSetMovedPartitions / HeapTupleHeaderIndicatesMovedPartitions and ItemPointerSetMovedPartitions / ItemPointerIndicatesMovedPartitions I'm not wedded to these names, but I'l be adamant they they're not talking about invalid block numbers. Makes code harder to understand imo. - removed new assertion from heap_get_latest_tid(), it's wrong for the case where all row versions are invisible. - editorialized comments a bit - added a few more assertions I went through the existing code to make sure that a) no checks where missed b) to evaluate what the consequences when chasing chains would be c) to evaluate what the consequences when we miss erroring out WRT b), it's usually just superflous extra work if the new checks weren't there. I went through all callers accessing xmax (via GetRawXmax and GetUpdateXid): b) - heap rewrites will keep a tuple in hashtable till end of run, then reset the ctid to self. No real corruption, but we'd not detect further errors when attempting to follow chain. - EvalPlanQualFetch would fail to abort loop, attempt to fetch tuple. This'll extend the relation by a single page, because P_NEW == InvalidBlockNumber. - heap_prune_chain - no changes needed (delete isn't recursed through) - heap_get_root_tuples - same - heap_hot_search_buffer - only continues over hot updates - heap_lock_tuple (and subsidiary routines) - same as EvalPlanQualFetch, would then return HeapTupleUpdated. c) - GetTupleForTrigger - the proper error wouldn't be raised, instead a NULL tuple would be passed to the trigger - EvalPlanQualFetch - a NULL tuple would be returned after the consequences above - RelationFindReplTupleBy* - wrong error message - ExecLockRows - no error would be raised, continue normally - ExecDelete() - tuple ignored without error - ExecUpdate() - same Questions: - I'm not perfectly happy with "tuple to be locked was already moved to another partition due to concurrent update" as the error message. If somebody has a better suggestions. - should heap_get_latest_tid() error out when the chain ends in a moved tuple? I personally think it doesn't matter much, the functionality is so bonkers and underspecified that it doesn't matter anyway ;) - I'm not that happy with the number of added spec test files with number postfixes. Can't we combine them into a single file? - as remarked elsewhere on this thread, I think the used errcode should be a serialization failure Greetings, Andres Freund
Attachment
pgsql-hackers by date: