Re: Lowering the ever-growing heap->pd_lower - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: Lowering the ever-growing heap->pd_lower |
Date | |
Msg-id | CAH2-WznKrvA0-ZUXCsTCb+2OTg-tOyK8z__vwq0fU7mmcrSc8A@mail.gmail.com Whole thread Raw |
Responses |
Re: Lowering the ever-growing heap->pd_lower
|
List | pgsql-hackers |
On Wed, Mar 10, 2021 at 6:01 AM Matthias van de Meent <boekewurm+postgres@gmail.com> wrote: > > The case I was concerned about back when is that there are various bits of > > code that may visit a page with a predetermined TID in mind to look at. > > An index lookup is an obvious example, and another one is chasing an > > update chain's t_ctid link. You might argue that if the tuple was dead > > enough to be removed, there should be no such in-flight references to > > worry about, but I think such an assumption is unsafe. There is not, for > > example, any interlock that ensures that a process that has obtained a TID > > from an index will have completed its heap visit before a VACUUM that > > subsequently removed that index entry also removes the heap tuple. > > I am aware of this problem. I will admit that I did not detected all > potentially problematic accesses, so I'll show you my work. Attached is a trivial rebase of your v3, which I've called v4. I am interested in getting this patch into Postgres 14. > In my search for problematic accesses, I make the following assumptions: > * PageRepairFragmentation as found in bufpage is only applicable to > heap pages; this function is not applied to other pages in core > postgres. So, any problems that occur are with due to access with an > OffsetNumber > PageGetMaxOffsetNumber. > * Items [on heap pages] are only extracted after using PageGetItemId > for that item on the page, whilst holding a lock. > The 3 problem cases were classified based on the origin of the > potentially invalid pointer. > > Problem case 1: table scan; heapam.c lines 678 and 811, in heapgettup I think that it boils down to this: 100% of the cases where this could be a problem all either involve old TIDs, or old line pointer -- in principle these could be invalidated in some way, like your backwards scan example. But that's it. Bear in mind that we always call PageRepairFragmentation() with a super-exclusive lock. > This is in the replay of transaction logs, in heap_xlog_freeze_page. > As I am unaware whether or not pages to which these transaction logs > are applied can contain changes from the xlog-generating instance, I > flagged this as a potential problem. The application of the xlogs is > probably safe (it assumes the existence of a HeapTupleHeader for that > ItemId), but my limited knowledge put this on the 'potential problem' > list. > > Please advise on this; I cannot find the right documentation Are you aware of wal_consistency_checking? I think that this should be fine. There are differences that are possible between a replica and primary, but they're very minor and don't seem relevant. > Problem case 3: invalid redirect pointers; pruneheap.c line 949, in > heap_get_root_tuples > > The heap pruning mechanism currently assumes that all redirects are > valid. Currently, if a HOT root points to !ItemIdIsNormal, it breaks > out of the loop, but doesn't actually fail. This code is already > potentially problematic because it has no bounds or sanity checking > for the nextoffnum, but with shrinking pd_linp it would also add the > failure mode of HOT tuples pointing into what is now arbitrary data. heap_prune_chain() is less trusting than heap_get_root_tuples(), though -- it doesn't trust redirects (because there is a generic offnum sanity check at the start of its loop). I think that the inconsistency between these two functions probably isn't hugely significant. Ideally it would be 100% clear which of the defenses in code like this is merely extra hardening. The assumptions should be formalized. There is nothing wrong with hardening, but we should know it when we see it. > This failure mode is now accompanied by an Assert, which fails when > the redirect is to an invalid OffsetNumber. This is not enough to not > exhibit arbitrary behaviour when accessing corrupted data with > non-assert builds, but I think that that is fine; we already do not > have a guaranteed behaviour for this failure mode. What about my "set would-be LP_UNUSED space to all-0x7F bytes in CLOBBER_FREED_MEMORY builds" idea? Did you think about that? -- Peter Geoghegan
Attachment
pgsql-hackers by date: