Home > mailing lists

Re: Lowering the ever-growing heap->pd_lower - Mailing list pgsql-hackers

From	Peter Geoghegan
Subject	Re: Lowering the ever-growing heap->pd_lower
Date	March 31, 2021 03:35:33
Msg-id	CAH2-WznKrvA0-ZUXCsTCb+2OTg-tOyK8z__vwq0fU7mmcrSc8A@mail.gmail.com Whole thread
Responses	Re: Lowering the ever-growing heap->pd_lower
List	pgsql-hackers

Tree view

On Wed, Mar 10, 2021 at 6:01 AM Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:
> > The case I was concerned about back when is that there are various bits of
> > code that may visit a page with a predetermined TID in mind to look at.
> > An index lookup is an obvious example, and another one is chasing an
> > update chain's t_ctid link.  You might argue that if the tuple was dead
> > enough to be removed, there should be no such in-flight references to
> > worry about, but I think such an assumption is unsafe.  There is not, for
> > example, any interlock that ensures that a process that has obtained a TID
> > from an index will have completed its heap visit before a VACUUM that
> > subsequently removed that index entry also removes the heap tuple.
>
> I am aware of this problem. I will admit that I did not detected all
> potentially problematic accesses, so I'll show you my work.

Attached is a trivial rebase of your v3, which I've called v4. I am
interested in getting this patch into Postgres 14.

> In my search for problematic accesses, I make the following assumptions:
> * PageRepairFragmentation as found in bufpage is only applicable to
> heap pages; this function is not applied to other pages in core
> postgres. So, any problems that occur are with due to access with an
> OffsetNumber > PageGetMaxOffsetNumber.
> * Items [on heap pages] are only extracted after using PageGetItemId
> for that item on the page, whilst holding a lock.

> The 3 problem cases were classified based on the origin of the
> potentially invalid pointer.
>
> Problem case 1: table scan; heapam.c lines 678 and 811, in heapgettup

I think that it boils down to this: 100% of the cases where this could
be a problem all either involve old TIDs, or old line pointer -- in
principle these could be invalidated in some way, like your backwards
scan example. But that's it. Bear in mind that we always call
PageRepairFragmentation() with a super-exclusive lock.

> This is in the replay of transaction logs, in heap_xlog_freeze_page.
> As I am unaware whether or not pages to which these transaction logs
> are applied can contain changes from the xlog-generating instance, I
> flagged this as a potential problem. The application of the xlogs is
> probably safe (it assumes the existence of a HeapTupleHeader for that
> ItemId), but my limited knowledge put this on the 'potential problem'
> list.
>
> Please advise on this; I cannot find the right documentation

Are you aware of wal_consistency_checking?

I think that this should be fine. There are differences that are
possible between a replica and primary, but they're very minor and
don't seem relevant.

> Problem case 3: invalid redirect pointers; pruneheap.c line 949, in
> heap_get_root_tuples
>
> The heap pruning mechanism currently assumes that all redirects are
> valid. Currently, if a HOT root points to !ItemIdIsNormal, it breaks
> out of the loop, but doesn't actually fail. This code is already
> potentially problematic because it has no bounds or sanity checking
> for the nextoffnum, but with shrinking pd_linp it would also add the
> failure mode of HOT tuples pointing into what is now arbitrary data.

heap_prune_chain() is less trusting than heap_get_root_tuples(),
though -- it doesn't trust redirects (because there is a generic
offnum sanity check at the start of its loop). I think that the
inconsistency between these two functions probably isn't hugely
significant.

Ideally it would be 100% clear which of the defenses in code like this
is merely extra hardening. The assumptions should be formalized. There
is nothing wrong with hardening, but we should know it when we see it.

> This failure mode is now accompanied by an Assert, which fails when
> the redirect is to an invalid OffsetNumber. This is not enough to not
> exhibit arbitrary behaviour when accessing corrupted data with
> non-assert builds, but I think that that is fine; we already do not
> have a guaranteed behaviour for this failure mode.

What about my "set would-be LP_UNUSED space to all-0x7F bytes in
CLOBBER_FREED_MEMORY builds" idea? Did you think about that?

--
Peter Geoghegan

Attachment

v4-0001-Truncate-a-pages-line-pointer-array-when-it-has-t.patch

pgsql-hackers by date:

From: Greg Rychlewski
Date: 31 March 2021, 03:29:17
Subject: Re: DROP INDEX docs - explicit lock naming

From: vignesh C
Date: 31 March 2021, 03:38:25
Subject: Re: extra semicolon in postgres_fdw test cases

Re: Lowering the ever-growing heap->pd_lower - Mailing list pgsql-hackers

Attachment

Previous

Next