Re: pgsql: Compute XID horizon for page level index vacuum on primary. - Mailing list pgsql-committers

From Simon Riggs
Subject Re: pgsql: Compute XID horizon for page level index vacuum on primary.
Date
Msg-id CANP8+jLEy2vMFcSG8P=vFT4F5Loiw4OkD-uZ65cwmDa_C+fK7w@mail.gmail.com
Whole thread Raw
In response to Re: pgsql: Compute XID horizon for page level index vacuum onprimary.  (Andres Freund <andres@anarazel.de>)
List pgsql-committers
On Fri, 29 Mar 2019 at 16:32, Andres Freund <andres@anarazel.de> wrote:
On 2019-03-29 16:20:54 +0000, Simon Riggs wrote:
> On Fri, 29 Mar 2019 at 16:12, Andres Freund <andres@anarazel.de> wrote:
>
>
> > On 2019-03-29 15:58:14 +0000, Simon Riggs wrote:
> > > On Fri, 29 Mar 2019 at 15:29, Andres Freund <andres@anarazel.de> wrote:
> > > > That's far from a trivial feature imo. It seems quite possible that
> > we'd
> > > > end up with increased overhead, because the current logic can get away
> > > > with only doing hint bit style writes - but would that be true if we
> > > > started actually replacing the item pointers? Because I don't see any
> > > > guarantee they couldn't cross a page boundary etc? So I think we'd need
> > > > to do WAL logging during index searches, which seems prohibitively
> > > > expensive.
> > > >
> > >
> > > Don't see that.
> > >
> > > I was talking about reusing the first 4 bytes of an index tuple's
> > > ItemPointerData,
> > > which is the first field of an index tuple. Index tuples are MAXALIGNed,
> > so
> > > I can't see how that would ever cross a page boundary.
> >
> > They're 8 bytes, and MAXALIGN often is 4 bytes:
> >
>
> xids are 4 bytes, so we're good.

I literally went on to explain why that's not sufficient? You can't
*just* replace the block number with an xid. You *also* need to set a
flag denoting that it's an xid and dead now. Which can't fit in the same
4 bytes.  You either have to set it in the IndexTuple's t_info, or or in
the ItemIdData's lp_flags. And both can be on a different sectors.  If
the flag persists, and the xid doesn't you're going to interpret a block
number as an xid - not great; but even worse, if the xid survives, but
the flag doesn't, you're going to access the xid as a block - definitely
not ok, because you're going to return wrong results.

Yes, I agree, I was thinking the same thing after my last comment, but was afk. The idea requires the atomic update of at least 4 bytes plus at least 1 bit and so would require at least 8byte MAXALIGN to be useful. Your other points suggesting that multiple things all need to be set accurately for this to work aren't correct. The idea was that we would write a hint that would avoid later I/O, so the overall idea is still viable.

Anyway, thinking some more, I think the whole idea of generating lastRemovedXid is moot and there are better ways in the future of doing this that would avoid a performance issue altogether. Clearly not PG12.

The main issue relates to the potential overhead of moving this to the master. I agree its the right thing to do, but we should have some way of checking it is not a performance issue in practice.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

pgsql-committers by date:

Previous
From: Michael Paquier
Date:
Subject: pgsql: Fix thinko in allocation call during MVC list deserialization
Next
From: Peter Eisentraut
Date:
Subject: pgsql: Catch syntax error in generated column definition