Re: Preventing indirection for IndexPageGetOpaque for known-size page special areas - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Preventing indirection for IndexPageGetOpaque for known-size page special areas
Date
Msg-id CA+TgmobQtf7hMNLtbPa9KUcsQ3oWquCL2OMzCcKPymTw8i1odg@mail.gmail.com
Whole thread Raw
In response to Re: Preventing indirection for IndexPageGetOpaque for known-size page special areas  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: Preventing indirection for IndexPageGetOpaque for known-size page special areas  (Peter Geoghegan <pg@bowt.ie>)
Re: Preventing indirection for IndexPageGetOpaque for known-size page special areas  (Matthias van de Meent <boekewurm+postgres@gmail.com>)
Re: Preventing indirection for IndexPageGetOpaque for known-size page special areas  (Bruce Momjian <bruce@momjian.us>)
List pgsql-hackers
On Thu, Apr 7, 2022 at 2:43 PM Peter Geoghegan <pg@bowt.ie> wrote:
> But if we were in a green-field situation we'd probably not want to
> use up several bytes for a nonse anyway. You said so yourself.

I don't know what statement of mine you're talking about here, and
while I don't love using up space for a nonce, it seems to be the way
this encryption stuff works. I don't see that there's a reasonable
alternative, green field or no.

> > I do understand that there are significant challenges and performance
> > concerns around having these kinds of initdb-controlled page layout
> > changes, so the future of that patch is unclear.
>
> Why does it need to be at initdb time?
>
> Though I cannot prove it, I suspect that the original intent of the
> special area was to support an additional (though typically small)
> variable length array, that works a little like the current line
> pointer array. This array would have to grow backwards (newer items
> get appended at earlier physical offsets), unlike our line pointer
> array (which gets appended to at the end, in the simple and obvious
> way). Growing backwards like this happens with DB systems, that store
> their line pointer array at the end of the page(the traditional
> approach from the System R days, I believe).
>
> Supporting a variable-length special area array like this would mean
> that any time you add a new item to the variable-sized array in the
> special area, the page's entire tuple space has to be memmove()'d
> backwards by a couple of bytes to create the required space. And so
> the relevant bufpage.c routine would have to adjust the whole line
> pointer array such that each lp_off received a compensating
> adjustment. The array might only be for some kind of page-level
> transaction metadata, something like that -- shifting it around is
> pretty expensive (reusing existing slots isn't too expensive, though).
>
> Why can't it work like that? You don't really need to build the full
> set of bufpage.c facilities (though it might not be a bad idea to
> fully support these variable-length arrays, which seem like they might
> come in handy). That seems perfectly compatible with what Matthias
> wants to do, provided we're willing to deem the special area struct
> (e.g. BTOpaque) as always coming "first" (which is essentially the
> same as his current proposal anyway). You can even do the same thing
> yourself for the nonse (use a fixed, known offset), with relatively
> modest effort. You'd need to have AM-specific knowledge (it would
> stack right on top of Matthias's technique), but that doesn't seem all
> that hard. There are plenty of remaining status bits in BTOpaque, and
> probably all other index AM special areas.

I'm not really following any of this. You seem to be arguing about
whether it's possible to change the length of the special space
*later* than initdb time. I agree that might have some use for some
purpose, but for encryption it's not necessarily all that helpful
because you have to be able to find the nonce on the page before
you've decrypted it. If you don't know whether there's a nonce or
where it's located, you can't do that. What Matthias and I were
discussing is whether you have to make a decision about appending
stuff to the special space *earlier* than initdb-time i.e. at compile
time.

My position is that if we need some space in every page to put a
nonce, the best place to put it is at the very end of the page, within
the special space and after anything else that is stored in the
special space. Code that only manipulates the line pointer array and
tuple data won't care, because pd_special will just be a bit smaller
than it would otherwise have been, and none of that code looks at any
byte offset >= pd_special. Code that looks at the special space won't
care either, as long as it uses PageGetSpecialPointer to find the
data, and doesn't examine how large the special space actually is.
That corresponds pretty well to how existing users of the special
space work, so it seems pretty good.

If we *didn't* put the nonce at the end of the page, where else would
we put it? It has to be at a fixed offset, because otherwise you can't
find it without decrypting the page first, which would be circular.
You could put it at the beginning of the page, or after the page
header and before the line pointer array, but either of those things
seem likely to affect a lot more code, because there's a lot more
stuff that accesses the line pointer array than the special space.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: test/isolation/expected/stats_1.out broken for me
Next
From: Peter Geoghegan
Date:
Subject: Re: Preventing indirection for IndexPageGetOpaque for known-size page special areas