Re: MaxOffsetNumber for Table AMs - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: MaxOffsetNumber for Table AMs
Date
Msg-id CAH2-WznmdB4AUx-KhiSGFbtyr0X5gEoB1AzzAQV2t69o1Krm_w@mail.gmail.com
Whole thread Raw
In response to Re: MaxOffsetNumber for Table AMs  (Andres Freund <andres@anarazel.de>)
Responses Re: MaxOffsetNumber for Table AMs
List pgsql-hackers
On Mon, May 3, 2021 at 10:01 PM Andres Freund <andres@anarazel.de> wrote:
> > For example, the TIDs should always work like unsigned integers -- the
> > table AM must be willing to work with that restriction.
>
> Isn't that more a question of the encoding than the concrete representation?

I don't think so, no. How does B-Tree deduplication work without
something like that? The fact of the matter is that things are very
tightly coupled in all kinds of ways. I'm all for decoupling them to
the extent required to facilitate a new and useful table AM. But I am
unlikely to commit to months of work based on abstract arguments and
future work. I think that you'll find that I'm not the only one that
sees it that way.

> > You'd then have posting lists tuples in nbtree whose TIDs were all
> > either 6 bytes or 8 bytes wide, with a mix of each possible (though
> > not particularly likely) on the same leaf page. Say when you have a
> > table that exceeds the current MaxBlockNumber restrictions. It would
> > be relatively straightforward for nbtree deduplication to simply
> > refuse to mix 6 byte and 8 byte datums together to avoid complexity in
> > boundary cases. The deduplication pass logic has the flexibility that
> > this requires already.
>
> Which nbtree cases do you think would have an easier time supporting
> switching between 6 or 8 byte tids than supporting fully variable width
> tids?  Given that IndexTupleData already is variable-width, it's not
> clear to me why supporting two distinct sizes would be harder than a
> fully variable size?  I assume it's things like BTDedupState->htids?

Stuff like that, yeah. The space utilization stuff inside
nbtsplitloc.c and nbtdedup.c pretty much rests on the assumption that
TIDs are fixed width. Obviously there are some ways in which that
could be revised if there was a really good reason to do so -- like an
actual concrete reason with some clear basis in reality. You have no
obligation to make me happy, but FYI I find arguments like "but why
wouldn't you just allow arbitrary-width TIDs?" to be deeply
unconvincing. Do you really expect me to do a huge amount of work and
risk a lot of new bugs, just to facilitate something that may or may
not ever happen? Would you do that if you were in my position?

> I don't think anybody is arguing that AMs cannot accept any restrictions? I do
> think it's pretty clear that it's not entirely obvious what the concrete set
> of proper restrictions would be, where we won't end up needing to re-evaluate
> limits in a few years are.

I'm absolutely fine with the fact that the table AM has these issues
-- I would expect it. I would like to help! I just find these wildly
abstract discussions to be close to a total waste of time. The idea
that we should let a thousand table AM flowers bloom and then review
what to do seems divorced from reality. Even if the table AM becomes
wildly successful there will still only have been maybe 2 - 4 table
AMs that ever really had a chance. Supposing that we have no idea what
they could possibly look like just yet is just navel gazing.

> If you add to that the fact that variable-width tids will often end up
> considerably smaller than our current tids, it's not obvious why we should use
> bitspace somewhere to indicate an 8 byte tid instead of a a variable-width
> tid?

It's not really the space overhead. It's the considerable complexity
that it would add.

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Justin Pryzby
Date:
Subject: Re: PG in container w/ pid namespace is init, process exits cause restart
Next
From: Andres Freund
Date:
Subject: Re: [PATCH] Identify LWLocks in tracepoints