Re: MaxOffsetNumber for Table AMs - Mailing list pgsql-hackers

From Andres Freund
Subject Re: MaxOffsetNumber for Table AMs
Date
Msg-id 20210504050142.bhpoff7rsdpacnrq@alap3.anarazel.de
Whole thread Raw
In response to Re: MaxOffsetNumber for Table AMs  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: MaxOffsetNumber for Table AMs  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-hackers
Hi,

On 2021-04-30 11:51:07 -0700, Peter Geoghegan wrote:
> I think that it's reasonable to impose some cost on index AMs here,
> but that needs to be bounded sensibly and unambiguously. For example,
> it would probably be okay if you had either 6 byte or 8 byte TIDs, but
> no other variations. You could require index AMs (the subset of index
> AMs that are ever able to store 8 byte TIDs) to directly encode which
> width they're dealing with at the level of each IndexTuple. That would
> create some problems for nbtree deduplication, especially in boundary
> cases, but ISTM that you can manage the complexity by sensibly
> restricting how the TIDs work across the board.

> For example, the TIDs should always work like unsigned integers -- the
> table AM must be willing to work with that restriction.

Isn't that more a question of the encoding than the concrete representation?


> You'd then have posting lists tuples in nbtree whose TIDs were all
> either 6 bytes or 8 bytes wide, with a mix of each possible (though
> not particularly likely) on the same leaf page. Say when you have a
> table that exceeds the current MaxBlockNumber restrictions. It would
> be relatively straightforward for nbtree deduplication to simply
> refuse to mix 6 byte and 8 byte datums together to avoid complexity in
> boundary cases. The deduplication pass logic has the flexibility that
> this requires already.

Which nbtree cases do you think would have an easier time supporting
switching between 6 or 8 byte tids than supporting fully variable width
tids?  Given that IndexTupleData already is variable-width, it's not
clear to me why supporting two distinct sizes would be harder than a
fully variable size?  I assume it's things like BTDedupState->htids?



> > What's wrong with varlena headers? It would end up being a 1-byte
> > header in practically every case, and no variable-width representation
> > can do without a length word of some sort. I'm not saying varlena is
> > as efficient as some new design could hypothetically be, but it
> > doesn't seem like it'd be a big enough problem to stress about. If you
> > used a variable-width representation for integers, you might actually
> > save bytes in a lot of cases. An awful lot of the TIDs people store in
> > practice probably contain several zero bytes, and if we make them
> > wider, that's going to be even more true.
> 
> Maybe all of this is true, and maybe it works out to be the best path
> forward in the long term, all things considered. But whether or not
> that's true is crucially dependent on what real practical table AMs
> (of which there will only ever be a tiny number) actually need to do.
> Why should we assume that the table AM cannot accept some
> restrictions? What good does it do to legalistically define the
> problem as a problem for index AMs to solve?

I don't think anybody is arguing that AMs cannot accept any restrictions? I do
think it's pretty clear that it's not entirely obvious what the concrete set
of proper restrictions would be, where we won't end up needing to re-evaluate
limits in a few years are.

If you add to that the fact that variable-width tids will often end up
considerably smaller than our current tids, it's not obvious why we should use
bitspace somewhere to indicate an 8 byte tid instead of a a variable-width
tid?

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: AlterSubscription_refresh "wrconn" wrong variable?
Next
From: Peter Smith
Date:
Subject: Re: AlterSubscription_refresh "wrconn" wrong variable?