Re: MaxOffsetNumber for Table AMs - Mailing list pgsql-hackers

From Matthias van de Meent
Subject Re: MaxOffsetNumber for Table AMs
Date
Msg-id CAEze2Wit1EtHHBHJ+CYvBPthrWUzu2Vqc-BmzU3ApK3iotHriw@mail.gmail.com
Whole thread Raw
In response to MaxOffsetNumber for Table AMs  (Jeff Davis <pgsql@j-davis.com>)
Responses Re: MaxOffsetNumber for Table AMs
List pgsql-hackers
On Fri, 30 Apr 2021, 09:46 Jeff Davis, <pgsql@j-davis.com> wrote:

The notion of TID is based on pages and line pointers, which makes
sense for heapam, but that's not likely to make sense for a custom
table AM.

The obvious answer is to make a simple mapping between a TID and
whatever makes sense to the AM (for the sake of discussion, let's say a
plain row number).

The most natural thing would be to say that we have 48 bits, so it can
just be a 48-bit number. Of course, there are some restrictions on
valid values that complicate this:

  * InvalidBlockNumber of 0xFFFFFFFF. Not a problem.
  * InvalidOffsetNumber of 0. Not a problem.
  * MaxOffsetNumber of 2048. Does this limit really apply to table AMs?

MaxOffsetNumber is not per se 2048. It is BLCKSZ / sizeof(ItemIdData), which is only 2048 for a 8kiB BLCKSZ. As we support BLCKSZ up to 32kiB, MaxOffsetNumber can be as large as 8196.
 
Other than that, I believe you've also missed the special offset numbers specified in itemptr.h (SpecTokenOffsetNumber and MovedPartitionsOffsetNumber). I am not well enough aware of the usage of these OffsetNumber values, but those might also be limiting the values any tableAM can use for their TIDs.

It just seems like it's used when scanning heap or index pages for
stack-allocated arrays. For a table AM it appears to waste 5 bits.

MaxOffsetNumber is used for postgres' Page layout, of which the MaxOffsetNumber is defined as how many item pointers could exist on a page, and AFAIK should be used for postgres' Page layout only. No thing can or should change that. If any code asserts limitations to the ip_posid of table tuples that could also not be tableam tuples, then I believe that is probably a mistake in postgres, and that should be fixed.

  * ginpostinglist.c:itemptr_to_uint64() seems to think 2047 is the max
offset number. Is this a bug? 

No. The documentation for that function explicitly mentions that these item pointers are optimized for storage when using the heap tableam, and that that code will be changed once there exist tableAMs with different TID / ip_posid constraints (see the comment on lines 32 and 33 of that file).

Note that the limiting number that itemptr_to_uint64 should mention for bit calculations is actually MaxHeaptuplesPerPage, which is about one seventh of MaxOffsetNumber. The resulting number of bits reserved is not a miscalculation though, because MaxHeaptuplesPerPage (for 32kiB BLCKSZ) requires the mentioned 11 bits, and adapting bit swizzling for multiple page sizes was apparently not considered worth the effort.

As a table AM author, I'd like to know what the real limits are so that
I can use whatever bits are available to map from TID to row number and
back, without worrying that something will break in the future. A few
possibilities:

  1. Keep MaxOffsetNumber as 2048 and fix itemptr_to_uint64().

I believe that this is the right way when there exist tableAMs that use those upper 5 bits.
 
  2. Change MaxOffsetNumber to 2047. This seems likely to break
extensions that rely on it.

If you're going to change MaxOffsetNumber, I believe that it's better to change it to ((BLCKSZ - sizeof(PageHeaderData)) / sizeof(ItemIdData)), as that is the maximum amount of ItemIds you could put on a Page that has no page opaque.

  3. Define MaxOffsetNumber as 65536 and introduce a new
MaxItemsPerPage as 2048 for the stack-allocated arrays. We'd still need
to fix itemptr_to_uint64().

I believe that that breaks more things than otherwise required. ip_posid is already limited to uint16, so I see no reason to add a constant that would assert that the value of any uint16 is less its max value plus one.


With regards,

Matthias van de Meent

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
Next
From: Patrik Novotny
Date:
Subject: Help needed with a reproducer for CVE-2020-25695 not based on REFRESH MATERIALIZED VIEW