Thread: 64 bit TID?
All,
I'm considering a new design for a specialized table am. It would simplify the design if TIDs grew forever and I didn't have to implement TID reuse logic.
The current 48 bit TID is big, but I can see extreme situations where it might not be quite big enough. If every row that gets updated needs a TID, and something like an IoT app is updating huge numbers of rows per second using multiple connections in parallel, there might be a problem. This is especially true if each connection requests a batch of TIDs and then doesn't use all of them.
Are there any plans in the works to widen the TID?
I saw some notes on this in the Zedstore project, but there hasn't been much activity in that project for almost a year.
Chris
--
Chris Cleveland
312-339-2677 mobile
On Mon, 13 Sept 2021 at 17:50, Chris Cleveland <ccleveland@dieselpoint.com> wrote: > > All, > > I'm considering a new design for a specialized table am. It would simplify the design if TIDs grew forever and I didn'thave to implement TID reuse logic. TID reuse logic also helps clean up index tuples for deleted table tuples. I would suggest to implement TID reuse logic if only to prevent indexes from growing indefinately (or TID limits reached, whichever first). > The current 48 bit TID is big, but I can see extreme situations where it might not be quite big enough. If every row thatgets updated needs a TID, and something like an IoT app is updating huge numbers of rows per second using multiple connectionsin parallel, there might be a problem. If your table contains such large amounts of (versions of) tuples, you might want to partition your table(s), as that allows the system to move some bits of tuple identification to the the relation identifier. > This is especially true if each connection requests a batch of TIDs and then doesn't use all of them. For the HeapAM this is never the case; TIDs cannot be allocated without use (albeit some may be used for rolled-back and thus dead tuples). > Are there any plans in the works to widen the TID? This was recently discussed here [0] as well, but to the best of my knowledge no material proposal to update the APIs has been suggested as of yet. Kind regards, Matthias van de Meent [0] https://www.postgresql.org/message-id/flat/0bbeb784050503036344e1f08513f13b2083244b.camel%40j-davis.com
> > Are there any plans in the works to widen the TID? > > This was recently discussed here [0] as well, but to the best of my > knowledge no material proposal to update the APIs has been suggested > as of yet. > > [0] https://www.postgresql.org/message-id/flat/0bbeb784050503036344e1f08513f13b2083244b.camel%40j-davis.com Wow, thank you, that is some thread. It discusses the issues thoroughly. As I see it, there are three options: 1. Make it possible to use the unused 5 bits in the existing TID scheme. The advantages: we get the full 48 bits, and it may not take a lot of work, and it makes Jeff Davis' work with Columnar easier. 2. Go to a flat 64-bit logical TID. The advantages: certain types of table AMs work better, including Columnar and LSM tree-based AMs (which I'm currently working on). 3. Go to a variable-length TID. The advantages: you can stuff any kind of payload into the TID, which would make clustered tables and certain fancy indexes easier, but would be far more work. I would contribute patches myself, but I'm not *yet* skilled enough in the ways of Postgres to do so. Questions: Would widening the existing ItemPointer to 64 bits now preclude a variable-length TID in the future? Or make it more difficult? How much work would it take? Since the thread ended in May, has the group reached any kind of consensus on the issue? -- Chris Cleveland 312-339-2677 mobile
On Mon, Sep 13, 2021 at 3:30 PM Chris Cleveland <ccleveland@dieselpoint.com> wrote: > Wow, thank you, that is some thread. It discusses the issues > thoroughly. If somebody wants to make TIDs (or some generalized TID-like thing that tableam knows about) into logical identifiers, then they must also answer the question: identifiers of what? TIDs from Postgres heapam identify a physical version, or perhaps a HOT chain -- which is not how TIDs work in other DB systems that use a heap structure. This is the only reason why we can mostly think of indexes as data structures that don't need to be involved in concurrency control. Postgres index access methods don't usually need to know anything about locks that protect the logical structure of the database. The option of just creating a new distinct TID (for the same logical row) buys us the ability to keep index access methods rather separate from everything else -- which helps with extensibility. No logical locks are required in Postgres. Complicated designs that bleed into other parts of the system (designs like ARIES/KVL and ARIES/IM) are unnecessary. > Questions: > > Would widening the existing ItemPointer to 64 bits now preclude a > variable-length TID in the future? Or make it more difficult? > > How much work would it take? If it was just a matter of changing the data structure then I think it would be far easier. -- Peter Geoghegan
On Mon, Sep 13, 2021 at 5:36 PM Peter Geoghegan <pg@bowt.ie> wrote: > If somebody wants to make TIDs (or some generalized TID-like thing > that tableam knows about) into logical identifiers, then they must > also answer the question: identifiers of what? > > TIDs from Postgres heapam identify a physical version, or perhaps a > HOT chain -- which is not how TIDs work in other DB systems that use a > heap structure. This is the only reason why we can mostly think of > indexes as data structures that don't need to be involved in > concurrency control. Postgres index access methods don't usually need > to know anything about locks that protect the logical structure of the > database. The 1993 paper "Options in Physical Database Design" gives a useful overview of the challenges here. Especially for an extensibile system like Postgres relative to a system with a traditional design implementing classic ARIES. I think that you need an ACM membership to get a copy. The relevant section starts out like this: """ Item Representation ------------------- Physical representation types for abstract data types is only slowly gaining research attention for object- oriented database systems but will likely become a very important tuning option. Examples include sets represented as bit maps, arrays, or lists and matrices represented densely or sparsely, by row or by column or as tiles, e.g. [MaV93]. The goal is to bring physical data independence to object-oriented and scientific databases and their applications. Physical pointers, references, or object identifiers to represent relationships support "navigation" through a database, which is very good for single-instance retrievals and often improves set matching, but also creates a new type of updates, structural updates, which may increase the complexity of concurrency control and recovery [CSL90, ChK84, RoR85, ShC90]. """ This seems to be a fundamental trade-off that is tied inextricably to the design of many other things. That doesn't stop anybody from creating a column store using the tableam. But it does mean that they will need to be very careful about defining what exact "logical vs physical vs physiological" tradeoff they've chosen. It's rather subtle stuff. -- Peter Geoghegan