Thread: What is an item pointer, anyway?
itemid.h introduces the struct ItemIdData as follows: /* * An item pointer (also called line pointer) on a buffer page Meanwhile, itemptr.h introduces the struct ItemPointerData as follows: /* * ItemPointer: * * This is a pointer to an item within a disk page of a known file * (for example, a cross-link from an index to its parent table). It doesn't seem reasonable to assume that you should know the difference based on context. The two concepts are closely related. An ItemPointerData points to a block, as well as the, uh, item pointer within that block. This ambiguity is avoidable, and should be avoided. ISTM that the least confusing way of removing the ambiguity would be to no longer refer to ItemIds as item pointers, without changing anything else. -- Peter Geoghegan
On Fri, Apr 26, 2019 at 2:19 PM Peter Geoghegan <pg@bowt.ie> wrote:
itemid.h introduces the struct ItemIdData as follows:
/*
* An item pointer (also called line pointer) on a buffer page
Meanwhile, itemptr.h introduces the struct ItemPointerData as follows:
/*
* ItemPointer:
*
* This is a pointer to an item within a disk page of a known file
* (for example, a cross-link from an index to its parent table).
It doesn't seem reasonable to assume that you should know the
difference based on context. The two concepts are closely related. An
ItemPointerData points to a block, as well as the, uh, item pointer
within that block.
This ambiguity is avoidable, and should be avoided.
Agree.
ISTM that the
least confusing way of removing the ambiguity would be to no longer
refer to ItemIds as item pointers, without changing anything else.
How about we rename ItemPointerData to TupleIdentifier or ItemIdentifier instead and leave ItemPointer or Item confined to AM term, where item can be tuple, datum or anything else ?
On Fri, Apr 26, 2019 at 4:23 PM Ashwin Agrawal <aagrawal@pivotal.io> wrote: > How about we rename ItemPointerData to TupleIdentifier or ItemIdentifier instead and leave ItemPointer or Item confinedto AM term, where item can be tuple, datum or anything else ? I'm not a fan of that idea, because the reality is that an ItemPointerData is quite explicitly supposed to be a physiological identifier (TID) used by heapam, or a similar heap-like access method such as zheap. This is baked into a number of things. The limitation that pluggable storage engines have to work in terms of item pointers is certainly a problem, especially for things like the Zedstore column store project you're working on. However, I suspect that that problem is best solved by accommodating other types of identifiers that don't work like TIDs. I understand why you've adopted ItemPointerData as a fully-logical identifier in your prototype, but it's not a great long term solution. -- Peter Geoghegan
Ashwin Agrawal <aagrawal@pivotal.io> writes: > On Fri, Apr 26, 2019 at 2:19 PM Peter Geoghegan <pg@bowt.ie> wrote: >> ISTM that the >> least confusing way of removing the ambiguity would be to no longer >> refer to ItemIds as item pointers, without changing anything else. How many places would we be changing to clean that up? > How about we rename ItemPointerData to TupleIdentifier or ItemIdentifier > instead and leave ItemPointer or Item confined to AM term, where item can > be tuple, datum or anything else ? There's half a thousand references to ItemPointer[Data] in our sources, and probably tons more in external modules. I'm *not* in favor of renaming it. ItemId[Data] is somewhat less widely referenced, but I'm still not much in favor of renaming that type. I think fixing comments to uniformly call it an item ID would be more reasonable. (We should leave the "line pointer" terminology in place, too; if memory serves, an awful lot of variables of the type are named "lp" or variants. Renaming all of those is to nobody's benefit.) regards, tom lane
On Fri, Apr 26, 2019 at 4:57 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > ItemId[Data] is somewhat less widely referenced, but I'm still not > much in favor of renaming that type. I think fixing comments to > uniformly call it an item ID would be more reasonable. (We should > leave the "line pointer" terminology in place, too; if memory serves, > an awful lot of variables of the type are named "lp" or variants. > Renaming all of those is to nobody's benefit.) I was proposing that we not rename any struct at all, and continue to call ItemId[Data]s "line pointers" only. This would involve removing the comment in itemid.h that confusingly refers to line pointers as "item pointers" (plus any other comments that fail to make a clear distinction). I think that the total number of comments that would be affected by this approach is quite low. -- Peter Geoghegan
Peter Geoghegan <pg@bowt.ie> writes: > I was proposing that we not rename any struct at all, and continue to > call ItemId[Data]s "line pointers" only. Yeah, I'd be fine with that, although the disconnect between the type name and the comment terminology might confuse some people. regards, tom lane
On Fri, Apr 26, 2019 at 5:05 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Yeah, I'd be fine with that, although the disconnect between the type > name and the comment terminology might confuse some people. Maybe, but the fact that the ItemIdData struct consists of bit fields that are all named "lp_*" offers a hint. Plus you have the LP_* constants that get stored in ItemIdData.lp_flags. I wouldn't call the struct ItemIdData if I was in a green field situation, but it doesn't seem too bad under the present circumstances. I'd rather not change the struct's name, because that would probably cause problems without any real benefit. OTOH, calling two closely related but distinct things by the same name is atrocious. -- Peter Geoghegan
On Fri, Apr 26, 2019 at 5:13 PM Peter Geoghegan <pg@bowt.ie> wrote: > On Fri, Apr 26, 2019 at 5:05 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Yeah, I'd be fine with that, although the disconnect between the type > > name and the comment terminology might confuse some people. > > Maybe, but the fact that the ItemIdData struct consists of bit fields > that are all named "lp_*" offers a hint. Plus you have the LP_* > constants that get stored in ItemIdData.lp_flags. Attached draft patch adjusts code comments and error messages where a line pointer is referred to as an item pointer. It turns out that this practice isn't all that prevalent. Here are some specific concerns that I had to think about when writing the patch, though: * I ended up removing a big indexam.c "old comments" comment paragraph from the Berkeley days, because it used the term item pointer in what seemed like the wrong way, but also because AFAICT it's totally obsolete. * Someone should confirm that I have preserved the original intent of the changes within README.HOT, and the heapam changes that relate to pruning. It's possible that I changed "item pointer" to "line pointer" in one or two places where I should have changed "item pointer" to "tuple" instead. * I changed a few long standing "can't happen" error messages that concern corruption, most of which also relate to pruning. Maybe that's a cost that needs to be considered. * I changed a lazy_scan_heap() log message of long-standing. Another downside that needs to be considered. * I expanded a little on the advantages of using line pointers within bufpage.h. That seemed in scope to me, because it drives home the distinction between item pointers and line pointers. -- Peter Geoghegan
Attachment
On Sun, May 5, 2019 at 1:14 PM Peter Geoghegan <pg@bowt.ie> wrote: > Attached draft patch adjusts code comments and error messages where a > line pointer is referred to as an item pointer. It turns out that this > practice isn't all that prevalent. Here are some specific concerns > that I had to think about when writing the patch, though: Ping? Any objections to pushing ahead with this? -- Peter Geoghegan
Peter Geoghegan <pg@bowt.ie> writes: > On Sun, May 5, 2019 at 1:14 PM Peter Geoghegan <pg@bowt.ie> wrote: >> Attached draft patch adjusts code comments and error messages where a >> line pointer is referred to as an item pointer. It turns out that this >> practice isn't all that prevalent. Here are some specific concerns >> that I had to think about when writing the patch, though: > Ping? Any objections to pushing ahead with this? Patch looks fine to me. One minor quibble: in pruneheap.c you have /* - * Prune specified item pointer or a HOT chain originating at that item. + * Prune specified line pointer or a HOT chain originating at that item. * * If the item is an index-referenced tuple (i.e. not a heap-only tuple), Should "that item" also be re-worded, for consistency? regards, tom lane
On Mon, May 13, 2019 at 12:38 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > /* > - * Prune specified item pointer or a HOT chain originating at that item. > + * Prune specified line pointer or a HOT chain originating at that item. > * > * If the item is an index-referenced tuple (i.e. not a heap-only tuple), > > Should "that item" also be re-worded, for consistency? Yes, it should be -- I'll fix it. I'm going to backpatch the storage.sgml change on its own, while pushing everything else in a separate HEAD-only commit. Thanks -- Peter Geoghegan