Thread: What is an item pointer, anyway?

What is an item pointer, anyway?

From
Peter Geoghegan
Date:
itemid.h introduces the struct ItemIdData as follows:

/*
 * An item pointer (also called line pointer) on a buffer page

Meanwhile, itemptr.h introduces the struct ItemPointerData as follows:

/*
 * ItemPointer:
 *
 * This is a pointer to an item within a disk page of a known file
 * (for example, a cross-link from an index to its parent table).

It doesn't seem reasonable to assume that you should know the
difference based on context. The two concepts are closely related. An
ItemPointerData points to a block, as well as the, uh, item pointer
within that block.

This ambiguity is avoidable, and should be avoided. ISTM that the
least confusing way of removing the ambiguity would be to no longer
refer to ItemIds as item pointers, without changing anything else.

-- 
Peter Geoghegan



Re: What is an item pointer, anyway?

From
Ashwin Agrawal
Date:

On Fri, Apr 26, 2019 at 2:19 PM Peter Geoghegan <pg@bowt.ie> wrote:
itemid.h introduces the struct ItemIdData as follows:

/*
 * An item pointer (also called line pointer) on a buffer page

Meanwhile, itemptr.h introduces the struct ItemPointerData as follows:

/*
 * ItemPointer:
 *
 * This is a pointer to an item within a disk page of a known file
 * (for example, a cross-link from an index to its parent table).

It doesn't seem reasonable to assume that you should know the
difference based on context. The two concepts are closely related. An
ItemPointerData points to a block, as well as the, uh, item pointer
within that block.

This ambiguity is avoidable, and should be avoided.

Agree.
 
ISTM that the
least confusing way of removing the ambiguity would be to no longer
refer to ItemIds as item pointers, without changing anything else.

How about we rename ItemPointerData to TupleIdentifier or ItemIdentifier instead and leave ItemPointer or Item confined to AM term, where item can be tuple, datum or anything else ?

Re: What is an item pointer, anyway?

From
Peter Geoghegan
Date:
On Fri, Apr 26, 2019 at 4:23 PM Ashwin Agrawal <aagrawal@pivotal.io> wrote:
> How about we rename ItemPointerData to TupleIdentifier or ItemIdentifier instead and leave ItemPointer or Item
confinedto AM term, where item can be tuple, datum or anything else ?
 

I'm not a fan of that idea, because the reality is that an
ItemPointerData is quite explicitly supposed to be a physiological
identifier (TID) used by heapam, or a similar heap-like access method
such as zheap. This is baked into a number of things.

The limitation that pluggable storage engines have to work in terms of
item pointers is certainly a problem, especially for things like the
Zedstore column store project you're working on. However, I suspect
that that problem is best solved by accommodating other types of
identifiers that don't work like TIDs.

I understand why you've adopted ItemPointerData as a fully-logical
identifier in your prototype, but it's not a great long term solution.

-- 
Peter Geoghegan



Re: What is an item pointer, anyway?

From
Tom Lane
Date:
Ashwin Agrawal <aagrawal@pivotal.io> writes:
> On Fri, Apr 26, 2019 at 2:19 PM Peter Geoghegan <pg@bowt.ie> wrote:
>> ISTM that the
>> least confusing way of removing the ambiguity would be to no longer
>> refer to ItemIds as item pointers, without changing anything else.

How many places would we be changing to clean that up?

> How about we rename ItemPointerData to TupleIdentifier or ItemIdentifier
> instead and leave ItemPointer or Item confined to AM term, where item can
> be tuple, datum or anything else ?

There's half a thousand references to ItemPointer[Data] in our
sources, and probably tons more in external modules.  I'm *not*
in favor of renaming it.

ItemId[Data] is somewhat less widely referenced, but I'm still not
much in favor of renaming that type.  I think fixing comments to
uniformly call it an item ID would be more reasonable.  (We should
leave the "line pointer" terminology in place, too; if memory serves,
an awful lot of variables of the type are named "lp" or variants.
Renaming all of those is to nobody's benefit.)

            regards, tom lane



Re: What is an item pointer, anyway?

From
Peter Geoghegan
Date:
On Fri, Apr 26, 2019 at 4:57 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> ItemId[Data] is somewhat less widely referenced, but I'm still not
> much in favor of renaming that type.  I think fixing comments to
> uniformly call it an item ID would be more reasonable.  (We should
> leave the "line pointer" terminology in place, too; if memory serves,
> an awful lot of variables of the type are named "lp" or variants.
> Renaming all of those is to nobody's benefit.)

I was proposing that we not rename any struct at all, and continue to
call ItemId[Data]s "line pointers" only.  This would involve removing
the comment in itemid.h that confusingly refers to line pointers as
"item pointers" (plus any other comments that fail to make a clear
distinction).

I think that the total number of comments that would be affected by
this approach is quite low.

-- 
Peter Geoghegan



Re: What is an item pointer, anyway?

From
Tom Lane
Date:
Peter Geoghegan <pg@bowt.ie> writes:
> I was proposing that we not rename any struct at all, and continue to
> call ItemId[Data]s "line pointers" only.

Yeah, I'd be fine with that, although the disconnect between the type
name and the comment terminology might confuse some people.

            regards, tom lane



Re: What is an item pointer, anyway?

From
Peter Geoghegan
Date:
On Fri, Apr 26, 2019 at 5:05 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Yeah, I'd be fine with that, although the disconnect between the type
> name and the comment terminology might confuse some people.

Maybe, but the fact that the ItemIdData struct consists of bit fields
that are all named "lp_*" offers a hint. Plus you have the LP_*
constants that get stored in ItemIdData.lp_flags.

I wouldn't call the struct ItemIdData if I was in a green field
situation, but it doesn't seem too bad under the present
circumstances. I'd rather not change the struct's name, because that
would probably cause problems without any real benefit. OTOH, calling
two closely related but distinct things by the same name is atrocious.

-- 
Peter Geoghegan



Re: What is an item pointer, anyway?

From
Peter Geoghegan
Date:
On Fri, Apr 26, 2019 at 5:13 PM Peter Geoghegan <pg@bowt.ie> wrote:
> On Fri, Apr 26, 2019 at 5:05 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Yeah, I'd be fine with that, although the disconnect between the type
> > name and the comment terminology might confuse some people.
>
> Maybe, but the fact that the ItemIdData struct consists of bit fields
> that are all named "lp_*" offers a hint. Plus you have the LP_*
> constants that get stored in ItemIdData.lp_flags.

Attached draft patch adjusts code comments and error messages where a
line pointer is referred to as an item pointer. It turns out that this
practice isn't all that prevalent. Here are some specific concerns
that I had to think about when writing the patch, though:

* I ended up removing a big indexam.c "old comments" comment paragraph
from the Berkeley days, because it used the term item pointer in what
seemed like the wrong way, but also because AFAICT it's totally
obsolete.

* Someone should confirm that I have preserved the original intent of
the changes within README.HOT, and the heapam changes that relate to
pruning. It's possible that I changed "item pointer" to "line pointer"
in one or two places where I should have changed "item pointer" to
"tuple" instead.

* I changed a few long standing "can't happen" error messages that
concern corruption, most of which also relate to pruning. Maybe that's
a cost that needs to be considered.

* I changed a lazy_scan_heap() log message of long-standing. Another
downside that needs to be considered.

* I expanded a little on the advantages of using line pointers within
bufpage.h. That seemed in scope to me, because it drives home the
distinction between item pointers and line pointers.

-- 
Peter Geoghegan

Attachment

Re: What is an item pointer, anyway?

From
Peter Geoghegan
Date:
On Sun, May 5, 2019 at 1:14 PM Peter Geoghegan <pg@bowt.ie> wrote:
> Attached draft patch adjusts code comments and error messages where a
> line pointer is referred to as an item pointer. It turns out that this
> practice isn't all that prevalent. Here are some specific concerns
> that I had to think about when writing the patch, though:

Ping? Any objections to pushing ahead with this?

-- 
Peter Geoghegan



Re: What is an item pointer, anyway?

From
Tom Lane
Date:
Peter Geoghegan <pg@bowt.ie> writes:
> On Sun, May 5, 2019 at 1:14 PM Peter Geoghegan <pg@bowt.ie> wrote:
>> Attached draft patch adjusts code comments and error messages where a
>> line pointer is referred to as an item pointer. It turns out that this
>> practice isn't all that prevalent. Here are some specific concerns
>> that I had to think about when writing the patch, though:

> Ping? Any objections to pushing ahead with this?

Patch looks fine to me.  One minor quibble: in pruneheap.c you have

 /*
- * Prune specified item pointer or a HOT chain originating at that item.
+ * Prune specified line pointer or a HOT chain originating at that item.
  *
  * If the item is an index-referenced tuple (i.e. not a heap-only tuple),

Should "that item" also be re-worded, for consistency?

            regards, tom lane



Re: What is an item pointer, anyway?

From
Peter Geoghegan
Date:
On Mon, May 13, 2019 at 12:38 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>  /*
> - * Prune specified item pointer or a HOT chain originating at that item.
> + * Prune specified line pointer or a HOT chain originating at that item.
>   *
>   * If the item is an index-referenced tuple (i.e. not a heap-only tuple),
>
> Should "that item" also be re-worded, for consistency?

Yes, it should be -- I'll fix it.

I'm going to backpatch the storage.sgml change on its own, while
pushing everything else in a separate HEAD-only commit.

Thanks
-- 
Peter Geoghegan