Thread: Btree internal node data?

Btree internal node data?

From
Tatsuo Ishii
Date:
While looking into a btree internal page using pg_filedump against an
int4 index generated pgbench, I noticed that only item 2 has length 8,
which indicates that the index tuple has only tuple header and has no
index data. In my understanding this indicates that the item is used
to represent a down link to a page. Question is, why the item is 2,
not 1. I thought an index tuple indicating down link is always 1. Is
this a sign that something goes wrong?

Block    3 ********************************************************
<Header> -----Block Offset: 0x00006000         Offsets: Lower    1164 (0x048c)Block: Size 8192  Version    4
Upper   3624 (0x0e28)LSN:  logid      2 recoff 0x1550a608      Special  8176 (0x1ff0)Items:  285
FreeSpace: 2460Checksum: 0x0000  Prune XID: 0x00000000  Flags: 0x0000 ()Length (including item array): 1164
 

<Data> ------ Item   1 -- Length:   16  Offset: 3624 (0x0e28)  Flags: NORMALItem   2 -- Length:    8  Offset: 8168
(0x1fe8) Flags: NORMALItem   3 -- Length:   16  Offset: 8152 (0x1fd8)  Flags: NORMALItem   4 -- Length:   16  Offset:
8136(0x1fc8)  Flags: NORMALItem   5 -- Length:   16  Offset: 8120 (0x1fb8)  Flags: NORMAL
 
[snip]Item 281 -- Length:   16  Offset: 3704 (0x0e78)  Flags: NORMALItem 282 -- Length:   16  Offset: 3688 (0x0e68)
Flags:NORMALItem 283 -- Length:   16  Offset: 3672 (0x0e58)  Flags: NORMALItem 284 -- Length:   16  Offset: 3656
(0x0e48) Flags: NORMALItem 285 -- Length:   16  Offset: 3640 (0x0e38)  Flags: NORMAL
 

<Special Section> -----BTree Index Section: Flags: 0x0000 () Blocks: Previous (0)  Next (289)  Level (1)  CycleId (0)

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp



Re: Btree internal node data?

From
Peter Geoghegan
Date:
On Wed, Aug 27, 2014 at 7:08 PM, Tatsuo Ishii <ishii@postgresql.org> wrote:
> While looking into a btree internal page using pg_filedump against an
> int4 index generated pgbench, I noticed that only item 2 has length 8,
> which indicates that the index tuple has only tuple header and has no
> index data. In my understanding this indicates that the item is used
> to represent a down link to a page. Question is, why the item is 2,
> not 1. I thought an index tuple indicating down link is always 1. Is
> this a sign that something goes wrong?


No. On a non-rightmost page, the "high key" item is physically first
(which is a bit odd, because it serves as a high-bound invariant on
the items that the page stores, but it's convenient to do it that way
for other reasons). On an internal page (that is also non-rightmost),
the second item (which is the first "real" item - i.e. the item which
P_FIRSTDATAKEY() returns) is just placeholder garbage. The reason for
that is noted above _bt_compare():
* CRUCIAL NOTE: on a non-leaf page, the first data key is assumed to be* "minus infinity": this routine will always
claimit is less than the* scankey.  The actual key value stored (if any, which there probably isn't)* does not matter.
Thisconvention allows us to implement the Lehman and* Yao convention that the first down-link pointer is before the
firstkey.* See backend/access/nbtree/README for details.
 

-- 
Peter Geoghegan