Correct docs about GiST leaf page structure - Mailing list pgsql-docs

From Paul A Jungwirth
Subject Correct docs about GiST leaf page structure
Date
Msg-id CA+renyWs5Np+FLSYfL+eu20S4U671A3fQGb-+7e22HLrD1NbYw@mail.gmail.com
Whole thread Raw
List pgsql-docs
Our docs for GiST indexes say the compress function is only used for
internal pages, not leaf pages, but actually it is used everywhere.
Here are two patches to clean things up.

You can see that we store compressed values with the pageinspect
extension. For instance, multiranges are compressed to ranges. Here
they are in leaf pages:

[v19devel:5432][314069] postgres=# create table mr (id int4multirange);
CREATE TABLE
[v19devel:5432][314069] postgres=# create index idx_mr on mr using gist (id);
CREATE INDEX
[v19devel:5432][314069] postgres=# insert into mr values ('{[1,2)}'),
('{[2,3)}');
INSERT 0 2
[v19devel:5432][314069] postgres=# select * from
gist_page_opaque_info(get_raw_page('idx_mr', 0));
    lsn     |    nsn     | rightlink  | flags
------------+------------+------------+--------
 0/01917320 | 0/00000000 | 4294967295 | {leaf}
(1 row)

[v19devel:5432][314069] postgres=# select * from
gist_page_items(get_raw_page('idx_mr', 0), 'idx_mr');
 itemoffset | ctid  | itemlen | dead |      keys
------------+-------+---------+------+----------------
          1 | (0,1) |      24 | f    | (id)=("[1,2)")
          2 | (0,2) |      24 | f    | (id)=("[2,3)")

Similarly, btree_gist stores gbtreekey entries in leaf pages:

[v19devel:5432][314069] postgres=# create table i (id int);
CREATE TABLE
[v19devel:5432][314069] postgres=# create index idx_i on i using gist (id);
CREATE INDEX
[v19devel:5432][314069] postgres=# insert into i values (1), (2), (3);
INSERT 0 3
[v19devel:5432][314069] postgres=# select * from
gist_page_items(get_raw_page('idx_i', 0), 'idx_i');
ERROR:  cannot display a value of type gbtreekey?
[v19devel:5432][314069] postgres=# select * from
gist_page_items_bytea(get_raw_page('idx_i', 0));
 itemoffset | ctid  | itemlen | dead |              key_data
------------+-------+---------+------+------------------------------------
          1 | (0,1) |      16 | f    | \x00000000010010000100000001000000
          2 | (0,2) |      16 | f    | \x00000000020010000200000002000000
          3 | (0,3) |      16 | f    | \x00000000030010000300000003000000

I think this error goes back to the second GiST paper referenced in
access/gist/README, titled "Concurrency and Recovery in Generalized
Search Trees" (from 1997). On page 2 it says that internal pages store
the predicate and leaf pages store the key. (The original 1995 paper
doesn't differentiate like that though.) Since our README has a list
of ways that our implementation diverges from the research, I added a
note there as well.

I've also supplied a patch to clarify that there are two papers. The
old wording is a bit confusing:

> GiST stands for Generalized Search Tree. It was introduced in the seminal paper
> "Generalized Search Trees for Database Systems", 1995, Joseph M. Hellerstein,
> Jeffrey F. Naughton, Avi Pfeffer:
>
>     http://www.sai.msu.su/~megera/postgres/gist/papers/gist.ps
>     https://dsf.berkeley.edu/papers/sigmod97-gist.pdf

Clarifying the two papers helps me call out the leaf page difference
in the main patch.

Yours,

-- 
Paul              ~{:-)
pj@illuminatedcomputing.com

Attachment

pgsql-docs by date:

Previous
From: PG Doc comments form
Date:
Subject: 3.5. Window Functions, example table
Next
From: Nick Canale
Date:
Subject: Large elephant