Re: BUG #17884: gist_page_items() crashes for a non-leaf page of an index with non-key columns - Mailing list pgsql-bugs

From Michael Paquier
Subject Re: BUG #17884: gist_page_items() crashes for a non-leaf page of an index with non-key columns
Date
Msg-id ZGWEZCN0PpfFona0@paquier.xyz
Whole thread Raw
In response to Re: BUG #17884: gist_page_items() crashes for a non-leaf page of an index with non-key columns  (Alexander Lakhin <exclusion@gmail.com>)
Responses Re: BUG #17884: gist_page_items() crashes for a non-leaf page of an index with non-key columns  (Michael Paquier <michael@paquier.xyz>)
List pgsql-bugs
On Wed, May 17, 2023 at 05:00:00PM +0300, Alexander Lakhin wrote:
> While thinking about naming (what exactly "attroid" means there), I've
> reread the comment above BuildIndexValueDescription():
>  * The passed-in values/nulls arrays are the "raw" input to the index AM,
>  * e.g. results of FormIndexDatum --- this is not necessarily what is stored
>  * in the index, but it's what the user perceives to be stored.

Yeah, I was wondering about that a bit as well yesterday, see below.
But in short I'd agree that showing all the data from a single item as
stored is more helpful than just showing what would be perceived by
the user, through originally what's an API to generate error messages
on constraint failures.

> and recognized that I've made a mistake when relied on the existing coding.
> A simple example:
> create extension pageinspect;
> create extension pg_trgm;
>
> create table test_trgm(t text);
> create index trgm_idx on test_trgm using gist (t gist_trgm_ops);
> insert into test_trgm select to_char(g, 'textFM000')  from generate_series(1, 1000) g;
> select gist_page_items(get_raw_page('trgm_idx', 0), 'trgm_idx'::regclass);
> select gist_page_items(get_raw_page('trgm_idx', 1), 'trgm_idx'::regclass);
>
> (I chose gist_trgm_ops because that opclass defined as follows:
> CREATE OPERATOR CLASS gist_trgm_ops FOR TYPE text USING gist ... STORAGE gtrgm)
> shows garbage values:
>                gist_page_items
> ---------------------------------------------
>  (1,"(1,65535)",32,f,"(t)=(\x02\x7F\x7F_ߟ)")
>  (2,"(3,65535)",32,f,"(t)=(\x02')")
>  (3,"(2,65535)",32,f,"(t)=(\x02߽)")
>  (4,"(5,65535)",32,f,"(t)=(\x02\x7F)")
>  (5,"(4,65535)",32,f,"(t)=(\x02w?\x7F)")
>  (6,"(6,65535)",32,f,"(t)=(\x02/)")
> (6 rows)

Fun.

> It means that pageinspect tries to print stored gtrgm values as text, and I
> believe that's the third problem here.

And it is not possible to display them as its output function says.
People could just use the bytea flavor in this case to show such items

> Moreover, for the point_ops class opcintype is point, but opckeytype is box,
> thus indexes test_gist_idx and test_gist_idx_inc used in the test in fact
> contain boxes, not points. So the fixed gist_page_items() produces different
> output even for existing test cases. (Please look at the patch attached.)
> I'm somewhat confused with the output:

Hmm.  I was actually looking at something pretty close to that
yesterday, but discarded the thought when I sent v3.  Included
attributes don't support expressions, but key attributes do, and the
generated output was a bit confusing when not applying parenthesis for
each attribute printed.  For example, take this case:
CREATE EXTENSION pageinspect;
CREATE TABLE test_gist AS SELECT point(i,i) p, i i FROM
  generate_series(1, 1000) i;
CREATE INDEX test_gist_idx_inc ON test_gist USING
  gist (p, point(i,i), (point(i, i)));
SELECT keys FROM gist_page_items(get_raw_page('test_gist_idx_inc', 1), 'test_gist_idx_inc')
  WHERE itemoffset = 1;

On HEAD or with v3, I get that, which is parseable:
                                                            keys
     

-----------------------------------------------------------------------------------------------------------------------------
 (p, point(i::double precision, i::double precision), point(i::double precision, i::double precision))=((1,1), (1,1),
(1,1))
(1 row)

v4 would give that once you include the full contents of the page items:
                                                                     keys
                       

-----------------------------------------------------------------------------------------------------------------------------------------------
 (p, point(i::double precision, i::double precision), point(i::double precision, i::double precision))=((1,1),(1,1),
(1,1),(1,1),(1,1),(1,1)) 
(1 row)

So, as you say, we should really add more parenthesis in this case.
Even with that, we cannot add too many of them, or the output could be
confusing.  For example, appending always parenthesis for included
attributes could be seen as an expression, but they can never be that.
So my take would be something a little bit more tricky, as of:
- Show NULL as it is.
- If dealing with a key attribute, always add parenthesis to it to
clarify that we are interested in the opckeytype.
- If dealing with an included attribute, don't add parenthesis at
all.

I have expanded the tests to show how this applies for more data
types, like point or integers.  Any thoughts about the v5 attached?
--
Michael

Attachment

pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: Need Support to Upgrade from 13.6 to 15.3
Next
From: Michael Paquier
Date:
Subject: Re: BUG #17884: gist_page_items() crashes for a non-leaf page of an index with non-key columns