Thread: pgsql: Fix confusion in SP-GiST between attribute type and leaf storage

Fix confusion in SP-GiST between attribute type and leaf storage type.

According to the documentation, the attType passed to the opclass
config function (and also relied on by the core code) is the type
of the heap column or expression being indexed.  But what was
actually being passed was the type stored for the index column.
This made no difference for user-defined SP-GiST opclasses,
because we weren't allowing the STORAGE clause of CREATE OPCLASS
to be used, so the two types would be the same.  But it's silly
not to allow that, seeing that the built-in poly_ops opclass
has a different value for opckeytype than opcintype, and that if you
want to do lossy storage then the types must really be different.
(Thus, user-defined opclasses doing lossy storage had to lie about
what type is in the index.)  Hence, remove the restriction, and make
sure that we use the input column type not opckeytype where relevant.

For reasons of backwards compatibility with existing user-defined
opclasses, we can't quite insist that the specified leafType match
the STORAGE clause; instead just add an amvalidate() warning if
they don't match.

Also fix some bugs that would only manifest when trying to return
index entries when attType is different from attLeafType.  It's not
too surprising that these have not been reported, because the only
usual reason for such a difference is to store the leaf value
lossily, rendering index-only scans impossible.

Add a src/test/modules module to exercise cases where attType is
different from attLeafType and yet index-only scan is supported.

Discussion: https://postgr.es/m/3728741.1617381471@sss.pgh.pa.us

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/ac9099fc1dd460bffaafec19272159dd7bc86f5b

Modified Files
--------------
doc/src/sgml/ref/create_opclass.sgml               |   2 +-
doc/src/sgml/spgist.sgml                           |  74 +--
src/backend/access/spgist/spgscan.c                |  31 +-
src/backend/access/spgist/spgutils.c               |  84 +++-
src/backend/access/spgist/spgvalidate.c            |  27 +-
src/include/access/spgist_private.h                |   6 +-
src/test/modules/Makefile                          |   1 +
src/test/modules/spgist_name_ops/.gitignore        |   4 +
src/test/modules/spgist_name_ops/Makefile          |  23 +
src/test/modules/spgist_name_ops/README            |   8 +
.../spgist_name_ops/expected/spgist_name_ops.out   |  95 ++++
.../spgist_name_ops/spgist_name_ops--1.0.sql       |  54 +++
src/test/modules/spgist_name_ops/spgist_name_ops.c | 501 +++++++++++++++++++++
.../spgist_name_ops/spgist_name_ops.control        |   4 +
.../spgist_name_ops/sql/spgist_name_ops.sql        |  38 ++
src/test/regress/expected/rangetypes.out           |  19 +
src/test/regress/sql/rangetypes.sql                |  10 +
17 files changed, 937 insertions(+), 44 deletions(-)


Re: pgsql: Fix confusion in SP-GiST between attribute type and leaf storage

From
Michael Paquier
Date:
On Sun, Apr 04, 2021 at 06:29:09PM +0000, Tom Lane wrote:
> Fix confusion in SP-GiST between attribute type and leaf storage type.

anole, woodstar and some other animals have been failing after this
commit:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=anole&dt=2021-04-04%2021%3A36%3A54

SELECT rank() OVER (ORDER BY p <-> point '123,456') n, p <-> point
'123,456' dist, id
 FROM quad_poly_tbl WHERE p <@ polygon
 '((300,300),(400,600),(600,500),(700,200))';
+ERROR:  out of memory
+DETAIL:  Failed on request of size 7688192 in memory context
 "SP-GiST traversal-value context".

Tom, would dfc843d fix this issue?  The buildfarm has not reported yet
with dfc843d, it seems.
--
Michael

Attachment
Michael Paquier <michael@paquier.xyz> writes:
> On Sun, Apr 04, 2021 at 06:29:09PM +0000, Tom Lane wrote:
>> Fix confusion in SP-GiST between attribute type and leaf storage type.

> anole, woodstar and some other animals have been failing after this
> commit:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=anole&dt=2021-04-04%2021%3A36%3A54

Yeah, I believe that's fixed.  All the big-endian animals were unhappy,
because they noticed a datatype mismatch in a way that little-endians
did not.  (Basically, interpreting a fixed-width box value as varlena
yields a data value length that accidentally works okay on little-endian,
but on big-endian it looks like a huge width and you soon go OOM.)

I was able to reproduce it locally on a spare PPC Mac, so I'm
pretty confident that dfc843d fixes it.  Takes a while for these
old dinosaurs to report in, though.

            regards, tom lane