Thread: Can get GiST RECHECK clause to work
Hi everyone, I'm trying to mark a GiST index as lossy using the RECHECK operator as part of some work on PostGIS, but what happens is that the original operator function is never reapplied to the results of the index scan. The operator class and operator definitions looks like this: CREATE OPERATOR && ( LEFTARG = GEOMETRY, RIGHTARG = GEOMETRY, PROCEDURE = geometry_overlap, COMMUTATOR = '&&', RESTRICT = postgis_gist_sel, JOIN = positionjoinsel ); and: CREATE OPERATOR CLASS gist_geometry_opsDEFAULT FOR TYPE geometry USING gist ASOPERATOR 1 << RECHECK,OPERATOR 2 &< RECHECK,OPERATOR 3 && RECHECK,OPERATOR 4 &> RECHECK,OPERATOR 5 >> RECHECK,OPERATOR 6 ~= RECHECK,OPERATOR 7 ~ RECHECK,OPERATOR 8 @ RECHECK,FUNCTION 1 ggeometry_consistent(internal, geometry, int4),FUNCTION 2 gbox_union (bytea, internal),FUNCTION 3 ggeometry_compress (internal),FUNCTION 4 rtree_decompress (internal),FUNCTION 5 gbox_penalty (internal, internal, internal),FUNCTION 6 gbox_picksplit (internal, internal),FUNCTION 7 gbox_same (box, box, internal); What I'm expecting is that since RECHECK is specified, PostgreSQL will identify the index entries using the && operator and then call geometry_overlap() function with the full tuples from the heap. However, it seems geometry_overlap() is never called from an index scan made using the && operator :( Can anyone point out where I'm going wrong? Many thanks, Mark. --- Mark Cave-Ayland Webbased Ltd. Tamar Science Park Derriford Plymouth PL6 8BX England Tel: +44 (0)1752 764445 Fax: +44 (0)1752 764446 This email and any attachments are confidential to the intended recipient and may also be privileged. If you are not the intended recipient please delete it from your system and notify the sender. You should not copy it or use it for any purpose nor disclose or distribute its contents to any other person.
"Mark Cave-Ayland" <m.cave-ayland@webbased.co.uk> writes: > I'm trying to mark a GiST index as lossy using the RECHECK operator as > part of some work on PostGIS, but what happens is that the original > operator function is never reapplied to the results of the index scan. You sure? I'm pretty sure that a number of the contrib gist index opclasses would fail their regression tests if this were broken. As of 7.5 you cannot see the reapplication in the generated plan's filter condition; perhaps that got you confused? 2004-01-05 23:31 tgl * src/: backend/executor/nodeIndexscan.c,backend/nodes/copyfuncs.c, backend/nodes/outfuncs.c,backend/optimizer/path/costsize.c,backend/optimizer/plan/createplan.c,backend/optimizer/plan/setrefs.c, include/nodes/execnodes.h,include/nodes/plannodes.h:Instead of rechecking lossy indexoperators by putting them into the regularqpqual ('filtercondition'), add special-purpose code to nodeIndexscan.c to recheckthem. This ends being almost nonet addition of code, because theremoval of planner code balances out the extra executor code, butit is significantly moreefficient when a lossy operator isinvolved in an OR indexscan. The old implementation had to recheckthe entire indexqualin such cases. regards, tom lane
Hi Tom, As far as I can tell this is the case. What I've done to test this is to put an elog(NOTICE, ".....") in geometry_overlap() so I can tell when it's being called and this is the result I get: shapefile=# select * from tgr1 where the_geom && GeometryFromText('BOX3D(1000 10 0, 2000 2000)'::box3d, -2); NOTICE: postgis_gist_sel called NOTICE: search_box does not overlaps histogram, returning 0 NOTICE: returning computed value: 0.000000 NOTICE: IN GEOMETRY OVERLAP!!!! ERROR: Operation on two GEOMETRIES with different SRIDs shapefile=# create index tgr1_idx on tgr1 using gist (the_geom gist_geometry_ops ); CREATE INDEX shapefile=# select * from tgr1 where the_geom && GeometryFromText('BOX3D(1000 10 0, 2000 2000)'::box3d, -2); NOTICE: postgis_gist_sel called NOTICE: search_box does not overlaps histogram, returning 0 NOTICE: returning computed value: 0.000000gid | tlid | fnode | tnode | length | fedirp | fename | fetype | fedirs | cfcc | fraddl | toaddl | fraddr | toaddr | zipl | zipr | census1 | census2 | cfcc1 | cfcc2 | source | the_geom -----+------+-------+-------+--------+--------+--------+--------+------- -+------ +--------+--------+--------+--------+------+------+---------+---------+- ------+- ------+--------+---------- (0 rows) shapefile=# explain analyze select * from tgr1 where the_geom && GeometryFromTex t('BOX3D(1000 100, 2000 2000)'::box3d, -2); NOTICE: postgis_gist_sel called NOTICE: search_box does not overlaps histogram, returning 0 NOTICE: returning computed value: 0.000000 QUERY PLAN ------------------------------------------------------------------------ -------- -----------------------------------Index Scan using tgr1_idx on tgr1 (cost=0.00..6.01 rows=1 width=327) (actual t ime=30.000..30.000 rows=0 loops=1) Index Cond: (the_geom && 'SRID=-2;BOX3D(1000 100 0,2000 2000 0)'::geometry)Total runtime: 30.000 ms (3 rows) shapefile=# select * from pg_amop where amopclaid=(SELECT oid FROM pg_opclass W ERE opcname = 'gist_geometry_ops');amopclaid | amopsubtype | amopstrategy | amopreqcheck | amopopr -----------+-------------+--------------+--------------+--------- 17456 | 0 | 1 | t | 17410 17456 | 0 | 2 | t | 17412 17456 | 0 | 3 | t | 17413 17456 | 0 | 4 | t | 17411 17456 | 0 | 5 | t | 17409 17456 | 0 | 6 | t | 17414 17456 | 0 | 7 |t | 17415 17456 | 0 | 8 | t | 17416 (8 rows) So before the index is created, the geometry_overlap() function is called, but whenever an index scan is used then it's never called? I've had a look at contrib/rtree_gist but it doesn't make much sense; from what I can see the RECHECK clause is specified for a couple of operators in the operator class but there is no operator defined in the SQL file - so I'm guessing that in this case the RECHECK won't do anything anyway? I'm wondering if I'm missing some sort of mapping between && used for CREATE OPERATOR and the && listed in CREATE OPERATOR CLASS? This is happening with current CVS as of earlier today, however it looks as if it doesn't work in 7.4 either (see http://postgis.refractions.net/pipermail/postgis-users/2004-June/004973. html where I was trying to get the person in question to alter the catalogues manually to enforce the RECHECK which didn't solve the problem for him either). Many thanks, Mark. --- Mark Cave-Ayland Webbased Ltd. Tamar Science Park Derriford Plymouth PL6 8BX England Tel: +44 (0)1752 764445 Fax: +44 (0)1752 764446 This email and any attachments are confidential to the intended recipient and may also be privileged. If you are not the intended recipient please delete it from your system and notify the sender. You should not copy it or use it for any purpose nor disclose or distribute its contents to any other person. > -----Original Message----- > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > Sent: 13 June 2004 23:09 > To: Mark Cave-Ayland > Cc: pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] Can get GiST RECHECK clause to work > > > "Mark Cave-Ayland" <m.cave-ayland@webbased.co.uk> writes: > > I'm trying to mark a GiST index as lossy using the RECHECK > operator as > > part of some work on PostGIS, but what happens is that the original > > operator function is never reapplied to the results of the > index scan. > > You sure? I'm pretty sure that a number of the contrib gist > index opclasses would fail their regression tests if this were broken. > > As of 7.5 you cannot see the reapplication in the generated > plan's filter condition; perhaps that got you confused? > > 2004-01-05 23:31 tgl > > * src/: backend/executor/nodeIndexscan.c, > backend/nodes/copyfuncs.c, backend/nodes/outfuncs.c, > backend/optimizer/path/costsize.c, > backend/optimizer/plan/createplan.c, > backend/optimizer/plan/setrefs.c, include/nodes/execnodes.h, > include/nodes/plannodes.h: Instead of rechecking lossy index > operators by putting them into the regular qpqual ('filter > condition'), add special-purpose code to > nodeIndexscan.c to recheck > them. This ends being almost no net addition of code, > because the > removal of planner code balances out the extra executor > code, but > it is significantly more efficient when a lossy operator is > involved in an OR indexscan. The old implementation > had to recheck > the entire indexqual in such cases. > > > regards, tom lane >
"Mark Cave-Ayland" <m.cave-ayland@webbased.co.uk> writes: > As far as I can tell this is the case. What I've done to test this is to > put an elog(NOTICE, ".....") in geometry_overlap() Well, I can easily prove that CVS tip does call the operator function and honor its result. regression=# create table foo (f1 float8 unique); NOTICE: CREATE TABLE / UNIQUE will create implicit index "foo_f1_key" for table "foo" CREATE TABLE regression=# insert into foo values(1); INSERT 480998 1 regression=# insert into foo values(2); INSERT 480999 1 regression=# select * from foo where f1 = 1;f1 ---- 1 (1 row) With gdb, I set a breakpoint at float8eq, and determine that it is called exactly once (during _bt_checkkeys' scan setup) in this query. Next, after some fooling about to determine which row in pg_amop describes float8eq: regression=# update pg_amop set amopreqcheck = true regression-# where amopclaid = 1972 and amopsubtype = 0 and amopstrategy = 3; UPDATE 1 Now the select calls float8eq twice, once from _bt_checkkeys and once from IndexNext. Moreover I can force a zero result from float8eq in the second call, and if I do then no rows are returned. My guess is that your problem occurs because the index is not returning the row in the first place, and thus there is nothing to recheck. This would point to a bug somewhere in your GIST support functions. regards, tom lane