Re: Incorrect behaviour when using a GiST index on points - Mailing list pgsql-hackers

From Noah Misch
Subject Re: Incorrect behaviour when using a GiST index on points
Date
Msg-id 20121018191828.GB10844@tornado.leadboat.com
Whole thread Raw
In response to Re: Incorrect behaviour when using a GiST index on points  (Noah Misch <noah@leadboat.com>)
Responses Re: Incorrect behaviour when using a GiST index on points  (Alexander Korotkov <aekorotkov@gmail.com>)
List pgsql-hackers
On Thu, Oct 11, 2012 at 07:17:28AM -0400, Noah Misch wrote:
> On Tue, Oct 02, 2012 at 01:58:40PM -0400, Noah Misch wrote:
> > > > On Mon, Aug 27, 2012 at 7:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > > >> There's also the big-picture question of whether we should just get rid
> > > >> of fuzzy comparisons in the geometric types instead of trying to hack
> > > >> indexes to work around them.
> 
> > In any event, I think we should entertain a patch to make the GiST operator
> > class methods bug-compatible with corresponding operators.  Even if we decide
> > to change operator behavior in HEAD, the back branches could use it.
> 
> We have broad agreement that the specific implementation of fuzz in geometric
> comparison operators is shoddy, but nobody has voiced interest in designing a
> concrete improvement.  I propose adding a TODO item "Remove or improve
> rounding in geometric comparison operators", endorsing Alexander's design, and
> reviewing his patch.  Objections?

TODO added, and here's a review:

The patch adds no regression tests; it should add tests illustrating the
problems it fixes.

I audited the other indexable geometric operators for similar problems.  This
passage in gist_point_consistent_internal(), which handles (point,point)
operators, caught my suspicion:
    case RTSameStrategyNumber:        if (isLeaf)        {            result = FPeq(key->low.x, query->x)
&& FPeq(key->low.y, query->y);        }        else        {            result = (query->x <= key->high.x && query->x
>=key->low.x &&                      query->y <= key->high.y && query->y >= key->low.y);        }        break;
 

A leaf entry reachable from an internal entry may fall exactly on the
internal-entry bounding box.  Since we would accept a fuzzy match at the leaf
level, I think we must also accept a fuzzy match at the internal level.

> *** a/src/backend/access/gist/gistproc.c
> --- b/src/backend/access/gist/gistproc.c

> ***************
> *** 1326,1331 **** gist_point_consistent(PG_FUNCTION_ARGS)
> --- 1327,1333 ----
>       bool       *recheck = (bool *) PG_GETARG_POINTER(4);
>       bool        result;
>       StrategyNumber strategyGroup = strategy / GeoStrategyNumberOffset;
> +     BOX           *query, *key;

This function now has "query" variables within subsidiary blocks redundant
with and masking this one.  Avoid doing that.

>   
>       switch (strategyGroup)
>       {
> ***************
> *** 1337,1348 **** gist_point_consistent(PG_FUNCTION_ARGS)
>               *recheck = false;
>               break;
>           case BoxStrategyNumberGroup:
> !             result = DatumGetBool(DirectFunctionCall5(
> !                                                       gist_box_consistent,
> !                                                       PointerGetDatum(entry),
> !                                                       PG_GETARG_DATUM(1),
> !                                       Int16GetDatum(RTOverlapStrategyNumber),
> !                                                0, PointerGetDatum(recheck)));
>               break;
>           case PolygonStrategyNumberGroup:
>               {
> --- 1339,1356 ----
>               *recheck = false;
>               break;
>           case BoxStrategyNumberGroup:
> !             /* 
> !              * This code repeats logic of on_ob which uses simple comparison
> !              * rather than FP* functions.
> !              */
> !             query = PG_GETARG_BOX_P(1);
> !             key = DatumGetBoxP(entry->key);
> !             
> !             *recheck = false;
> !             result = key->high.x >= query->low.x && 
> !                      key->low.x <= query->high.x &&
> !                      key->high.y >= query->low.y && 
> !                      key->low.y <= query->high.y;

For leaf entries, this correctly degenerates to on_pb().  For internal
entries, it must, but does not, implement box_overlap().  (The fuzzy
box_overlap() would be fine.)  I recommend making gist_point_consistent()'s
treatment of boxes resemble its treatment of circles and polygons; that eases
verifying their correctness.  Call gist_box_consistent.  Then, for leaf
entries, call box_contain_pt().


GiST "consistent" functions often validate the strategy number, but the
circle, polygon and box branches of gist_point_consistent silently assume
strategy % GeoStrategyNumberOffset == RTContainedByStrategyNumber.  Should
they verify that assumption?  I raise this as an incidental question; it is
not new with your patch.

Thanks,
nm



pgsql-hackers by date:

Previous
From: Christopher Browne
Date:
Subject: Re: [RFC] CREATE QUEUE (log-only table) for londiste/pgQ ccompatibility
Next
From: Gezeala M. Bacuño II
Date:
Subject: Re: [BUGS] BUG #7521: Cannot disable WAL log while using pg_dump