Re: Incorrect behaviour when using a GiST index on points - Mailing list pgsql-hackers

From Alexander Korotkov
Subject Re: Incorrect behaviour when using a GiST index on points
Date
Msg-id CAPpHfduzPjwO-bUkJdniQ-K-c3x-uh9ZhtFf1PLr_6hKstn7fQ@mail.gmail.com
Whole thread Raw
In response to Re: Incorrect behaviour when using a GiST index on points  (Noah Misch <noah@leadboat.com>)
Responses Re: Incorrect behaviour when using a GiST index on points  (Noah Misch <noah@leadboat.com>)
List pgsql-hackers
On Thu, Oct 18, 2012 at 11:18 PM, Noah Misch <noah@leadboat.com> wrote:
On Thu, Oct 11, 2012 at 07:17:28AM -0400, Noah Misch wrote:
> On Tue, Oct 02, 2012 at 01:58:40PM -0400, Noah Misch wrote:
> > > > On Mon, Aug 27, 2012 at 7:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > > >> There's also the big-picture question of whether we should just get rid
> > > >> of fuzzy comparisons in the geometric types instead of trying to hack
> > > >> indexes to work around them.
>
> > In any event, I think we should entertain a patch to make the GiST operator
> > class methods bug-compatible with corresponding operators.  Even if we decide
> > to change operator behavior in HEAD, the back branches could use it.
>
> We have broad agreement that the specific implementation of fuzz in geometric
> comparison operators is shoddy, but nobody has voiced interest in designing a
> concrete improvement.  I propose adding a TODO item "Remove or improve
> rounding in geometric comparison operators", endorsing Alexander's design, and
> reviewing his patch.  Objections?

TODO added, and here's a review:

The patch adds no regression tests; it should add tests illustrating the
problems it fixes.

I've added some tests to points.sql.
 
I audited the other indexable geometric operators for similar problems.  This
passage in gist_point_consistent_internal(), which handles (point,point)
operators, caught my suspicion:

                case RTSameStrategyNumber:
                        if (isLeaf)
                        {
                                result = FPeq(key->low.x, query->x)
                                        && FPeq(key->low.y, query->y);
                        }
                        else
                        {
                                result = (query->x <= key->high.x && query->x >= key->low.x &&
                                                  query->y <= key->high.y && query->y >= key->low.y);
                        }
                        break;

A leaf entry reachable from an internal entry may fall exactly on the
internal-entry bounding box.  Since we would accept a fuzzy match at the leaf
level, I think we must also accept a fuzzy match at the internal level.

Good catch, fixed.
 
> *** a/src/backend/access/gist/gistproc.c
> --- b/src/backend/access/gist/gistproc.c

> ***************
> *** 1326,1331 **** gist_point_consistent(PG_FUNCTION_ARGS)
> --- 1327,1333 ----
>       bool       *recheck = (bool *) PG_GETARG_POINTER(4);
>       bool            result;
>       StrategyNumber strategyGroup = strategy / GeoStrategyNumberOffset;
> +     BOX                *query, *key;

This function now has "query" variables within subsidiary blocks redundant
with and masking this one.  Avoid doing that.
 
Fixed.

>
>       switch (strategyGroup)
>       {
> ***************
> *** 1337,1348 **** gist_point_consistent(PG_FUNCTION_ARGS)
>                       *recheck = false;
>                       break;
>               case BoxStrategyNumberGroup:
> !                     result = DatumGetBool(DirectFunctionCall5(
> !                                                                                                       gist_box_consistent,
> !                                                                                                       PointerGetDatum(entry),
> !                                                                                                       PG_GETARG_DATUM(1),
> !                                                                       Int16GetDatum(RTOverlapStrategyNumber),
> !                                                                                        0, PointerGetDatum(recheck)));
>                       break;
>               case PolygonStrategyNumberGroup:
>                       {
> --- 1339,1356 ----
>                       *recheck = false;
>                       break;
>               case BoxStrategyNumberGroup:
> !                     /*
> !                      * This code repeats logic of on_ob which uses simple comparison
> !                      * rather than FP* functions.
> !                      */
> !                     query = PG_GETARG_BOX_P(1);
> !                     key = DatumGetBoxP(entry->key);
> !
> !                     *recheck = false;
> !                     result = key->high.x >= query->low.x &&
> !                                      key->low.x <= query->high.x &&
> !                                      key->high.y >= query->low.y &&
> !                                      key->low.y <= query->high.y;

For leaf entries, this correctly degenerates to on_pb().  For internal
entries, it must, but does not, implement box_overlap().  (The fuzzy
box_overlap() would be fine.)  I recommend making gist_point_consistent()'s
treatment of boxes resemble its treatment of circles and polygons; that eases
verifying their correctness.  Call gist_box_consistent.  Then, for leaf
entries, call box_contain_pt().
 
I have two objections on doing that:
1) It's not evident for me that fuzzy comparison in internal pages is fine. Obviously, it depends on data distribution. It's easy to provide an example when fuzzy comparison will lead to significant performance degradation.
2) With PolygonStrategyNumberGroup CircleStrategyNumberGroup it's faster to do simple box comparison than doing calculation for exact circle and especially polygon check. In this case previous filtering in leaf pages looks reasonable. With BoxStrategyNumberGroup exact calculation is simpler than gist_box_consistent.

------
With best regards,
Alexander Korotkov.
Attachment

pgsql-hackers by date:

Previous
From: Pavan Deolasee
Date:
Subject: Re: Problem Observed in behavior of Create Index Concurrently and Hot Update
Next
From: Noah Misch
Date:
Subject: Re: Incorrect behaviour when using a GiST index on points