Thread: SP-GiST for ranges based on 2d-mapping and quad-tree

SP-GiST for ranges based on 2d-mapping and quad-tree

From

Alexander Korotkov

Date:

13 June 2012, 19:57:29

Hackers,

attached patch implements quad-tree on ranges. Some performance results in comparison with current GiST indexing.

Index creation is slightly slower. Probably, it need some investigation. Search queries on SP-GiST use much more pages. However this comparison can be not really correct, because SP-GiST can pin same buffer several times during one scan. In CPU search queries on SP-GiST seems to be slightly faster. Dramatical difference in "column <@ const" query is thanks to 2d-mapping.

test=# create index period_idx on period_test using gist (period);

CREATE INDEX

Time: 49141,148 ms

test=# create index period_idx2 on period_test2 using spgist (period);

CREATE INDEX

Time: 59503,215 ms

test=# explain (analyze, buffers) select * from period_test where period && daterange('2011-12-10', '2011-12-11');

QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------

Bitmap Heap Scan on period_test (cost=471.63..13716.45 rows=11866 width=32) (actual time=107.258..147.139 rows=365451 loops=1)

Recheck Cond: (period && '[2011-12-10,2011-12-11)'::daterange)

Buffers: shared read=7636

-> Bitmap Index Scan on period_idx (cost=0.00..468.67 rows=11866 width=0) (actual time=106.697..106.697 rows=365451 loops=1)

Index Cond: (period && '[2011-12-10,2011-12-11)'::daterange)

Buffers: shared read=3694

Total runtime: 160.670 ms

(7 rows)

test=# explain (analyze, buffers) select * from period_test2 where period && daterange('2011-12-10', '2011-12-11');

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------

Bitmap Heap Scan on period_test2 (cost=414.52..13659.34 rows=11866 width=32) (actual time=88.793..129.608 rows=365451 loops=1)

Recheck Cond: (period && '[2011-12-10,2011-12-11)'::daterange)

Buffers: shared hit=7623 read=7687

-> Bitmap Index Scan on period_idx2 (cost=0.00..411.55 rows=11866 width=0) (actual time=88.285..88.285 rows=365451 loops=1)

Index Cond: (period && '[2011-12-10,2011-12-11)'::daterange)

Buffers: shared hit=7623 read=3745

Total runtime: 142.971 ms

(7 rows)

test=# explain (analyze, buffers) select * from period_test where period <@ daterange('2011-12-10', '2011-12-11');

QUERY PLAN

---------------------------------------------------------------------------------------------------

-----------------------

Bitmap Heap Scan on period_test (cost=102.06..6140.66 rows=2373 width=32) (actual time=85.437..85.437 rows=0 loops=1)

Recheck Cond: (period <@ '[2011-12-10,2011-12-11)'::daterange)

Buffers: shared read=3694

-> Bitmap Index Scan on period_idx (cost=0.00..101.47 rows=2373 width=0) (actual time=85.431..85.431 rows=0 loops=1)

Index Cond: (period <@ '[2011-12-10,2011-12-11)'::daterange)

Buffers: shared read=3694

Total runtime: 85.493 ms

(7 rows)

test=# explain (analyze, buffers) select * from period_test2 where period <@ daterange('2011-12-10', '2011-12-11');

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------

Bitmap Heap Scan on period_test2 (cost=88.95..6127.54 rows=2373 width=32) (actual time=18.666..18.666 rows=0 loops=1)

Recheck Cond: (period <@ '[2011-12-10,2011-12-11)'::daterange)

Buffers: shared hit=1404

-> Bitmap Index Scan on period_idx2 (cost=0.00..88.35 rows=2373 width=0) (actual time=18.660..18.660 rows=0 loops=1)

Index Cond: (period <@ '[2011-12-10,2011-12-11)'::daterange)

Buffers: shared hit=1404

Total runtime: 18.717 ms

(7 rows)

test=# explain (analyze, buffers) select * from period_test where period @> daterange('2011-08-10', '2011-12-31');

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------

Bitmap Heap Scan on period_test (cost=102.06..6140.66 rows=2373 width=32) (actual time=56.114..73.785 rows=118097 loops=1)

Recheck Cond: (period @> '[2011-08-10,2011-12-31)'::daterange)

Buffers: shared read=4125

-> Bitmap Index Scan on period_idx (cost=0.00..101.47 rows=2373 width=0) (actual time=55.740..55.740 rows=118097 loops=1)

Index Cond: (period @> '[2011-08-10,2011-12-31)'::daterange)

Buffers: shared read=1391

Total runtime: 78.469 ms

(7 rows)

test=# explain (analyze, buffers) select * from period_test2 where period @> daterange('2011-08-10', '2011-12-31');

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------

Bitmap Heap Scan on period_test2 (cost=88.95..6127.54 rows=2373 width=32) (actual time=31.653..49

.751 rows=118097 loops=1)

Recheck Cond: (period @> '[2011-08-10,2011-12-31)'::daterange)

Buffers: shared hit=3093 read=3768

-> Bitmap Index Scan on period_idx2 (cost=0.00..88.35 rows=2373 width=0) (actual time=31.282..31.282 rows=118097 loops=1)

Index Cond: (period @> '[2011-08-10,2011-12-31)'::daterange)

Buffers: shared hit=3093 read=1034

Total runtime: 54.288 ms

(7 rows)

------
With best regards,
Alexander Korotkov.

Attachment

range_spgist_quad-0.2.patch.gz

Re: SP-GiST for ranges based on 2d-mapping and quad-tree

From

Heikki Linnakangas

Date:

21 June 2012, 04:13:12

On 14.06.2012 01:56, Alexander Korotkov wrote:
> Hackers,
>
> attached patch implements quad-tree on ranges. Some performance results in
> comparison with current GiST indexing.

> @@ -788,7 +774,7 @@ range_super_union(TypeCacheEntry *typcache, RangeType * r1, R
> angeType * r2)
>   * part of the relcache entry for the index, typically) this essentially
>   * eliminates lookup overhead during operations on a GiST range index.
>   */
> -static Datum
> +Datum
>  TrickFunctionCall2(PGFunction proc, FmgrInfo *flinfo, Datum arg1, Datum arg2)
>  {
>         FunctionCallInfoData fcinfo;

I don't think we want to expose TrickFunctionCall2(). Not with that 
name, anyway. Perhaps we should refactor the functions called this way, 
range_adjacent, range_overlaps etc., to have internal counterparts that 
can be called without FunctionCall(). Like:

> ***************
> *** 692,697 ****
> --- 692,708 ----
>   {
>       RangeType  *r1 = PG_GETARG_RANGE(0);
>       RangeType  *r2 = PG_GETARG_RANGE(1);
> +
> +     typcache = range_get_typcache(fcinfo, RangeTypeGetOid(r1));
> +
> +     PG_RETURN_BOOL(range_adjacent_internal(r1, r2, typcache);
> + }
> +
> + bool
> + range_adjacent_internal(RangeType r1, RangeType r2, TypeCacheEntry *typcache)
> + {
> +     RangeType  *r1 = PG_GETARG_RANGE(0);
> +     RangeType  *r2 = PG_GETARG_RANGE(1);
>       TypeCacheEntry *typcache;
>       RangeBound    lower1,
>                   lower2;

The gist and SP-gist consistent functions could call the internal 
function directly.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

Re: SP-GiST for ranges based on 2d-mapping and quad-tree

From

Alexander Korotkov

Date:

21 June 2012, 18:30:04

On Thu, Jun 21, 2012 at 11:12 AM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote:

On 14.06.2012 01:56, Alexander Korotkov wrote:
Hackers,

attached patch implements quad-tree on ranges. Some performance results in
comparison with current GiST indexing.

@@ -788,7 +774,7 @@ range_super_union(TypeCacheEntry *typcache, RangeType * r1, R
angeType * r2)
* part of the relcache entry for the index, typically) this essentially
* eliminates lookup overhead during operations on a GiST range index.
*/
-static Datum
+Datum
TrickFunctionCall2(PGFunction proc, FmgrInfo *flinfo, Datum arg1, Datum arg2)
{
FunctionCallInfoData fcinfo;

I don't think we want to expose TrickFunctionCall2(). Not with that name, anyway. Perhaps we should refactor the functions called this way, range_adjacent, range_overlaps etc., to have internal counterparts that can be called without FunctionCall(). Like:

***************
*** 692,697 ****
--- 692,708 ----
{
RangeType *r1 = PG_GETARG_RANGE(0);
RangeType *r2 = PG_GETARG_RANGE(1);
+
+ typcache = range_get_typcache(fcinfo, RangeTypeGetOid(r1));
+
+ PG_RETURN_BOOL(range_adjacent_internal(r1, r2, typcache);
+ }
+
+ bool
+ range_adjacent_internal(RangeType r1, RangeType r2, TypeCacheEntry *typcache)
+ {
+ RangeType *r1 = PG_GETARG_RANGE(0);
+ RangeType *r2 = PG_GETARG_RANGE(1);
TypeCacheEntry *typcache;
RangeBound lower1,
lower2;

The gist and SP-gist consistent functions could call the internal function directly.

I like idea of replacing TrickFunctionCall2 with internal function. Do you think I should post a separate patch for existing GiST code?

------
With best regards,
Alexander Korotkov.

Re: SP-GiST for ranges based on 2d-mapping and quad-tree

From

Alexander Korotkov

Date:

21 June 2012, 18:34:06

On Thu, Jun 14, 2012 at 2:56 AM, Alexander Korotkov <aekorotkov@gmail.com> wrote:

attached patch implements quad-tree on ranges. Some performance results in comparison with current GiST indexing.
Index creation is slightly slower. Probably, it need some investigation. Search queries on SP-GiST use much more pages. However this comparison can be not really correct, because SP-GiST can pin same buffer several times during one scan. In CPU search queries on SP-GiST seems to be slightly faster. Dramatical difference in "column <@ const" query is thanks to 2d-mapping.

Patch with another SP-GiST implementation for ranges is attached. It uses k-d tree instead of quad-tree. I would like to leave only one implementation of SP-GiST for ranges. I'm going to do as comprehensive testing as I can for it.

------
With best regards,
Alexander Korotkov.

Attachment

range_spgist_kd-0.1.patch

Re: SP-GiST for ranges based on 2d-mapping and quad-tree

From

Tom Lane

Date:

21 June 2012, 18:38:06

Alexander Korotkov <aekorotkov@gmail.com> writes:
> On Thu, Jun 21, 2012 at 11:12 AM, Heikki Linnakangas <
> heikki.linnakangas@enterprisedb.com> wrote:
>> I don't think we want to expose TrickFunctionCall2(). Not with that name,
>> anyway. Perhaps we should refactor the functions called this way,
>> range_adjacent, range_overlaps etc., to have internal counterparts that can
>> be called without FunctionCall(). Like:

> I like idea of replacing TrickFunctionCall2 with internal function. Do you
> think I should post a separate patch for existing GiST code?

+1 ... that was a pretty grotty hack, so let's get rid of it if we can.
It's still going to require some documentation though I think.
        regards, tom lane

Re: SP-GiST for ranges based on 2d-mapping and quad-tree

From

Jeff Davis

Date:

22 June 2012, 05:53:14

On Thu, 2012-06-14 at 02:56 +0400, Alexander Korotkov wrote:
> Hackers,
> 
> 
> attached patch implements quad-tree on ranges. Some performance
> results in comparison with current GiST indexing.
> Index creation is slightly slower. Probably, it need some
> investigation. Search queries on SP-GiST use much more pages. However
> this comparison can be not really correct, because SP-GiST can pin
> same buffer several times during one scan. In CPU search queries on
> SP-GiST seems to be slightly faster. Dramatical difference in "column
> <@ const" query is thanks to 2d-mapping.
> 
Looking at this patch now. I see that it fails the opr_sanity test (on
master), can you look into that?

It looks very promising from a performance standpoint. I think the "col
<@ const" query will be a common query; and I also think that pattern
will be useful to restrict a large table down to something more
manageable.

In the bounds_connected function, it might make more sense to use the
word "adjacent" which I already used for ordinary ranges, rather than
using the new word "connected".

Also, I'm getting a little confused switching between thinking in terms
of "X and Y" and "lower and upper" (particularly since lower and upper
can be confused with > or <). I don't have a suggestion yet how to
clarify that, but it might be good to use the spatial terminology in
more places and avoid lower/upper except where needed.

Please excuse the slow review, I'm catching up on the SP-GiST API.

Regards,Jeff Davis

Re: SP-GiST for ranges based on 2d-mapping and quad-tree

From

Jeff Davis

Date:

03 July 2012, 03:47:59

On Thu, 2012-06-14 at 02:56 +0400, Alexander Korotkov wrote:
> Hackers,
> 
> 
> attached patch implements quad-tree on ranges. Some performance
> results in comparison with current GiST indexing.
> Index creation is slightly slower. Probably, it need some
> investigation. Search queries on SP-GiST use much more pages. However
> this comparison can be not really correct, because SP-GiST can pin
> same buffer several times during one scan. In CPU search queries on
> SP-GiST seems to be slightly faster. Dramatical difference in "column
> <@ const" query is thanks to 2d-mapping.
> 

More comments:

* Minor rebase is required (simple int2 -> int16).

* Perhaps I'm mistaken, but the following code in getQuadrant() looks
wrong to me, shouldn't the 1 and 2 be reversed?
       if (range_cmp_bounds(typcache, &upper, ¢roidUpper) >= 0)           return 1;       else           return 2;

* in the "choose" method, why does in->allTheSame unconditionally match?
Why not split? Similarly, why does inner_consistent always match when
the nodes are allTheSame?

* It's a little confusing having empty prefixes mean that empty range go
to node0, and non-empty ranges meaning that empty ranges go to node4
(quadrant 5). Why can't there just always be 5 nodes, and iff all the
ranges are empty, then the prefix is NULL?

And for that matter, let's let the quadrant equal the node number, and
have the empty ranges in node0. I don't see much point in always
subtracting 1 from the quadrant number.

Regards,Jeff Davis

Re: SP-GiST for ranges based on 2d-mapping and quad-tree

From

Jeff Davis

Date:

03 July 2012, 03:51:06

On Mon, 2012-07-02 at 23:47 -0700, Jeff Davis wrote:
> On Thu, 2012-06-14 at 02:56 +0400, Alexander Korotkov wrote:
> > Hackers,
> > 
> > 
> > attached patch implements quad-tree on ranges. Some performance
> > results in comparison with current GiST indexing.
> > Index creation is slightly slower. Probably, it need some
> > investigation. Search queries on SP-GiST use much more pages. However
> > this comparison can be not really correct, because SP-GiST can pin
> > same buffer several times during one scan. In CPU search queries on
> > SP-GiST seems to be slightly faster. Dramatical difference in "column
> > <@ const" query is thanks to 2d-mapping.
> > 

Also, it would be helpful to add a couple tests to rangetypes.sql.

Regards,Jeff Davis

Re: SP-GiST for ranges based on 2d-mapping and quad-tree

From

Jeff Davis

Date:

03 July 2012, 11:26:08

On Mon, 2012-07-02 at 23:47 -0700, Jeff Davis wrote:
> * Perhaps I'm mistaken, but the following code in getQuadrant() looks
> wrong to me, shouldn't the 1 and 2 be reversed?
> 
>         if (range_cmp_bounds(typcache, &upper, ¢roidUpper) >= 0)
>             return 1;
>         else
>             return 2;

Oops, looks like I was mistaken. The code looks fine to me now.

Regards,Jeff Davis

Re: SP-GiST for ranges based on 2d-mapping and quad-tree

From

Alexander Korotkov

Date:

03 July 2012, 19:09:38

On Tue, Jul 3, 2012 at 10:51 AM, Jeff Davis <pgsql@j-davis.com> wrote:

On Mon, 2012-07-02 at 23:47 -0700, Jeff Davis wrote:
> On Thu, 2012-06-14 at 02:56 +0400, Alexander Korotkov wrote:
> > Hackers,
> >
> >
> > attached patch implements quad-tree on ranges. Some performance
> > results in comparison with current GiST indexing.
> > Index creation is slightly slower. Probably, it need some
> > investigation. Search queries on SP-GiST use much more pages. However
> > this comparison can be not really correct, because SP-GiST can pin
> > same buffer several times during one scan. In CPU search queries on
> > SP-GiST seems to be slightly faster. Dramatical difference in "column
> > <@ const" query is thanks to 2d-mapping.
> >

Also, it would be helpful to add a couple tests to rangetypes.sql.

Thank you for review! Now I'm working on detailed performance benchmarks for different opclasses. I hope to finish it in this week. Then we would see which opclasses we're really need and nail down issues you've pointed.

------
With best regards,

Alexander Korotkov.

Re: SP-GiST for ranges based on 2d-mapping and quad-tree

From

Alexander Korotkov

Date:

11 July 2012, 20:04:11

On Tue, Jul 3, 2012 at 10:51 AM, Jeff Davis <pgsql@j-davis.com> wrote:

Also, it would be helpful to add a couple tests to rangetypes.sql.

New version of patch is attached.

1) Idea of having different node numbers is that nodes takes some space (check spgFormInnerTuple). I've rethink this idea a little because adding of node without label just didn't work :). Only root inner index tuple have 5 nodes, others have 4. Thereby all empty ranges are branched already at root inner index tuple.

2) Empty prefix datum replaced with absence of prefix datum.

3) int2 replaced with int16.

4) I've added some tests which duplicates tests for GiST.

5) "connected" replaced with "adjacent"

6) allTheSame nodes is node created by SP-GiST core when it decide than result of picksplit method is not good enough. It divides tuples arbitrarily. So we have to visit all the nodes in this case during scan. See: http://www.postgresql.org/docs/devel/static/spgist-implementation.html#SPGIST-ALL-THE-SAME . Currently in-core SP-GiST opclasses behaves similarly.

I didn't decide how to rethink terms in comments yet :(

------
With best regards,
Alexander Korotkov.

Attachment

range_spgist_quad-0.3.patch

Re: SP-GiST for ranges based on 2d-mapping and quad-tree

From

Alexander Korotkov

Date:

11 July 2012, 20:11:58

On Thu, Jul 12, 2012 at 3:03 AM, Alexander Korotkov <aekorotkov@gmail.com> wrote:

On Tue, Jul 3, 2012 at 10:51 AM, Jeff Davis <pgsql@j-davis.com> wrote:
Also, it would be helpful to add a couple tests to rangetypes.sql.

New version of patch is attached.

Oops, forgot to include one comment fix into patch.

------
With best regards,
Alexander Korotkov.

Attachment

range_spgist_quad-0.4.patch

Re: SP-GiST for ranges based on 2d-mapping and quad-tree

From

Heikki Linnakangas

Date:

12 July 2012, 03:30:08

On 12.07.2012 02:11, Alexander Korotkov wrote:
> On Thu, Jul 12, 2012 at 3:03 AM, Alexander Korotkov<aekorotkov@gmail.com>wrote:
>
>> On Tue, Jul 3, 2012 at 10:51 AM, Jeff Davis<pgsql@j-davis.com>  wrote:
>>
>>> Also, it would be helpful to add a couple tests to rangetypes.sql.
>>>
>>
>> New version of patch is attached.
>>
>
> Oops, forgot to include one comment fix into patch.

Thanks. Can you do something about TrickFunctionCall2, please? 
(http://archives.postgresql.org/message-id/4FE2C968.2010503@enterprisedb.com) 
A separate patch to refactor that in the existing gist opclass would 
probably be best.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

Re: SP-GiST for ranges based on 2d-mapping and quad-tree

From

Alexander Korotkov

Date:

12 July 2012, 20:00:45

On Thu, Jul 12, 2012 at 10:29 AM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote:

On 12.07.2012 02:11, Alexander Korotkov wrote:
On Thu, Jul 12, 2012 at 3:03 AM, Alexander Korotkov<aekorotkov@gmail.com>wrote:

On Tue, Jul 3, 2012 at 10:51 AM, Jeff Davis<pgsql@j-davis.com> wrote:

Also, it would be helpful to add a couple tests to rangetypes.sql.

New version of patch is attached.

Oops, forgot to include one comment fix into patch.

Thanks. Can you do something about TrickFunctionCall2, please? (http://archives.postgresql.org/message-id/4FE2C968.2010503@enterprisedb.com) A separate patch to refactor that in the existing gist opclass would probably be best.

Done. There are separate patch for get rid of TrickFunctionCall2 and version of SP-GiST for ranges based on that patch.

------
With best regards,
Alexander Korotkov.

Attachment

Re: SP-GiST for ranges based on 2d-mapping and quad-tree

From

Heikki Linnakangas

Date:

18 July 2012, 17:23:01

On 13.07.2012 02:00, Alexander Korotkov wrote:
> On Thu, Jul 12, 2012 at 10:29 AM, Heikki Linnakangas<
> heikki.linnakangas@enterprisedb.com>  wrote:
>
>> Thanks. Can you do something about TrickFunctionCall2, please? (
>> http://archives.postgresql.**org/message-id/4FE2C968.**
>> 2010503@enterprisedb.com<http://archives.postgresql.org/message-id/4FE2C968.2010503@enterprisedb.com>)
>> A separate patch to refactor that in the existing gist opclass would
>> probably be best.
>
> Done. There are separate patch for get rid of TrickFunctionCall2 and
> version of SP-GiST for ranges based on that patch.

Committed the get-rid-of-TrickFunctionCall2 patch. I also changed one 
call in range_gist_same(), which was not using TrickFunctionCall2 but 
was nevertheless doing the same thing in spirit.

I'll try to take a look at the other patch in the next few days.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

Re: SP-GiST for ranges based on 2d-mapping and quad-tree

From

Alexander Korotkov

Date:

18 July 2012, 17:53:25

On Thu, Jul 19, 2012 at 12:22 AM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote:

On 13.07.2012 02:00, Alexander Korotkov wrote:
On Thu, Jul 12, 2012 at 10:29 AM, Heikki Linnakangas<
heikki.linnakangas@enterprisedb.com> wrote:

Thanks. Can you do something about TrickFunctionCall2, please? (
http://archives.postgresql.**org/message-id/4FE2C968.**
2010503@enterprisedb.com<http://archives.postgresql.org/message-id/4FE2C968.2010503@enterprisedb.com>)

A separate patch to refactor that in the existing gist opclass would
probably be best.

Done. There are separate patch for get rid of TrickFunctionCall2 and
version of SP-GiST for ranges based on that patch.

Committed the get-rid-of-TrickFunctionCall2 patch. I also changed one call in range_gist_same(), which was not using TrickFunctionCall2 but was nevertheless doing the same thing in spirit.

Good, thanks!

------
With best regards,
Alexander Korotkov.

Re: SP-GiST for ranges based on 2d-mapping and quad-tree

From

Heikki Linnakangas

Date:

20 July 2012, 08:49:27

On 13.07.2012 02:00, Alexander Korotkov wrote:
> Done. There are separate patch for get rid of TrickFunctionCall2 and
> version of SP-GiST for ranges based on that patch.

Looking at the SP-GiST patch now..

It would be nice to have an introduction, perhaps as a file comment at 
the top of rangetypes_spgist.c, explaining how the quad tree works. I 
have a general idea of what a quad tree is, but it's not immediately 
obvious how it maps to SP-GiST. What is stored on a leaf node and an 
internal node? What is the 'prefix' (seems to be the centroid)? How are 
ranges mapped to 2D points? (the function comment of getQuadrant() is a 
good start for that last one)

In spg_range_quad_inner_consistent(), if in->hasPrefix == true, ISTM 
that in all cases where 'empty' is true, 'which' is set to 0, meaning 
that there can be no matches in any of the quadrants. In most of the 
case-branches, you explicitly check for 'empty', but even in the ones 
where you don't, I think you end up setting which=0 if empty==true. I'm 
not 100% sure about the RANGESTRAT_ADJACENT case, though. Am I missing 
something?

It would be nice to avoid the code duplication between the new 
bounds_adjacent() function, and the range_adjacent_internal(). Perhaps 
move bounds_adjacent() to rangetypes.c and use it in 
range_adjacent_internal() too?

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

Re: SP-GiST for ranges based on 2d-mapping and quad-tree

From

Alexander Korotkov

Date:

23 July 2012, 04:37:59

On Fri, Jul 20, 2012 at 3:48 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote:

On 13.07.2012 02:00, Alexander Korotkov wrote:
Done. There are separate patch for get rid of TrickFunctionCall2 and
version of SP-GiST for ranges based on that patch.

Looking at the SP-GiST patch now..

It would be nice to have an introduction, perhaps as a file comment at the top of rangetypes_spgist.c, explaining how the quad tree works. I have a general idea of what a quad tree is, but it's not immediately obvious how it maps to SP-GiST. What is stored on a leaf node and an internal node? What is the 'prefix' (seems to be the centroid)? How are ranges mapped to 2D points? (the function comment of getQuadrant() is a good start for that last one)

I've added some comments at the top of rangetypes_spgist.c.

In spg_range_quad_inner_consistent(), if in->hasPrefix == true, ISTM that in all cases where 'empty' is true, 'which' is set to 0, meaning that there can be no matches in any of the quadrants. In most of the case-branches, you explicitly check for 'empty', but even in the ones where you don't, I think you end up setting which=0 if empty==true. I'm not 100% sure about the RANGESTRAT_ADJACENT case, though. Am I missing something?

Ops., it was a bug: RANGESTRAT_ADJACENT shoud set which=0 if empty==true, while RANGESTRAT_CONTAINS and RANGESTRAT_CONTAINED_BY not. Corrected.

It would be nice to avoid the code duplication between the new bounds_adjacent() function, and the range_adjacent_internal(). Perhaps move bounds_adjacent() to rangetypes.c and use it in range_adjacent_internal() too?

Done.

------
With best regards,
Alexander Korotkov.

Attachment

range_spgist_quad-0.6.patch

Re: SP-GiST for ranges based on 2d-mapping and quad-tree

From

Heikki Linnakangas

Date:

28 July 2012, 18:33:41

On 23.07.2012 10:37, Alexander Korotkov wrote:
> On Fri, Jul 20, 2012 at 3:48 PM, Heikki Linnakangas<
> heikki.linnakangas@enterprisedb.com>  wrote:
>
>> It would be nice to have an introduction, perhaps as a file comment at the
>> top of rangetypes_spgist.c, explaining how the quad tree works. I have a
>> general idea of what a quad tree is, but it's not immediately obvious how
>> it maps to SP-GiST. What is stored on a leaf node and an internal node?
>> What is the 'prefix' (seems to be the centroid)? How are ranges mapped to
>> 2D points? (the function comment of getQuadrant() is a good start for that
>> last one)
>
> I've added some comments at the top of rangetypes_spgist.c.

Thanks, much better.

I think the handling of empty ranges needs some further explanation. If
I understand the code correctly, the root node can contain a centroid
like usual, and empty ranges are placed in the magic 5th quadrant.
Alternatively, the root node has no centroid, and contains only two
subnodes: all empty ranges are placed under one subnode, and non-empty
ranges under the other.

It seems it would be simpler if we always stored empty nodes the latter
way, with a no-centroid root node, and nodes with a centroid would
always only have 4 subnodes. When the first empty range is added to an
index that already contains non-empty values, the choose-function could
return spgSplitTuple to create a new root node that divides the space
into empty and non-empty values. Then again, I don't think it would
actually simplify the code much. Handling the 5th quadrant doesn't
require much code in spg_range_quad_inner_consistent() as it is. So
maybe it's better to leave it the way it is.

Or perhaps we should stipulate that a root node with no centroid can
only contain empty ranges. When you add the first non-empty range, have
choose function return spgSplitTuple, and create a new root node with a
centroid, and the empty ranges in the 5th quadrant.

>> In spg_range_quad_inner_**consistent(), if in->hasPrefix == true, ISTM that
>> in all cases where 'empty' is true, 'which' is set to 0, meaning that there
>> can be no matches in any of the quadrants. In most of the case-branches,
>> you explicitly check for 'empty', but even in the ones where you don't, I
>> think you end up setting which=0 if empty==true. I'm not 100% sure about
>> the RANGESTRAT_ADJACENT case, though. Am I missing something?
>
> Ops., it was a bug: RANGESTRAT_ADJACENT shoud set which=0 if empty==true,
> while RANGESTRAT_CONTAINS and RANGESTRAT_CONTAINED_BY not. Corrected.

Ok. I did some copy-editing, rewording some comments, and fixing
whitespace. Patch attached, hope I didn't break anything.

I think the most difficult part of the patch is the
spg_range_quad_inner_consistent() function. It's hard to understand how
the various strategies are implemented. I started to expand the comments
in for each strategy, explaining how each strategy maps to a bounding
box in the 2D space, but I'm not sure how much that actually helps.
Perhaps it would help to also restructure the code so that you first
have a switch-case statement that maps the scan key to bounding box,
without worrying about the centroid yet, and then check which quadrants
you need to descend to to find points within the box. The
adjacent-strategy is more complicated than a simple bounding-box search,
though.

I'm having trouble understanding how exactly the RANGESTRAT_ADJACENT
works. The geometric interpretation of 'adjacent' is that the scan key
defines two lines, one horizontal and one vertical. Any points that lie
on either of the lines match the query. Looking at the implementation,
it's not clear how the code implements the search for along those two lines.

Also, I wonder if we really need to reconstruct the "previous" value in
a RANGESTRAT_ADJACENT search. ISTM we only need to remember which of the
two lines we are chasing. For example, if you descend to quadrant 2
because there might be a point there that lies on the horizontal line,
but we already know that there can't be any points there lie on the
vertical line, you only need to remember that, not the whole centroid
from the previous level. Does the SP-GiST API require the
"reconstructed" values stored by inner_consistent to be of the correct
datatype, or can it store any Datums in the array?

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Attachment

range_spgist_quad-0.6-heikki.patch

Re: SP-GiST for ranges based on 2d-mapping and quad-tree

From

Tom Lane

Date:

28 July 2012, 18:51:09

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> Also, I wonder if we really need to reconstruct the "previous" value in 
> a RANGESTRAT_ADJACENT search. ISTM we only need to remember which of the 
> two lines we are chasing. For example, if you descend to quadrant 2 
> because there might be a point there that lies on the horizontal line, 
> but we already know that there can't be any points there lie on the 
> vertical line, you only need to remember that, not the whole centroid 
> from the previous level. Does the SP-GiST API require the 
> "reconstructed" values stored by inner_consistent to be of the correct 
> datatype, or can it store any Datums in the array?

They have to match the attribute type, at least as to storage details
(typbyval/typlen), because the core uses datumCopy to copy them around.

We could possibly extend the API to allow a different type to be used
for this, but then it wouldn't be "reconstructed data" in any sense of
the word; so I think it'd be abuse of the concept --- which would come
back to bite us if we ever try to support index-only scans with SPGiST.
ISTM what this points up is that the opclass might want some private
state kept around during a tree descent.  If we want to support that,
we should support it as a separate concept from reconstructed data.
        regards, tom lane

Re: SP-GiST for ranges based on 2d-mapping and quad-tree

From

Heikki Linnakangas

Date:

28 July 2012, 19:11:17

On 29.07.2012 00:50, Tom Lane wrote:
> Heikki Linnakangas<heikki.linnakangas@enterprisedb.com>  writes:
>> Also, I wonder if we really need to reconstruct the "previous" value in
>> a RANGESTRAT_ADJACENT search. ISTM we only need to remember which of the
>> two lines we are chasing. For example, if you descend to quadrant 2
>> because there might be a point there that lies on the horizontal line,
>> but we already know that there can't be any points there lie on the
>> vertical line, you only need to remember that, not the whole centroid
>> from the previous level. Does the SP-GiST API require the
>> "reconstructed" values stored by inner_consistent to be of the correct
>> datatype, or can it store any Datums in the array?
>
> They have to match the attribute type, at least as to storage details
> (typbyval/typlen), because the core uses datumCopy to copy them around.
>
> We could possibly extend the API to allow a different type to be used
> for this, but then it wouldn't be "reconstructed data" in any sense of
> the word; so I think it'd be abuse of the concept --- which would come
> back to bite us if we ever try to support index-only scans with SPGiST.

I can see that for leaf nodes, but does that also hold for inner nodes?

> ISTM what this points up is that the opclass might want some private
> state kept around during a tree descent.  If we want to support that,
> we should support it as a separate concept from reconstructed data.

Yeah, that seems better. The representation of an inner node is 
datatype-specific, there should be no need to expose "reconstructed" 
inner node values outside a datatype's SP-GiST implementation.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

Re: SP-GiST for ranges based on 2d-mapping and quad-tree

From

Tom Lane

Date:

28 July 2012, 19:38:05

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> On 29.07.2012 00:50, Tom Lane wrote:
>> We could possibly extend the API to allow a different type to be used
>> for this, but then it wouldn't be "reconstructed data" in any sense of
>> the word; so I think it'd be abuse of the concept --- which would come
>> back to bite us if we ever try to support index-only scans with SPGiST.

> I can see that for leaf nodes, but does that also hold for inner nodes?

I didn't explain myself terribly well, probably.  Consider an opclass
that wants some private state like this and *also* needs to reconstruct
column data.

In principle I suppose we could do away with the reconstructed-data
support altogether, and consider that if you need that then it is just a
portion of the unspecified private state the opclass is holding.  But
it's probably a bit late to remove bits of the opclass API.
        regards, tom lane

Re: SP-GiST for ranges based on 2d-mapping and quad-tree

From

Heikki Linnakangas

Date:

06 August 2012, 11:49:54

Just to check where we stand on this: Are you going to send a finalized 
version of this patch, based on the one I sent earlier, or should I pick 
up that version and try to get it into committable state?

On 23.07.2012 10:37, Alexander Korotkov wrote:
> On Fri, Jul 20, 2012 at 3:48 PM, Heikki Linnakangas<
> heikki.linnakangas@enterprisedb.com>  wrote:
>
>> On 13.07.2012 02:00, Alexander Korotkov wrote:
>>
>>> Done. There are separate patch for get rid of TrickFunctionCall2 and
>>> version of SP-GiST for ranges based on that patch.
>>>
>>
>> Looking at the SP-GiST patch now..
>>
>> It would be nice to have an introduction, perhaps as a file comment at the
>> top of rangetypes_spgist.c, explaining how the quad tree works. I have a
>> general idea of what a quad tree is, but it's not immediately obvious how
>> it maps to SP-GiST. What is stored on a leaf node and an internal node?
>> What is the 'prefix' (seems to be the centroid)? How are ranges mapped to
>> 2D points? (the function comment of getQuadrant() is a good start for that
>> last one)
>>
>
> I've added some comments at the top of rangetypes_spgist.c.
>
> In spg_range_quad_inner_**consistent(), if in->hasPrefix == true, ISTM that
>> in all cases where 'empty' is true, 'which' is set to 0, meaning that there
>> can be no matches in any of the quadrants. In most of the case-branches,
>> you explicitly check for 'empty', but even in the ones where you don't, I
>> think you end up setting which=0 if empty==true. I'm not 100% sure about
>> the RANGESTRAT_ADJACENT case, though. Am I missing something?
>>
>
> Ops., it was a bug: RANGESTRAT_ADJACENT shoud set which=0 if empty==true,
> while RANGESTRAT_CONTAINS and RANGESTRAT_CONTAINED_BY not. Corrected.
>
> It would be nice to avoid the code duplication between the new
>> bounds_adjacent() function, and the range_adjacent_internal(). Perhaps move
>> bounds_adjacent() to rangetypes.c and use it in range_adjacent_internal()
>> too?
>
>
> Done.
>
> ------
> With best regards,
> Alexander Korotkov.
>


--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

Re: SP-GiST for ranges based on 2d-mapping and quad-tree

From

Alexander Korotkov

Date:

09 August 2012, 12:43:20

In this revision of patch I tried to handle conditions more generally using variables minLower, maxLower, minUpper, maxUpper, inclusive and strictEmpty. However some strategies still contain additional logic.

What is our conclusion about saving previous choice for RANGESTRAT_ADJACENT strategy?

------
With best regards,
Alexander Korotkov.

Attachment

range_spgist_quad-0.7.patch.gz

Re: SP-GiST for ranges based on 2d-mapping and quad-tree

From

Heikki Linnakangas

Date:

16 August 2012, 11:46:44

On 09.08.2012 18:42, Alexander Korotkov wrote:
> In this revision of patch I tried to handle conditions more generally using
> variables minLower, maxLower, minUpper, maxUpper, inclusive and
> strictEmpty. However some strategies still contain additional logic.

Thanks, that clarified the code tremendously. The comments I added about 
the geometrical interpretations of the operations earlier seem 
unnecessary now, so removed those.

> What is our conclusion about saving previous choice for RANGESTRAT_ADJACENT
> strategy?

I think we're going to do what you did in the patch. A more generic 
mechanism for holding private state across consistent calls would be 
nice, but it's not that ugly the way you wrote it.

I committed the patch now, but left out the support for adjacent for 
now. Not because there was necessarily anything wrong with that, but 
because I have limited time for reviewing, and the rest of the patch 
looks ready for commit now. I reworded the comments quite a lot, you 
might want to proofread those to double-check that they're still 
correct. I'll take a look at the adjacent-support next, as a separate patch.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

Re: SP-GiST for ranges based on 2d-mapping and quad-tree

From

Alexander Korotkov

Date:

18 August 2012, 14:11:16

On Thu, Aug 16, 2012 at 3:46 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote:

I committed the patch now, but left out the support for adjacent for now. Not because there was necessarily anything wrong with that, but because I have limited time for reviewing, and the rest of the patch looks ready for commit now. I reworded the comments quite a lot, you might want to proofread those to double-check that they're still correct. I'll take a look at the adjacent-support next, as a separate patch.

Thanks! There is a separate patch for adjacent. I've reworked adjacent check in order to make it more clear.

------
With best regards,
Alexander Korotkov.

Attachment

range_spgist_adjacent-0.1.patch

Re: SP-GiST for ranges based on 2d-mapping and quad-tree

From

Jeff Davis

Date:

19 August 2012, 20:25:15

On Sat, 2012-08-18 at 18:10 +0400, Alexander Korotkov wrote:
> On Thu, Aug 16, 2012 at 3:46 PM, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com> wrote:
>         I committed the patch now, but left out the support for
>         adjacent for now. Not because there was necessarily anything
>         wrong with that, but because I have limited time for
>         reviewing, and the rest of the patch looks ready for commit
>         now. I reworded the comments quite a lot, you might want to
>         proofread those to double-check that they're still correct.
>         I'll take a look at the adjacent-support next, as a separate
>         patch.
> 
> 
> Thanks! There is a separate patch for adjacent. I've reworked adjacent
> check in order to make it more clear.

I am taking a look at this patch now. A few quick comments:

* It looks like bounds_adjacent modifies it's by-reference arguments,
which is a little worrying to me. The lower/upper labels are flipped
back, but the inclusivities are not. Maybe just pass by value instead?

* Bounds_adjacent is sensitive to the argument order. Can't it just take
bound1 and bound2?

* I tried some larger tests and they seemed to work. I haven't reviewed
the spgist code changes in detail though.

Regards,Jeff Davis

Re: SP-GiST for ranges based on 2d-mapping and quad-tree

From

Jeff Davis

Date:

19 August 2012, 20:51:40

On Sat, 2012-07-28 at 17:50 -0400, Tom Lane wrote:
> which would come
> back to bite us if we ever try to support index-only scans with SPGiST.

I'm confused:

http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=92203624934095163f8b57b5b3d7bbd2645da2c8

And the patch that was just committed for Range Types SP-GiST is already
using index-only scans.

Regards,Jeff Davis

Re: SP-GiST for ranges based on 2d-mapping and quad-tree

From

Tom Lane

Date:

20 August 2012, 15:32:39

Jeff Davis <pgsql@j-davis.com> writes:
> On Sat, 2012-07-28 at 17:50 -0400, Tom Lane wrote:
>> which would come
>> back to bite us if we ever try to support index-only scans with SPGiST.

> I'm confused:
> http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=92203624934095163f8b57b5b3d7bbd2645da2c8

Sorry, I was being imprecise there.  What I meant was that an opclass
that abused the reconstructed-value storage for something else might
have problems supporting index-only scans.

If we think opclasses might need private storage for index searches, we
should add that as a new part of the API, not tell them to misuse this
part.
        regards, tom lane

Re: SP-GiST for ranges based on 2d-mapping and quad-tree

From

Alexander Korotkov

Date:

04 September 2012, 13:45:48

On Mon, Aug 20, 2012 at 12:25 AM, Jeff Davis <pgsql@j-davis.com> wrote:

I am taking a look at this patch now. A few quick comments:

* It looks like bounds_adjacent modifies it's by-reference arguments,
which is a little worrying to me. The lower/upper labels are flipped
back, but the inclusivities are not. Maybe just pass by value instead?

* Bounds_adjacent is sensitive to the argument order. Can't it just take
bound1 and bound2?

Fixed. Patch is attached.

------
With best regards,
Alexander Korotkov.

Attachment

range_spgist_adjacent-0.2.patch.gz

Re: SP-GiST for ranges based on 2d-mapping and quad-tree

From

Alvaro Herrera

Date:

18 October 2012, 18:15:53

Jeff Davis escribió:
> On Sat, 2012-08-18 at 18:10 +0400, Alexander Korotkov wrote:
> >
> > Thanks! There is a separate patch for adjacent. I've reworked adjacent
> > check in order to make it more clear.
>
> I am taking a look at this patch now. A few quick comments:

> * I tried some larger tests and they seemed to work. I haven't reviewed
> the spgist code changes in detail though.

Jeff, Heikki,

Any input on the subsequent version of this patch?

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: SP-GiST for ranges based on 2d-mapping and quad-tree

From

Jeff Davis

Date:

21 October 2012, 07:22:34

On Tue, 2012-09-04 at 17:45 +0400, Alexander Korotkov wrote:
> On Mon, Aug 20, 2012 at 12:25 AM, Jeff Davis <pgsql@j-davis.com>
> wrote:
>         I am taking a look at this patch now. A few quick comments:
>         
>         * It looks like bounds_adjacent modifies it's by-reference
>         arguments,
>         which is a little worrying to me. The lower/upper labels are
>         flipped
>         back, but the inclusivities are not. Maybe just pass by value
>         instead?
>         
>         * Bounds_adjacent is sensitive to the argument order. Can't it
>         just take
>         bound1 and bound2?
> 
> 
> Fixed. Patch is attached.
> 
It looks like this is basically the same diff as v0.1. Did something get
mixed up?

Regards,Jeff Davis

Re: SP-GiST for ranges based on 2d-mapping and quad-tree

From

Alexander Korotkov

Date:

02 November 2012, 08:47:35

On Sun, Oct 21, 2012 at 11:22 AM, Jeff Davis <pgsql@j-davis.com> wrote:

On Tue, 2012-09-04 at 17:45 +0400, Alexander Korotkov wrote:
> On Mon, Aug 20, 2012 at 12:25 AM, Jeff Davis <pgsql@j-davis.com>
> wrote:
> I am taking a look at this patch now. A few quick comments:
>
> * It looks like bounds_adjacent modifies it's by-reference
> arguments,
> which is a little worrying to me. The lower/upper labels are
> flipped
> back, but the inclusivities are not. Maybe just pass by value
> instead?
>
> * Bounds_adjacent is sensitive to the argument order. Can't it
> just take
> bound1 and bound2?
>
>
> Fixed. Patch is attached.
>
It looks like this is basically the same diff as v0.1. Did something get
mixed up?

Right version of patch is attached.

------
With best regards,
Alexander Korotkov.

Attachment

range_spgist_adjacent-0.3.patch.gz

Re: SP-GiST for ranges based on 2d-mapping and quad-tree

From

Jeff Davis

Date:

04 November 2012, 19:41:55

On Fri, 2012-11-02 at 12:47 +0400, Alexander Korotkov wrote:

> Right version of patch is attached.
> 
* In bounds_adjacent, there's no reason to flip the labels back.
* Comment should indicate more explicitly that bounds_adjacent is
sensitive to the argument order.
* In bounds_adjacent, it appears that "bound1" corresponds to "B" in the
comment above, and "bound2" corresponds to "A" in the comment above. I
would have guessed from reading the comment that bound1 corresponded to
A. We should just use consistent names between the comment and the code
(e.g. boundA and boundB).
* I could be getting confused, but I think that line 645 of
rangetypes_spgist.c should be inverted (!bounds_adjacent(...)).
* I think needPrevious should have an explanatory comment somewhere. It
looks like you are using it to store some state as you descend the tree,
but it doesn't look like it's used to reconstruct the value (because we
already have the value anyway). Since it's being used for a purpose
other than what's intended, that should be explained.

Regards,Jeff Davis

Re: SP-GiST for ranges based on 2d-mapping and quad-tree

From

Andres Freund

Date:

08 December 2012, 15:12:36

Hi Alexander,

On 2012-11-04 11:41:48 -0800, Jeff Davis wrote:
> On Fri, 2012-11-02 at 12:47 +0400, Alexander Korotkov wrote:
>
> > Right version of patch is attached.
> >
> * In bounds_adjacent, there's no reason to flip the labels back.
> * Comment should indicate more explicitly that bounds_adjacent is
> sensitive to the argument order.
> * In bounds_adjacent, it appears that "bound1" corresponds to "B" in the
> comment above, and "bound2" corresponds to "A" in the comment above. I
> would have guessed from reading the comment that bound1 corresponded to
> A. We should just use consistent names between the comment and the code
> (e.g. boundA and boundB).
> * I could be getting confused, but I think that line 645 of
> rangetypes_spgist.c should be inverted (!bounds_adjacent(...)).
> * I think needPrevious should have an explanatory comment somewhere. It
> looks like you are using it to store some state as you descend the tree,
> but it doesn't look like it's used to reconstruct the value (because we
> already have the value anyway). Since it's being used for a purpose
> other than what's intended, that should be explained.

Do you have time to address these comments or should the patch marked
as returned with Feedback? Afaics there hasn't been progress since
CF2...

Greetings,

Andres Freund

--Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: SP-GiST for ranges based on 2d-mapping and quad-tree

From

Alexander Korotkov

Date:

13 December 2012, 21:31:31

Hi!

On Sun, Nov 4, 2012 at 11:41 PM, Jeff Davis <pgsql@j-davis.com> wrote:

On Fri, 2012-11-02 at 12:47 +0400, Alexander Korotkov wrote:

> Right version of patch is attached.
>
* In bounds_adjacent, there's no reason to flip the labels back.

Fixed.

* Comment should indicate more explicitly that bounds_adjacent is
sensitive to the argument order.

Fixed.

* In bounds_adjacent, it appears that "bound1" corresponds to "B" in the
comment above, and "bound2" corresponds to "A" in the comment above. I
would have guessed from reading the comment that bound1 corresponded to
A. We should just use consistent names between the comment and the code
(e.g. boundA and boundB).

Fixed.

* I could be getting confused, but I think that line 645 of
rangetypes_spgist.c should be inverted (!bounds_adjacent(...)).

Good catch! Actually, code was in direct contradiction with comment. Fixed.

* I think needPrevious should have an explanatory comment somewhere. It
looks like you are using it to store some state as you descend the tree,
but it doesn't look like it's used to reconstruct the value (because we
already have the value anyway). Since it's being used for a purpose
other than what's intended, that should be explained.

Yes, it's a some kind of hack now. Comment is added.

------
With best regards,
Alexander Korotkov.

Attachment

range_spgist_adjacent-0.4.patch.gz

Re: SP-GiST for ranges based on 2d-mapping and quad-tree

From

Jeff Davis

Date:

17 December 2012, 10:42:35

On Fri, 2012-12-14 at 01:31 +0400, Alexander Korotkov wrote:
> Hi!
>
Hi!

I have attached a patch with some significant edits.

* In your patch, there was still an inconsistency between the comment
for bounds_adjacent and the code. I refactored it to ensure it always
takes the upper bound as boundA, and the lower bound as boundB, so that
it can invert the inclusivities to create A..B to match the comments.

* In the consistent method, you were inverting upper to be a lower bound
and lower to be an upper bound. I don't understand why (perhaps I did
the first time I read the patch), so I removed it.

* It looks like the comments for which1/2 were inconsistent with the
code. I tried to fix that.

* I significantly refactored the comments and code for the consistent
method. I had trouble understanding the original comments, particularly
around the edge cases.

Please take a look and see if it still matches your algorithm properly.
This patch is not intended to be a final version (I didn't analyze my
code carefully), but just to show you what I mean and how I interpret
what the code is trying to do. You don't need to use my refactorings,
but if not, the comments in your version need more improvement so I can
understand.

I found it easier to reason in terms of horizontal and vertical lines,
and which quadrants they crossed, and then work out the edge cases. I'm
not sure if that reasoning was correct, but it seemed to make more sense
to me.

Regards,
    Jeff Davis

Attachment

range_spgist_adjacent-0.4A.patch.gz

Re: SP-GiST for ranges based on 2d-mapping and quad-tree

From

Alexander Korotkov

Date:

09 February 2013, 09:36:30

Hi!

On Mon, Dec 17, 2012 at 2:42 PM, Jeff Davis <pgsql@j-davis.com> wrote:

I have attached a patch with some significant edits.

* In your patch, there was still an inconsistency between the comment
for bounds_adjacent and the code. I refactored it to ensure it always
takes the upper bound as boundA, and the lower bound as boundB, so that
it can invert the inclusivities to create A..B to match the comments.

* In the consistent method, you were inverting upper to be a lower bound
and lower to be an upper bound. I don't understand why (perhaps I did
the first time I read the patch), so I removed it.

* It looks like the comments for which1/2 were inconsistent with the
code. I tried to fix that.

* I significantly refactored the comments and code for the consistent
method. I had trouble understanding the original comments, particularly
around the edge cases.

Please take a look and see if it still matches your algorithm properly.
This patch is not intended to be a final version (I didn't analyze my
code carefully), but just to show you what I mean and how I interpret
what the code is trying to do. You don't need to use my refactorings,
but if not, the comments in your version need more improvement so I can
understand.

I found it easier to reason in terms of horizontal and vertical lines,
and which quadrants they crossed, and then work out the edge cases. I'm
not sure if that reasoning was correct, but it seemed to make more sense
to me.

Your comments and refactoring looks good for me.

------
With best regards,
Alexander Korotkov.

Re: SP-GiST for ranges based on 2d-mapping and quad-tree

From

Craig Ringer

Date:

03 March 2013, 14:38:13

On 02/09/2013 05:36 PM, Alexander Korotkov wrote:

Your comments and refactoring looks good for me.

This patch is currently flagged as waiting for author. Have you had a chance to test and examine Jeff's changes in more detail? Would you consider giving us a summary of the status of this work - do you think you can have it production-ready within a week or two, with all necessary tests and documentation.

Note that Jeff wrote:


Please take a look and see if it still matches your algorithm properly.
This patch is not intended to be a final version (I didn't analyze my
code carefully), but just to show you what I mean and how I interpret
what the code is trying to do.

... so it's clear that you need to read the patch in detail and make any desired edits, re-test, and go from there.

I see a couple of changes to the regression tests. Do you need any documentation changes for this too? Any user-visible effects other than performance?

--  Craig Ringer                   http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services

Re: SP-GiST for ranges based on 2d-mapping and quad-tree

From

Alexander Korotkov

Date:

03 March 2013, 17:42:48

On Sun, Mar 3, 2013 at 6:37 PM, Craig Ringer <craig@2ndquadrant.com> wrote:

On 02/09/2013 05:36 PM, Alexander Korotkov wrote:

Your comments and refactoring looks good for me.
This patch is currently flagged as waiting for author. Have you had a chance to test and examine Jeff's changes in more detail? Would you consider giving us a summary of the status of this work - do you think you can have it production-ready within a week or two, with all necessary tests and documentation.
Note that Jeff wrote:
Please take a look and see if it still matches your algorithm properly.
This patch is not intended to be a final version (I didn't analyze my
code carefully), but just to show you what I mean and how I interpret
what the code is trying to do. 
... so it's clear that you need to read the patch in detail and make any desired edits, re-test, and go from there.

I see a couple of changes to the regression tests. Do you need any documentation changes for this too? Any user-visible effects other than performance?

This patch only adds one more operator to already committed new opclass. Tests already cover this case. Without patch corresponding test leads to sequential scan instead of index scan. However, I can't see any documentation changes about already committed opclass. I think we need a documentation update as an separate patch.

Jeff changes only does comments changes and renaming, I don't need to examine them more.

------

With best regards,
Alexander Korotkov.

Re: SP-GiST for ranges based on 2d-mapping and quad-tree

From

Craig Ringer

Date:

04 March 2013, 02:53:38

On 03/04/2013 01:42 AM, Alexander Korotkov wrote:
>
> This patch only adds one more operator to already committed new
> opclass. Tests already cover this case. Without patch corresponding
> test leads to sequential scan instead of index scan. However, I can't
> see any documentation changes about already committed opclass. I think
> we need a documentation update as an separate patch.
> Jeff changes only does comments changes and renaming, I don't need to
> examine them more.

OK. Would it be fair to summarize that as "I think it's ready to commit
as-is" ? Do you have any outstanding concerns or uncertainties?

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: SP-GiST for ranges based on 2d-mapping and quad-tree

From

Alexander Korotkov

Date:

04 March 2013, 05:16:36

On Mon, Mar 4, 2013 at 6:53 AM, Craig Ringer <craig@2ndquadrant.com> wrote:

On 03/04/2013 01:42 AM, Alexander Korotkov wrote:
>
> This patch only adds one more operator to already committed new
> opclass. Tests already cover this case. Without patch corresponding
> test leads to sequential scan instead of index scan. However, I can't
> see any documentation changes about already committed opclass. I think
> we need a documentation update as an separate patch.
> Jeff changes only does comments changes and renaming, I don't need to
> examine them more.

OK. Would it be fair to summarize that as "I think it's ready to commit
as-is" ? Do you have any outstanding concerns or uncertainties?

Yes. I think it's ready to commit as-is.

------
With best regards,
Alexander Korotkov.

Re: SP-GiST for ranges based on 2d-mapping and quad-tree

From

Heikki Linnakangas

Date:

08 March 2013, 13:08:14

On 03.03.2013 19:42, Alexander Korotkov wrote:
> This patch only adds one more operator to already committed new opclass.
> Tests already cover this case. Without patch corresponding test leads to
> sequential scan instead of index scan.

Thanks, committed with some trivial cleanup.

> However, I can't see any
> documentation changes about already committed opclass. I think we need a
> documentation update as an separate patch.

Agreed. I think a small mention in the Range Types - Indexing section 
(http://www.postgresql.org/docs/devel/static/rangetypes.html#RANGETYPES-GIST) 
will do.

- Heikki