Thread: SP-GiST for ranges based on 2d-mapping and quad-tree
Hackers,
attached patch implements quad-tree on ranges. Some performance results in comparison with current GiST indexing.
Index creation is slightly slower. Probably, it need some investigation. Search queries on SP-GiST use much more pages. However this comparison can be not really correct, because SP-GiST can pin same buffer several times during one scan. In CPU search queries on SP-GiST seems to be slightly faster. Dramatical difference in "column <@ const" query is thanks to 2d-mapping.
test=# create index period_idx on period_test using gist (period);
CREATE INDEX
Time: 49141,148 ms
test=# create index period_idx2 on period_test2 using spgist (period);
CREATE INDEX
Time: 59503,215 ms
test=# explain (analyze, buffers) select * from period_test where period && daterange('2011-12-10', '2011-12-11');
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on period_test (cost=471.63..13716.45 rows=11866 width=32) (actual time=107.258..147.139 rows=365451 loops=1)
Recheck Cond: (period && '[2011-12-10,2011-12-11)'::daterange)
Buffers: shared read=7636
-> Bitmap Index Scan on period_idx (cost=0.00..468.67 rows=11866 width=0) (actual time=106.697..106.697 rows=365451 loops=1)
Index Cond: (period && '[2011-12-10,2011-12-11)'::daterange)
Buffers: shared read=3694
Total runtime: 160.670 ms
(7 rows)
test=# explain (analyze, buffers) select * from period_test2 where period && daterange('2011-12-10', '2011-12-11');
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on period_test2 (cost=414.52..13659.34 rows=11866 width=32) (actual time=88.793..129.608 rows=365451 loops=1)
Recheck Cond: (period && '[2011-12-10,2011-12-11)'::daterange)
Buffers: shared hit=7623 read=7687
-> Bitmap Index Scan on period_idx2 (cost=0.00..411.55 rows=11866 width=0) (actual time=88.285..88.285 rows=365451 loops=1)
Index Cond: (period && '[2011-12-10,2011-12-11)'::daterange)
Buffers: shared hit=7623 read=3745
Total runtime: 142.971 ms
(7 rows)
test=# explain (analyze, buffers) select * from period_test where period <@ daterange('2011-12-10', '2011-12-11');
QUERY PLAN
---------------------------------------------------------------------------------------------------
-----------------------
Bitmap Heap Scan on period_test (cost=102.06..6140.66 rows=2373 width=32) (actual time=85.437..85.437 rows=0 loops=1)
Recheck Cond: (period <@ '[2011-12-10,2011-12-11)'::daterange)
Buffers: shared read=3694
-> Bitmap Index Scan on period_idx (cost=0.00..101.47 rows=2373 width=0) (actual time=85.431..85.431 rows=0 loops=1)
Index Cond: (period <@ '[2011-12-10,2011-12-11)'::daterange)
Buffers: shared read=3694
Total runtime: 85.493 ms
(7 rows)
test=# explain (analyze, buffers) select * from period_test2 where period <@ daterange('2011-12-10', '2011-12-11');
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on period_test2 (cost=88.95..6127.54 rows=2373 width=32) (actual time=18.666..18.666 rows=0 loops=1)
Recheck Cond: (period <@ '[2011-12-10,2011-12-11)'::daterange)
Buffers: shared hit=1404
-> Bitmap Index Scan on period_idx2 (cost=0.00..88.35 rows=2373 width=0) (actual time=18.660..18.660 rows=0 loops=1)
Index Cond: (period <@ '[2011-12-10,2011-12-11)'::daterange)
Buffers: shared hit=1404
Total runtime: 18.717 ms
(7 rows)
test=# explain (analyze, buffers) select * from period_test where period @> daterange('2011-08-10', '2011-12-31');
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on period_test (cost=102.06..6140.66 rows=2373 width=32) (actual time=56.114..73.785 rows=118097 loops=1)
Recheck Cond: (period @> '[2011-08-10,2011-12-31)'::daterange)
Buffers: shared read=4125
-> Bitmap Index Scan on period_idx (cost=0.00..101.47 rows=2373 width=0) (actual time=55.740..55.740 rows=118097 loops=1)
Index Cond: (period @> '[2011-08-10,2011-12-31)'::daterange)
Buffers: shared read=1391
Total runtime: 78.469 ms
(7 rows)
test=# explain (analyze, buffers) select * from period_test2 where period @> daterange('2011-08-10', '2011-12-31');
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on period_test2 (cost=88.95..6127.54 rows=2373 width=32) (actual time=31.653..49
.751 rows=118097 loops=1)
Recheck Cond: (period @> '[2011-08-10,2011-12-31)'::daterange)
Buffers: shared hit=3093 read=3768
-> Bitmap Index Scan on period_idx2 (cost=0.00..88.35 rows=2373 width=0) (actual time=31.282..31.282 rows=118097 loops=1)
Index Cond: (period @> '[2011-08-10,2011-12-31)'::daterange)
Buffers: shared hit=3093 read=1034
Total runtime: 54.288 ms
(7 rows)
With best regards,
Alexander Korotkov.
Attachment
On 14.06.2012 01:56, Alexander Korotkov wrote: > Hackers, > > attached patch implements quad-tree on ranges. Some performance results in > comparison with current GiST indexing. > @@ -788,7 +774,7 @@ range_super_union(TypeCacheEntry *typcache, RangeType * r1, R > angeType * r2) > * part of the relcache entry for the index, typically) this essentially > * eliminates lookup overhead during operations on a GiST range index. > */ > -static Datum > +Datum > TrickFunctionCall2(PGFunction proc, FmgrInfo *flinfo, Datum arg1, Datum arg2) > { > FunctionCallInfoData fcinfo; I don't think we want to expose TrickFunctionCall2(). Not with that name, anyway. Perhaps we should refactor the functions called this way, range_adjacent, range_overlaps etc., to have internal counterparts that can be called without FunctionCall(). Like: > *************** > *** 692,697 **** > --- 692,708 ---- > { > RangeType *r1 = PG_GETARG_RANGE(0); > RangeType *r2 = PG_GETARG_RANGE(1); > + > + typcache = range_get_typcache(fcinfo, RangeTypeGetOid(r1)); > + > + PG_RETURN_BOOL(range_adjacent_internal(r1, r2, typcache); > + } > + > + bool > + range_adjacent_internal(RangeType r1, RangeType r2, TypeCacheEntry *typcache) > + { > + RangeType *r1 = PG_GETARG_RANGE(0); > + RangeType *r2 = PG_GETARG_RANGE(1); > TypeCacheEntry *typcache; > RangeBound lower1, > lower2; The gist and SP-gist consistent functions could call the internal function directly. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Thu, Jun 21, 2012 at 11:12 AM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote:
I like idea of replacing TrickFunctionCall2 with internal function. Do you think I should post a separate patch for existing GiST code?
------
With best regards,
Alexander Korotkov.
On 14.06.2012 01:56, Alexander Korotkov wrote:Hackers,
attached patch implements quad-tree on ranges. Some performance results in
comparison with current GiST indexing.@@ -788,7 +774,7 @@ range_super_union(TypeCacheEntry *typcache, RangeType * r1, R
angeType * r2)
* part of the relcache entry for the index, typically) this essentially
* eliminates lookup overhead during operations on a GiST range index.
*/
-static Datum
+Datum
TrickFunctionCall2(PGFunction proc, FmgrInfo *flinfo, Datum arg1, Datum arg2)
{
FunctionCallInfoData fcinfo;
I don't think we want to expose TrickFunctionCall2(). Not with that name, anyway. Perhaps we should refactor the functions called this way, range_adjacent, range_overlaps etc., to have internal counterparts that can be called without FunctionCall(). Like:***************
*** 692,697 ****
--- 692,708 ----
{
RangeType *r1 = PG_GETARG_RANGE(0);
RangeType *r2 = PG_GETARG_RANGE(1);
+
+ typcache = range_get_typcache(fcinfo, RangeTypeGetOid(r1));
+
+ PG_RETURN_BOOL(range_adjacent_internal(r1, r2, typcache);
+ }
+
+ bool
+ range_adjacent_internal(RangeType r1, RangeType r2, TypeCacheEntry *typcache)
+ {
+ RangeType *r1 = PG_GETARG_RANGE(0);
+ RangeType *r2 = PG_GETARG_RANGE(1);
TypeCacheEntry *typcache;
RangeBound lower1,
lower2;
The gist and SP-gist consistent functions could call the internal function directly.
With best regards,
Alexander Korotkov.
On Thu, Jun 14, 2012 at 2:56 AM, Alexander Korotkov <aekorotkov@gmail.com> wrote:
------
With best regards,
Alexander Korotkov.
attached patch implements quad-tree on ranges. Some performance results in comparison with current GiST indexing.Index creation is slightly slower. Probably, it need some investigation. Search queries on SP-GiST use much more pages. However this comparison can be not really correct, because SP-GiST can pin same buffer several times during one scan. In CPU search queries on SP-GiST seems to be slightly faster. Dramatical difference in "column <@ const" query is thanks to 2d-mapping.
Patch with another SP-GiST implementation for ranges is attached. It uses k-d tree instead of quad-tree. I would like to leave only one implementation of SP-GiST for ranges. I'm going to do as comprehensive testing as I can for it.
With best regards,
Alexander Korotkov.
Attachment
Alexander Korotkov <aekorotkov@gmail.com> writes: > On Thu, Jun 21, 2012 at 11:12 AM, Heikki Linnakangas < > heikki.linnakangas@enterprisedb.com> wrote: >> I don't think we want to expose TrickFunctionCall2(). Not with that name, >> anyway. Perhaps we should refactor the functions called this way, >> range_adjacent, range_overlaps etc., to have internal counterparts that can >> be called without FunctionCall(). Like: > I like idea of replacing TrickFunctionCall2 with internal function. Do you > think I should post a separate patch for existing GiST code? +1 ... that was a pretty grotty hack, so let's get rid of it if we can. It's still going to require some documentation though I think. regards, tom lane
On Thu, 2012-06-14 at 02:56 +0400, Alexander Korotkov wrote: > Hackers, > > > attached patch implements quad-tree on ranges. Some performance > results in comparison with current GiST indexing. > Index creation is slightly slower. Probably, it need some > investigation. Search queries on SP-GiST use much more pages. However > this comparison can be not really correct, because SP-GiST can pin > same buffer several times during one scan. In CPU search queries on > SP-GiST seems to be slightly faster. Dramatical difference in "column > <@ const" query is thanks to 2d-mapping. > Looking at this patch now. I see that it fails the opr_sanity test (on master), can you look into that? It looks very promising from a performance standpoint. I think the "col <@ const" query will be a common query; and I also think that pattern will be useful to restrict a large table down to something more manageable. In the bounds_connected function, it might make more sense to use the word "adjacent" which I already used for ordinary ranges, rather than using the new word "connected". Also, I'm getting a little confused switching between thinking in terms of "X and Y" and "lower and upper" (particularly since lower and upper can be confused with > or <). I don't have a suggestion yet how to clarify that, but it might be good to use the spatial terminology in more places and avoid lower/upper except where needed. Please excuse the slow review, I'm catching up on the SP-GiST API. Regards,Jeff Davis
On Thu, 2012-06-14 at 02:56 +0400, Alexander Korotkov wrote: > Hackers, > > > attached patch implements quad-tree on ranges. Some performance > results in comparison with current GiST indexing. > Index creation is slightly slower. Probably, it need some > investigation. Search queries on SP-GiST use much more pages. However > this comparison can be not really correct, because SP-GiST can pin > same buffer several times during one scan. In CPU search queries on > SP-GiST seems to be slightly faster. Dramatical difference in "column > <@ const" query is thanks to 2d-mapping. > More comments: * Minor rebase is required (simple int2 -> int16). * Perhaps I'm mistaken, but the following code in getQuadrant() looks wrong to me, shouldn't the 1 and 2 be reversed? if (range_cmp_bounds(typcache, &upper, ¢roidUpper) >= 0) return 1; else return 2; * in the "choose" method, why does in->allTheSame unconditionally match? Why not split? Similarly, why does inner_consistent always match when the nodes are allTheSame? * It's a little confusing having empty prefixes mean that empty range go to node0, and non-empty ranges meaning that empty ranges go to node4 (quadrant 5). Why can't there just always be 5 nodes, and iff all the ranges are empty, then the prefix is NULL? And for that matter, let's let the quadrant equal the node number, and have the empty ranges in node0. I don't see much point in always subtracting 1 from the quadrant number. Regards,Jeff Davis
On Mon, 2012-07-02 at 23:47 -0700, Jeff Davis wrote: > On Thu, 2012-06-14 at 02:56 +0400, Alexander Korotkov wrote: > > Hackers, > > > > > > attached patch implements quad-tree on ranges. Some performance > > results in comparison with current GiST indexing. > > Index creation is slightly slower. Probably, it need some > > investigation. Search queries on SP-GiST use much more pages. However > > this comparison can be not really correct, because SP-GiST can pin > > same buffer several times during one scan. In CPU search queries on > > SP-GiST seems to be slightly faster. Dramatical difference in "column > > <@ const" query is thanks to 2d-mapping. > > Also, it would be helpful to add a couple tests to rangetypes.sql. Regards,Jeff Davis
On Mon, 2012-07-02 at 23:47 -0700, Jeff Davis wrote: > * Perhaps I'm mistaken, but the following code in getQuadrant() looks > wrong to me, shouldn't the 1 and 2 be reversed? > > if (range_cmp_bounds(typcache, &upper, ¢roidUpper) >= 0) > return 1; > else > return 2; Oops, looks like I was mistaken. The code looks fine to me now. Regards,Jeff Davis
On Tue, Jul 3, 2012 at 10:51 AM, Jeff Davis <pgsql@j-davis.com> wrote:
------
With best regards,
On Mon, 2012-07-02 at 23:47 -0700, Jeff Davis wrote:Also, it would be helpful to add a couple tests to rangetypes.sql.
> On Thu, 2012-06-14 at 02:56 +0400, Alexander Korotkov wrote:
> > Hackers,
> >
> >
> > attached patch implements quad-tree on ranges. Some performance
> > results in comparison with current GiST indexing.
> > Index creation is slightly slower. Probably, it need some
> > investigation. Search queries on SP-GiST use much more pages. However
> > this comparison can be not really correct, because SP-GiST can pin
> > same buffer several times during one scan. In CPU search queries on
> > SP-GiST seems to be slightly faster. Dramatical difference in "column
> > <@ const" query is thanks to 2d-mapping.
> >
Thank you for review! Now I'm working on detailed performance benchmarks for different opclasses. I hope to finish it in this week. Then we would see which opclasses we're really need and nail down issues you've pointed.
With best regards,
Alexander Korotkov.
On Tue, Jul 3, 2012 at 10:51 AM, Jeff Davis <pgsql@j-davis.com> wrote:
------
With best regards,
Alexander Korotkov.
Also, it would be helpful to add a couple tests to rangetypes.sql.
New version of patch is attached.
1) Idea of having different node numbers is that nodes takes some space (check spgFormInnerTuple). I've rethink this idea a little because adding of node without label just didn't work :). Only root inner index tuple have 5 nodes, others have 4. Thereby all empty ranges are branched already at root inner index tuple.
2) Empty prefix datum replaced with absence of prefix datum.
3) int2 replaced with int16.
4) I've added some tests which duplicates tests for GiST.
5) "connected" replaced with "adjacent"
6) allTheSame nodes is node created by SP-GiST core when it decide than result of picksplit method is not good enough. It divides tuples arbitrarily. So we have to visit all the nodes in this case during scan. See: http://www.postgresql.org/docs/devel/static/spgist-implementation.html#SPGIST-ALL-THE-SAME . Currently in-core SP-GiST opclasses behaves similarly.
I didn't decide how to rethink terms in comments yet :(
With best regards,
Alexander Korotkov.
Attachment
On Thu, Jul 12, 2012 at 3:03 AM, Alexander Korotkov <aekorotkov@gmail.com> wrote:
On Tue, Jul 3, 2012 at 10:51 AM, Jeff Davis <pgsql@j-davis.com> wrote:Also, it would be helpful to add a couple tests to rangetypes.sql.New version of patch is attached.
Oops, forgot to include one comment fix into patch.
------
With best regards,
Alexander Korotkov.
Attachment
On 12.07.2012 02:11, Alexander Korotkov wrote: > On Thu, Jul 12, 2012 at 3:03 AM, Alexander Korotkov<aekorotkov@gmail.com>wrote: > >> On Tue, Jul 3, 2012 at 10:51 AM, Jeff Davis<pgsql@j-davis.com> wrote: >> >>> Also, it would be helpful to add a couple tests to rangetypes.sql. >>> >> >> New version of patch is attached. >> > > Oops, forgot to include one comment fix into patch. Thanks. Can you do something about TrickFunctionCall2, please? (http://archives.postgresql.org/message-id/4FE2C968.2010503@enterprisedb.com) A separate patch to refactor that in the existing gist opclass would probably be best. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Thu, Jul 12, 2012 at 10:29 AM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote:
------
With best regards,
Alexander Korotkov.
On 12.07.2012 02:11, Alexander Korotkov wrote:Thanks. Can you do something about TrickFunctionCall2, please? (http://archives.postgresql.org/message-id/4FE2C968.2010503@enterprisedb.com) A separate patch to refactor that in the existing gist opclass would probably be best.On Thu, Jul 12, 2012 at 3:03 AM, Alexander Korotkov<aekorotkov@gmail.com>wrote:On Tue, Jul 3, 2012 at 10:51 AM, Jeff Davis<pgsql@j-davis.com> wrote:Also, it would be helpful to add a couple tests to rangetypes.sql.
New version of patch is attached.
Oops, forgot to include one comment fix into patch.
Done. There are separate patch for get rid of TrickFunctionCall2 and version of SP-GiST for ranges based on that patch.
With best regards,
Alexander Korotkov.
Attachment
On 13.07.2012 02:00, Alexander Korotkov wrote: > On Thu, Jul 12, 2012 at 10:29 AM, Heikki Linnakangas< > heikki.linnakangas@enterprisedb.com> wrote: > >> Thanks. Can you do something about TrickFunctionCall2, please? ( >> http://archives.postgresql.**org/message-id/4FE2C968.** >> 2010503@enterprisedb.com<http://archives.postgresql.org/message-id/4FE2C968.2010503@enterprisedb.com>) >> A separate patch to refactor that in the existing gist opclass would >> probably be best. > > Done. There are separate patch for get rid of TrickFunctionCall2 and > version of SP-GiST for ranges based on that patch. Committed the get-rid-of-TrickFunctionCall2 patch. I also changed one call in range_gist_same(), which was not using TrickFunctionCall2 but was nevertheless doing the same thing in spirit. I'll try to take a look at the other patch in the next few days. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Thu, Jul 19, 2012 at 12:22 AM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote:
------
With best regards,
Alexander Korotkov.
On 13.07.2012 02:00, Alexander Korotkov wrote:Thanks. Can you do something about TrickFunctionCall2, please? (http://archives.postgresql.**org/message-id/4FE2C968.**
2010503@enterprisedb.com<http://archives.postgresql.org/message-id/4FE2C968.2010503@enterprisedb.com>)
A separate patch to refactor that in the existing gist opclass would
probably be best.
Done. There are separate patch for get rid of TrickFunctionCall2 and
version of SP-GiST for ranges based on that patch.
Committed the get-rid-of-TrickFunctionCall2 patch. I also changed one call in range_gist_same(), which was not using TrickFunctionCall2 but was nevertheless doing the same thing in spirit.
Good, thanks!
With best regards,
Alexander Korotkov.
On 13.07.2012 02:00, Alexander Korotkov wrote: > Done. There are separate patch for get rid of TrickFunctionCall2 and > version of SP-GiST for ranges based on that patch. Looking at the SP-GiST patch now.. It would be nice to have an introduction, perhaps as a file comment at the top of rangetypes_spgist.c, explaining how the quad tree works. I have a general idea of what a quad tree is, but it's not immediately obvious how it maps to SP-GiST. What is stored on a leaf node and an internal node? What is the 'prefix' (seems to be the centroid)? How are ranges mapped to 2D points? (the function comment of getQuadrant() is a good start for that last one) In spg_range_quad_inner_consistent(), if in->hasPrefix == true, ISTM that in all cases where 'empty' is true, 'which' is set to 0, meaning that there can be no matches in any of the quadrants. In most of the case-branches, you explicitly check for 'empty', but even in the ones where you don't, I think you end up setting which=0 if empty==true. I'm not 100% sure about the RANGESTRAT_ADJACENT case, though. Am I missing something? It would be nice to avoid the code duplication between the new bounds_adjacent() function, and the range_adjacent_internal(). Perhaps move bounds_adjacent() to rangetypes.c and use it in range_adjacent_internal() too? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Fri, Jul 20, 2012 at 3:48 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote:
I've added some comments at the top of rangetypes_spgist.c.
------
With best regards,
Alexander Korotkov.
On 13.07.2012 02:00, Alexander Korotkov wrote:Looking at the SP-GiST patch now..Done. There are separate patch for get rid of TrickFunctionCall2 and
version of SP-GiST for ranges based on that patch.
It would be nice to have an introduction, perhaps as a file comment at the top of rangetypes_spgist.c, explaining how the quad tree works. I have a general idea of what a quad tree is, but it's not immediately obvious how it maps to SP-GiST. What is stored on a leaf node and an internal node? What is the 'prefix' (seems to be the centroid)? How are ranges mapped to 2D points? (the function comment of getQuadrant() is a good start for that last one)
In spg_range_quad_inner_consistent(), if in->hasPrefix == true, ISTM that in all cases where 'empty' is true, 'which' is set to 0, meaning that there can be no matches in any of the quadrants. In most of the case-branches, you explicitly check for 'empty', but even in the ones where you don't, I think you end up setting which=0 if empty==true. I'm not 100% sure about the RANGESTRAT_ADJACENT case, though. Am I missing something?
Ops., it was a bug: RANGESTRAT_ADJACENT shoud set which=0 if empty==true, while RANGESTRAT_CONTAINS and RANGESTRAT_CONTAINED_BY not. Corrected.
It would be nice to avoid the code duplication between the new bounds_adjacent() function, and the range_adjacent_internal(). Perhaps move bounds_adjacent() to rangetypes.c and use it in range_adjacent_internal() too?
Done.
With best regards,
Alexander Korotkov.
Attachment
On 23.07.2012 10:37, Alexander Korotkov wrote: > On Fri, Jul 20, 2012 at 3:48 PM, Heikki Linnakangas< > heikki.linnakangas@enterprisedb.com> wrote: > >> It would be nice to have an introduction, perhaps as a file comment at the >> top of rangetypes_spgist.c, explaining how the quad tree works. I have a >> general idea of what a quad tree is, but it's not immediately obvious how >> it maps to SP-GiST. What is stored on a leaf node and an internal node? >> What is the 'prefix' (seems to be the centroid)? How are ranges mapped to >> 2D points? (the function comment of getQuadrant() is a good start for that >> last one) > > I've added some comments at the top of rangetypes_spgist.c. Thanks, much better. I think the handling of empty ranges needs some further explanation. If I understand the code correctly, the root node can contain a centroid like usual, and empty ranges are placed in the magic 5th quadrant. Alternatively, the root node has no centroid, and contains only two subnodes: all empty ranges are placed under one subnode, and non-empty ranges under the other. It seems it would be simpler if we always stored empty nodes the latter way, with a no-centroid root node, and nodes with a centroid would always only have 4 subnodes. When the first empty range is added to an index that already contains non-empty values, the choose-function could return spgSplitTuple to create a new root node that divides the space into empty and non-empty values. Then again, I don't think it would actually simplify the code much. Handling the 5th quadrant doesn't require much code in spg_range_quad_inner_consistent() as it is. So maybe it's better to leave it the way it is. Or perhaps we should stipulate that a root node with no centroid can only contain empty ranges. When you add the first non-empty range, have choose function return spgSplitTuple, and create a new root node with a centroid, and the empty ranges in the 5th quadrant. >> In spg_range_quad_inner_**consistent(), if in->hasPrefix == true, ISTM that >> in all cases where 'empty' is true, 'which' is set to 0, meaning that there >> can be no matches in any of the quadrants. In most of the case-branches, >> you explicitly check for 'empty', but even in the ones where you don't, I >> think you end up setting which=0 if empty==true. I'm not 100% sure about >> the RANGESTRAT_ADJACENT case, though. Am I missing something? > > Ops., it was a bug: RANGESTRAT_ADJACENT shoud set which=0 if empty==true, > while RANGESTRAT_CONTAINS and RANGESTRAT_CONTAINED_BY not. Corrected. Ok. I did some copy-editing, rewording some comments, and fixing whitespace. Patch attached, hope I didn't break anything. I think the most difficult part of the patch is the spg_range_quad_inner_consistent() function. It's hard to understand how the various strategies are implemented. I started to expand the comments in for each strategy, explaining how each strategy maps to a bounding box in the 2D space, but I'm not sure how much that actually helps. Perhaps it would help to also restructure the code so that you first have a switch-case statement that maps the scan key to bounding box, without worrying about the centroid yet, and then check which quadrants you need to descend to to find points within the box. The adjacent-strategy is more complicated than a simple bounding-box search, though. I'm having trouble understanding how exactly the RANGESTRAT_ADJACENT works. The geometric interpretation of 'adjacent' is that the scan key defines two lines, one horizontal and one vertical. Any points that lie on either of the lines match the query. Looking at the implementation, it's not clear how the code implements the search for along those two lines. Also, I wonder if we really need to reconstruct the "previous" value in a RANGESTRAT_ADJACENT search. ISTM we only need to remember which of the two lines we are chasing. For example, if you descend to quadrant 2 because there might be a point there that lies on the horizontal line, but we already know that there can't be any points there lie on the vertical line, you only need to remember that, not the whole centroid from the previous level. Does the SP-GiST API require the "reconstructed" values stored by inner_consistent to be of the correct datatype, or can it store any Datums in the array? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Attachment
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > Also, I wonder if we really need to reconstruct the "previous" value in > a RANGESTRAT_ADJACENT search. ISTM we only need to remember which of the > two lines we are chasing. For example, if you descend to quadrant 2 > because there might be a point there that lies on the horizontal line, > but we already know that there can't be any points there lie on the > vertical line, you only need to remember that, not the whole centroid > from the previous level. Does the SP-GiST API require the > "reconstructed" values stored by inner_consistent to be of the correct > datatype, or can it store any Datums in the array? They have to match the attribute type, at least as to storage details (typbyval/typlen), because the core uses datumCopy to copy them around. We could possibly extend the API to allow a different type to be used for this, but then it wouldn't be "reconstructed data" in any sense of the word; so I think it'd be abuse of the concept --- which would come back to bite us if we ever try to support index-only scans with SPGiST. ISTM what this points up is that the opclass might want some private state kept around during a tree descent. If we want to support that, we should support it as a separate concept from reconstructed data. regards, tom lane
On 29.07.2012 00:50, Tom Lane wrote: > Heikki Linnakangas<heikki.linnakangas@enterprisedb.com> writes: >> Also, I wonder if we really need to reconstruct the "previous" value in >> a RANGESTRAT_ADJACENT search. ISTM we only need to remember which of the >> two lines we are chasing. For example, if you descend to quadrant 2 >> because there might be a point there that lies on the horizontal line, >> but we already know that there can't be any points there lie on the >> vertical line, you only need to remember that, not the whole centroid >> from the previous level. Does the SP-GiST API require the >> "reconstructed" values stored by inner_consistent to be of the correct >> datatype, or can it store any Datums in the array? > > They have to match the attribute type, at least as to storage details > (typbyval/typlen), because the core uses datumCopy to copy them around. > > We could possibly extend the API to allow a different type to be used > for this, but then it wouldn't be "reconstructed data" in any sense of > the word; so I think it'd be abuse of the concept --- which would come > back to bite us if we ever try to support index-only scans with SPGiST. I can see that for leaf nodes, but does that also hold for inner nodes? > ISTM what this points up is that the opclass might want some private > state kept around during a tree descent. If we want to support that, > we should support it as a separate concept from reconstructed data. Yeah, that seems better. The representation of an inner node is datatype-specific, there should be no need to expose "reconstructed" inner node values outside a datatype's SP-GiST implementation. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > On 29.07.2012 00:50, Tom Lane wrote: >> We could possibly extend the API to allow a different type to be used >> for this, but then it wouldn't be "reconstructed data" in any sense of >> the word; so I think it'd be abuse of the concept --- which would come >> back to bite us if we ever try to support index-only scans with SPGiST. > I can see that for leaf nodes, but does that also hold for inner nodes? I didn't explain myself terribly well, probably. Consider an opclass that wants some private state like this and *also* needs to reconstruct column data. In principle I suppose we could do away with the reconstructed-data support altogether, and consider that if you need that then it is just a portion of the unspecified private state the opclass is holding. But it's probably a bit late to remove bits of the opclass API. regards, tom lane
Just to check where we stand on this: Are you going to send a finalized version of this patch, based on the one I sent earlier, or should I pick up that version and try to get it into committable state? On 23.07.2012 10:37, Alexander Korotkov wrote: > On Fri, Jul 20, 2012 at 3:48 PM, Heikki Linnakangas< > heikki.linnakangas@enterprisedb.com> wrote: > >> On 13.07.2012 02:00, Alexander Korotkov wrote: >> >>> Done. There are separate patch for get rid of TrickFunctionCall2 and >>> version of SP-GiST for ranges based on that patch. >>> >> >> Looking at the SP-GiST patch now.. >> >> It would be nice to have an introduction, perhaps as a file comment at the >> top of rangetypes_spgist.c, explaining how the quad tree works. I have a >> general idea of what a quad tree is, but it's not immediately obvious how >> it maps to SP-GiST. What is stored on a leaf node and an internal node? >> What is the 'prefix' (seems to be the centroid)? How are ranges mapped to >> 2D points? (the function comment of getQuadrant() is a good start for that >> last one) >> > > I've added some comments at the top of rangetypes_spgist.c. > > In spg_range_quad_inner_**consistent(), if in->hasPrefix == true, ISTM that >> in all cases where 'empty' is true, 'which' is set to 0, meaning that there >> can be no matches in any of the quadrants. In most of the case-branches, >> you explicitly check for 'empty', but even in the ones where you don't, I >> think you end up setting which=0 if empty==true. I'm not 100% sure about >> the RANGESTRAT_ADJACENT case, though. Am I missing something? >> > > Ops., it was a bug: RANGESTRAT_ADJACENT shoud set which=0 if empty==true, > while RANGESTRAT_CONTAINS and RANGESTRAT_CONTAINED_BY not. Corrected. > > It would be nice to avoid the code duplication between the new >> bounds_adjacent() function, and the range_adjacent_internal(). Perhaps move >> bounds_adjacent() to rangetypes.c and use it in range_adjacent_internal() >> too? > > > Done. > > ------ > With best regards, > Alexander Korotkov. > -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
In this revision of patch I tried to handle conditions more generally using variables minLower, maxLower, minUpper, maxUpper, inclusive and strictEmpty. However some strategies still contain additional logic.
What is our conclusion about saving previous choice for RANGESTRAT_ADJACENT strategy?
With best regards,
Alexander Korotkov.
Attachment
On 09.08.2012 18:42, Alexander Korotkov wrote: > In this revision of patch I tried to handle conditions more generally using > variables minLower, maxLower, minUpper, maxUpper, inclusive and > strictEmpty. However some strategies still contain additional logic. Thanks, that clarified the code tremendously. The comments I added about the geometrical interpretations of the operations earlier seem unnecessary now, so removed those. > What is our conclusion about saving previous choice for RANGESTRAT_ADJACENT > strategy? I think we're going to do what you did in the patch. A more generic mechanism for holding private state across consistent calls would be nice, but it's not that ugly the way you wrote it. I committed the patch now, but left out the support for adjacent for now. Not because there was necessarily anything wrong with that, but because I have limited time for reviewing, and the rest of the patch looks ready for commit now. I reworded the comments quite a lot, you might want to proofread those to double-check that they're still correct. I'll take a look at the adjacent-support next, as a separate patch. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Thu, Aug 16, 2012 at 3:46 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote:
------
With best regards,
Alexander Korotkov.
I committed the patch now, but left out the support for adjacent for now. Not because there was necessarily anything wrong with that, but because I have limited time for reviewing, and the rest of the patch looks ready for commit now. I reworded the comments quite a lot, you might want to proofread those to double-check that they're still correct. I'll take a look at the adjacent-support next, as a separate patch.
Thanks! There is a separate patch for adjacent. I've reworked adjacent check in order to make it more clear.
With best regards,
Alexander Korotkov.
Attachment
On Sat, 2012-08-18 at 18:10 +0400, Alexander Korotkov wrote: > On Thu, Aug 16, 2012 at 3:46 PM, Heikki Linnakangas > <heikki.linnakangas@enterprisedb.com> wrote: > I committed the patch now, but left out the support for > adjacent for now. Not because there was necessarily anything > wrong with that, but because I have limited time for > reviewing, and the rest of the patch looks ready for commit > now. I reworded the comments quite a lot, you might want to > proofread those to double-check that they're still correct. > I'll take a look at the adjacent-support next, as a separate > patch. > > > Thanks! There is a separate patch for adjacent. I've reworked adjacent > check in order to make it more clear. I am taking a look at this patch now. A few quick comments: * It looks like bounds_adjacent modifies it's by-reference arguments, which is a little worrying to me. The lower/upper labels are flipped back, but the inclusivities are not. Maybe just pass by value instead? * Bounds_adjacent is sensitive to the argument order. Can't it just take bound1 and bound2? * I tried some larger tests and they seemed to work. I haven't reviewed the spgist code changes in detail though. Regards,Jeff Davis
On Sat, 2012-07-28 at 17:50 -0400, Tom Lane wrote: > which would come > back to bite us if we ever try to support index-only scans with SPGiST. I'm confused: http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=92203624934095163f8b57b5b3d7bbd2645da2c8 And the patch that was just committed for Range Types SP-GiST is already using index-only scans. Regards,Jeff Davis
Jeff Davis <pgsql@j-davis.com> writes: > On Sat, 2012-07-28 at 17:50 -0400, Tom Lane wrote: >> which would come >> back to bite us if we ever try to support index-only scans with SPGiST. > I'm confused: > http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=92203624934095163f8b57b5b3d7bbd2645da2c8 Sorry, I was being imprecise there. What I meant was that an opclass that abused the reconstructed-value storage for something else might have problems supporting index-only scans. If we think opclasses might need private storage for index searches, we should add that as a new part of the API, not tell them to misuse this part. regards, tom lane
On Mon, Aug 20, 2012 at 12:25 AM, Jeff Davis <pgsql@j-davis.com> wrote:
------
With best regards,
Alexander Korotkov.
I am taking a look at this patch now. A few quick comments:
* It looks like bounds_adjacent modifies it's by-reference arguments,
which is a little worrying to me. The lower/upper labels are flipped
back, but the inclusivities are not. Maybe just pass by value instead?
* Bounds_adjacent is sensitive to the argument order. Can't it just take
bound1 and bound2?
Fixed. Patch is attached.
With best regards,
Alexander Korotkov.
Attachment
Jeff Davis escribió: > On Sat, 2012-08-18 at 18:10 +0400, Alexander Korotkov wrote: > > > > Thanks! There is a separate patch for adjacent. I've reworked adjacent > > check in order to make it more clear. > > I am taking a look at this patch now. A few quick comments: > * I tried some larger tests and they seemed to work. I haven't reviewed > the spgist code changes in detail though. Jeff, Heikki, Any input on the subsequent version of this patch? -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Tue, 2012-09-04 at 17:45 +0400, Alexander Korotkov wrote: > On Mon, Aug 20, 2012 at 12:25 AM, Jeff Davis <pgsql@j-davis.com> > wrote: > I am taking a look at this patch now. A few quick comments: > > * It looks like bounds_adjacent modifies it's by-reference > arguments, > which is a little worrying to me. The lower/upper labels are > flipped > back, but the inclusivities are not. Maybe just pass by value > instead? > > * Bounds_adjacent is sensitive to the argument order. Can't it > just take > bound1 and bound2? > > > Fixed. Patch is attached. > It looks like this is basically the same diff as v0.1. Did something get mixed up? Regards,Jeff Davis
On Sun, Oct 21, 2012 at 11:22 AM, Jeff Davis <pgsql@j-davis.com> wrote:
------
With best regards,
Alexander Korotkov.
It looks like this is basically the same diff as v0.1. Did something getOn Tue, 2012-09-04 at 17:45 +0400, Alexander Korotkov wrote:
> On Mon, Aug 20, 2012 at 12:25 AM, Jeff Davis <pgsql@j-davis.com>
> wrote:
> I am taking a look at this patch now. A few quick comments:
>
> * It looks like bounds_adjacent modifies it's by-reference
> arguments,
> which is a little worrying to me. The lower/upper labels are
> flipped
> back, but the inclusivities are not. Maybe just pass by value
> instead?
>
> * Bounds_adjacent is sensitive to the argument order. Can't it
> just take
> bound1 and bound2?
>
>
> Fixed. Patch is attached.
>
mixed up?
Right version of patch is attached.
With best regards,
Alexander Korotkov.
Attachment
On Fri, 2012-11-02 at 12:47 +0400, Alexander Korotkov wrote: > Right version of patch is attached. > * In bounds_adjacent, there's no reason to flip the labels back. * Comment should indicate more explicitly that bounds_adjacent is sensitive to the argument order. * In bounds_adjacent, it appears that "bound1" corresponds to "B" in the comment above, and "bound2" corresponds to "A" in the comment above. I would have guessed from reading the comment that bound1 corresponded to A. We should just use consistent names between the comment and the code (e.g. boundA and boundB). * I could be getting confused, but I think that line 645 of rangetypes_spgist.c should be inverted (!bounds_adjacent(...)). * I think needPrevious should have an explanatory comment somewhere. It looks like you are using it to store some state as you descend the tree, but it doesn't look like it's used to reconstruct the value (because we already have the value anyway). Since it's being used for a purpose other than what's intended, that should be explained. Regards,Jeff Davis
Hi Alexander, On 2012-11-04 11:41:48 -0800, Jeff Davis wrote: > On Fri, 2012-11-02 at 12:47 +0400, Alexander Korotkov wrote: > > > Right version of patch is attached. > > > * In bounds_adjacent, there's no reason to flip the labels back. > * Comment should indicate more explicitly that bounds_adjacent is > sensitive to the argument order. > * In bounds_adjacent, it appears that "bound1" corresponds to "B" in the > comment above, and "bound2" corresponds to "A" in the comment above. I > would have guessed from reading the comment that bound1 corresponded to > A. We should just use consistent names between the comment and the code > (e.g. boundA and boundB). > * I could be getting confused, but I think that line 645 of > rangetypes_spgist.c should be inverted (!bounds_adjacent(...)). > * I think needPrevious should have an explanatory comment somewhere. It > looks like you are using it to store some state as you descend the tree, > but it doesn't look like it's used to reconstruct the value (because we > already have the value anyway). Since it's being used for a purpose > other than what's intended, that should be explained. Do you have time to address these comments or should the patch marked as returned with Feedback? Afaics there hasn't been progress since CF2... Greetings, Andres Freund --Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Hi!
On Sun, Nov 4, 2012 at 11:41 PM, Jeff Davis <pgsql@j-davis.com> wrote:
------
With best regards,
Alexander Korotkov.
On Fri, 2012-11-02 at 12:47 +0400, Alexander Korotkov wrote:* In bounds_adjacent, there's no reason to flip the labels back.
> Right version of patch is attached.
>
Fixed.
* Comment should indicate more explicitly that bounds_adjacent issensitive to the argument order.
Fixed.
* In bounds_adjacent, it appears that "bound1" corresponds to "B" in the
comment above, and "bound2" corresponds to "A" in the comment above. I
would have guessed from reading the comment that bound1 corresponded to
A. We should just use consistent names between the comment and the code
(e.g. boundA and boundB).
Fixed.
* I could be getting confused, but I think that line 645 of
rangetypes_spgist.c should be inverted (!bounds_adjacent(...)).
Good catch! Actually, code was in direct contradiction with comment. Fixed.
* I think needPrevious should have an explanatory comment somewhere. It
looks like you are using it to store some state as you descend the tree,
but it doesn't look like it's used to reconstruct the value (because we
already have the value anyway). Since it's being used for a purpose
other than what's intended, that should be explained.
Yes, it's a some kind of hack now. Comment is added.
With best regards,
Alexander Korotkov.
Attachment
On Fri, 2012-12-14 at 01:31 +0400, Alexander Korotkov wrote: > Hi! > Hi! I have attached a patch with some significant edits. * In your patch, there was still an inconsistency between the comment for bounds_adjacent and the code. I refactored it to ensure it always takes the upper bound as boundA, and the lower bound as boundB, so that it can invert the inclusivities to create A..B to match the comments. * In the consistent method, you were inverting upper to be a lower bound and lower to be an upper bound. I don't understand why (perhaps I did the first time I read the patch), so I removed it. * It looks like the comments for which1/2 were inconsistent with the code. I tried to fix that. * I significantly refactored the comments and code for the consistent method. I had trouble understanding the original comments, particularly around the edge cases. Please take a look and see if it still matches your algorithm properly. This patch is not intended to be a final version (I didn't analyze my code carefully), but just to show you what I mean and how I interpret what the code is trying to do. You don't need to use my refactorings, but if not, the comments in your version need more improvement so I can understand. I found it easier to reason in terms of horizontal and vertical lines, and which quadrants they crossed, and then work out the edge cases. I'm not sure if that reasoning was correct, but it seemed to make more sense to me. Regards, Jeff Davis
Attachment
Hi!
On Mon, Dec 17, 2012 at 2:42 PM, Jeff Davis <pgsql@j-davis.com> wrote:
------
With best regards,
Alexander Korotkov.
I have attached a patch with some significant edits.
* In your patch, there was still an inconsistency between the comment
for bounds_adjacent and the code. I refactored it to ensure it always
takes the upper bound as boundA, and the lower bound as boundB, so that
it can invert the inclusivities to create A..B to match the comments.
* In the consistent method, you were inverting upper to be a lower bound
and lower to be an upper bound. I don't understand why (perhaps I did
the first time I read the patch), so I removed it.
* It looks like the comments for which1/2 were inconsistent with the
code. I tried to fix that.
* I significantly refactored the comments and code for the consistent
method. I had trouble understanding the original comments, particularly
around the edge cases.
Please take a look and see if it still matches your algorithm properly.
This patch is not intended to be a final version (I didn't analyze my
code carefully), but just to show you what I mean and how I interpret
what the code is trying to do. You don't need to use my refactorings,
but if not, the comments in your version need more improvement so I can
understand.
I found it easier to reason in terms of horizontal and vertical lines,
and which quadrants they crossed, and then work out the edge cases. I'm
not sure if that reasoning was correct, but it seemed to make more sense
to me.
Your comments and refactoring looks good for me.
With best regards,
Alexander Korotkov.
On 02/09/2013 05:36 PM, Alexander Korotkov wrote:
This patch is currently flagged as waiting for author. Have you had a chance to test and examine Jeff's changes in more detail? Would you consider giving us a summary of the status of this work - do you think you can have it production-ready within a week or two, with all necessary tests and documentation.Your comments and refactoring looks good for me.
Note that Jeff wrote:
Please take a look and see if it still matches your algorithm properly. This patch is not intended to be a final version (I didn't analyze my code carefully), but just to show you what I mean and how I interpret what the code is trying to do.
... so it's clear that you need to read the patch in detail and make any desired edits, re-test, and go from there.
I see a couple of changes to the regression tests. Do you need any documentation changes for this too? Any user-visible effects other than performance?
-- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Sun, Mar 3, 2013 at 6:37 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
Alexander Korotkov.
This patch is currently flagged as waiting for author. Have you had a chance to test and examine Jeff's changes in more detail? Would you consider giving us a summary of the status of this work - do you think you can have it production-ready within a week or two, with all necessary tests and documentation.On 02/09/2013 05:36 PM, Alexander Korotkov wrote:Your comments and refactoring looks good for me.... so it's clear that you need to read the patch in detail and make any desired edits, re-test, and go from there.
Note that Jeff wrote:Please take a look and see if it still matches your algorithm properly. This patch is not intended to be a final version (I didn't analyze my code carefully), but just to show you what I mean and how I interpret what the code is trying to do.
I see a couple of changes to the regression tests. Do you need any documentation changes for this too? Any user-visible effects other than performance?
This patch only adds one more operator to already committed new opclass. Tests already cover this case. Without patch corresponding test leads to sequential scan instead of index scan. However, I can't see any documentation changes about already committed opclass. I think we need a documentation update as an separate patch.
Jeff changes only does comments changes and renaming, I don't need to examine them more.
------
With best regards,Alexander Korotkov.
On 03/04/2013 01:42 AM, Alexander Korotkov wrote: > > This patch only adds one more operator to already committed new > opclass. Tests already cover this case. Without patch corresponding > test leads to sequential scan instead of index scan. However, I can't > see any documentation changes about already committed opclass. I think > we need a documentation update as an separate patch. > Jeff changes only does comments changes and renaming, I don't need to > examine them more. OK. Would it be fair to summarize that as "I think it's ready to commit as-is" ? Do you have any outstanding concerns or uncertainties? -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Mon, Mar 4, 2013 at 6:53 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
------
With best regards,
Alexander Korotkov.
On 03/04/2013 01:42 AM, Alexander Korotkov wrote:OK. Would it be fair to summarize that as "I think it's ready to commit
>
> This patch only adds one more operator to already committed new
> opclass. Tests already cover this case. Without patch corresponding
> test leads to sequential scan instead of index scan. However, I can't
> see any documentation changes about already committed opclass. I think
> we need a documentation update as an separate patch.
> Jeff changes only does comments changes and renaming, I don't need to
> examine them more.
as-is" ? Do you have any outstanding concerns or uncertainties?
Yes. I think it's ready to commit as-is.
With best regards,
Alexander Korotkov.
On 03.03.2013 19:42, Alexander Korotkov wrote: > This patch only adds one more operator to already committed new opclass. > Tests already cover this case. Without patch corresponding test leads to > sequential scan instead of index scan. Thanks, committed with some trivial cleanup. > However, I can't see any > documentation changes about already committed opclass. I think we need a > documentation update as an separate patch. Agreed. I think a small mention in the Range Types - Indexing section (http://www.postgresql.org/docs/devel/static/rangetypes.html#RANGETYPES-GIST) will do. - Heikki