Re: GiST range-contained-by searches versus empty ranges - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: GiST range-contained-by searches versus empty ranges |
Date | |
Msg-id | 8565.1322353615@sss.pgh.pa.us Whole thread Raw |
In response to | GiST range-contained-by searches versus empty ranges (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: GiST range-contained-by searches versus empty ranges
|
List | pgsql-hackers |
I wrote: > I started to wonder why the test in range_gist_consistent_int() for > RANGESTRAT_CONTAINED_BY was "return true" (ie, search the entire index) > rather than range_overlaps, which is what is tested in the comparable > case in rtree_internal_consistent(). The regression tests showed me > how come: an empty-range index entry should be reported as being > contained by anything, but range_overlaps won't necessarily find such > entries. (The rtree case is all right because there is no equivalent of > an empty range in boxes, circles, or polygons.) > I am not satisfied with this state of affairs though: range_contained_by > really ought to be efficiently indexable, but with the current coding > an index search is nearly useless. Also, so far as I can see, the > current penalty function allows empty-range entries to be scattered > basically anywhere in the index, making a search for "= empty" pretty > inefficient too. > The first solution that comes to mind is to make the penalty and > picksplit functions forcibly segregate empty ranges from others, that is > a split will never put empty ranges together with non-empty ones. Then, > we can assume that a non-empty internal node doesn't represent any empty > leaf entries, and avoid descending to it when it doesn't overlap the > target range. Then the equality-search case could be improved too. After looking through the code a bit, it seems that this isn't possible to solve with a localized fix after all. Yes, the opclass-specific picksplit function can positively guarantee to separate empty ranges from others when it splits a page ... but it doesn't have control over what happens when an item is added to an index without a page split occurring. Consider a scenario where we have a multi-page GiST index containing no empty ranges, and an empty range item needs to be added. As the code is currently written, the central GiST code will choose one of the existing root-page items to add the new item to; there is no way to persuade it that a new top-level item would be a better idea. What's more, if we try to do so by having the penalty function return infinity for adding an empty range to a nonempty item, we'll end up with empty ranges being randomly added to any/all of the items, since we'll get the exact same penalty for all the nonempty items. So, far from segregating the empties, we'll just end up with them cluttered all over the index, just like now. This is not really a new problem. There is code in there that is trying to segregate NULLs from non-NULL entries, and it is equally broken/ineffective. I'm inclined to propose that we should add some logic to say that merging a new item into an existing one is forbidden if the penalty function returns plus-infinity for the case. If all existing items on a page return infinity, a new item must be added to the page (possibly causing a page split) instead of inserting into any existing one. (Of course, gistpenalty() should be fixed to return infinity, not just a randomly chosen large value, for the null-and-not-null case.) Comments? regards, tom lane
pgsql-hackers by date: