Thread: ScanKey representation for RowCompare index conditions
There's one nontrivial decision still to make about how to implement proper per-spec row-comparison operations, namely: how a row comparison ought to be represented in the index access method API. The current representation of index conditions in the AM API is an array of ScanKey structs, one per "indexcol op constant" index condition, with implicit ANDing across all the conditions. There are various bits of code that require the ScanKeys to appear in order by index column, though this isn't inherent in the data structure itself. Short of a fundamental redesign of the data structure, I can see two plausible approaches to adding row-wise comparisons: A. Include all the elements of the row comparison as separate entries in the ScanKey array, and mark them (probably via sk_flags bits) to show that they form a row condition rather than independent tests. Unless we want to add more fields to ScanKey, we'd have to rely on the order of the ScanKey entries to show the relationship of the conditions (ie, which condition is part of which row comparison, and what its column position is within the row). B. Place a single entry for the row comparison in the main ScanKey array, with a special sk_func pointer pointing to a function that does a row-wise comparison. The sk_argument field would point to a subsidiary ScanKey array containing the actual index columns and data values for the row elements. sk_attno would reference the first (leftmost) index column used by the row comparison. We'd still want an sk_flags bit to indicate that this is a row comparison, probably. I'm currently leaning to plan B, on the grounds that: 1. It would require no changes in _bt_checkkeys(), which is the only user of the data structure that is particularly performance-critical. With plan A we'd be adding at least a few cycles to _bt_checkkeys in all cases. Plan B avoids that at the cost of an extra level of function call to do a row comparison. Given the relative frequency of the two sorts of index conditions, this has to be the better tradeoff to make. 2. There's quite a bit of logic in btree indexscan setup that would find it convenient to treat a row comparison as if it were a single condition on just its leftmost column. This may end up being nearly a wash, but I think that plan A would make the setup code a shade more complex than plan B would. In particular, the rules about valid orderings of the ScanKey entries would be complicated under plan A, whereas under plan B it's still clear where everything belongs. Any thoughts? Anyone see a good plan C, or a serious flaw that I'm missing in either of these ideas? regards, tom lane
On Sun, 2006-01-15 at 18:03 -0500, Tom Lane wrote: > There's one nontrivial decision still to make about how to implement > proper per-spec row-comparison operations, namely: how a row > comparison ought to be represented in the index access method API. I'm not sure I understand why. Surely a row wise comparison can take advantage of indexes on some of its columns? When would it be advantageous to create an index on the whole row anyway? Wouldn't the key likely be excessively long? So why would we support this at all? > I'm currently leaning to plan B, on the grounds that: > > 1. It would require no changes in _bt_checkkeys(), which is the only > user of the data structure that is particularly performance-critical. > With plan A we'd be adding at least a few cycles to _bt_checkkeys in > all cases. Plan B avoids that at the cost of an extra level of function > call to do a row comparison. Given the relative frequency of the two > sorts of index conditions, this has to be the better tradeoff to make. > > 2. There's quite a bit of logic in btree indexscan setup that would find > it convenient to treat a row comparison as if it were a single condition > on just its leftmost column. This may end up being nearly a wash, but > I think that plan A would make the setup code a shade more complex than > plan B would. In particular, the rules about valid orderings of the > ScanKey entries would be complicated under plan A, whereas under plan > B it's still clear where everything belongs. > > Any thoughts? Anyone see a good plan C, or a serious flaw that I'm > missing in either of these ideas? That looks sound. Best Regards, Simon Riggs
On Sun, Jan 15, 2006 at 06:03:12PM -0500, Tom Lane wrote: > There's one nontrivial decision still to make about how to implement > proper per-spec row-comparison operations, namely: how a row > comparison ought to be represented in the index access method API. > The current representation of index conditions in the AM API is an > array of ScanKey structs, one per "indexcol op constant" index > condition, with implicit ANDing across all the conditions. There > are various bits of code that require the ScanKeys to appear in order > by index column, though this isn't inherent in the data structure > itself. ISTM that row-wise comparisons, as far as indexes are concerned are actually simpler than normal scan-keys. For example, if you have the condition (a,b) >= (5,1) then once the index has found that point, every subsequent entry in the index matches (except possibly NULLs). So you really don't actually need a row-comparison at any point after than. Now, if you have bracketing conditions like (a,b) BETWEEN (5,1) and (7,0) it gets more interesting. Ideally you'd want to pass both of these to the index so it knows when to stop. This really suggests using your plan B since you then have the possibility of passing both, which you don't really have with plan A. The latter also allows you to give more auxilliary conditions like b>4. So it seems like a good idea. No plan C I can think of... Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Simon Riggs <simon@2ndquadrant.com> writes: > On Sun, 2006-01-15 at 18:03 -0500, Tom Lane wrote: >> There's one nontrivial decision still to make about how to implement >> proper per-spec row-comparison operations, namely: how a row >> comparison ought to be represented in the index access method API. > I'm not sure I understand why. Surely a row wise comparison can take > advantage of indexes on some of its columns? Not until we write the code to let it do so. Currently, what the system can match to an index on (a,b) is conditions like WHERE a >= x AND b >= y, which is quite a different animal both semantically and representationally from WHERE (a,b) >= (x,y). (At least, assuming that one imputes the correct SQL-spec meaning to the latter, viz a > x OR (a = x AND b >= y).) Because of the semantic difference, we have to preserve the difference when passing the condition to the index AM, thus the need for my question. > So why would we support this at all? Example applications here and here: http://archives.postgresql.org/pgsql-performance/2004-07/msg00188.php http://archives.postgresql.org/pgsql-performance/2005-12/msg00590.php (which demonstrate BTW the reasons for not trying to handle this by just expanding out the OR/AND equivalents). regards, tom lane
Martijn van Oosterhout <kleptog@svana.org> writes: > ISTM that row-wise comparisons, as far as indexes are concerned are > actually simpler than normal scan-keys. For example, if you have the > condition (a,b) >= (5,1) then once the index has found that point, > every subsequent entry in the index matches (except possibly NULLs). So > you really don't actually need a row-comparison at any point after > than. Well, yeah you do, eg if the caller starts demanding backward fetch; you need to be able to tell where to stop. In any case it's certainly necessary to pass down the whole row-condition into the index AM; without that, the closest you could get would be an indexscan on a >= 5 which might have to scan an undesirably large number of rows before reaching the first one with b >= 1. > Now, if you have bracketing conditions like (a,b) BETWEEN (5,1) and > (7,0) it gets more interesting. Ideally you'd want to pass both of > these to the index so it knows when to stop. This really suggests using > your plan B since you then have the possibility of passing both, which > you don't really have with plan A. The latter also allows you to give > more auxilliary conditions like b>4. No, you misunderstood my plan A --- it's not giving up any of these options. But it's paying for it in complexity. I was envisioning a case like the above as being represented this way: sk_flags sk_attno sk_func sk_datum ROW_START a >= 5ROW_END b >= 1ROW_START a <= 7ROW_END b <= 0normal b > 4 Hence my comments about weird rules for the ordering of entries under plan A --- the normal and ROW_START entries would be in order by attno, but the row continuation/end entries wouldn't appear to follow this global ordering. They'd follow a local order within each row condition, instead. Since you didn't understand what I was saying, I suspect that plan A is too confusing ... regards, tom lane
On Mon, Jan 16, 2006 at 12:07:44PM -0500, Tom Lane wrote: > Since you didn't understand what I was saying, I suspect that plan A is > too confusing ... Umm, yeah. Now you've explained it I think it should be excluded on the basis that it'll be a source of bugs. For all the places that matter a row-condition needs to be treated as a whole and storing parts in a top level ScanKey array just seems like asking for trouble. Given no plan C, I guess plan B is the way to go...? -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Martijn van Oosterhout <kleptog@svana.org> writes: > On Mon, Jan 16, 2006 at 12:07:44PM -0500, Tom Lane wrote: >> Since you didn't understand what I was saying, I suspect that plan A is >> too confusing ... > Umm, yeah. Now you've explained it I think it should be excluded on the > basis that it'll be a source of bugs. For all the places that matter a > row-condition needs to be treated as a whole and storing parts in a > top level ScanKey array just seems like asking for trouble. Actually, I'd just been coming back to plan A, after realizing that it's not possible to do this with zero change in _bt_checkkeys() after all. The amount of information passed down to an ordinary sk_func isn't sufficient to do what the row-compare subroutine would need to do: it needs access to the whole indextuple, not only the first column's value, plus access to the direction and continuescan variables. We could add extra parameters to what gets passed, but that's certainly no cheaper than adding a test on some sk_flags bits. And having the row comparison logic inside bt_checkkeys is probably more understandable than having a separate subroutine with strange calling conventions. I think given adequate documentation in skey.h, either representation is about the same as far as the source-of-bugs argument goes. The main risk you get from a Plan-B approach is that it's easier to fail to notice bugs of omission, where a piece of code looking at a ScanKey needs to do something special with a row-compare key and omits to do so. At least that's what I'm guessing at the moment --- I haven't actually tried to code up any of the changes yet. In any case, since we are only intending to support this for searches of btree indexes, there are not that many pieces of code that will be concerned with it. I think it's just _bt_first, _bt_preprocess_keys, _bt_checkkeys, and ExecIndexBuildScanKeys. Everyplace else will be dealing only in "simple" ScanKeys with no row comparisons. regards, tom lane