Thread: Fix for gistchoose
I found that code of gistchoose doesn't follow it's logic. Idea of gistchoose is that first column penalty is more important than penalty of second column. If we meet same penalty values of first column then we choose minimal penalty value of second column among them.
Current code doesn't follow described logic. Let's illustrate it on example. Imagine we've following input data.
tuple col1 col2
1 1 0
2 0 2
3 0 1
According to function logic, tuple #3 should be selected. But gistchoose will choose #2, because after processing #2 which_grow will contain two zeros. It's enough to remove "&& i == FirstOffsetNumber" from gistchoose in order to fix it. I've discussed it with Teodor in person and he have confirmed my thoughts.
1 1 0
2 0 2
3 0 1
Let's consider an example.
Create table with two columns where first column contain only few distinct values.
test=# create table test as (select (random()*5)::integer, random() from generate_series(1,1000000));
test=# create index test_idx on test using gist (x,y);
Without patch:
test=# explain (analyze, buffers) select * from test where x = 1 and y between 0.1 and 0.2;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on test (cost=986.89..6737.30 rows=19681 width=12) (actual time=40.543..52.033 rows=19894 loops=1)
Recheck Cond: ((x = 1) AND (y >= 0.1::double precision) AND (y <= 0.2::double precision))
Buffers: shared hit=5957
-> Bitmap Index Scan on test_idx (cost=0.00..981.96 rows=19681 width=0) (actual time=39.248..39.248 rows=19894 loops=1)
Index Cond: ((x = 1) AND (y >= 0.1::double precision) AND (y <= 0.2::double precision))
Buffers: shared hit=699
Total runtime: 53.494 ms
(7 rows)
test=# explain (analyze, buffers) select * from test where x = 1 and y between 0.5 and 0.6;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on test (cost=998.29..6753.38 rows=19948 width=12) (actual time=38.646..50.136 rows=19830 loops=1)
Recheck Cond: ((x = 1) AND (y >= 0.5::double precision) AND (y <= 0.6::double precision))
Buffers: shared hit=6243
-> Bitmap Index Scan on test_idx (cost=0.00..993.30 rows=19948 width=0) (actual time=37.342..37.342 rows=19830 loops=1)
Index Cond: ((x = 1) AND (y >= 0.5::double precision) AND (y <= 0.6::double precision))
Buffers: shared hit=974
Total runtime: 51.561 ms
(7 rows)
With patch
test=# explain (analyze, buffers) select * from test2 where x = 1 and y between 0.1 and 0.2;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on test2 (cost=889.96..6645.37 rows=19966 width=12) (actual time=17.761..29.499 rows=19894 loops=1)
Recheck Cond: ((x = 1) AND (y >= 0.1::double precision) AND (y <= 0.2::double precision))
Buffers: shared hit=5436
-> Bitmap Index Scan on test2_idx (cost=0.00..884.97 rows=19966 width=0) (actual time=16.598..16.598 rows=19894 loops=1)
Index Cond: ((x = 1) AND (y >= 0.1::double precision) AND (y <= 0.2::double precision))
Buffers: shared hit=178
Total runtime: 30.996 ms
(7 rows)
test=# explain (analyze, buffers) select * from test2 where x = 1 and y between 0.5 and 0.6;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on test2 (cost=885.34..6639.89 rows=19917 width=12) (actual time=21.734..33.655 rows=19830 loops=1)
Recheck Cond: ((x = 1) AND (y >= 0.5::double precision) AND (y <= 0.6::double precision))
Buffers: shared hit=5452
-> Bitmap Index Scan on test2_idx (cost=0.00..880.36 rows=19917 width=0) (actual time=20.604..20.604 rows=19830 loops=1)
Index Cond: ((x = 1) AND (y >= 0.5::double precision) AND (y <= 0.6::double precision))
Buffers: shared hit=183
Total runtime: 35.136 ms
(7 rows)
You can see that difference in number of read buffers during index scan is pretty significant.
With best regards,
Alexander Korotkov.
Attachment
On Mon, Aug 20, 2012 at 12:57 PM, Alexander Korotkov <aekorotkov@gmail.com> wrote: > I found that code of gistchoose doesn't follow it's logic. Idea of > gistchoose is that first column penalty is more important than penalty of > second column. If we meet same penalty values of first column then we choose > minimal penalty value of second column among them. Nice catch. Thanks for the patch, which I have now committed. But since these functions were very short on useful comments, I added some. Hopefully they're even right, but let me know if you think otherwise, or have any ideas for further improvement. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Mon, Aug 20, 2012 at 12:57 PM, Alexander Korotkov > <aekorotkov@gmail.com> wrote: >> I found that code of gistchoose doesn't follow it's logic. Idea of >> gistchoose is that first column penalty is more important than penalty of >> second column. If we meet same penalty values of first column then we choose >> minimal penalty value of second column among them. > Nice catch. Thanks for the patch, which I have now committed. Should we backpatch that? regards, tom lane
On Thu, Aug 30, 2012 at 1:27 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Mon, Aug 20, 2012 at 12:57 PM, Alexander Korotkov >> <aekorotkov@gmail.com> wrote: >>> I found that code of gistchoose doesn't follow it's logic. Idea of >>> gistchoose is that first column penalty is more important than penalty of >>> second column. If we meet same penalty values of first column then we choose >>> minimal penalty value of second column among them. > >> Nice catch. Thanks for the patch, which I have now committed. > > Should we backpatch that? Arguably, yes. Does the patch look sane to you? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
<div class="gmail_quote">On Thu, Aug 30, 2012 at 9:11 PM, Robert Haas <span dir="ltr"><<a href="mailto:robertmhaas@gmail.com"target="_blank">robertmhaas@gmail.com</a>></span> wrote:<br /><blockquote class="gmail_quote"style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">On Mon, Aug 20,2012 at 12:57 PM, Alexander Korotkov<br /> <<a href="mailto:aekorotkov@gmail.com">aekorotkov@gmail.com</a>> wrote:<br/> > I found that code of gistchoose doesn't follow it's logic. Idea of<br /> > gistchoose is that first columnpenalty is more important than penalty of<br /> > second column. If we meet same penalty values of first columnthen we choose<br /> > minimal penalty value of second column among them.<br /><br /></div>Nice catch. Thanks forthe patch, which I have now committed. But<br /> since these functions were very short on useful comments, I added<br/> some. Hopefully they're even right, but let me know if you think<br /> otherwise, or have any ideas for furtherimprovement.</blockquote><div class="gmail_quote"><br /></div><div class="gmail_quote">Thanks! Comments looks goodfor me.</div><br />------<br />With best regards,<br />Alexander Korotkov.</div>
Robert Haas <robertmhaas@gmail.com> writes: > On Thu, Aug 30, 2012 at 1:27 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Should we backpatch that? > Arguably, yes. Does the patch look sane to you? I was afraid you'd ask that. [ studies code for awhile ... ] I think this fixes the bug, but the function could really do with slightly more invasive code adjustments for clarity. For instance, we could get rid of the "sum_grow = 1" kluge if we removed the "&& sum_grow" from the outer for statement (where it's pretty damn ugly/surprising anyway --- I dislike for loops that look like they're just running a counter when they're doing other things too) and instead put something like this at the bottom of the outer loop: /* * If we examined all the columns and there is zero penalty for * all of them, we can stop examining tuples; this is agood one * to insert the new key into. */if (sum_grow == 0 && j == r->rd_att->natts) break; I'm not thrilled with the added comments either; they need copy-editing and they randomly fail to fit in an 80-column window. (pgindent will have something to say about that later for some of them, but I think it doesn't reformat comments that start at the left margin.) BTW, I think the real issue with the bug is not so much that we're resetting anything as that we may be initializing which_grow[j] for the next index column for the first time. Note that the function's startup code only bothers to initialize which_grow[0] (although it does so in a less than legible fashion). The rest have to get filled as the loop proceeds. Do you want to have another go at it, or would you like me to try to make it better? regards, tom lane
On Thu, Aug 30, 2012 at 3:27 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Thu, Aug 30, 2012 at 1:27 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Should we backpatch that? > >> Arguably, yes. Does the patch look sane to you? > > I was afraid you'd ask that. > > [ studies code for awhile ... ] > > I think this fixes the bug, but the function could really do with slightly > more invasive code adjustments for clarity. For instance, we could get > rid of the "sum_grow = 1" kluge if we removed the "&& sum_grow" from the > outer for statement (where it's pretty damn ugly/surprising anyway --- > I dislike for loops that look like they're just running a counter when > they're doing other things too) and instead put something like this at > the bottom of the outer loop: > > /* > * If we examined all the columns and there is zero penalty for > * all of them, we can stop examining tuples; this is a good one > * to insert the new key into. > */ > if (sum_grow == 0 && j == r->rd_att->natts) > break; > > I'm not thrilled with the added comments either; they need copy-editing > and they randomly fail to fit in an 80-column window. (pgindent will > have something to say about that later for some of them, but I think it > doesn't reformat comments that start at the left margin.) Sorry. I must have had my window sized wrong. At any rate, as far as the GiST code goes anyway, I'm inclined to think that even mis-spelled comments are an improvement over the status quo. > BTW, I think the real issue with the bug is not so much that we're > resetting anything as that we may be initializing which_grow[j] for > the next index column for the first time. Note that the function's > startup code only bothers to initialize which_grow[0] (although it > does so in a less than legible fashion). The rest have to get filled > as the loop proceeds. I noticed all that, but didn't feel like putting in the effort to make it better. I would have been happy to have someone else pick up the patch, but as it had been languishing I thought it would be better to get it committed more or less as it was than to wait for someone to have time to make it beautiful. If you want to hack on it more that's fine with me. I kind of wonder if we ought to rename the variables, and maybe turn sum_grow into a boolean. But I'm not really eager to go crazy if this is something we have to back-patch. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > I noticed all that, but didn't feel like putting in the effort to make > it better. I would have been happy to have someone else pick up the > patch, but as it had been languishing I thought it would be better to > get it committed more or less as it was than to wait for someone to > have time to make it beautiful. If you want to hack on it more that's > fine with me. I kind of wonder if we ought to rename the variables, > and maybe turn sum_grow into a boolean. But I'm not really eager to > go crazy if this is something we have to back-patch. Yeah, the idea of replacing sum_grow with a boolean just occurred to me too. As is, I think the code is making some less-than-portable assumptions about what will happen if sum_grow overflows; which can definitely happen, seeing that gistpenalty and its callees intentionally return infinity in some cases. I'd rather it didn't attempt to add column penalties together, and I think there's a case to be made that not doing so is a back-patchable bug fix. I'll take a shot at improving this some more. regards, tom lane
On Thu, Aug 30, 2012 at 4:15 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> I noticed all that, but didn't feel like putting in the effort to make >> it better. I would have been happy to have someone else pick up the >> patch, but as it had been languishing I thought it would be better to >> get it committed more or less as it was than to wait for someone to >> have time to make it beautiful. If you want to hack on it more that's >> fine with me. I kind of wonder if we ought to rename the variables, >> and maybe turn sum_grow into a boolean. But I'm not really eager to >> go crazy if this is something we have to back-patch. > > Yeah, the idea of replacing sum_grow with a boolean just occurred to me > too. As is, I think the code is making some less-than-portable > assumptions about what will happen if sum_grow overflows; which can > definitely happen, seeing that gistpenalty and its callees intentionally > return infinity in some cases. I'd rather it didn't attempt to add > column penalties together, and I think there's a case to be made that > not doing so is a back-patchable bug fix. Keep in mind that the worst case outcome is the index quality is worse than it otherwise would have been, so it's not like OMG-PostgreSQ-eats-your-data. > I'll take a shot at improving this some more. Okey dokey. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Thu, Aug 30, 2012 at 4:15 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Yeah, the idea of replacing sum_grow with a boolean just occurred to me >> too. As is, I think the code is making some less-than-portable >> assumptions about what will happen if sum_grow overflows; which can >> definitely happen, seeing that gistpenalty and its callees intentionally >> return infinity in some cases. I'd rather it didn't attempt to add >> column penalties together, and I think there's a case to be made that >> not doing so is a back-patchable bug fix. > Keep in mind that the worst case outcome is the index quality is worse > than it otherwise would have been, so it's not like > OMG-PostgreSQ-eats-your-data. Agreed, but we've seen plenty of complaining about bloated gist indexes, and this might be the cause. >> I'll take a shot at improving this some more. > Okey dokey. Attached is a revised version of gistchoose(); I've not yet transposed the changes into gistRelocateBuildBuffersOnSplit(). It looks a lot more readable to me now. Objections? regards, tom lane diff --git a/src/backend/access/gist/gistutil.c b/src/backend/access/gist/gistutil.c index 7c6ce8495caac1e9d7c99afdb513689de93beea5..48d70d98e25363d8425020be7607f20348a0456a 100644 *** a/src/backend/access/gist/gistutil.c --- b/src/backend/access/gist/gistutil.c *************** gistgetadjusted(Relation r, IndexTuple o *** 363,475 **** } /* ! * Search a page for the entry with lowest penalty. * ! * The index may have multiple columns, and there's a penalty value for each column. ! * The penalty associated with a column which appears earlier in the index definition is ! * strictly more important than the penalty of column which appears later in the index ! * definition. */ OffsetNumber gistchoose(Relation r, Page p, IndexTuple it, /* it has compressed entry */ GISTSTATE *giststate) { OffsetNumber maxoff; OffsetNumber i; ! OffsetNumber which; ! float sum_grow, ! which_grow[INDEX_MAX_KEYS]; GISTENTRY entry, identry[INDEX_MAX_KEYS]; bool isnull[INDEX_MAX_KEYS]; ! maxoff = PageGetMaxOffsetNumber(p); ! *which_grow = -1.0; ! which = InvalidOffsetNumber; ! sum_grow = 1; gistDeCompressAtt(giststate, r, it, NULL, (OffsetNumber) 0, identry, isnull); ! Assert(maxoff >= FirstOffsetNumber); ! Assert(!GistPageIsLeaf(p)); /* ! * Loop over tuples on page. * ! * We'll exit early if we find an index key that can accommodate the new key with no ! * penalty on any column. sum_grow is used to track this condition. Normally, it is the ! * sum of the penalties we've seen for this column so far, which is not a very useful ! * quantity in general because the penalties for each column are only considered ! * independently, but all we really care about is whether or not it's greater than zero. ! * Since penalties can't be negative, the sum of the penalties will be greater than ! * zero if and only if at least one penalty was greater than zero. To make things just ! * a bit more complicated, we arbitrarily set sum_grow to 1.0 whenever we want to force ! * the at least one more iteration of this outer loop. Any non-zero value would serve ! * just as well. */ ! for (i = FirstOffsetNumber; i <= maxoff && sum_grow; i = OffsetNumberNext(i)) { - int j; IndexTuple itup = (IndexTuple) PageGetItem(p, PageGetItemId(p, i)); ! sum_grow = 0; ! /* Loop over indexed attribtues. */ for (j = 0; j < r->rd_att->natts; j++) { Datum datum; float usize; bool IsNull; datum = index_getattr(itup, j + 1, giststate->tupdesc, &IsNull); gistdentryinit(giststate, j, &entry, datum, r, p, i, FALSE, IsNull); usize = gistpenalty(giststate, j, &entry, IsNull, &identry[j], isnull[j]); ! if (which_grow[j] < 0 || usize < which_grow[j]) { /* ! * We get here in two cases. First, we may have just discovered that the ! * current tuple is the best one we've seen so far; that is, for the first ! * column for which the penalty is not equal to the best tuple seen so far, ! * this one has a lower penalty than the previously-seen one. But, when ! * a new best tuple is found, we must record the best penalty value for ! * all the remaining columns. We'll end up here for each remaining index ! * column in that case, too. */ ! which = i; ! which_grow[j] = usize; if (j < r->rd_att->natts - 1) ! which_grow[j + 1] = -1; ! sum_grow += which_grow[j]; } ! else if (which_grow[j] == usize) { /* ! * The current tuple is exactly as good for this column as the best tuple ! * seen so far. The next iteration of this loop will compare the next ! * column. */ - sum_grow += usize; } else { /* ! * The current tuple is worse for this column than the best tuple seen so ! * far. Skip the remaining columns and move on to the next tuple, if any. */ ! sum_grow = 1; break; } } - } ! if (which == InvalidOffsetNumber) ! which = FirstOffsetNumber; ! return which; } /* --- 363,482 ---- } /* ! * Search an upper index page for the entry with lowest penalty for insertion ! * of the new index key contained in "it". * ! * Returns the index of the page entry to insert into. */ OffsetNumber gistchoose(Relation r, Page p, IndexTuple it, /* it has compressed entry */ GISTSTATE *giststate) { + OffsetNumber result; OffsetNumber maxoff; OffsetNumber i; ! float best_penalty[INDEX_MAX_KEYS]; GISTENTRY entry, identry[INDEX_MAX_KEYS]; bool isnull[INDEX_MAX_KEYS]; ! Assert(!GistPageIsLeaf(p)); ! gistDeCompressAtt(giststate, r, it, NULL, (OffsetNumber) 0, identry, isnull); ! /* we'll return FirstOffsetNumber if page is empty (shouldn't happen) */ ! result = FirstOffsetNumber; /* ! * The index may have multiple columns, and there's a penalty value for ! * each column. The penalty associated with a column which appears ! * earlier in the index definition is strictly more important than the ! * penalty of a column which appears later in the index definition. * ! * best_penalty[j] is the best penalty we have seen so far for column j, ! * or -1 when we haven't yet examined column j. Array entries to the ! * right of the first -1 are undefined. */ ! best_penalty[0] = -1; ! ! /* ! * Loop over tuples on page. ! */ ! maxoff = PageGetMaxOffsetNumber(p); ! Assert(maxoff >= FirstOffsetNumber); ! ! for (i = FirstOffsetNumber; i <= maxoff; i = OffsetNumberNext(i)) { IndexTuple itup = (IndexTuple) PageGetItem(p, PageGetItemId(p, i)); + bool zero_penalty; + int j; ! zero_penalty = true; ! /* Loop over indexed attributes within this tuple. */ for (j = 0; j < r->rd_att->natts; j++) { Datum datum; float usize; bool IsNull; + /* Compute penalty for this column. */ datum = index_getattr(itup, j + 1, giststate->tupdesc, &IsNull); gistdentryinit(giststate, j, &entry, datum, r, p, i, FALSE, IsNull); usize = gistpenalty(giststate, j, &entry, IsNull, &identry[j], isnull[j]); + if (usize > 0) + zero_penalty = false; ! if (best_penalty[j] < 0 || usize < best_penalty[j]) { /* ! * New best penalty for column. Tentatively select this tuple ! * as the target, and record the best penalty. Then reset the ! * next column's penalty to "unknown" (and indirectly, the ! * same for all the ones to its right). This will force us ! * to adopt this tuple's penalty values as the best for all ! * the remaining columns during subsequent loop iterations. */ ! result = i; ! best_penalty[j] = usize; ! if (j < r->rd_att->natts - 1) ! best_penalty[j + 1] = -1; } ! else if (best_penalty[j] == usize) { /* ! * The current tuple is exactly as good for this column as the ! * best tuple seen so far. The next iteration of this loop ! * will compare the next column. */ } else { /* ! * The current tuple is worse for this column than the best ! * tuple seen so far. Skip the remaining columns and move on ! * to the next tuple, if any. */ ! zero_penalty = false; /* be sure outer loop won't exit */ break; } } ! /* ! * If we find a tuple with zero penalty for all columns, there's no ! * need to examine remaining tuples; just break out of the loop and ! * return it. ! */ ! if (zero_penalty) ! break; ! } ! return result; } /*
On Thu, Aug 30, 2012 at 4:48 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Thu, Aug 30, 2012 at 4:15 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Yeah, the idea of replacing sum_grow with a boolean just occurred to me >>> too. As is, I think the code is making some less-than-portable >>> assumptions about what will happen if sum_grow overflows; which can >>> definitely happen, seeing that gistpenalty and its callees intentionally >>> return infinity in some cases. I'd rather it didn't attempt to add >>> column penalties together, and I think there's a case to be made that >>> not doing so is a back-patchable bug fix. > >> Keep in mind that the worst case outcome is the index quality is worse >> than it otherwise would have been, so it's not like >> OMG-PostgreSQ-eats-your-data. > > Agreed, but we've seen plenty of complaining about bloated gist indexes, > and this might be the cause. More likely one of several, but sure. >>> I'll take a shot at improving this some more. > >> Okey dokey. > > Attached is a revised version of gistchoose(); I've not yet transposed > the changes into gistRelocateBuildBuffersOnSplit(). It looks a lot > more readable to me now. Objections? Looks good to me. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company