Thread: More fun with GIN lossy-page pointers
I fixed the problem I was on about earlier with ginget.c doing the wrong thing for keystreams like this: ... ...42/6 42/6553542/7 ...... (To recap, after reporting the potential lossy match for 42/6, the previous code would advance both keystreams and thus fail to see the potential lossy match for 42/7.) But in the shower this morning I realized there was another case I had not considered: ... ...42/6 42/750/1 42/65535... ... After deciding there is no possible match at 42/6, both the original code and my patch will choose to advance the first keystream. This means they will never compare 42/6 to the lossy pointer and recognize there's a potential lossy match. So far as I can see, it's impossible to handle this situation when examining only one TID per stream with no lookahead. Choosing to advance the second stream would obviously fail in many other cases, so there is no correct action. The only reasonable way out is to forbid the case --- that is, decree that a keystream may *not* contain both lossy and nonlossy pointers to the same page. Now it looks to me like there's no actual bug so far as keyGetItem is concerned, because in fact that's how entryGetItem will behave: it can only return a lossy pointer when a partial match bitmap has gone lossy, and then the bitmap won't return any real pointers to that page. However it *is* possible for the revised version of keyGetItem to return such a keystream, thus breaking scanGetItem. The problematic case is where the consistentFn has OR logic and we have a situation like ... ...42/6 42/6553550/1 ...... We'll correctly report 42/6 as a lossy result, but then the next call will advance only the first keystream, and decide that it needs to test nothing-and-42/65535 as a possible match. Given an OR query, that'll succeed, and we'll return 42/65535 next, thus breaking the rule that needs to hold for scanGetItem. So this looks like a bit of a mess to resolve. I think what we have to do is first see if the consistentFn will succeed when only the lossy stream is TRUE, and if so report the whole-page pointer as a lossy match (and advance over it as well as all the non-lossy pointers for the same page on the next call). Otherwise continue as now. In any case the code comments are all wet because they state that a keystream with both lossy and nonlossy pointers on the same page is OK. Comments? regards, tom lane
Excerpts from Tom Lane's message of sáb jul 31 09:57:13 -0400 2010: > So far as I can see, it's impossible to handle this situation when > examining only one TID per stream with no lookahead. Choosing to > advance the second stream would obviously fail in many other cases, > so there is no correct action. The only reasonable way out is to > forbid the case --- that is, decree that a keystream may *not* > contain both lossy and nonlossy pointers to the same page. Would it make sense to order the streams differently? I mean, what if whole-page pointers in the lossy stream are processed before regular ones? (I am thoroughly unfamiliar with this stuff, so forgive me if I've gotten the concepts and terminology all wrong) -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera <alvherre@commandprompt.com> writes: > Excerpts from Tom Lane's message of sáb jul 31 09:57:13 -0400 2010: >> So far as I can see, it's impossible to handle this situation when >> examining only one TID per stream with no lookahead. Choosing to >> advance the second stream would obviously fail in many other cases, >> so there is no correct action. The only reasonable way out is to >> forbid the case --- that is, decree that a keystream may *not* >> contain both lossy and nonlossy pointers to the same page. > Would it make sense to order the streams differently? I mean, what if > whole-page pointers in the lossy stream are processed before regular ones? Hmm ... interesting thought. I'm not sure what the implications are, but it's definitely worth considering. regards, tom lane
Alvaro Herrera <alvherre@commandprompt.com> writes: > Excerpts from Tom Lane's message of sáb jul 31 09:57:13 -0400 2010: >> So far as I can see, it's impossible to handle this situation when >> examining only one TID per stream with no lookahead. Choosing to >> advance the second stream would obviously fail in many other cases, >> so there is no correct action. The only reasonable way out is to >> forbid the case --- that is, decree that a keystream may *not* >> contain both lossy and nonlossy pointers to the same page. > Would it make sense to order the streams differently? I mean, what if > whole-page pointers in the lossy stream are processed before regular ones? I thought about this for awhile and decided that, although it's probably possible to fix the problem along those lines, it would still add quite a lot of ugliness to the code. One point in particular is that there's no natural representation of lossy-page pointers that would make them sort that way (we can't use blocknum/0 because 0/0 has to be reserved for ItemPointerSetMin). So part of the price would be to make compareItemPointers uglier and slower. And we'd still need special cases in keyGetItem and scanGetItem, to teach them not to advance past a lossy pointer as long as we were dealing with non-lossy pointers to the same page in other streams. So on the whole it seems better to confine the ugliness to the immediately-relevant functions. regards, tom lane