Thread: More fun with GIN lossy-page pointers

More fun with GIN lossy-page pointers

From
Tom Lane
Date:
I fixed the problem I was on about earlier with ginget.c doing the wrong
thing for keystreams like this:
...        ...42/6        42/6553542/7        ......

(To recap, after reporting the potential lossy match for 42/6, the
previous code would advance both keystreams and thus fail to see
the potential lossy match for 42/7.)  But in the shower this morning
I realized there was another case I had not considered:
...        ...42/6        42/750/1        42/65535...        ...

After deciding there is no possible match at 42/6, both the original
code and my patch will choose to advance the first keystream.  This
means they will never compare 42/6 to the lossy pointer and recognize
there's a potential lossy match.

So far as I can see, it's impossible to handle this situation when
examining only one TID per stream with no lookahead.  Choosing to
advance the second stream would obviously fail in many other cases,
so there is no correct action.  The only reasonable way out is to
forbid the case --- that is, decree that a keystream may *not*
contain both lossy and nonlossy pointers to the same page.

Now it looks to me like there's no actual bug so far as keyGetItem is
concerned, because in fact that's how entryGetItem will behave: it can
only return a lossy pointer when a partial match bitmap has gone lossy,
and then the bitmap won't return any real pointers to that page.
However it *is* possible for the revised version of keyGetItem to return
such a keystream, thus breaking scanGetItem.  The problematic case is
where the consistentFn has OR logic and we have a situation like
...            ...42/6            42/6553550/1            ......

We'll correctly report 42/6 as a lossy result, but then the next
call will advance only the first keystream, and decide that it needs
to test nothing-and-42/65535 as a possible match.  Given an OR query,
that'll succeed, and we'll return 42/65535 next, thus breaking the
rule that needs to hold for scanGetItem.

So this looks like a bit of a mess to resolve.  I think what we
have to do is first see if the consistentFn will succeed when only
the lossy stream is TRUE, and if so report the whole-page pointer
as a lossy match (and advance over it as well as all the non-lossy
pointers for the same page on the next call).  Otherwise continue
as now.

In any case the code comments are all wet because they state that
a keystream with both lossy and nonlossy pointers on the same page
is OK.

Comments?
        regards, tom lane


Re: More fun with GIN lossy-page pointers

From
Alvaro Herrera
Date:
Excerpts from Tom Lane's message of sáb jul 31 09:57:13 -0400 2010:

> So far as I can see, it's impossible to handle this situation when
> examining only one TID per stream with no lookahead.  Choosing to
> advance the second stream would obviously fail in many other cases,
> so there is no correct action.  The only reasonable way out is to
> forbid the case --- that is, decree that a keystream may *not*
> contain both lossy and nonlossy pointers to the same page.

Would it make sense to order the streams differently?  I mean, what if
whole-page pointers in the lossy stream are processed before regular ones?
(I am thoroughly unfamiliar with this stuff, so forgive me if I've
gotten the concepts and terminology all wrong)

-- 
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: More fun with GIN lossy-page pointers

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Excerpts from Tom Lane's message of sáb jul 31 09:57:13 -0400 2010:
>> So far as I can see, it's impossible to handle this situation when
>> examining only one TID per stream with no lookahead.  Choosing to
>> advance the second stream would obviously fail in many other cases,
>> so there is no correct action.  The only reasonable way out is to
>> forbid the case --- that is, decree that a keystream may *not*
>> contain both lossy and nonlossy pointers to the same page.

> Would it make sense to order the streams differently?  I mean, what if
> whole-page pointers in the lossy stream are processed before regular ones?

Hmm ... interesting thought.  I'm not sure what the implications are,
but it's definitely worth considering.
        regards, tom lane


Re: More fun with GIN lossy-page pointers

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Excerpts from Tom Lane's message of sáb jul 31 09:57:13 -0400 2010:
>> So far as I can see, it's impossible to handle this situation when
>> examining only one TID per stream with no lookahead.  Choosing to
>> advance the second stream would obviously fail in many other cases,
>> so there is no correct action.  The only reasonable way out is to
>> forbid the case --- that is, decree that a keystream may *not*
>> contain both lossy and nonlossy pointers to the same page.

> Would it make sense to order the streams differently?  I mean, what if
> whole-page pointers in the lossy stream are processed before regular ones?

I thought about this for awhile and decided that, although it's probably
possible to fix the problem along those lines, it would still add quite
a lot of ugliness to the code.  One point in particular is that there's
no natural representation of lossy-page pointers that would make them
sort that way (we can't use blocknum/0 because 0/0 has to be reserved
for ItemPointerSetMin).  So part of the price would be to make
compareItemPointers uglier and slower.  And we'd still need special
cases in keyGetItem and scanGetItem, to teach them not to advance past a
lossy pointer as long as we were dealing with non-lossy pointers to the
same page in other streams.  So on the whole it seems better to confine
the ugliness to the immediately-relevant functions.
        regards, tom lane