Re: GIN versus zero-key queries - Mailing list pgsql-hackers

From Tom Lane
Subject Re: GIN versus zero-key queries
Date
Msg-id 3868.1239304852@sss.pgh.pa.us
Whole thread Raw
In response to Re: GIN versus zero-key queries  (Teodor Sigaev <teodor@sigaev.ru>)
Responses Re: GIN versus zero-key queries  (Bruce Momjian <bruce@momjian.us>)
List pgsql-hackers
Teodor Sigaev <teodor@sigaev.ru> writes:
>> We have an API definition by which extractQuery can distinguish "all
>> match" from "no match".  If we just legislate that "some match" isn't
>> a valid behavior for zero-key queries, then the code is correct and the

> Right now I don't see an example with zero keys and "some match", consistent 
> method will not have any information about indexed tuple and hence it could not 
> produce any reasonable result. It seems to me, that paragraph should be removed 
> at all.

Well, we still have to document the fact that GIN fails when presented
with a query that would require a full-index scan.  I've updated section
52.5 as attached.  (I removed the claim that multiple matches were a
problem, since this is obviously not true for a bitmap scan, and I
suppose that the plain indexscan code must have had a way to deal with
it too.)

More generally, though, I find myself quite unhappy with the fact that
GIN doesn't provide reasonable behavior for the no-keys corner cases.
If we are going to provide operator classes that claim to implement
index acceleration of standard operators, it is not okay for them
to not match the exact semantics of those operators.  Right now it's
a mess for empty arrays, and it's even more of a mess for arrays
containing nulls.  array_contain_compare() considers nulls as never
matching, which means that

regression=# select array[1,null] <@ array[1,null];?column? 
----------f
(1 row)

which is entirely bizarre when you compare that result to

regression=# select array[1,null] = array[1,null];?column? 
----------t
(1 row)

It's obviously too late to do anything about this for 8.4, but I would
like to have a TODO item to figure out how to do this right.  We need to
adjust the behavior of the operators to be consistent, and then we need
to make it possible for GIN to implement them exactly.  I wonder whether
we should not change GIN to automatically do something reasonable for
empty indexed values, ie stick them into the index in some special way
denoting "no indexable keys for this item".  The dummy-value solution
is not something that operator classes should have to come up with,
and not all data types present a reasonable way to have dummy values
anyway.
        regards, tom lane

<para> <acronym>GIN</acronym> doesn't support full index scans.  The reason for this is that <function>extractValue</>
isallowed to return zero keys, as for example might happen with an empty string or empty array.  In such a case the
indexedvalue will be unrepresented in the index.  It is therefore impossible for <acronym>GIN</acronym> to guarantee
thata scan of the index can find every row in the table.</para>
 
<para> Because of this limitation, when <function>extractQuery</function> returns <literal>nkeys = 0</> to indicate
thatall values match the query, <acronym>GIN</acronym> will emit an error.  (If there are multiple ANDed indexable
operatorsin the query, this happens only if they all return zero for <literal>nkeys</>.)</para>
 
<para> It is possible for an operator class to circumvent the restriction against full index scan.  To do that,
<function>extractValue</>must return at least one (possibly dummy) key for every indexed value, and
<function>extractQuery</function>must convert an unrestricted search into a partial-match query that will scan the
wholeindex.  This is inefficient but might be necessary to avoid corner-case failures with operators such as
<literal>LIKE</>or subset inclusion.</para>
 
<para> <acronym>GIN</acronym> assumes that indexable operators are strict. This means that <function>extractValue</>
willnot be called at all on a NULL value (so the value will go unindexed), and <function>extractQuery</function> will
notbe called on a NULL comparison value either (instead, the query is presumed to be unmatchable).</para>
 
<para> A possibly more serious limitation is that <acronym>GIN</acronym> cannot handle NULL keys — for example,
anarray containing a NULL cannot be handled except by ignoring the NULL.</para>
 


pgsql-hackers by date:

Previous
From: "Kevin Grittner"
Date:
Subject: Re: Re: [BUGS] BUG #4027: backslash escaping notdisabled inplpgsql
Next
From: "Kevin Grittner"
Date:
Subject: Re: Re: [BUGS] BUG #4027: backslash escapingnotdisabled inplpgsql