Re: GIN versus zero-key queries - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: GIN versus zero-key queries
Date
Msg-id 200904152344.n3FNiuH05971@momjian.us
Whole thread Raw
In response to Re: GIN versus zero-key queries  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Where are we on this?

---------------------------------------------------------------------------

Tom Lane wrote:
> Teodor Sigaev <teodor@sigaev.ru> writes:
> >> We have an API definition by which extractQuery can distinguish "all
> >> match" from "no match".  If we just legislate that "some match" isn't
> >> a valid behavior for zero-key queries, then the code is correct and the
> 
> > Right now I don't see an example with zero keys and "some match", consistent 
> > method will not have any information about indexed tuple and hence it could not 
> > produce any reasonable result. It seems to me, that paragraph should be removed 
> > at all.
> 
> Well, we still have to document the fact that GIN fails when presented
> with a query that would require a full-index scan.  I've updated section
> 52.5 as attached.  (I removed the claim that multiple matches were a
> problem, since this is obviously not true for a bitmap scan, and I
> suppose that the plain indexscan code must have had a way to deal with
> it too.)
> 
> More generally, though, I find myself quite unhappy with the fact that
> GIN doesn't provide reasonable behavior for the no-keys corner cases.
> If we are going to provide operator classes that claim to implement
> index acceleration of standard operators, it is not okay for them
> to not match the exact semantics of those operators.  Right now it's
> a mess for empty arrays, and it's even more of a mess for arrays
> containing nulls.  array_contain_compare() considers nulls as never
> matching, which means that
> 
> regression=# select array[1,null] <@ array[1,null];
>  ?column? 
> ----------
>  f
> (1 row)
> 
> which is entirely bizarre when you compare that result to
> 
> regression=# select array[1,null] = array[1,null];
>  ?column? 
> ----------
>  t
> (1 row)
> 
> It's obviously too late to do anything about this for 8.4, but I would
> like to have a TODO item to figure out how to do this right.  We need to
> adjust the behavior of the operators to be consistent, and then we need
> to make it possible for GIN to implement them exactly.  I wonder whether
> we should not change GIN to automatically do something reasonable for
> empty indexed values, ie stick them into the index in some special way
> denoting "no indexable keys for this item".  The dummy-value solution
> is not something that operator classes should have to come up with,
> and not all data types present a reasonable way to have dummy values
> anyway.
> 
>             regards, tom lane
> 
> 
>  <para>
>   <acronym>GIN</acronym> doesn't support full index scans.  The reason for
>   this is that <function>extractValue</> is allowed to return zero keys,
>   as for example might happen with an empty string or empty array.  In such
>   a case the indexed value will be unrepresented in the index.  It is
>   therefore impossible for <acronym>GIN</acronym> to guarantee that a
>   scan of the index can find every row in the table.
>  </para>
> 
>  <para>
>   Because of this limitation, when <function>extractQuery</function> returns
>   <literal>nkeys = 0</> to indicate that all values match the query,
>   <acronym>GIN</acronym> will emit an error.  (If there are multiple ANDed
>   indexable operators in the query, this happens only if they all return zero
>   for <literal>nkeys</>.)
>  </para>
> 
>  <para>
>   It is possible for an operator class to circumvent the restriction against
>   full index scan.  To do that, <function>extractValue</> must return at least
>   one (possibly dummy) key for every indexed value, and
>   <function>extractQuery</function> must convert an unrestricted search into
>   a partial-match query that will scan the whole index.  This is inefficient
>   but might be necessary to avoid corner-case failures with operators such
>   as <literal>LIKE</> or subset inclusion.
>  </para>
> 
>  <para>
>   <acronym>GIN</acronym> assumes that indexable operators are strict.
>   This means that <function>extractValue</> will not be called at all on
>   a NULL value (so the value will go unindexed), and
>   <function>extractQuery</function> will not be called on a NULL comparison
>   value either (instead, the query is presumed to be unmatchable).
>  </para>
> 
>  <para>
>   A possibly more serious limitation is that <acronym>GIN</acronym> cannot
>   handle NULL keys — for example, an array containing a NULL cannot
>   be handled except by ignoring the NULL.
>  </para>
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: libpq is not thread safe
Next
From: David Fetter
Date:
Subject: Re: psql with "Function Type" in \df