Re: GIN versus zero-key queries - Mailing list pgsql-hackers
From | Bruce Momjian |
---|---|
Subject | Re: GIN versus zero-key queries |
Date | |
Msg-id | 200904152344.n3FNiuH05971@momjian.us Whole thread Raw |
In response to | Re: GIN versus zero-key queries (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-hackers |
Where are we on this? --------------------------------------------------------------------------- Tom Lane wrote: > Teodor Sigaev <teodor@sigaev.ru> writes: > >> We have an API definition by which extractQuery can distinguish "all > >> match" from "no match". If we just legislate that "some match" isn't > >> a valid behavior for zero-key queries, then the code is correct and the > > > Right now I don't see an example with zero keys and "some match", consistent > > method will not have any information about indexed tuple and hence it could not > > produce any reasonable result. It seems to me, that paragraph should be removed > > at all. > > Well, we still have to document the fact that GIN fails when presented > with a query that would require a full-index scan. I've updated section > 52.5 as attached. (I removed the claim that multiple matches were a > problem, since this is obviously not true for a bitmap scan, and I > suppose that the plain indexscan code must have had a way to deal with > it too.) > > More generally, though, I find myself quite unhappy with the fact that > GIN doesn't provide reasonable behavior for the no-keys corner cases. > If we are going to provide operator classes that claim to implement > index acceleration of standard operators, it is not okay for them > to not match the exact semantics of those operators. Right now it's > a mess for empty arrays, and it's even more of a mess for arrays > containing nulls. array_contain_compare() considers nulls as never > matching, which means that > > regression=# select array[1,null] <@ array[1,null]; > ?column? > ---------- > f > (1 row) > > which is entirely bizarre when you compare that result to > > regression=# select array[1,null] = array[1,null]; > ?column? > ---------- > t > (1 row) > > It's obviously too late to do anything about this for 8.4, but I would > like to have a TODO item to figure out how to do this right. We need to > adjust the behavior of the operators to be consistent, and then we need > to make it possible for GIN to implement them exactly. I wonder whether > we should not change GIN to automatically do something reasonable for > empty indexed values, ie stick them into the index in some special way > denoting "no indexable keys for this item". The dummy-value solution > is not something that operator classes should have to come up with, > and not all data types present a reasonable way to have dummy values > anyway. > > regards, tom lane > > > <para> > <acronym>GIN</acronym> doesn't support full index scans. The reason for > this is that <function>extractValue</> is allowed to return zero keys, > as for example might happen with an empty string or empty array. In such > a case the indexed value will be unrepresented in the index. It is > therefore impossible for <acronym>GIN</acronym> to guarantee that a > scan of the index can find every row in the table. > </para> > > <para> > Because of this limitation, when <function>extractQuery</function> returns > <literal>nkeys = 0</> to indicate that all values match the query, > <acronym>GIN</acronym> will emit an error. (If there are multiple ANDed > indexable operators in the query, this happens only if they all return zero > for <literal>nkeys</>.) > </para> > > <para> > It is possible for an operator class to circumvent the restriction against > full index scan. To do that, <function>extractValue</> must return at least > one (possibly dummy) key for every indexed value, and > <function>extractQuery</function> must convert an unrestricted search into > a partial-match query that will scan the whole index. This is inefficient > but might be necessary to avoid corner-case failures with operators such > as <literal>LIKE</> or subset inclusion. > </para> > > <para> > <acronym>GIN</acronym> assumes that indexable operators are strict. > This means that <function>extractValue</> will not be called at all on > a NULL value (so the value will go unindexed), and > <function>extractQuery</function> will not be called on a NULL comparison > value either (instead, the query is presumed to be unmatchable). > </para> > > <para> > A possibly more serious limitation is that <acronym>GIN</acronym> cannot > handle NULL keys — for example, an array containing a NULL cannot > be handled except by ignoring the NULL. > </para> > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
pgsql-hackers by date: