Thread: AW: pg_index.indislossy

AW: pg_index.indislossy

From
Zeugswetter Andreas SB
Date:
> > > Can someone tell me what we use indislossy for? 

Ok, so the interpretation of this field is:A match in the index needs to be reevaluated in the heap tuple data,since a
matchin the index does not necessarily mean, that the heap tuplematches.If the heap tuple data matches, the index must
alwaysmatch.
 

A very typical example for such an index is a hash index. This might explain the 
fact, that the ODBC driver misinterpreted that field as meaning that the index is a hash.  
The field has nothing to do with partial index.

Andreas


Re: AW: pg_index.indislossy

From
Bruce Momjian
Date:
Added to pg_index.h file as a comment.

> 
> > > > Can someone tell me what we use indislossy for? 
> 
> Ok, so the interpretation of this field is:
>     A match in the index needs to be reevaluated in the heap tuple data,
>     since a match in the index does not necessarily mean, that the heap tuple
>     matches.
>     If the heap tuple data matches, the index must always match.
> 
> A very typical example for such an index is a hash index. This might explain the 
> fact, that the ODBC driver misinterpreted that field as meaning that the index is a hash.  
> The field has nothing to do with partial index.
> 
> Andreas
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: AW: pg_index.indislossy

From
Peter Eisentraut
Date:
Bruce Momjian writes:

> > > > > Can someone tell me what we use indislossy for?
> >
> > Ok, so the interpretation of this field is:
> >     A match in the index needs to be reevaluated in the heap tuple data,
> >     since a match in the index does not necessarily mean, that the heap tuple
> >     matches.
> >     If the heap tuple data matches, the index must always match.

AFAIK, this is true for all indexes in PostgreSQL, because index rows
don't store the transactions status.  Of course those are two different
underlying reasons why a heap lookup is always necessary, but there
shouldn't be any functional difference in the current implementation.

-- 
Peter Eisentraut   peter_e@gmx.net   http://funkturm.homeip.net/~peter



Re: AW: pg_index.indislossy

From
Bruce Momjian
Date:
> Bruce Momjian writes:
> 
> > > > > > Can someone tell me what we use indislossy for?
> > >
> > > Ok, so the interpretation of this field is:
> > >     A match in the index needs to be reevaluated in the heap tuple data,
> > >     since a match in the index does not necessarily mean, that the heap tuple
> > >     matches.
> > >     If the heap tuple data matches, the index must always match.
> 
> AFAIK, this is true for all indexes in PostgreSQL, because index rows
> don't store the transactions status.  Of course those are two different
> underlying reasons why a heap lookup is always necessary, but there
> shouldn't be any functional difference in the current implementation.

Seems it is something they added for the index abstraction and not for
practical use by PostgreSQL.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: AW: pg_index.indislossy

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> Bruce Momjian writes:
> A match in the index needs to be reevaluated in the heap tuple data,
> since a match in the index does not necessarily mean, that the heap tuple
> matches.

> AFAIK, this is true for all indexes in PostgreSQL, because index rows
> don't store the transactions status.

Not true at all.  The tuple commit status needs to be rechecked, yes,
but with a normal index it is not necessary to recheck whether the index
key field actually satisfies the index qual conditions.  With a lossy
index it *is* necessary to recheck --- the index may return more tuples
than the ones that match the given qual.  For example, an r-tree index
applied to a "distance from point X <= D" query might return all the
tuples lying within a bounding box of the circle actually wanted.

The LIKE index optimization can also be thought of as using an index as
a lossy index: the index scan gives you all the tuples you want, plus
some you don't, so you have to evaluate the LIKE operator over again at
each returned tuple.

Basically, what this is good for is using an index for more kinds of
WHERE conditions than it could otherwise support.  It is *not* a useless
abstraction.  It occurs to me though that marking the index itself
as lossy is the wrong way to think about it --- the right way is to
associate the "lossy" flag with use of a particular operator with an
index.  So maybe the flag should be in pg_amop or pg_amproc, instead.
Someday I'd also like to see those tables extended so that the LIKE
index optimization is described by the tables, rather than being
hard-wired into the planner as it is now.
        regards, tom lane


Re: AW: pg_index.indislossy

From
Peter Eisentraut
Date:
Tom Lane writes:

> Not true at all.  The tuple commit status needs to be rechecked, yes,
> but with a normal index it is not necessary to recheck whether the index
> key field actually satisfies the index qual conditions.  With a lossy
> index it *is* necessary to recheck --- the index may return more tuples
> than the ones that match the given qual.

Okay, this is not surprising.  I agree that storing this in the index
might be suboptimal.

But why is this called lossy?  Shouldn't it be called "exceedy"?

-- 
Peter Eisentraut   peter_e@gmx.net   http://funkturm.homeip.net/~peter



Re: AW: pg_index.indislossy

From
Bruce Momjian
Date:
> Peter Eisentraut <peter_e@gmx.net> writes:
> > But why is this called lossy?  Shouldn't it be called "exceedy"?
> 
> Good point ;-).  "lossy" does sound like the index might "lose" tuples,
> which is exactly what it's not allowed to do; it must find all the
> tuples that match the query.
> 
> The terminology is correct by analogy to "lossy compression" --- the
> index loses information, in the sense that its result isn't quite the
> result you wanted.  But I can see where it'd confuse the unwary.
> Perhaps we should consult the literature and see if there is another
> term for this concept.

Seeing how our ODBC driver refrences it in previous releases, we are
going to have trouble changing it.  I always thought it was "lossy" in
terms of compression too.

I don't see it mentioned now in ODBC, but I think it used to be there. 
I changed it recently to check for word "hash" instead.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: AW: pg_index.indislossy

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> But why is this called lossy?  Shouldn't it be called "exceedy"?

Good point ;-).  "lossy" does sound like the index might "lose" tuples,
which is exactly what it's not allowed to do; it must find all the
tuples that match the query.

The terminology is correct by analogy to "lossy compression" --- the
index loses information, in the sense that its result isn't quite the
result you wanted.  But I can see where it'd confuse the unwary.
Perhaps we should consult the literature and see if there is another
term for this concept.
        regards, tom lane


Re: AW: pg_index.indislossy

From
ncm@zembu.com (Nathan Myers)
Date:
On Tue, Jul 10, 2001 at 01:36:33PM -0400, Tom Lane wrote:
> Peter Eisentraut <peter_e@gmx.net> writes:
> > But why is this called lossy?  Shouldn't it be called "exceedy"?
> 
> Good point ;-).  "lossy" does sound like the index might "lose" tuples,
> which is exactly what it's not allowed to do; it must find all the
> tuples that match the query.
> 
> The terminology is correct by analogy to "lossy compression" --- the
> index loses information, in the sense that its result isn't quite the
> result you wanted.  But I can see where it'd confuse the unwary.
> Perhaps we should consult the literature and see if there is another
> term for this concept.

How about "hinty"? :-)

Seriously, "indislossy" is a singularly poor name for a predicate.
Also, are we so poor that we can't afford whole words, or even word 
breaks?  I propose "index_is_hint".  

Actually, is the "ind[ex]" part even necessary?  
How about "must_check_heap"?

Nathan Myers
ncm@zembu.com


Re: AW: pg_index.indislossy

From
Tom Lane
Date:
ncm@zembu.com (Nathan Myers) writes:
> Seriously, "indislossy" is a singularly poor name for a predicate.

Perhaps, but it fits with the existing naming conventions for Postgres
catalog columns.  Unless we want to indulge in wholesale renaming of
the system's catalog columns (and break an awful lot of applications)
I'd resist any name for a pg_index column that's not of the form "indFOO".
        regards, tom lane