Re: GIN vs. Partial Indexes - Mailing list pgsql-hackers

From Tom Lane
Subject Re: GIN vs. Partial Indexes
Date
Msg-id 13837.1286560056@sss.pgh.pa.us
Whole thread Raw
In response to Re: GIN vs. Partial Indexes  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: GIN vs. Partial Indexes
Re: GIN vs. Partial Indexes
List pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Oct 7, 2010 at 10:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> IMO, what's needed is to fix GIN so it doesn't go insane for empty
>> values or non-restrictive queries, by ensuring there's at least one
>> index entry for every row. �This has been discussed before; see the TODO
>> section for GIN.

> That seems like it could waste an awful lot of disk space (and
> therefore I/O, etc.).  No?

How so?  In a typical application, there would not likely be very many
such rows --- we're talking about cases like documents containing zero
indexable words.  In any case, the problem right now is that GIN has
significant functional limitations because it fails to make any index
entry at all for such rows.  Even if there are in fact no such rows
in a particular table, it has to fail on some queries because there
*might* be such rows.  There is no way to fix those limitations
unless it undertakes to have some index entry for every row.  That
will take disk space, but it's *necessary*.  (To adapt the old saw,
I can make this index arbitrarily small if it doesn't have to give
the right answers.)

In any case, I would expect that GIN could actually do this quite
efficiently.  What we'd probably want is a concept of a "null word",
with empty indexable rows entered in the index as if they contained the
null word.  So there'd be just one index entry with a posting list of
however many such rows there are.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: WIP: Triggers on VIEWs
Next
From: Tom Lane
Date:
Subject: Re: WIP: Triggers on VIEWs