Re: Odd Sort/Limit/Max Problem - Mailing list pgsql-performance

From Tom Lane
Subject Re: Odd Sort/Limit/Max Problem
Date
Msg-id 6352.1039824235@sss.pgh.pa.us
Whole thread Raw
In response to Re: Odd Sort/Limit/Max Problem  (Hannu Krosing <hannu@tm.ee>)
List pgsql-performance
Hannu Krosing <hannu@tm.ee> writes:
> Tom Lane kirjutas L, 14.12.2002 kell 01:24:
>> Not correct.  btrees index NULLs, as they must do in order to have
>> correct behavior for multicolumn indexes.

> I've heard this befoe, but this is something I've never understood - why
> do you have to index _single_ null's in order to behave correctly for
> multi-column index.

Well, you don't absolutely *have to* index individual nulls, but you do
have to support nulls in index entries.

The example that motivates this is

    create table foo (f1 int, f2 int);
    create index fooi on foo(f1,f2);
    ... fill table ...
    select * from foo where f1 = 42;

The planner is entitled to implement this as an indexscan using fooi's
first column (and ignoring its lower-order column(s)).  Now if fooi does
not index rows in which f2 is null, you lose, because it may omit rows
with f1 = 42 that should have been found by the indexscan.  So it *has
to* be able to store index entries like (42, NULL).

For btree's purposes, the easiest implementation is to say that NULL is
an ordinary index entry with a definable sort position (which we chose
to define as "after all non-NULL values").  There's no particular value
in having a special case for all-NULL index entries, so we don't.

GiST is not able to handle all-NULL index entries, so it uses the rule
"index all rows in which the first index column is not NULL".  This
still meets the planner's constraint because we never do an indexscan
that uses only lower-order index columns.

hash and rtree don't support NULL index entries, but they don't support
multicolumn indexes either, so the constraint doesn't apply.

> I don't have anything against nulls being indexed - in a table where
> nulls have about the same frequency as other values it may actually be
> useful (if indexes were used to find IS NULL tuples)

At least for btree, it would be nice to someday allow IS NULL as an
indexable operator.  I haven't thought very hard about how to do that;
shoehorning it into the operator class structure looks like it'd be a
horrid mess, so it'd probably require some creative klugery :-(

> Is it just that nulls need to have an ordering and that this fact has
> somehow leaked down to actually being stored in the index ?

No, more the other way around: btree assigns an ordering to NULLs
because it must do so in order to know where to put them in the index.
This is an artifact of btree that happens to "leak upward" ...

            regards, tom lane

pgsql-performance by date:

Previous
From: Laurette Cisneros
Date:
Subject: Re: Fwd: Re: Odd Sort/Limit/Max Problem
Next
From:
Date:
Subject: ~* + LIMIT => infinite time?