Thread: [PATCH] Add support for IS NULL to btree indexes
Here is a patch that enables IS NULL to use a btree index. After mentioning it was really hard, I had a brainwave and here are the 80 lines of code necessary to make it work. What surprised me is that in various important parts most of the work had actually been done already. Actually, it's two changes: - In the btree index code, if SK_ISNULL is set, do the right thing - If the query has col IS NULL, expand that to col = NULL in the index quals The bit to convert col = NULL to an appropriate scan key and everything else was already in place. It's not overhauling the index am code, just using the facilities already there. The only possibly controversial addition of a new scankey flag SK_INDEXFINDNULL which needs to be set for the index to consider the scankey. This is to distinguish the cases where (a = NULL) has been derived from (a IS NULL) and (a = col) where col happens to be NULL this time round. It's not entirely perfect, if you look at the explain output, it's still in the filter list. I couldn't work out how to remove it. Before you ask, it is working, I traced gdb all the way through the btree index code and it is doing what it's supposed to. It's the same issue affecting the IS TRUE/FALSE predicates, they're still in the filter lists too. Basically, could some people look over the logic. I think I got it right but this is the first time I've played with the index code so maybe I've missed something. If it is agreed to be the way to go, I'll go back and work out the documentation changes. Download: http://svana.org/kleptog/pgsql/indexnulls.diff Have a nice day, test=# explain analyze select * from z order by t; QUERY PLAN -------------------------------------------------------------------------------------------------------------- Index Scan using zt on z (cost=0.00..65.58 rows=1144 width=44) (actual time=0.034..4.624 rows=1000 loops=1) Total runtime: 7.684 ms (2 rows) test=# explain analyze select * from z where t is null order by t; QUERY PLAN --------------------------------------------------------------------------------------------------------- Index Scan using zt on z (cost=0.00..5.84 rows=6 width=44) (actual time=0.044..1.442 rows=250 loops=1) Index Cond: (t = NULL::text) Filter: (t IS NULL) Total runtime: 2.279 ms (4 rows) -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Attachment
Martijn van Oosterhout <kleptog@svana.org> writes: > Actually, it's two changes: > - In the btree index code, if SK_ISNULL is set, do the right thing > - If the query has col IS NULL, expand that to col =3D NULL in the index > quals This is a bad idea, because it translates "x IS NULL" into "x = NULL" which is under no circumstances the same thing. It might coincidentally fail to malfunction for btree indexes, depending on the specifics of the "=" operator in use; but that doesn't make it right. (AFAICS, the proposed patch simply breaks for non-btree indexes.) Also, it cannot handle IS NOT NULL. A proper solution requires explicitly representing IS NULL/IS NOT NULL as distinct kinds of scankey. regards, tom lane
On Mon, Sep 19, 2005 at 04:33:27PM -0400, Tom Lane wrote: > This is a bad idea, because it translates "x IS NULL" into "x = NULL" > which is under no circumstances the same thing. It might coincidentally > fail to malfunction for btree indexes, depending on the specifics of the > "=" operator in use; but that doesn't make it right. (AFAICS, the > proposed patch simply breaks for non-btree indexes.) Also, it cannot > handle IS NOT NULL. That's point of the extra flag, to distinguish between the two cases. It works for all types, the actual operator in the operator class is never invoked (they're strict, the code can't call them with NULL even if it wanted to). ExecInitIndexScan has always set SK_ISNULL for null args, even though it never happened in practice. All it does is move to the point in the index where the NULLs begin and start returning tuples. It takes a previously impossible state and makes it return something useful. It doesn't handle IS NOT NULL, since I don't think that's worth indexing anyway, but perhaps for completeness. It the much harder case. > A proper solution requires explicitly representing IS NULL/IS NOT NULL > as distinct kinds of scankey. Well, this patch has a kind for IS NULL. It does have issues with other indexes, but there's no point working on that until there is a possibility of acceptance. The purpose was to demonstrate that it is trivially possible using what is available. -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.