Index order ignored after `is null` in query

From: Artūras Lapinskas
Subject: Index order ignored after `is null` in query
Date: ,
Msg-id: 20141105214243.GA546@public-docking-cx-1771.ethz.ch
(view: Whole thread, Raw)
Responses: Re: Index order ignored after `is null` in query  (Artūras Lapinskas)
List: pgsql-performance

Tree view

Index order ignored after `is null` in query  (Artūras Lapinskas, )
 Re: Index order ignored after `is null` in query  (Artūras Lapinskas, )
  Re: Index order ignored after `is null` in query  (Tom Lane, )
   Re: Index order ignored after `is null` in query  (Artūras Lapinskas, )
    Re: Index order ignored after `is null` in query  (Jim Nasby, )

Hello,

I am having some hard time understanding how postgresql handles null
values. As much I understand null values are stored in b-tree as simple
values (put as last or first depending on index). But it seems that
there is something really specific about them as postgresql deliberately
ignores obvious (I think...) optimizations concerning index order after
using one of them in a query. As a simple example look at table below:

    arturas=# drop table if exists test;
    DROP TABLE
    arturas=# create table test (
    arturas(#   a int not null,
    arturas(#   b int,
    arturas(#   c int not null
    arturas(# );
    CREATE TABLE

After filling this table with random data (actual distribution of
null's/real values seams not to matter):

    arturas=# insert into test (a, b, c)
    arturas-#   select
    arturas-#     case when random() < 0.5 then 1 else 2 end
    arturas-#     , case when random() < 0.5 then null else 1 end
    arturas-#     , case when random() < 0.5 then 1 else 2 end
    arturas-#   from generate_series(1, 1000000, 1) as gen;
    INSERT 0 1000000

And creating index:

    arturas=# create index test_idx on test (a, b nulls first, c);
    CREATE INDEX

We get fast queries with `order by` on c:

    arturas=# explain analyze verbose select * from test where a = 1 and b = 1 order by c limit 1;
                                                                    QUERY PLAN
                       

-------------------------------------------------------------------------------------------------------------------------------------------
     Limit  (cost=0.42..0.53 rows=1 width=12) (actual time=0.052..0.052 rows=1 loops=1)
       Output: a, b, c
       ->  Index Only Scan using test_idx on public.test  (cost=0.42..25890.42 rows=251433 width=12) (actual
time=0.051..0.051rows=1 loops=1) 
             Output: a, b, c
             Index Cond: ((test.a = 1) AND (test.b = 1))
             Heap Fetches: 1
     Total runtime: 0.084 ms
    (7 rows)

But really slow ones if we search for null values of b:

    arturas=# explain analyze verbose select * from test where a = 1 and b is null order by c limit 1;
                                                                     QUERY PLAN
                         

---------------------------------------------------------------------------------------------------------------------------------------------
     Limit  (cost=15632.47..15632.47 rows=1 width=12) (actual time=138.127..138.127 rows=1 loops=1)
       Output: a, b, c
       ->  Sort  (cost=15632.47..16253.55 rows=248434 width=12) (actual time=138.127..138.127 rows=1 loops=1)
             Output: a, b, c
             Sort Key: test.c
             Sort Method: top-N heapsort  Memory: 25kB
             ->  Bitmap Heap Scan on public.test  (cost=6378.87..14390.30 rows=248434 width=12) (actual
time=47.083..88.986rows=249243 loops=1) 
                   Output: a, b, c
                   Recheck Cond: ((test.a = 1) AND (test.b IS NULL))
                   ->  Bitmap Index Scan on test_idx  (cost=0.00..6316.77 rows=248434 width=0) (actual
time=46.015..46.015rows=249243 loops=1) 
                         Index Cond: ((test.a = 1) AND (test.b IS NULL))
     Total runtime: 138.200 ms
    (12 rows)

Can someone please give some insight on this problem :)

P.S. I am using `select version()` => PostgreSQL 9.3.5 on
x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.7.2-5) 4.7.2,
64-bit, compiled from source with no default configuration changes.

--
Best Regard,
Artūras Lapinskas



pgsql-performance by date:

From: Tomas Vondra
Date:
Subject: Re: 9.3 performance issues, lots of bind and parse log entries
From: Artūras Lapinskas
Date:
Subject: Index order ignored after `is null` in query