Re: Index order ignored after `is null` in query - Mailing list pgsql-performance

From Artūras Lapinskas
Subject Re: Index order ignored after `is null` in query
Date
Msg-id 20141106170612.GA527@public-docking-cx-1226.ethz.ch
Whole thread Raw
In response to Index order ignored after `is null` in query  (Artūras Lapinskas <arturaslape@gmail.com>)
Responses Re: Index order ignored after `is null` in query  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
After some more investigation my wild guess would be that then nulls are
involved in query postgresql wants to double check whatever they are
really nulls in actual relation (maybe because of dead tuples). To do
that it has to go and fetch pages from disk and the best way to do that
is to use bitmap index. Sadly bitmaps tend to be not the best option
when using limit in queries. Which would make sense, if it is really a
need to synchronize index with relation...

--
Best Regard,
Artūras Lapinskas

On Wed, Nov 05, 2014 at 10:42:43PM +0100, Artūras Lapinskas wrote:
>Hello,
>
>I am having some hard time understanding how postgresql handles null
>values. As much I understand null values are stored in b-tree as
>simple values (put as last or first depending on index). But it seems
>that there is something really specific about them as postgresql
>deliberately ignores obvious (I think...) optimizations concerning
>index order after using one of them in a query. As a simple example
>look at table below:
>
>    arturas=# drop table if exists test;
>    DROP TABLE
>    arturas=# create table test (
>    arturas(#   a int not null,
>    arturas(#   b int,
>    arturas(#   c int not null
>    arturas(# );
>    CREATE TABLE
>
>After filling this table with random data (actual distribution of
>null's/real values seams not to matter):
>
>    arturas=# insert into test (a, b, c)
>    arturas-#   select
>    arturas-#     case when random() < 0.5 then 1 else 2 end
>    arturas-#     , case when random() < 0.5 then null else 1 end
>    arturas-#     , case when random() < 0.5 then 1 else 2 end
>    arturas-#   from generate_series(1, 1000000, 1) as gen;
>    INSERT 0 1000000
>
>And creating index:
>
>    arturas=# create index test_idx on test (a, b nulls first, c);
>    CREATE INDEX
>
>We get fast queries with `order by` on c:
>
>    arturas=# explain analyze verbose select * from test where a = 1 and b = 1 order by c limit 1;
>                                                                    QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------
>     Limit  (cost=0.42..0.53 rows=1 width=12) (actual time=0.052..0.052 rows=1 loops=1)
>       Output: a, b, c
>       ->  Index Only Scan using test_idx on public.test  (cost=0.42..25890.42 rows=251433 width=12) (actual
time=0.051..0.051rows=1 loops=1) 
>             Output: a, b, c
>             Index Cond: ((test.a = 1) AND (test.b = 1))
>             Heap Fetches: 1
>     Total runtime: 0.084 ms
>    (7 rows)
>
>But really slow ones if we search for null values of b:
>
>    arturas=# explain analyze verbose select * from test where a = 1 and b is null order by c limit 1;
>                                                                     QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------
>     Limit  (cost=15632.47..15632.47 rows=1 width=12) (actual time=138.127..138.127 rows=1 loops=1)
>       Output: a, b, c
>       ->  Sort  (cost=15632.47..16253.55 rows=248434 width=12) (actual time=138.127..138.127 rows=1 loops=1)
>             Output: a, b, c
>             Sort Key: test.c
>             Sort Method: top-N heapsort  Memory: 25kB
>             ->  Bitmap Heap Scan on public.test  (cost=6378.87..14390.30 rows=248434 width=12) (actual
time=47.083..88.986rows=249243 loops=1) 
>                   Output: a, b, c
>                   Recheck Cond: ((test.a = 1) AND (test.b IS NULL))
>                   ->  Bitmap Index Scan on test_idx  (cost=0.00..6316.77 rows=248434 width=0) (actual
time=46.015..46.015rows=249243 loops=1) 
>                         Index Cond: ((test.a = 1) AND (test.b IS NULL))
>     Total runtime: 138.200 ms
>    (12 rows)
>
>Can someone please give some insight on this problem :)
>
>P.S. I am using `select version()` => PostgreSQL 9.3.5 on
>x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.7.2-5) 4.7.2,
>64-bit, compiled from source with no default configuration changes.
>
>--
>Best Regard,
>Artūras Lapinskas


pgsql-performance by date:

Previous
From: Guillaume Lelarge
Date:
Subject: Re: log_temp_files (integer), tuning work_mem
Next
From: Tom Lane
Date:
Subject: Re: Index order ignored after `is null` in query