Thread: BUG #6283: About the behavior of indexscan in case there are some NULL values.

BUG #6283: About the behavior of indexscan in case there are some NULL values.

From

"Naoya Anzai"

Date:

02 November 2011, 01:44:26

The following bug has been logged online:

Bug reference:      6283
Logged by:          Naoya Anzai
Email address:      anzai-naoya@mxu.nes.nec.co.jp
PostgreSQL version: 9.1.1
Operating system:   RHEL5.5
Description:        About the behavior of indexscan in case there are some
NULL values.
Details:

Hello,

In the newest PostgreSQL(9.1.1) or lower, Index-Scan's actual time may
increase unexpectedly.

I think that this is for scanning All NULL VALUES when performing an
indexscan
even if they does not need to be scanned.

I think that the cause is here.

[src/backend/access/nbtree/nbtutils.c(L963-L996) in PostgreSQL9.1.1]
--------
if (isNull)
{
     if (key->sk_flags & SK_BT_NULLS_FIRST)
     {
             /*
              * Since NULLs are sorted before non-NULLs, we know we have
              * reached the lower limit of the range of values for this
              * index attr.  On a backward scan, we can stop if this qual
              * is one of the "must match" subset.  On a forward scan,
              * however, we should keep going.
              */
             if ((key->sk_flags & SK_BT_REQBKWD) &&
                     ScanDirectionIsBackward(dir))
                     *continuescan = false;
     }
     else
     {
             /*
              * Since NULLs are sorted after non-NULLs, we know we have
              * reached the upper limit of the range of values for this
              * index attr.  On a forward scan, we can stop if this qual is
              * one of the "must match" subset.      On a backward scan,
              * however, we should keep going.
              */
             if ((key->sk_flags & SK_BT_REQFWD) &&
                     ScanDirectionIsForward(dir))
                     *continuescan = false;
     }
        /*
         * In any case, this indextuple doesn't match the qual.
         */
        return false;
}
---------
For example, with NULLS_LAST, GREATER THAN scan key('value > scankey' etc.),
and FORWARD SCAN conditions,
even if scan have reached a NULL value, continuescan is still true all the
time.

If it rewrites as follows, I think that this problem is solved, but how is
it?
--------

--- nbtutils.c  2011-11-02 14:10:55.000000000 +0900
+++ nbtutils.c.new      2011-11-02 14:11:38.000000000 +0900
@@ -971,8 +971,7 @@
                                 * is one of the "must match" subset.  On a
forward scan,
                                 * however, we should keep going.
                                 */
-                               if ((key->sk_flags & SK_BT_REQBKWD) &&
-                                       ScanDirectionIsBackward(dir))
+                               if (ScanDirectionIsBackward(dir))
                                        *continuescan = false;
                        }
                        else
@@ -984,8 +983,7 @@
                                 * one of the "must match" subset.      On a
backward scan,
                                 * however, we should keep going.
                                 */
-                               if ((key->sk_flags & SK_BT_REQFWD) &&
-                                       ScanDirectionIsForward(dir))
+                               if (ScanDirectionIsForward(dir))
                                        *continuescan = false;
                        }

---------

Regards,
Naoya Anzai

Re: BUG #6283: About the behavior of indexscan in case there are some NULL values.

From

安西　直也

Date:

02 November 2011, 06:48:10

>>
>> Hello,
>>
>> In the newest PostgreSQL(9.1.1) or lower, Index-Scan's actual time may
>> increase unexpectedly.
>>
>> I think that this is for scanning All NULL VALUES when performing an
>> indexscan
>> even if they does not need to be scanned.
>
>I think this was just fixed. Please check latest source code.

I have checked latest source code.
But, backward scan doesn't work correctly...

==========================================
[naoya@nesitcspg03 ~]$ psql
psql (9.2devel)
Type "help" for help.

naoya=# create table hoge(id integer,id2 integer);
CREATE TABLE
naoya=# insert into hoge select generate_series(1,10);
INSERT 0 10
naoya=# update hoge set id2=1 where id=5;
UPDATE 1
naoya=# update hoge set id2=10 where id=7;
UPDATE 1
naoya=# create index hoge_idx on hoge(id2);
CREATE INDEX
naoya=# analyze hoge;
ANALYZE
naoya=# set enable_bitmapscan to off;
SET
naoya=# set enable_seqscan to off;
SET
naoya=# select * from hoge;id | id2
----+----- 1 | 2 | 3 | 4 | 6 | 8 | 9 |10 | 5 |   1 7 |  10
(10 rows)

naoya=# explain analyze select * from hoge where id2>0;                                                 QUERY PLAN

---------------------------------------------------------------------------------------------------------------Index
Scanusing hoge_idx on hoge  (cost=0.00..8.29 rows=2 width=8) (actual time=0.010..0.012 rows=2 loops=1)  Index Cond:
(id2> 0)Total runtime: 0.065 ms 
(3 rows)

naoya=# explain analyze select * from hoge where id2>0 order by id2 desc;
      QUERY PLAN 


------------------------------------------------------------------------------------------------------------------------Index
ScanBackward using hoge_idx on hoge  (cost=0.00..8.29 rows=2 width=8) (actual time=0.005..0.005 rows=0 loops=1)  Index
Cond:(id2 > 0)Total runtime: 0.035 ms 
(3 rows)

naoya=# select * from hoge where id2>0;id | id2
----+----- 5 |   1 7 |  10
(2 rows)

naoya=# select * from hoge where id2>0 order by id2 desc;id | id2
----+-----
(0 rows)

==========================================

Regards.

---
Naoya Anzai
---

Re: BUG #6283: About the behavior of indexscan in case there are some NULL values.

From

Tom Lane

Date:

02 November 2011, 13:38:21

安西　直也 <anzai-naoya@mxu.nes.nec.co.jp> writes:
> I have checked latest source code.
> But, backward scan doesn't work correctly...

[ pokes at that... ]  Hmm, the patches I applied a couple days ago
assumed that we are stepping forward or back from a place where the
WHERE clauses are satisfied.  But in this example, the system just
applies _bt_endpoint to descend to the right-hand end of the index,
since there is no upper-bound qual with which to do anything different.
So we start from a place where the clauses aren't satisfied.  That also
means that we haven't really fixed the original performance complaint:
there could be lots of nulls to be stepped over before we reach the
first matching row.

I think that the right fix for this is probably to make
_bt_preprocess_keys explicitly generate the "id is not null" qual that's
implied by "id > 0", so that it will have what amounts to a range
condition on the index contents (since for NULLS LAST, "id is not null"
amounts to "id is less than null", as it were).  Then, instead of applying
_bt_endpoint, it will use the less-than key to descend the btree to the
last non-null entry, and we'll be good for both correctness and
performance.

I don't see any big problem in doing this in HEAD, but it's getting past
what seems like a sane back-patch.  So probably we should revert the
back-branch versions of the prior patch, and just say that the
performance problem is only going to be addressed in HEAD.
        regards, tom lane