Thread: BUG #6278: Index scans on '>' condition on field with many NULLS

BUG #6278: Index scans on '>' condition on field with many NULLS

From
"Maksym Boguk"
Date:
The following bug has been logged online:

Bug reference:      6278
Logged by:          Maksym Boguk
Email address:      maxim.boguk@gmail.com
PostgreSQL version: 9.0
Operating system:   Linux
Description:        Index scans on '>' condition on field with many NULLS
Details:

I not sure it is missing feature or actual bug.

However very selective index scan on '>' condition can work pretty
inefficient on column with many nulls.
(in the same time '<' work well).

Seems index scan on '>' condition going through all nulls in index.

Test case (tested on 8.4 and 9.0 with same effect):

postgres=#  CREATE table test as select (case when random()>0.1 then NULL
else random() end) as value from generate_series(1,10000000);
SELECT 10000000
postgres=# CREATE INDEX test_value_key on test(value);
CREATE INDEX
postgres=# SELECT count(*) from test;
  count
----------
 10000000
(1 row)

postgres=# VACUUM ANALYZE test;
VACUUM

postgres=# EXPLAIN ANALYZE select * from test where value>0.9999;
                                                        QUERY PLAN
----------------------------------------------------------------------------
-----------------------------------------------
 Index Scan using test_value_key on test  (cost=0.00..13.78 rows=105
width=8) (actual time=0.010..155.318 rows=91 loops=1)
   Index Cond: (value > 0.9999::double precision)
 Total runtime: 155.346 ms
(3 rows)

Oops... 160ms to return 90 rows from memory.

In the same time 100 rows from other index side:

postgres=# EXPLAIN ANALYZE select * from test where value<0.0001;
                                                        QUERY PLAN
----------------------------------------------------------------------------
----------------------------------------------
 Index Scan using test_value_key on test  (cost=0.00..15.69 rows=120
width=8) (actual time=0.006..0.158 rows=103 loops=1)
   Index Cond: (value < 0.0001::double precision)
 Total runtime: 0.175 ms
(3 rows)

That is good result (1000 faster then other way around).

For sure that can be fixed via create index with NOT NULL predicated. But
may be that problem worth small investigation.

Seems index scan cannot stop after finding first NULL during scan on '>'
condition, and doing scan through all 90% nulls in table.

Re: BUG #6278: Index scans on '>' condition on field with many NULLS

From
Robert Haas
Date:
On Sun, Oct 30, 2011 at 11:39 PM, Maksym Boguk <maxim.boguk@gmail.com> wrot=
e:
> However very selective index scan on '>' condition can work pretty
> inefficient on column with many nulls.
> (in the same time '<' work well).
>
> Seems index scan on '>' condition going through all nulls in index.
>
> Test case (tested on 8.4 and 9.0 with same effect):
>
> postgres=3D# =A0CREATE table test as select (case when random()>0.1 then =
NULL
> else random() end) as value from generate_series(1,10000000);
> SELECT 10000000
> postgres=3D# CREATE INDEX test_value_key on test(value);
> CREATE INDEX
> postgres=3D# SELECT count(*) from test;
> =A0count
> ----------
> =A010000000
> (1 row)
>
> postgres=3D# VACUUM ANALYZE test;
> VACUUM
>
> postgres=3D# EXPLAIN ANALYZE select * from test where value>0.9999;
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0QUERY PLAN
> -------------------------------------------------------------------------=
---
> -----------------------------------------------
> =A0Index Scan using test_value_key on test =A0(cost=3D0.00..13.78 rows=3D=
105
> width=3D8) (actual time=3D0.010..155.318 rows=3D91 loops=3D1)
> =A0 Index Cond: (value > 0.9999::double precision)
> =A0Total runtime: 155.346 ms
> (3 rows)
>
> Oops... 160ms to return 90 rows from memory.
>
> In the same time 100 rows from other index side:
>
> postgres=3D# EXPLAIN ANALYZE select * from test where value<0.0001;
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0QUERY PLAN
> -------------------------------------------------------------------------=
---
> ----------------------------------------------
> =A0Index Scan using test_value_key on test =A0(cost=3D0.00..15.69 rows=3D=
120
> width=3D8) (actual time=3D0.006..0.158 rows=3D103 loops=3D1)
> =A0 Index Cond: (value < 0.0001::double precision)
> =A0Total runtime: 0.175 ms
> (3 rows)
>
> That is good result (1000 faster then other way around).
>
> For sure that can be fixed via create index with NOT NULL predicated. But
> may be that problem worth small investigation.
>
> Seems index scan cannot stop after finding first NULL during scan on '>'
> condition, and doing scan through all 90% nulls in table.

I can reproduce this.  I'm not sure whether it's a bug either, but it
sure seems less than ideal.  I suppose the problem is that we are
generating an index scan that starts at 0.9999 and runs through the
end of the index, rather than stopping when it hits the first NULL.
Not sure how much work it would be to make that happen, but I guess
we'd need a second branch to the index condition to stop the scan,
just as we already do for:

EXPLAIN (ANALYZE, BUFFERS) select * from test where value>0.9993 and
value <0.9999;

--=20
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: BUG #6278: Index scans on '>' condition on field with many NULLS

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Sun, Oct 30, 2011 at 11:39 PM, Maksym Boguk <maxim.boguk@gmail.com> wrote:
>> Seems index scan cannot stop after finding first NULL during scan on '>'
>> condition, and doing scan through all 90% nulls in table.

> I can reproduce this.  I'm not sure whether it's a bug either, but it
> sure seems less than ideal.  I suppose the problem is that we are
> generating an index scan that starts at 0.9999 and runs through the
> end of the index, rather than stopping when it hits the first NULL.

I poked at this a bit last night.  The reason it's happening is that the
">" key is only marked SK_BT_REQBKWD, not SK_BT_REQFWD, so _bt_checkkeys
doesn't think it can stop when it hits the NULLs.  Right at the moment
it seems like we could mark that key with both flags, which leads to the
conclusion that two flags are unnecessary and we could get by with only
one direction-independent flag.  Which, if memory serves, is how it used
to be ... until I split the flag into two to fix some bug or other.  But
the regression tests still pass if you make _bt_mark_scankey_required
mark any required key with both flags (which is the zeroth-order version
of recombining them).  So either my analysis was wrong at the time,
or some later change has eliminated the need for two flags, or the
regression tests aren't covering the problematic case.  Will investigate
further once I've absorbed some caffeine.

            regards, tom lane

Re: BUG #6278: Index scans on '>' condition on field with many NULLS

From
Robert Haas
Date:
On Mon, Oct 31, 2011 at 10:37 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Sun, Oct 30, 2011 at 11:39 PM, Maksym Boguk <maxim.boguk@gmail.com> w=
rote:
>>> Seems index scan cannot stop after finding first NULL during scan on '>'
>>> condition, and doing scan through all 90% nulls in table.
>
>> I can reproduce this. =A0I'm not sure whether it's a bug either, but it
>> sure seems less than ideal. =A0I suppose the problem is that we are
>> generating an index scan that starts at 0.9999 and runs through the
>> end of the index, rather than stopping when it hits the first NULL.
>
> I poked at this a bit last night. =A0The reason it's happening is that the
> ">" key is only marked SK_BT_REQBKWD, not SK_BT_REQFWD, so _bt_checkkeys
> doesn't think it can stop when it hits the NULLs. =A0Right at the moment
> it seems like we could mark that key with both flags, which leads to the
> conclusion that two flags are unnecessary and we could get by with only
> one direction-independent flag. =A0Which, if memory serves, is how it used
> to be ... until I split the flag into two to fix some bug or other. =A0But
> the regression tests still pass if you make _bt_mark_scankey_required
> mark any required key with both flags (which is the zeroth-order version
> of recombining them). =A0So either my analysis was wrong at the time,
> or some later change has eliminated the need for two flags, or the
> regression tests aren't covering the problematic case. =A0Will investigate
> further once I've absorbed some caffeine.

I know almost nothing about this code, but it seems both flags were
introduced in commit 7ccaf13a06b8e1f70b26ab049fdb4f8c8dece3f8.

--=20
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: BUG #6278: Index scans on '>' condition on field with many NULLS

From
Tom Lane
Date:
I wrote:
> I poked at this a bit last night.  The reason it's happening is that the
> ">" key is only marked SK_BT_REQBKWD, not SK_BT_REQFWD, so _bt_checkkeys
> doesn't think it can stop when it hits the NULLs.  Right at the moment
> it seems like we could mark that key with both flags, which leads to the
> conclusion that two flags are unnecessary and we could get by with only
> one direction-independent flag.

OK, I swapped some of this knowledge back in.  The above idea would work
in simple cases with non-redundant scan keys, but in general not.
Consider an indexscan on x with

    WHERE x > 4

We'll start scanning at the x>4 boundary point and run forward.  The
scankey condition can only fail once we reach NULL index entries, at
which point we can stop, so in this case marking this key SK_BT_REQFWD
would be OK.  However, consider something like

    WHERE x > 4 AND x > 10

where _bt_preprocess_keys is unable to determine which key is more
restrictive.  (This could only happen if the comparison constants are of
different types and we don't have a suitable cross-type comparison
operator in the opfamily.  None of the standard opfamilies have such
omissions, which is why there's no regression test covering this.)
In such a case, _bt_first arbitrarily picks one of the keys to do the
initial positioning with.  If it picks x>4, then the x>10 scankey will
fail until we reach index entries > 10 ... but we can't abandon the
scan just because x>10 is failing.  This is why we have to have
direction-sensitive required-key flags.  (If we were to start backing
up, failure of either of the redundant keys would indeed be
justification for stopping.)

We could possibly do something with marking only non-redundant >/>=
scankeys as SK_BT_REQFWD, and conversely for </<= keys.  But this
looks like it'd be a pain to manage in _bt_preprocess_keys, and given
the difficulty of testing cases where it matters, I'm not excited about
going that direction.

What I'm currently thinking is that we could hack _bt_checkkeys to
stop the scan when it reaches NULLs if the current scankey is not an
ISNULL test and is marked *either* SK_BT_REQFWD or SK_BT_REQBKWD.
In such a case we know that the scankey cannot be satisfied by a NULL,
so it will fail until we reach the next range of this index column,
ie move to the next value of the next higher-order column (if any).
And that value can't pass the scankeys, because it must have an equality
constraint, else this key wouldn't be marked required.

BTW, I had thought that Maksym could work around this with something
like
    WHERE value > 0.9999 and value < 1.1
or
    WHERE value > 0.9999 and value is not null
but it turns out that neither of those work.  The second key will be
marked SK_BT_REQFWD, but because _bt_checkkeys stops testing as soon
as any key has failed, it doesn't reach the second key and doesn't
realize that there's a failing SK_BT_REQFWD key available.  That's
a little bit annoying, but the only clear fix would be to keep testing
keys after we already know the index entry has failed, and I think
that's probably going to be a net loss.  If we fix the NULL case to
accept either flag, then the only case where this will matter is
redundant scan keys, and that's not a case that I think we should
optimize at the cost of pessimizing normal cases.

A patch along these lines should be pretty localized.  Has anyone
got an opinion on whether to back-patch or not?  This seems like a
performance bug, but given the lack of prior complaints, maybe we
shouldn't take any risks for it.

            regards, tom lane

Re: BUG #6278: Index scans on '>' condition on field with many NULLS

From
Robert Haas
Date:
On Mon, Oct 31, 2011 at 12:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> A patch along these lines should be pretty localized. =A0Has anyone
> got an opinion on whether to back-patch or not? =A0This seems like a
> performance bug, but given the lack of prior complaints, maybe we
> shouldn't take any risks for it.

I think my vote would be to just fix it in master.

--=20
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: BUG #6278: Index scans on '>' condition on field with many NULLS

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Mon, Oct 31, 2011 at 12:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> A patch along these lines should be pretty localized.  Has anyone
>> got an opinion on whether to back-patch or not?  This seems like a
>> performance bug, but given the lack of prior complaints, maybe we
>> shouldn't take any risks for it.

> I think my vote would be to just fix it in master.

After researching the issue, my patch looks like the attached.  This
seems to apply cleanly back to about 8.3, so I'm inclined to back-patch
that far.  It's pretty simple, and I think the reason we (I) got it
wrong to begin with is just lack of thought about the NULLs case.

            regards, tom lane

diff --git a/src/backend/access/nbtree/nbtutils.c b/src/backend/access/nbtree/nbtutils.c
index 134abe6d596413a1ad54c0c7f378248251295d4a..c00255567116e3b6f7bd8426eb32b2398c8d95d5 100644
*** a/src/backend/access/nbtree/nbtutils.c
--- b/src/backend/access/nbtree/nbtutils.c
*************** _bt_checkkeys(IndexScanDesc scan,
*** 1384,1398 ****
              }

              /*
!              * Tuple fails this qual.  If it's a required qual for the current
!              * scan direction, then we can conclude no further tuples will
!              * pass, either.
               */
!             if ((key->sk_flags & SK_BT_REQFWD) &&
!                 ScanDirectionIsForward(dir))
!                 *continuescan = false;
!             else if ((key->sk_flags & SK_BT_REQBKWD) &&
!                      ScanDirectionIsBackward(dir))
                  *continuescan = false;

              /*
--- 1394,1409 ----
              }

              /*
!              * Tuple fails this qual.  If it's a required qual, then we can
!              * conclude no further tuples will pass, either.  We can stop
!              * regardless of the scan direction, because we know that NULLs
!              * sort to one end or the other of the range of values.  If this
!              * tuple doesn't pass, then no future ones will either, until we
!              * reach the next set of values of the higher-order index attrs
!              * (if any) ... and those attrs must have equality quals, else
!              * this one wouldn't be marked required.
               */
!             if (key->sk_flags & (SK_BT_REQFWD | SK_BT_REQBKWD))
                  *continuescan = false;

              /*
*************** _bt_checkkeys(IndexScanDesc scan,
*** 1403,1434 ****

          if (isNull)
          {
!             if (key->sk_flags & SK_BT_NULLS_FIRST)
!             {
!                 /*
!                  * Since NULLs are sorted before non-NULLs, we know we have
!                  * reached the lower limit of the range of values for this
!                  * index attr.    On a backward scan, we can stop if this qual
!                  * is one of the "must match" subset.  On a forward scan,
!                  * however, we should keep going.
!                  */
!                 if ((key->sk_flags & SK_BT_REQBKWD) &&
!                     ScanDirectionIsBackward(dir))
!                     *continuescan = false;
!             }
!             else
!             {
!                 /*
!                  * Since NULLs are sorted after non-NULLs, we know we have
!                  * reached the upper limit of the range of values for this
!                  * index attr.    On a forward scan, we can stop if this qual is
!                  * one of the "must match" subset.    On a backward scan,
!                  * however, we should keep going.
!                  */
!                 if ((key->sk_flags & SK_BT_REQFWD) &&
!                     ScanDirectionIsForward(dir))
!                     *continuescan = false;
!             }

              /*
               * In any case, this indextuple doesn't match the qual.
--- 1414,1428 ----

          if (isNull)
          {
!             /*
!              * The index entry is NULL, so it must fail this qual (we assume
!              * all btree operators are strict).  Furthermore, we know that
!              * all remaining entries with the same higher-order index attr
!              * values must be NULLs too.  So, just as above, we can stop the
!              * scan regardless of direction, if the qual is required.
!              */
!             if (key->sk_flags & (SK_BT_REQFWD | SK_BT_REQBKWD))
!                 *continuescan = false;

              /*
               * In any case, this indextuple doesn't match the qual.
*************** _bt_check_rowcompare(ScanKey skey, Index
*** 1508,1539 ****

          if (isNull)
          {
!             if (subkey->sk_flags & SK_BT_NULLS_FIRST)
!             {
!                 /*
!                  * Since NULLs are sorted before non-NULLs, we know we have
!                  * reached the lower limit of the range of values for this
!                  * index attr. On a backward scan, we can stop if this qual is
!                  * one of the "must match" subset.    On a forward scan,
!                  * however, we should keep going.
!                  */
!                 if ((subkey->sk_flags & SK_BT_REQBKWD) &&
!                     ScanDirectionIsBackward(dir))
!                     *continuescan = false;
!             }
!             else
!             {
!                 /*
!                  * Since NULLs are sorted after non-NULLs, we know we have
!                  * reached the upper limit of the range of values for this
!                  * index attr. On a forward scan, we can stop if this qual is
!                  * one of the "must match" subset.    On a backward scan,
!                  * however, we should keep going.
!                  */
!                 if ((subkey->sk_flags & SK_BT_REQFWD) &&
!                     ScanDirectionIsForward(dir))
!                     *continuescan = false;
!             }

              /*
               * In any case, this indextuple doesn't match the qual.
--- 1502,1516 ----

          if (isNull)
          {
!             /*
!              * The index entry is NULL, so it must fail this qual (we assume
!              * all btree operators are strict).  Furthermore, we know that
!              * all remaining entries with the same higher-order index attr
!              * values must be NULLs too.  So, just as above, we can stop the
!              * scan regardless of direction, if the qual is required.
!              */
!             if (subkey->sk_flags & (SK_BT_REQFWD | SK_BT_REQBKWD))
!                 *continuescan = false;

              /*
               * In any case, this indextuple doesn't match the qual.