Nested loop does not preserve order. Why? - Mailing list pgsql-bugs

From Alexey Bashtanov
Subject Nested loop does not preserve order. Why?
Date
Msg-id 52F9E549.9060900@imap.cc
Whole thread Raw
Responses Re: Nested loop does not preserve order. Why?
List pgsql-bugs
Hello!

Let us have a table T(A, B), an index on T(B, A) and a much smaller
table V(B).

stat-dev1/ui_dev_16 12:27:07 > create table u as select a, a % 1000 b
from generate_series(1, 10000000) a;
SELECT 10000000
stat-dev1/ui_dev_16 12:27:28 > create table v as select distinct b from
u where b % 100 = 1;
SELECT 10
stat-dev1/ui_dev_16 12:27:41 > analyze t;
ANALYZE
stat-dev1/ui_dev_16 12:27:42 > analyze v;
ANALYZE
stat-dev1/ui_dev_16 12:27:45 > create index u_i on u(b,a);
CREATE INDEX

And we are looking for all rows of T that have B present in V and sorted
by B, A.
The most natural way to obtain them would be to take all records from V
ordered by B, and to perform an index-only scan on T for each B.
But not for postgresql :(

stat-dev1/ui_dev_16 12:28:57 > EXPLAIN ANALYZE select b, a from u where
b in (select b from v) order by b, a;
                                                          QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------
  Sort  (cost=879810.77..892310.77 rows=5000000 width=8) (actual
time=4728.831..4756.802 rows=100000 loops=1)
    Sort Key: u.b, u.a
    Sort Method: external merge  Disk: 1760kB
    ->  Hash Join  (cost=1.35..186749.35 rows=5000000 width=8) (actual
time=0.121..4568.170 rows=100000 loops=1)
          Hash Cond: (u.b = v.b)
          ->  Seq Scan on u  (cost=0.00..144248.00 rows=10000000
width=8) (actual time=0.027..2294.474 rows=10000000 loops=1)
          ->  Hash  (cost=1.23..1.23 rows=10 width=4) (actual
time=0.071..0.071 rows=10 loops=1)
                Buckets: 1024  Batches: 1  Memory Usage: 1kB
                ->  HashAggregate  (cost=1.12..1.23 rows=10 width=4)
(actual time=0.062..0.063 rows=10 loops=1)
                      ->  Seq Scan on v  (cost=0.00..1.10 rows=10
width=4) (actual time=0.050..0.052 rows=10 loops=1)
  Total runtime: 4771.817 ms
(11 rows)

Replacing IN by EXISTS changes nothing. Putting an ORDER BY clause into
IN(...) unsurprisingly does not help.
The only way to obtain the desired plan I found is to rewrite the query:

stat-dev1/ui_dev_16 12:29:06 > set enable_bitmapscan to off;
SET
stat-dev1/ui_dev_16 12:29:11 > set enable_seqscan to off;
SET
stat-dev1/ui_dev_16 12:29:13 > EXPLAIN ANALYZE select b, a from u where
b = any((select array_agg(b) from v)::int[]) order by b, a;
QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------
  Index Only Scan using u_i on u (cost=10000000001.58..10000315253.45
rows=99551 width=8) (actual time=0.078..103.239 rows=100000 loops=1)
    Index Cond: (b = ANY ($0))
    Heap Fetches: 100000
    InitPlan 1 (returns $0)
      ->  Aggregate  (cost=10000000001.13..10000000001.14 rows=1
width=4) (actual time=0.018..0.018 rows=1 loops=1)
            ->  Seq Scan on v (cost=10000000000.00..10000000001.10
rows=10 width=4) (actual time=0.006..0.009 rows=10 loops=1)
  Total runtime: 119.929 ms
(7 rows)

The plan I expect for the original query is something like that:

   Nested Loop Preserving Order (cost=1.85..317895.69 rows=100000 width=8)
     ->  HashAggregate  (cost=1.42..1.52 rows=10 width=4)
           ->  Sort  (cost=1.27..1.29 rows=10 width=4)
                 Sort Key: v.b
                 ->  Seq Scan on v  (cost=0.00..1.10 rows=10 width=4)
     ->  Index Only Scan using u_i on u  (cost=0.43..31689.42 rows=10000
width=8)
           Index Cond: (b = v.b)

But if we try to hint postgresql we will realize that nested loop does
not preserve order (see below). Why?

stat-dev1/ui_dev_16 12:42:32 > set enable_bitmapscan to off;
SET
Time: 0,895 ms
stat-dev1/ui_dev_16 12:42:33 > set enable_hashjoin to off;
SET
Time: 0,762 ms
stat-dev1/ui_dev_16 12:42:36 > EXPLAIN select b, a from u where b in
(select b from v order by b) order by b, a;
                                       QUERY PLAN
--------------------------------------------------------------------------------------
  Sort  (cost=326200.51..326450.51 rows=100000 width=8)
    Sort Key: u.b, u.a
    ->  Nested Loop  (cost=1.85..317895.69 rows=100000 width=8)
          ->  HashAggregate  (cost=1.42..1.52 rows=10 width=4)
                ->  Sort  (cost=1.27..1.29 rows=10 width=4)
                      Sort Key: v.b
                      ->  Seq Scan on v  (cost=0.00..1.10 rows=10 width=4)
          ->  Index Only Scan using u_i on u  (cost=0.43..31689.42
rows=10000 width=8)
                Index Cond: (b = v.b)
(9 rows)

pgsql-bugs by date:

Previous
From: Firas Khasawneh
Date:
Subject: Re: ODBC Driver not allowing updates into views
Next
From: RogerBerkelmans@arlcollect.com.au
Date:
Subject: BUG #9182: Data type (text) issue with MS SQL 2012