Nested loop does not preserve order. Why? - Mailing list pgsql-bugs
From | Alexey Bashtanov |
---|---|
Subject | Nested loop does not preserve order. Why? |
Date | |
Msg-id | 52F9E549.9060900@imap.cc Whole thread Raw |
Responses |
Re: Nested loop does not preserve order. Why?
|
List | pgsql-bugs |
Hello! Let us have a table T(A, B), an index on T(B, A) and a much smaller table V(B). stat-dev1/ui_dev_16 12:27:07 > create table u as select a, a % 1000 b from generate_series(1, 10000000) a; SELECT 10000000 stat-dev1/ui_dev_16 12:27:28 > create table v as select distinct b from u where b % 100 = 1; SELECT 10 stat-dev1/ui_dev_16 12:27:41 > analyze t; ANALYZE stat-dev1/ui_dev_16 12:27:42 > analyze v; ANALYZE stat-dev1/ui_dev_16 12:27:45 > create index u_i on u(b,a); CREATE INDEX And we are looking for all rows of T that have B present in V and sorted by B, A. The most natural way to obtain them would be to take all records from V ordered by B, and to perform an index-only scan on T for each B. But not for postgresql :( stat-dev1/ui_dev_16 12:28:57 > EXPLAIN ANALYZE select b, a from u where b in (select b from v) order by b, a; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------- Sort (cost=879810.77..892310.77 rows=5000000 width=8) (actual time=4728.831..4756.802 rows=100000 loops=1) Sort Key: u.b, u.a Sort Method: external merge Disk: 1760kB -> Hash Join (cost=1.35..186749.35 rows=5000000 width=8) (actual time=0.121..4568.170 rows=100000 loops=1) Hash Cond: (u.b = v.b) -> Seq Scan on u (cost=0.00..144248.00 rows=10000000 width=8) (actual time=0.027..2294.474 rows=10000000 loops=1) -> Hash (cost=1.23..1.23 rows=10 width=4) (actual time=0.071..0.071 rows=10 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 1kB -> HashAggregate (cost=1.12..1.23 rows=10 width=4) (actual time=0.062..0.063 rows=10 loops=1) -> Seq Scan on v (cost=0.00..1.10 rows=10 width=4) (actual time=0.050..0.052 rows=10 loops=1) Total runtime: 4771.817 ms (11 rows) Replacing IN by EXISTS changes nothing. Putting an ORDER BY clause into IN(...) unsurprisingly does not help. The only way to obtain the desired plan I found is to rewrite the query: stat-dev1/ui_dev_16 12:29:06 > set enable_bitmapscan to off; SET stat-dev1/ui_dev_16 12:29:11 > set enable_seqscan to off; SET stat-dev1/ui_dev_16 12:29:13 > EXPLAIN ANALYZE select b, a from u where b = any((select array_agg(b) from v)::int[]) order by b, a; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------- Index Only Scan using u_i on u (cost=10000000001.58..10000315253.45 rows=99551 width=8) (actual time=0.078..103.239 rows=100000 loops=1) Index Cond: (b = ANY ($0)) Heap Fetches: 100000 InitPlan 1 (returns $0) -> Aggregate (cost=10000000001.13..10000000001.14 rows=1 width=4) (actual time=0.018..0.018 rows=1 loops=1) -> Seq Scan on v (cost=10000000000.00..10000000001.10 rows=10 width=4) (actual time=0.006..0.009 rows=10 loops=1) Total runtime: 119.929 ms (7 rows) The plan I expect for the original query is something like that: Nested Loop Preserving Order (cost=1.85..317895.69 rows=100000 width=8) -> HashAggregate (cost=1.42..1.52 rows=10 width=4) -> Sort (cost=1.27..1.29 rows=10 width=4) Sort Key: v.b -> Seq Scan on v (cost=0.00..1.10 rows=10 width=4) -> Index Only Scan using u_i on u (cost=0.43..31689.42 rows=10000 width=8) Index Cond: (b = v.b) But if we try to hint postgresql we will realize that nested loop does not preserve order (see below). Why? stat-dev1/ui_dev_16 12:42:32 > set enable_bitmapscan to off; SET Time: 0,895 ms stat-dev1/ui_dev_16 12:42:33 > set enable_hashjoin to off; SET Time: 0,762 ms stat-dev1/ui_dev_16 12:42:36 > EXPLAIN select b, a from u where b in (select b from v order by b) order by b, a; QUERY PLAN -------------------------------------------------------------------------------------- Sort (cost=326200.51..326450.51 rows=100000 width=8) Sort Key: u.b, u.a -> Nested Loop (cost=1.85..317895.69 rows=100000 width=8) -> HashAggregate (cost=1.42..1.52 rows=10 width=4) -> Sort (cost=1.27..1.29 rows=10 width=4) Sort Key: v.b -> Seq Scan on v (cost=0.00..1.10 rows=10 width=4) -> Index Only Scan using u_i on u (cost=0.43..31689.42 rows=10000 width=8) Index Cond: (b = v.b) (9 rows)
pgsql-bugs by date: