Home > mailing lists

UNION ALL with WHERE clause does not use Merge Append - Mailing list pgsql-hackers

From	Marti Raudsepp
Subject	UNION ALL with WHERE clause does not use Merge Append
Date	August 16, 2012 12:31:50
Msg-id	CABRT9RBbqEAZ37JMLq4cVTEJ5zTnzSFGBiXNZg-S6ctFSig+QQ@mail.gmail.com Whole thread Raw
Responses	Re: UNION ALL with WHERE clause does not use Merge Append
List	pgsql-hackers

Tree view

Hi list,

I have a table with (start_time, end_time) and I'd want to tally up
the number of concurrent connections at any point in time. The "Merge
Append" plan node introduced in 9.1 would be perfect for this purpose.
It seems to work out fine for the most trivial case, but just when I
add an WHERE clause to any of the UNION subqueries, this stops
working. Tested on 9.1.4 and 9.2beta3

Here's a simplified test case:

create table foo as select generate_series(1,1000000) i;
create index on foo(i);
vacuum analyze foo;

This works as expected:
EXPLAIN ANALYZE select i from foo
UNION ALL select i from foo
ORDER BY 1 LIMIT 100;
Limit  (cost=0.01..3.31 rows=100 width=4) (actual time=0.028..0.078
rows=100 loops=1)  ->  Result  (cost=0.01..65981.61 rows=2000000 width=4) (actual
time=0.026..0.064 rows=100 loops=1)        ->  Merge Append  (cost=0.01..65981.61 rows=2000000 width=4)
(actual time=0.026..0.053 rows=100 loops=1)              Sort Key: public.foo.i              ->  Index Only Scan using
foo_i_idxon foo
 
(cost=0.00..20490.80 rows=1000000 width=4) (actual time=0.017..0.021
rows=51 loops=1)                    Heap Fetches: 0              ->  Index Only Scan using foo_i_idx on foo
(cost=0.00..20490.80 rows=1000000 width=4) (actual time=0.007..0.012
rows=50 loops=1)                    Heap Fetches: 0Total runtime: 0.106 ms


But once I add even a basic WHERE clause, suddenly it decides that
sorting is the only way:
EXPLAIN ANALYZE select i from foo where i is not null
UNION ALL select i from foo where i is not null
ORDER BY 1 LIMIT 100;
Limit  (cost=127250.56..127250.81 rows=100 width=4) (actual
time=1070.799..1070.812 rows=100 loops=1)  ->  Sort  (cost=127250.56..132250.56 rows=2000000 width=4) (actual
time=1070.798..1070.804 rows=100 loops=1)        Sort Key: public.foo.i        Sort Method: top-N heapsort  Memory:
29kB       ->  Result  (cost=0.00..50812.00 rows=2000000 width=4)
 
(actual time=0.009..786.806 rows=2000000 loops=1)              ->  Append  (cost=0.00..50812.00 rows=2000000 width=4)
(actual time=0.007..512.201 rows=2000000 loops=1)                    ->  Seq Scan on foo  (cost=0.00..15406.00
rows=1000000 width=4) (actual time=0.007..144.872 rows=1000000
loops=1)                          Filter: (i IS NOT NULL)                    ->  Seq Scan on foo  (cost=0.00..15406.00
rows=1000000 width=4) (actual time=0.003..139.196 rows=1000000
loops=1)                          Filter: (i IS NOT NULL)Total runtime: 1070.847 ms


Works again when I stuff it in a subquery and put WHERE above the
UNION. But this loses flexibility -- I can't filter the subqueries
with different clauses any more:

EXPLAIN ANALYZE
select * from (   select i from foo UNION ALL   select i from foo
) subq where i is not null
ORDER BY 1 LIMIT 100;
Limit  (cost=0.01..3.56 rows=100 width=4) (actual time=0.033..0.088
rows=100 loops=1)  ->  Result  (cost=0.01..70981.61 rows=2000000 width=4) (actual
time=0.032..0.071 rows=100 loops=1)        ->  Merge Append  (cost=0.01..70981.61 rows=2000000 width=4)
(actual time=0.031..0.059 rows=100 loops=1)              Sort Key: public.foo.i              ->  Index Only Scan using
foo_i_idxon foo
 
(cost=0.00..22990.80 rows=1000000 width=4) (actual time=0.020..0.025
rows=51 loops=1)                    Index Cond: (i IS NOT NULL)                    Heap Fetches: 0              ->
IndexOnly Scan using foo_i_idx on foo
 
(cost=0.00..22990.80 rows=1000000 width=4) (actual time=0.010..0.014
rows=50 loops=1)                    Index Cond: (i IS NOT NULL)                    Heap Fetches: 0Total runtime: 0.115
ms


Is this just a planner shortcoming or a bug? Or is there some
justification for this behavior?

Regards,
Marti

pgsql-hackers by date:

From: Daniel Farina
Date: 16 August 2012, 10:02:51
Subject: Re: feature request: auto savepoint for interactive psql when in transaction.

From: Pavel Stehule
Date: 16 August 2012, 14:44:14
Subject: Re: ToDo: allow to get a number of processed rows by COPY statement

UNION ALL with WHERE clause does not use Merge Append - Mailing list pgsql-hackers

Previous

Next