Re: Extremely slow HashAggregate in simple UNION query - Mailing list pgsql-performance

From Jeff Janes
Subject Re: Extremely slow HashAggregate in simple UNION query
Date
Msg-id CAMkU=1x+aT9kWCWKoskOdQPAdXZwLsv-cpKJ6B=rytCibfrpyA@mail.gmail.com
Whole thread Raw
In response to Extremely slow HashAggregate in simple UNION query  (Felix Geisendörfer <felix@felixge.de>)
Responses Re: Extremely slow HashAggregate in simple UNION query  (Pavel Stehule <pavel.stehule@gmail.com>)
Re: Extremely slow HashAggregate in simple UNION query  (Felix Geisendörfer <felix@felixge.de>)
List pgsql-performance
On Tue, Aug 20, 2019 at 11:12 AM Felix Geisendörfer <felix@felixge.de> wrote:
 ...
 
[1] My actual query had bad estimates for other reasons (GIN Index), but that's another story. The query above was of course deliberately designed to have bad estimates.

As noted elsewhere, v12 thwarts your attempts to deliberately design the bad estimates.  You can still get them, you just have to work a bit harder at it:

CREATE FUNCTION j (bigint, bigint) returns setof bigint as $$ select generate_series($1,$2) $$ rows 1000 language sql;

EXPLAIN ANALYZE
SELECT * FROM j(1, 1) a, j(1, 1) b                        
UNION
SELECT * FROM j(1, 1) a, j(1, 1) b;
                                                        QUERY PLAN                                                        
--------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=80021.00..100021.00 rows=2000000 width=16) (actual time=11.332..13.241 rows=1 loops=1)
   Group Key: a.a, b.b
   ->  Append  (cost=0.50..70021.00 rows=2000000 width=16) (actual time=0.118..0.163 rows=2 loops=1)
         ->  Nested Loop  (cost=0.50..20010.50 rows=1000000 width=16) (actual time=0.117..0.118 rows=1 loops=1)
               ->  Function Scan on j a  (cost=0.25..10.25 rows=1000 width=8) (actual time=0.087..0.088 rows=1 loops=1)
               ->  Function Scan on j b  (cost=0.25..10.25 rows=1000 width=8) (actual time=0.027..0.027 rows=1 loops=1)
         ->  Nested Loop  (cost=0.50..20010.50 rows=1000000 width=16) (actual time=0.044..0.044 rows=1 loops=1)
               ->  Function Scan on j a_1  (cost=0.25..10.25 rows=1000 width=8) (actual time=0.022..0.022 rows=1 loops=1)
               ->  Function Scan on j b_1  (cost=0.25..10.25 rows=1000 width=8) (actual time=0.020..0.021 rows=1 loops=1)
 Planning Time: 0.085 ms
 Execution Time: 69.277 ms
(11 rows)

But the same advance in v12 which makes it harder to fool with your test case also opens the possibility of fixing your real case.

I've made an extension which has a function which always returns true, but lies about how often it is expected to return true. See the attachment.  With that, you can fine-tune the planner.

CREATE EXTENSION pg_selectivities ;

EXPLAIN ANALYZE
SELECT * FROM j(1, 1) a, j(1, 1) b where pg_always(0.00001)
UNION
SELECT * FROM j(1, 1) a, j(1, 1) b where pg_always(0.00001);
                                                        QUERY PLAN                                                        
--------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=45021.40..45021.60 rows=20 width=16) (actual time=0.226..0.227 rows=1 loops=1)
   Group Key: a.a, b.b
   ->  Append  (cost=0.50..45021.30 rows=20 width=16) (actual time=0.105..0.220 rows=2 loops=1)
         ->  Nested Loop  (cost=0.50..22510.50 rows=10 width=16) (actual time=0.104..0.105 rows=1 loops=1)
               Join Filter: pg_always('1e-05'::double precision)
               ->  Function Scan on j a  (cost=0.25..10.25 rows=1000 width=8) (actual time=0.066..0.066 rows=1 loops=1)
               ->  Function Scan on j b  (cost=0.25..10.25 rows=1000 width=8) (actual time=0.035..0.035 rows=1 loops=1)
         ->  Nested Loop  (cost=0.50..22510.50 rows=10 width=16) (actual time=0.112..0.113 rows=1 loops=1)
               Join Filter: pg_always('1e-05'::double precision)
               ->  Function Scan on j a_1  (cost=0.25..10.25 rows=1000 width=8) (actual time=0.077..0.077 rows=1 loops=1)
               ->  Function Scan on j b_1  (cost=0.25..10.25 rows=1000 width=8) (actual time=0.034..0.034 rows=1 loops=1)
 Planning Time: 0.139 ms
 Execution Time: 0.281 ms

Cheers,

Jeff

Attachment

pgsql-performance by date:

Previous
From: Luís Roberto Weck
Date:
Subject: Re: Erratically behaving query needs optimization
Next
From: Pavel Stehule
Date:
Subject: Re: Extremely slow HashAggregate in simple UNION query