Re: missing estimation for coalesce function - Mailing list pgsql-hackers

From David Fetter
Subject Re: missing estimation for coalesce function
Date
Msg-id 20191128025646.GC8731@fetter.org
Whole thread Raw
In response to missing estimation for coalesce function  (Pavel Stehule <pavel.stehule@gmail.com>)
Responses Re: missing estimation for coalesce function  (Pavel Stehule <pavel.stehule@gmail.com>)
List pgsql-hackers
On Wed, Nov 27, 2019 at 08:47:56AM +0100, Pavel Stehule wrote:
> Hi
> 
> I have a report from my customer about migration his application from
> Oracle to Postgres.
> 
> The most significant issue was missing correct estimation for coalesce
> function. He had to rewrite coalesce(var, X) = X to "var IS NULL or var =
> X". Then the result was very satisfactory.
> 
> Example:
> 
> create table xxx(a int);
> insert into xxx select null from generate_series(1,10000);
> insert into xxx select 1 from generate_series(1,1000);
> insert into xxx select 0 from generate_series(1,1000);
> analyze xxx;
> 
> postgres=# explain analyze select * from xxx where coalesce(a, 0) = 0;
>                                              QUERY PLAN
> 
> ----------------------------------------------------------------------------------------------------
>  Seq Scan on xxx  (cost=0.00..194.00 rows=60 width=4) (actual
> time=0.041..4.276 rows=11000 loops=1)
>    Filter: (COALESCE(a, 0) = 0)
>    Rows Removed by Filter: 1000
>  Planning Time: 0.099 ms
>  Execution Time: 5.412 ms
> (5 rows)
> 
> postgres=# explain analyze select * from xxx where a is null or a = 0;
>                                               QUERY PLAN
> 
> -------------------------------------------------------------------------------------------------------
>  Seq Scan on xxx  (cost=0.00..194.00 rows=10167 width=4) (actual
> time=0.052..5.891 rows=11000 loops=1)
>    Filter: ((a IS NULL) OR (a = 0))
>    Rows Removed by Filter: 1000
>  Planning Time: 0.136 ms
>  Execution Time: 7.522 ms
> (5 rows)
> 
> I think so pattern coalesce(var, X) = X is very common so can be very
> interesting to support it better.

Better support sounds great!

How specifically might this be better supported? On this relatively
short table, I see planning times considerably longer, I assume
because they need to take a function call into account, and execution
times longer but not all that much longer. I tried with 3 million
rows, and got the representative samples below:

shackle@[local]:5413/ctest(13devel)(149711) # EXPLAIN ANALYZE SELECT * FROM xxx WHERE COALESCE(a, 0)=0;
                                                        QUERY PLAN
 
 

══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════
 Gather  (cost=1000.00..30391.00 rows=15000 width=4) (actual time=1.315..346.406 rows=999772 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Seq Scan on xxx  (cost=0.00..27891.00 rows=6250 width=4) (actual time=0.029..216.419 rows=333257
loops=3)
         Filter: (COALESCE(a, 0) = 0)
         Rows Removed by Filter: 666743
 Planning Time: 0.204 ms
 Execution Time: 389.307 ms
(8 rows)

Time: 391.394 ms

shackle@[local]:5413/ctest(13devel)(149711) # EXPLAIN ANALYZE SELECT * FROM xxx WHERE a IS NULL OR a = 0;
                                                 QUERY PLAN                                                  
═════════════════════════════════════════════════════════════════════════════════════════════════════════════
 Seq Scan on xxx  (cost=0.00..49766.00 rows=995700 width=4) (actual time=0.043..524.401 rows=999772 loops=1)
   Filter: ((a IS NULL) OR (a = 0))
   Rows Removed by Filter: 2000228
 Planning Time: 0.106 ms
 Execution Time: 560.593 ms
(5 rows)

Time: 561.186 ms

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: pgbench -i progress output on terminal
Next
From: Thomas Munro
Date:
Subject: Re: POC: Cleaning up orphaned files using undo logs