Re: [PERFORM] Optimization inner join - Mailing list pgsql-performance

From Gustavo Rezende Montesino
Subject Re: [PERFORM] Optimization inner join
Date
Msg-id 5880BDEA.9050804@trtsp.jus.br
Whole thread Raw
In response to Re: [PERFORM] Optimization inner join  (Clailson <clailson.dba@gmail.com>)
Responses Re: [PERFORM] Optimization inner join  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: [PERFORM] Optimization inner join  ("Phillip Couto" <phillip@couto.in> )
List pgsql-performance
Hello,

Em 19/01/2017 11:04, Clailson escreveu:
Hi Phillip.



Not sure if it is all that common. Curious what if you put b.b IS NOT NULL in the WHERE statement?

It's the question. In the company I work with, one of my clients asked me: "Why PostgreSQL does not remove rows with null in column b (table b), before joining, since these rows have no corresponding in table a?" I gave the suggestion to put the IS NOT NULL in the WHERE statement, but HE can't modify the query in the application.

I did the tests with Oracle and it uses a predicate in the query plan, removing the lines where b.b is null.
In Oracle, it´s the same plan, with and without IS NOT NULL in the
WHERE statement.

Being the client in question, I would like to make a little remark: What we thought could be optimized here at first is on the row estimate of the index scan; which could take null_frac into account. To put things into perspective, our similar case in production has a table with 6 million lines where only 9.5k aren´t null for the join field, an the over-estimation is throwing away good plans (like ~150ms execution time) in favor of pretty bad ones (~80s execution time).

We´ve asked application people to put the where not null workaround, which works great, and are waiting on an answer, but I believe getting better estimates without that would be great if possible.


On 19/01/2017 09:34, Phillip Couto wrote:
NULL is still a value that may be paired with a NULL in a.a

Is that so? I would believe you would never get a match, as NULL <> NULL

On Jan 19, 2017, at 05:08, Clailson <clailson.dba@gmail.com> wrote:

Hi,

Is there something in the roadmap to optimize the inner join?

I've this situation
above. Table b has 400 rows with null in the column b.

explain analyze select * from a inner join b on (b.b = a.a);
"Merge Join  (cost=0.55..65.30 rows=599 width=16) (actual time=0.030..1.173 rows=599 loops=1)" 
"  Merge Cond: (a.a = b.b)" 
"  ->  Index Scan using a_pkey on a  (cost=0.28..35.27 rows=1000 width=8) (actual time=0.014..0.364 rows=1000 loops=1)" 
"  ->  Index Scan using in01 on b  (cost=0.28..33.27 rows=1000 width=8) (actual time=0.012..0.249 rows=600 loops=1)" 
"Total runtime: 1.248 ms" 

My question is: Why the planner isn't removing the null rows during the scan of table b?
-- 
Clailson Soares Dinízio de Almeida




Regards,

Gustavo R. Montesino
Tribunal Regional do Trabalho da 2a Região
Secretaria de Tecnologia da Informação e Comunicação
Coordenadoria de Infraestrutura de TIC
Seção de Administração de Banco de Dados
Av. Marquês de São Vicente, 121 - Bl. A - Sala 404
Telefone: (11) 3150-2082

pgsql-performance by date:

Previous
From: "Phillip Couto"
Date:
Subject: Re: [PERFORM] Optimization inner join
Next
From: Vitalii Tymchyshyn
Date:
Subject: Re: [PERFORM] Optimization inner join