Re: How to hint 2 coulms IS NOT DISTINCT FROM each other - Mailing list pgsql-general

From Merlin Moncure
Subject Re: How to hint 2 coulms IS NOT DISTINCT FROM each other
Date
Msg-id CAHyXU0whPYPA8LGw7cxSjN8rvFO=CurzaLwSZhfCOm7WMA+cfQ@mail.gmail.com
Whole thread Raw
In response to Re: How to hint 2 coulms IS NOT DISTINCT FROM each other  (Kim Rose Carlsen <krc@hiper.dk>)
Responses Re: How to hint 2 coulms IS NOT DISTINCT FROM each other  (Alban Hertroys <haramrae@gmail.com>)
List pgsql-general
On Fri, Nov 4, 2016 at 8:08 AM, Kim Rose Carlsen <krc@hiper.dk> wrote:
>>> It might raise another problem, that the nulls are generated through LEFT
>
>>> JOINS where no rows are defined. Then the 0 or -1 value need to be
>>> a computed value. Won't this throw off index lookups? (I might be
>>> more confused in this area).
>>
>>Not following this.
>
> The nulls are generated by something like this
>     SELECT c.circuit_id,
>                    cc.customer_id
>        FROM circuit AS c
> LEFT JOIN circuit_customer AS cc
>              ON c.circuit_id = cc.circuit_id
>
> To make a magic '0' customer we would be required to use
>   COALESCE(cc.customer_id, '0')
> I dont think the optimizer will do anything clever with the '0' we have
> computed from null.

It would if you explicitly indexed it as such;
CREATE INDEX ON circuit_customer((COALESCE(customer_id, '0'));

> I could ofc. by default assign all unassigned circuits to '0' in
> circuit_customer. I'm not a fan though.

hm, why not?  null generally means 'unknown' and that's why it fails
any equality test.

>>BTW, if you want a fast plan over the current
>>data without consideration of aesthetics, try this:
>>
>>CREATE VIEW view_circuit_with_status AS (
>>    SELECT r.*,
>>          s.circuit_status,
>>          s.customer_id AS s_customer_id,
>>          p.line_speed,
>>          p.customer_id AS p_customer_id
>>     FROM view_circuit r
>>     JOIN view_circuit_product_main s
>>      ON r.circuit_id = s.circuit_id
>>      AND r.customer_id, s.customer_id
>>     JOIN view_circuit_product p
>>       ON r.circuit_id = p.circuit_id
>>      AND r.customer_id, s.customer_id
>>  UNION ALL SELECT r.*,
>>          s.circuit_status,
>>          s.customer_id AS s_customer_id,
>>          p.line_speed,
>>          p.customer_id AS p_customer_id
>>     FROM view_circuit r
>>     JOIN view_circuit_product_main s
>>       ON r.circuit_id = s.circuit_id
>>      AND r.customer_id IS NULL
>>      AND  s.customer_id IS NULL
>>     JOIN view_circuit_product p
>>       ON r.circuit_id = p.circuit_id>
>
> I will have to figure something out, but this specific case is still
> problematic
> since we would like to filter this view using different criteria's, like
> circuit_no,
> products or customers.

the above is logically equivalent to IS NOT DISTINCT FROM; you should
be able to query it as you would have done the original view.

> But with all these detours, I assume that a change to IS NOT DISTINCT FROM,
> is difficult or not wanted?

Well, not exactly.  In your case you are trying to treat null as a
specific value and pass it through join operations.

TBH, this is a pretty dubious approach: null is not supposed to be
equal to anything and any join vs null should come up empty --
logically at least.  INDF works around this of course but it's not a
recommended approach (my usage is generally restricted to, "has this
value changed since yesterday? etc").

I'm not an expert backend structures for indexing and optimization but
I know enough to suspect that optimizing INDF might cause
implementation headaches in various places, as do other irregular
syntactical approaches in SQL.  I think minimally optimizing INDF
would require converting it to an operator on par with '=' which is a
pretty large infrastructure change for an edge optimization case.
The fact that there are solid optimization strategies already on the
table (UNION ALL, expr index COALESCE()) does not help.

merlin


pgsql-general by date:

Previous
From: Kim Rose Carlsen
Date:
Subject: Re: How to hint 2 coulms IS NOT DISTINCT FROM each other
Next
From: "Hu, Patricia"
Date:
Subject: What is the best thing to do with PUBLIC schema in Postgresql database