Re: How to hint 2 coulms IS NOT DISTINCT FROM each other - Mailing list pgsql-general

From Merlin Moncure
Subject Re: How to hint 2 coulms IS NOT DISTINCT FROM each other
Date
Msg-id CAHyXU0xjCn9j=AGSwu1VM-Suf7rgicORTiKKeqn6gYi8w+v2NQ@mail.gmail.com
Whole thread Raw
In response to Re: How to hint 2 coulms IS NOT DISTINCT FROM each other  (Kim Rose Carlsen <krc@hiper.dk>)
Responses Re: How to hint 2 coulms IS NOT DISTINCT FROM each other
List pgsql-general
On Fri, Oct 28, 2016 at 2:54 PM, Kim Rose Carlsen <krc@hiper.dk> wrote:
>> > On Fri, Oct 28, 2016 at 10:29 AM, Kim Rose Carlsen <krc@hiper.dk> wrote:
>
>> > Hi
>> >
>> > I was wondering if there is a way to hint that two columns in two
>> > different
>> > tables IS NOT DISTINCT FROM each other. So that the optimizer may assume
>> > if
>> > table_a.key = 'test' THEN table_b.key = 'test' .
>> >
>> > The equals operator already does this but it does not handle NULLS very
>> > well
>> > (or not at all). And it seems IS NOT DISTINCT FROM is not indexable and
>> > doesn't establish the same inference rules as equals.
>>
>> The whole idea behing Postgres' query planner is that you don't have
>> to use any hints. Late model versions of postgres handle nulls fine,
>> but nulls are never "equal" to anything else. I.e. where xxx is null
>> works with indexes. Where x=y does not, since null <> null.
>>
>> Suggestion for getting help, put a large-ish aka production sized
>> amount of data into your db, run your queries with explain analyze and
>> feed them to https://explain.depesz.com/ and post the links here along
>> with the slow queries. A lot of times the fix is non-obvious if you're
>> coming from another db with a different set of troubleshooting skills
>> for slow queries.
>
> The problem is how to reduce the problem into its core, without introducing
> all the unnecessary.
>
> Maybe simplifying the problem, also makes it impossible to say where I go
> wrong. It might be that I try to push too much logic into the SQL layer
> and Im adding too many layers of abstraction to accomplish what I want.
> So let me try and elaborate a little more.
>
> I have couple a tables describing resources (circuits) and allocation
> of resources to customers and products.
>
> First layer is a view called view_circuit. This view (left) join any table
> the circuit table reference through a foreign key (it gives exactly the same
> rows and columns as circuit table + some extra information like
> customer_id).
>
> Second layer is 2 views
> 1) a view describing if the circuit is active or inactive, lets call it
>    view_circuit_product_main
> 2) a view describing line_speed about the circuit, lets call it
>    view_circuit_product
>
> These views use aggregations (both GROUP BY and SELECT DISTINCT ON (...))
> if this has any relevance.
>
> Third layer
> Next step is to add a view that tells both (joins the two views together
> on circuit_id). lets call the new view view_circuit_with_status
>
> This view is defined as
>
> CREATE VIEW view_circuit_with_status AS (
>    SELECT r.*,
>           s.circuit_status,
>           s.customer_id AS s_customer_id,
>           p.line_speed,
>           p.customer_id AS p_customer_id
>      FROM view_circuit r
>      JOIN view_circuit_product_main s
>        ON r.circuit_id = s.circuit_id
>       AND r.customer_id IS NOT DISTINCT FROM s.customer_id
>      JOIN view_circuit_product p
>        ON r.circuit_id = p.circuit_id
>       AND r.customer_id IS NOT DISTINCT FROM s.customer_id
> );
>
> SELECT * FROM view_circuit_with_status WHERE customer_id = 1;
>
> Since customer_id is exposed through view_circuit the planner assumes
> view_circuit.customer_id = 1 and from there attempts to join
> view_circuit_product_main and view_circuit_product using circuit_id.
> This is not running optimal.
>
> However if we change our query to allow the inference rule to take place,
> the query is executed very fast.
>
> SELECT * FROM view_circuit_with_status WHERE customer_id = 1 AND
> s_customer_id = 1 AND p_customer_id = 1;
>
> If a circuit is not assigned to any customers customer_id is set to NULL.
> This is the reason I can't use = operator. If I do use = then I can't find
> circuit which are unassigned, but the query do run effective.
>
> I can see this still ends up being quite abstract, but the point is it would
> be quite beneficial if IS NOT DISTINCT used the same rules as = operator.
>
> I have attached the 2 query plans
>
> Bad plan: https://explain.depesz.com/s/SZN
> Good plan: https://explain.depesz.com/s/61Ro

try this :-D
create or replace function indf(anyelement, anyelement) returns anyelement as
$$
  select $1 = $2 or ($1 is null and $2 is null);
$$ language sql;

CREATE VIEW view_circuit_with_status AS (
   SELECT r.*,
          s.circuit_status,
          s.customer_id AS s_customer_id,
          p.line_speed,
          p.customer_id AS p_customer_id
     FROM view_circuit r
     JOIN view_circuit_product_main s
       ON r.circuit_id = s.circuit_id
      AND indf(r.customer_id, s.customer_id)
     JOIN view_circuit_product p
       ON r.circuit_id = p.circuit_id
      AND indf(r.customer_id, s.customer_id)

merlin


pgsql-general by date:

Previous
From: Kim Rose Carlsen
Date:
Subject: Re: How to hint 2 coulms IS NOT DISTINCT FROM each other
Next
From: "David G. Johnston"
Date:
Subject: Re: How to hint 2 coulms IS NOT DISTINCT FROM each other