Re: How to hint 2 coulms IS NOT DISTINCT FROM each other - Mailing list pgsql-general
From | Merlin Moncure |
---|---|
Subject | Re: How to hint 2 coulms IS NOT DISTINCT FROM each other |
Date | |
Msg-id | CAHyXU0xjCn9j=AGSwu1VM-Suf7rgicORTiKKeqn6gYi8w+v2NQ@mail.gmail.com Whole thread Raw |
In response to | Re: How to hint 2 coulms IS NOT DISTINCT FROM each other (Kim Rose Carlsen <krc@hiper.dk>) |
Responses |
Re: How to hint 2 coulms IS NOT DISTINCT FROM each other
|
List | pgsql-general |
On Fri, Oct 28, 2016 at 2:54 PM, Kim Rose Carlsen <krc@hiper.dk> wrote: >> > On Fri, Oct 28, 2016 at 10:29 AM, Kim Rose Carlsen <krc@hiper.dk> wrote: > >> > Hi >> > >> > I was wondering if there is a way to hint that two columns in two >> > different >> > tables IS NOT DISTINCT FROM each other. So that the optimizer may assume >> > if >> > table_a.key = 'test' THEN table_b.key = 'test' . >> > >> > The equals operator already does this but it does not handle NULLS very >> > well >> > (or not at all). And it seems IS NOT DISTINCT FROM is not indexable and >> > doesn't establish the same inference rules as equals. >> >> The whole idea behing Postgres' query planner is that you don't have >> to use any hints. Late model versions of postgres handle nulls fine, >> but nulls are never "equal" to anything else. I.e. where xxx is null >> works with indexes. Where x=y does not, since null <> null. >> >> Suggestion for getting help, put a large-ish aka production sized >> amount of data into your db, run your queries with explain analyze and >> feed them to https://explain.depesz.com/ and post the links here along >> with the slow queries. A lot of times the fix is non-obvious if you're >> coming from another db with a different set of troubleshooting skills >> for slow queries. > > The problem is how to reduce the problem into its core, without introducing > all the unnecessary. > > Maybe simplifying the problem, also makes it impossible to say where I go > wrong. It might be that I try to push too much logic into the SQL layer > and Im adding too many layers of abstraction to accomplish what I want. > So let me try and elaborate a little more. > > I have couple a tables describing resources (circuits) and allocation > of resources to customers and products. > > First layer is a view called view_circuit. This view (left) join any table > the circuit table reference through a foreign key (it gives exactly the same > rows and columns as circuit table + some extra information like > customer_id). > > Second layer is 2 views > 1) a view describing if the circuit is active or inactive, lets call it > view_circuit_product_main > 2) a view describing line_speed about the circuit, lets call it > view_circuit_product > > These views use aggregations (both GROUP BY and SELECT DISTINCT ON (...)) > if this has any relevance. > > Third layer > Next step is to add a view that tells both (joins the two views together > on circuit_id). lets call the new view view_circuit_with_status > > This view is defined as > > CREATE VIEW view_circuit_with_status AS ( > SELECT r.*, > s.circuit_status, > s.customer_id AS s_customer_id, > p.line_speed, > p.customer_id AS p_customer_id > FROM view_circuit r > JOIN view_circuit_product_main s > ON r.circuit_id = s.circuit_id > AND r.customer_id IS NOT DISTINCT FROM s.customer_id > JOIN view_circuit_product p > ON r.circuit_id = p.circuit_id > AND r.customer_id IS NOT DISTINCT FROM s.customer_id > ); > > SELECT * FROM view_circuit_with_status WHERE customer_id = 1; > > Since customer_id is exposed through view_circuit the planner assumes > view_circuit.customer_id = 1 and from there attempts to join > view_circuit_product_main and view_circuit_product using circuit_id. > This is not running optimal. > > However if we change our query to allow the inference rule to take place, > the query is executed very fast. > > SELECT * FROM view_circuit_with_status WHERE customer_id = 1 AND > s_customer_id = 1 AND p_customer_id = 1; > > If a circuit is not assigned to any customers customer_id is set to NULL. > This is the reason I can't use = operator. If I do use = then I can't find > circuit which are unassigned, but the query do run effective. > > I can see this still ends up being quite abstract, but the point is it would > be quite beneficial if IS NOT DISTINCT used the same rules as = operator. > > I have attached the 2 query plans > > Bad plan: https://explain.depesz.com/s/SZN > Good plan: https://explain.depesz.com/s/61Ro try this :-D create or replace function indf(anyelement, anyelement) returns anyelement as $$ select $1 = $2 or ($1 is null and $2 is null); $$ language sql; CREATE VIEW view_circuit_with_status AS ( SELECT r.*, s.circuit_status, s.customer_id AS s_customer_id, p.line_speed, p.customer_id AS p_customer_id FROM view_circuit r JOIN view_circuit_product_main s ON r.circuit_id = s.circuit_id AND indf(r.customer_id, s.customer_id) JOIN view_circuit_product p ON r.circuit_id = p.circuit_id AND indf(r.customer_id, s.customer_id) merlin
pgsql-general by date: