Re: [GENERAL] Equivalence Classes when using IN - Mailing list pgsql-general

From Kim Rose Carlsen
Subject Re: [GENERAL] Equivalence Classes when using IN
Date
Msg-id VI1PR05MB1709102944A48BCC52D460F1C7740@VI1PR05MB1709.eurprd05.prod.outlook.com
Whole thread Raw
In response to Re: [GENERAL] Equivalence Classes when using IN  (David Rowley <david.rowley@2ndquadrant.com>)
Responses Re: [GENERAL] Equivalence Classes when using IN  (David Rowley <david.rowley@2ndquadrant.com>)
List pgsql-general
>On 9 October 2017 at 08:01, Kim Rose Carlsen <krc@hiper.dk> wrote:


>> Is this because postgres never consider IN clause when building equivalence
>> class's?
>
>Only btree equality operators are considered at the moment.

After good night sleep and reading the previous discussion, I am no longer sure I have reduced my original problem to
theright example. If we continue from previous setup and add the following: 

           ALTER TABLE customer ADD COLUMN age INTEGER;
           UPDATE customer SET age = customer_id / 5;
           CREATE INDEX ON customer (age);           CREATE INDEX ON product (customer_id);

               SET enable_hashjoin = false;
               SET enable_mergejoin = false;
                
            EXPLAIN ANALYZE
             SELECT *
               FROM customer
               JOIN view_customer
                 ON customer.customer_id = view_customer.customer_id
              WHERE age < 20;
                                                                QUERY PLAN
                    

---------------------------------------------------------------------------------------------------------------------------------------------Nested
LoopLeft Join  (cost=139.00..10392.96 rows=668 width=16) (actual time=0.528..35.120 rows=200 loops=1)  Join Filter:
(c.customer_id= product.customer_id)  Rows Removed by Join Filter: 199900  ->  Nested Loop  (cost=0.28..199.21 rows=334
width=12)(actual time=0.075..1.146 rows=100 loops=1)        ->  Seq Scan on customer  (cost=0.00..21.51 rows=334
width=8)(actual time=0.067..0.282 rows=100 loops=1)              Filter: (age < 20)              Rows Removed by
Filter:901        ->  Index Only Scan using customer_pkey on customer c  (cost=0.28..0.53 rows=1 width=4) (actual
time=0.006..0.006rows=1 loops=100)              Index Cond: (customer_id = customer.customer_id)              Heap
Fetches:100  ->  Materialize  (cost=138.73..173.75 rows=2001 width=8) (actual time=0.005..0.130 rows=2001 loops=100)
   ->  Sort  (cost=138.73..143.73 rows=2001 width=8) (actual time=0.448..0.588 rows=2001 loops=1)              Sort
Key:product.customer_id, product.product_id              Sort Method: quicksort  Memory: 142kB              ->  Seq
Scanon product  (cost=0.00..29.01 rows=2001 width=8) (actual time=0.006..0.215 rows=2001 loops=1)Planning time: 0.214
msExecutiontime: 35.284 ms 


The planner prefer to use hash and merge joins which is ok, when many rows are to be joined, I don't think any
conditioncan be merged to make these case faster. I have disabled merge and hash joins to get to a nested loop join
instead,in this case it would be much better if customer_id can be pulled inside the loop, so it can look at only the
relevantrows and not all rows for each loop. I somehow inferred that this would be the same as selecting from the view
usingIN clause, now I'm not so sure anymore. 

I can see there is a trade off between planner time and how exotic the case is. If you want to be able to hide
abstractionthrough views I guess the nature becomes more OLAP oriented than OLTP.  

Best Regards
Kim Carlsen

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

pgsql-general by date:

Previous
From: David Rowley
Date:
Subject: Re: [GENERAL] OR-clause support for indexes
Next
From: David Rowley
Date:
Subject: Re: [GENERAL] Equivalence Classes when using IN