Re: Explain plan changes - IN CLAUSE ( Passing direct values Vs INNERQuery ) - Mailing list pgsql-performance

From Amarendra Konda
Subject Re: Explain plan changes - IN CLAUSE ( Passing direct values Vs INNERQuery )
Date
Msg-id CAJNAD0=NxGe+JfT6d_Wa4LXKhTcUckPyvDibSBMH31FGYRn8-w@mail.gmail.com
Whole thread Raw
In response to Re: Explain plan changes - IN CLAUSE ( Passing direct values Vs INNERQuery )  (Adrian Klaver <adrian.klaver@aklaver.com>)
List pgsql-performance
Hi Adrian,

Thanks for the reply.  And i have kept latest execution plans, for various SQL statements ( inner join, sub queries and placing values instead of sub query) . 
As suggested, tried with INNER JOIN, however result was similar to subquery. 

Is there any way we can tell the optimiser to process less number of rows based on the LIMIT value ? ( i.e. may be SQL re-write) ?
 

INNER SQL

EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) SELECT pi.process_instance_id AS pi_process_instance_id FROM process_instance pi WHERE pi.user_id = '137074931866340' AND pi.app_id = '126502930200650';
                                                                     QUERY PLAN                                                                      
-----------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using fki_conv_konotor_user_user_id on public.process_instance pi  (cost=0.43..2.66 rows=1 width=8) (actual time=0.018..0.019 rows=2 loops=1)
   Output: process_instance_id
   Index Cond: (pi.user_id = '137074931866340'::bigint)
   Filter: (pi.app_id = '126502930200650'::bigint)
   Buffers: shared hit=5
 Planning time: 0.119 ms
 Execution time: 0.041 ms


Full query - Sub query

 EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) SELECT pa.process_activity_id AS pa_process_activity_id  FROM process_activity pa WHERE pa.app_id = '126502930200650' AND pa.created > '1970-01-01 00:00:00' AND pa.process_instance_id in (SELECT pi.process_instance_id AS pi_process_instance_id FROM process_instance pi WHERE pi.user_id = '137074931866340' AND pi.app_id = '126502930200650') ORDER BY pa.process_instance_id, pa.created limit 50;
                                                                                                                                                                                                         
      QUERY PLAN                                                                                                                                                                                          
                       
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------
 Limit  (cost=1072.91..1072.99 rows=31 width=24) (actual time=744.386..744.415 rows=50 loops=1)
   Output: pa.process_activity_id, pa.process_instance_id, pa.created
   Buffers: shared hit=3760 read=39316
   ->  Sort  (cost=1072.91..1072.99 rows=31 width=24) (actual time=744.384..744.396 rows=50 loops=1)
         Output: pa.process_activity_id, pa.process_instance_id, pa.created
         Sort Key: pa.process_instance_id, pa.created
         Sort Method: top-N heapsort  Memory: 28kB
         Buffers: shared hit=3760 read=39316
         ->  Nested Loop  (cost=1.14..1072.14 rows=31 width=24) (actual time=0.044..727.297 rows=47011 loops=1)
               Output: pa.process_activity_id, pa.process_instance_id, pa.created
               Buffers: shared hit=3754 read=39316
               ->  Index Scan using fki_conv_konotor_user_user_id on public.process_instance pi  (cost=0.43..2.66 rows=1 width=8) (actual time=0.009..0.015 rows=2 loops=1)
                     Output: pi.process_instance_id
                     Index Cond: (pi.user_id = '137074931866340'::bigint)
                     Filter: (pi.app_id = '126502930200650'::bigint)
                     Buffers: shared hit=5
               ->  Index Scan using process_activity_process_instance_id_app_id_created_idx on public.process_activity pa  (cost=0.70..1055.22 rows=1427 width=24) (actual time=0.029..349.000 rows=23506 loops=2)
                     Output: pa.process_activity_id, pa.process_activity_type, pa.voice_url, pa.process_activity_user_id, pa.app_id, pa.process_instance_id, pa.alias, pa.read_by_user, pa.source, pa.label_category_id, pa.label_id, pa.csat_respons
e_id, pa.process_activity_fragments, pa.created, pa.updated, pa.rule_id, pa.marketing_reply_id, pa.delivered_at, pa.reply_fragments, pa.status_fragment, pa.internal_meta, pa.interaction_id, pa.do_not_translate, pa.should_tr
anslate, pa.in_reply_to
                     Index Cond: ((pa.process_instance_id = pi.process_instance_id) AND (pa.app_id = '126502930200650'::bigint) AND (pa.created > '1970-01-01 00:00:00'::timestamp without time zone))
                     Buffers: shared hit=3749 read=39316
 Planning time: 2.547 ms
 Execution time: 744.499 ms
(22 rows)

Full query - INNER JOIN

EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS)  SELECT pa.process_activity_id AS pa_process_activity_id  FROM process_activity pa INNER JOIN process_instance pi ON pi.process_instance_id = pa.process_instance_id AND pa.app_id = '126502930200650' AND pa.created > '1970-01-01 00:00:00' AND pi.user_id = '137074931866340' AND pi.app_id = '126502930200650' ORDER BY pa.process_instance_id, pa.created limit 50;
                                                                                                                                                                                                         
      QUERY PLAN                                                                                                                                                                                          
                       
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------
 Limit  (cost=1072.91..1072.99 rows=31 width=24) (actual time=87.803..87.834 rows=50 loops=1)
   Output: pa.process_activity_id, pa.process_instance_id, pa.created
   Buffers: shared hit=43070
   ->  Sort  (cost=1072.91..1072.99 rows=31 width=24) (actual time=87.803..87.815 rows=50 loops=1)
         Output: pa.process_activity_id, pa.process_instance_id, pa.created
         Sort Key: pa.process_instance_id, pa.created
         Sort Method: top-N heapsort  Memory: 28kB
         Buffers: shared hit=43070
         ->  Nested Loop  (cost=1.14..1072.14 rows=31 width=24) (actual time=0.030..73.847 rows=47011 loops=1)
               Output: pa.process_activity_id, pa.process_instance_id, pa.created
               Buffers: shared hit=43070
               ->  Index Scan using fki_conv_konotor_user_user_id on public.process_instance pi  (cost=0.43..2.66 rows=1 width=8) (actual time=0.015..0.018 rows=2 loops=1)
                     Output: pi.process_instance_id
                     Index Cond: (pi.user_id = '137074931866340'::bigint)
                     Filter: (pi.app_id = '126502930200650'::bigint)
                     Buffers: shared hit=5
               ->  Index Scan using process_activity_process_instance_id_app_id_created_idx on public.process_activity pa  (cost=0.70..1055.22 rows=1427 width=24) (actual time=0.011..21.447 rows=23506 loops=2)
                     Output: pa.process_activity_id, pa.process_activity_type, pa.voice_url, pa.process_activity_user_id, pa.app_id, pa.process_instance_id, pa.alias, pa.read_by_user, pa.source, pa.label_category_id, pa.label_id, pa.csat_respons
e_id, pa.process_activity_fragments, pa.created, pa.updated, pa.rule_id, pa.marketing_reply_id, pa.delivered_at, pa.reply_fragments, pa.status_fragment, pa.internal_meta, pa.interaction_id, pa.do_not_translate, pa.should_tr
anslate, pa.in_reply_to
                     Index Cond: ((pa.process_instance_id = pi.process_instance_id) AND (pa.app_id = '126502930200650'::bigint) AND (pa.created > '1970-01-01 00:00:00'::timestamp without time zone))
                     Buffers: shared hit=43065
 Planning time: 0.428 ms
 Execution time: 87.905 ms


FULL Query - INNER SQL replaced with result

EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) SELECT pa.process_activity_id AS m_process_activity_id FROM process_activity pa WHERE pa.app_id = '126502930200650' AND pa.created > '1970-01-01 00:00:00' AND pa.process_instance_id in (137074941043913,164357609323111)  ORDER BY pa.process_instance_id,pa.created limit 50;
                                                                                                   QUERY PLAN                                                                                            
       
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------
 Limit  (cost=0.70..37.65 rows=50 width=24) (actual time=0.016..0.095 rows=50 loops=1)
   Output: process_activity_id, process_instance_id, created
   Buffers: shared hit=55
   ->  Index Scan using process_activity_process_instance_id_app_id_created_idx on public.process_activity pa  (cost=0.70..2100.39 rows=2841 width=24) (actual time=0.015..0.077 rows=50 loops=1)
         Output: process_activity_id, process_instance_id, created
         Index Cond: ((pa.process_instance_id = ANY ('{137074941043913,164357609323111}'::bigint[])) AND (pa.app_id = '126502930200650'::bigint) AND (m.created > '1970-01-01 00:00:00'::timestamp without time
zone))
         Buffers: shared hit=55
 Planning time: 1.710 ms
 Execution time: 0.147 ms


Regards, Amar




On Thu, May 7, 2020 at 8:10 PM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 5/7/20 4:19 AM, Amarendra Konda wrote:
> Hi,
>
> PostgreSQL version : PostgreSQL 9.6.2 on x86_64-pc-linux-gnu, compiled
> by gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-9), 64-bit
>
> We have noticed huge difference interms of execution plan ( response
> time) , When we pass the direct values  Vs  inner query to IN clause.
>
> High level details of the use case are as follows
>
>   * As part of the SQL there are 2 tables named Process_instance
>     (master) and Process_activity ( child)
>   * Wanted to fetch TOP 50 rows from  Process_activity table for the
>     given values of the Process_instance.
>   * When we used Inner Join / Inner query ( query1)  between parent
>     table and child table , LIMIT is not really taking in to account.
>     Instead it is fetching more rows and columns that required, and
>     finally limiting the result

It is doing what you told it to do which is SELECT all
process_instance_i's for user_id='317079413683604' and app_id =
'427380312000560' and then filtering further. I am going to guess that
if you run the inner query alone you will find it returns ~23496 rows.
You might have better results if you an actual join between
process_activity and process_instance. Something like below(obviously
not tested):

SELECT
     pa.process_activity_id
FROM
     process_activity pa
JOIN
     process_instance pi
ON
     pa.process_instance_id = pi.process_instance_id
WHERE
     pa.app_id = '427380312000560'
     AND
          pa.created > '1970-01-01 00:00:00'
     AND
          pi.user_id = '317079413683604'
ORDER BY
     pa.process_instance_id,
     pa.created
LIMIT 50;

The second query is not equivalent as you are not filtering on user_id
and you are filtering on only three process_instance_id's.


>   *
>
>
> *Query1*
>
> web_1=> EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) SELECT
> pa.process_activity_id  FROM process_activity pa WHERE pa.app_id =
> '427380312000560' AND pa.created > '1970-01-01 00:00:00' AND
> pa.process_instance_id in *_(SELECT pi.process_instance_id FROM
> process_instance pi WHERE pi.user_id = '317079413683604' AND pi.app_id =
> '427380312000560')_* ORDER BY pa.process_instance_id,pa.created limit 50;
>                                                                         
>                                                                         
>                                                                  QUERY PLAN
>                                                 
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>   Limit  (cost=1071.47..1071.55 rows=31 width=24) (actual
> time=85.958..85.991 rows=50 loops=1)
>     Output: pa.process_activity_id, pa.process_instance_id, pa.created
>     Buffers: shared hit=43065
>     ->  Sort  (cost=1071.47..1071.55 rows=31 width=24) (actual
> time=85.956..85.971 rows=50 loops=1)
>           Output: pa.process_activity_id, pa.process_instance_id, pa.created
>           Sort Key: pa.process_instance_id, pa.created
>           Sort Method: top-N heapsort  Memory: 28kB
>           Buffers: shared hit=43065
>           ->  Nested Loop  (cost=1.14..1070.70 rows=31 width=24) (actual
> time=0.031..72.183 rows=46992 loops=1)
>                 Output: pa.process_activity_id, pa.process_instance_id,
> pa.created
>                 Buffers: shared hit=43065
>                 ->  Index Scan using fki_conv_konotor_user_user_id on
> public.process_instance pi  (cost=0.43..2.66 rows=1 width=8) (actual
> time=0.010..0.013 rows=2 loops=1)
>                       Output: pi.process_instance_id
>                       Index Cond: (pi.user_id = '317079413683604'::bigint)
>                       Filter: (pi.app_id = '427380312000560'::bigint)
>                       Buffers: shared hit=5
>                 ->  Index Scan using
> process_activity_process_instance_id_app_id_created_idx on
> public.process_activity pa  (cost=0.70..1053.80 rows=1425 width=24)
> (actual time=0.015..20.702 rows=*23496* loops=2)
> * Output: pa.process_activity_id, pa.process_activity_type,
> pa.voice_url, pa.process_activity_user_id, pa.app_id,
> pa.process_instance_id, pa.alias, pa.read_by_user, pa.source,
> pa.label_category_id, pa.label_id, pa.csat_response_id,
> pa.process_activity_fragments, pa.created, pa.updated, pa.rule_id, pa.market
> ing_reply_id, pa.delivered_at, pa.reply_fragments, pa.status_fragment,
> pa.internal_meta, pa.interaction_id, pa.do_not_translate,
> pa.should_translate, pa.in_reply_to*
>                       Index Cond: ((pa.process_instance_id =
> pi.process_instance_id) AND (pa.app_id = '427380312000560'::bigint) AND
> (pa.created > '1970-01-01 00:00:00'::timestamp without time zone))
>                       Buffers: shared hit=43060
>   Planning time: 0.499 ms
>   Execution time: 86.040 ms
> (22 rows)
>
> *_Query 2_*
>
> web_1=>  EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) SELECT
> pa.process_activity_id AS m_process_activity_id FROM process_activity m
> WHERE pa.app_id = '427380312000560' AND pa.created > '1970-01-01
> 00:00:00' AND pa.process_instance_id in
> (*240117466018927,325820556706970,433008275197305*) ORDER BY
> pa.process_instance_id,pa.created limit 50;
>                                                                         
>                                     QUERY PLAN
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>   Limit  (cost=0.70..37.66 rows=50 width=24) (actual time=0.023..0.094
> rows=50 loops=1)
>     Output: process_activity_id, process_instance_id, created
>     Buffers: shared hit=50
>     ->  Index Scan using
> process_activity_process_instance_id_app_id_created_idx on
> public.process_activity pa  (cost=0.70..3124.97 rows=4226 width=24)
> (actual time=0.022..0.079 *rows=50* loops=1)
>           Output: process_activity_id, process_instance_id, created
>           Index Cond: ((pa.process_instance_id = ANY
> ('{140117466018927,225820556706970,233008275197305}'::bigint[])) AND
> (pa.app_id = '427380312000560'::bigint) AND (pa.created > '1970-01-01
> 00:00:00'::timestamp without time zone))
>           Buffers: shared hit=50
>   Planning time: 0.167 ms
>   Execution time: 0.137 ms
> (9 rows)
>
>
> Can someone explain
>
>   * Why It is fetching more columns and more rows, incase of inner query ?
>   * Is there any option to really limit values with INNER JOIN, INNER
>     query ? If yes, can you please share information on this ?
>
> Thanks in advance for your time and suggestions.
>
> Regards, Amar


--
Adrian Klaver
adrian.klaver@aklaver.com

pgsql-performance by date:

Previous
From: samhitha g
Date:
Subject: pg_attribute, pg_class, pg_depend grow huge in count and size withmultiple tenants.
Next
From: Amarendra Konda
Date:
Subject: Re: Explain plan changes - IN CLAUSE ( Passing direct values Vs INNERQuery )