Re: Explain plan changes - IN CLAUSE ( Passing direct values Vs INNERQuery ) - Mailing list pgsql-performance

From Adrian Klaver
Subject Re: Explain plan changes - IN CLAUSE ( Passing direct values Vs INNERQuery )
Date
Msg-id 9d796f65-1c94-eb69-4828-04fe75d62885@aklaver.com
Whole thread Raw
In response to Re: Explain plan changes - IN CLAUSE ( Passing direct values Vs INNERQuery )  (Amarendra Konda <amar.vijaya@gmail.com>)
List pgsql-performance
On 5/7/20 10:49 AM, Amarendra Konda wrote:
> Hi David,
> 
> Thanks for the reply.This has optimized number of rows.

Yeah, but your execution time has increased an order of magnitude. Not 
sure if that is what you want.

> 
> Can you please explain, why it is getting more columns in output, even 
> though we have asked for only one column ?
> 
> 
>   EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS)  SELECT 
> pa.process_activity_id AS pa_process_activity_id  FROM process_activity 
> pa WHERE pa.app_id = '126502930200650' AND pa.created > '1970-01-01 
> 00:00:00'  AND EXISTS ( SELECT 1 FROM process_instance pi where 
> pi.app_id = pa.app_id  AND pi.user_id = '137074931866340') ORDER BY 
> pa.process_instance_id,m.created limit 50;
> 
>     QUERY PLAN
> 
>
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> -----------------
>   Limit  (cost=1.14..37.39 rows=50 width=24) (actual 
> time=821.283..891.629 rows=50 loops=1)
>     Output: pa.process_activity_id, pa.process_instance_id, pa.created
>     Buffers: shared hit=274950
>     ->  Nested Loop Semi Join  (cost=1.14..266660108.78 rows=367790473 
> width=24) (actual time=821.282..891.607 rows=50 loops=1)
>           Output: pa.process_activity_id, pa.process_instance_id, pa.created
>           Buffers: shared hit=274950
>           ->  Index Scan using 
> process_activity_process_instance_id_app_id_created_idx on 
> public.process_activity pa  (cost=0.70..262062725.21 rows=367790473 
> width=32) (actual time=821.253..891.517 rows=50 loops=1)
> * Output: pa.process_activity_id, pa.process_activity_type, 
> pa.voice_url, pa.process_activity_user_id, pa.app_id, 
> pa.process_instance_id, pa.alias, pa.read_by_user, pa.source, 
> pa.label_category_id, pa.label_id, pa.csat_response_id,
> m.process_activity_fragments, pa.created, pa.updated, pa.rule_id, 
> pa.marketing_reply_id, pa.delivered_at, pa.reply_fragments, 
> pa.status_fragment, pa.internal_meta, pa.interaction_id, 
> pa.do_not_translate, pa.should_translat
> e, pa.in_reply_to*
>                 Index Cond: ((m.app_id = '126502930200650'::bigint) AND 
> (m.created > '1970-01-01 00:00:00'::timestamp without time zone))
>                 Buffers: shared hit=274946
>           ->  Materialize  (cost=0.43..2.66 rows=1 width=8) (actual 
> time=0.001..0.001 rows=1 loops=50)
>                 Output: pi.app_id
>                 Buffers: shared hit=4
>                 ->  Index Scan using fki_conv_konotor_user_user_id on 
> public.process_instance pi  (cost=0.43..2.66 rows=1 width=8) (actual 
> time=0.020..0.020 rows=1 loops=1)
>                       Output: pi.app_id
>                       Index Cond: (pi.user_id = '137074931866340'::bigint)
>                       Filter: (pi.app_id = '126502930200650'::bigint)
>                       Buffers: shared hit=4
>   Planning time: 0.297 ms
>   Execution time: 891.686 ms
> (20 rows)
> 
> On Thu, May 7, 2020 at 9:17 PM David G. Johnston 
> <david.g.johnston@gmail.com <mailto:david.g.johnston@gmail.com>> wrote:
> 
>     On Thu, May 7, 2020 at 7:40 AM Adrian Klaver
>     <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>> wrote:
> 
>         On 5/7/20 4:19 AM, Amarendra Konda wrote:
>          > Hi,
>          >
>          > PostgreSQL version : PostgreSQL 9.6.2 on x86_64-pc-linux-gnu,
>         compiled
>          > by gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-9), 64-bit
>          >
>          > We have noticed huge difference interms of execution plan (
>         response
>          > time) , When we pass the direct values  Vs  inner query to IN
>         clause.
>          >
>          > High level details of the use case are as follows
>          >
>          >   * As part of the SQL there are 2 tables named Process_instance
>          >     (master) and Process_activity ( child)
>          >   * Wanted to fetch TOP 50 rows from  Process_activity table
>         for the
>          >     given values of the Process_instance.
>          >   * When we used Inner Join / Inner query ( query1)  between
>         parent
>          >     table and child table , LIMIT is not really taking in to
>         account.
>          >     Instead it is fetching more rows and columns that
>         required, and
>          >     finally limiting the result
> 
>         It is doing what you told it to do which is SELECT all
>         process_instance_i's for user_id='317079413683604' and app_id =
>         '427380312000560' and then filtering further. I am going to
>         guess that
>         if you run the inner query alone you will find it returns ~23496
>         rows.
>         You might have better results if you an actual join between
>         process_activity and process_instance. Something like
>         below(obviously
>         not tested):
> 
> 
>     What the OP seems to want is a semi-join:
> 
>     (not tested)
> 
>     SELECT pa.process_activity_id
>     FROM process_activity pa WHERE pa.app_id = '427380312000560' AND
>     pa.created > '1970-01-01 00:00:00'
>     ANDEXISTS (
>        SELECT 1 FROM process_instance pi WHERE pi.app_id = pa.app_id AND
>     pi.user_id = '317079413683604'
>     )
>     ORDER BY
>     pa.process_instance_id,
>     pa.created limit 50;
> 
>     I'm unsure exactly how this will impact the plan choice but it
>     should be an improvement, and in any case more correctly defines
>     what it is you are looking for.
> 
>     David J.
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



pgsql-performance by date:

Previous
From: Amarendra Konda
Date:
Subject: Re: Explain plan changes - IN CLAUSE ( Passing direct values Vs INNERQuery )
Next
From: Virendra Kumar
Date:
Subject: Re: Explain plan changes - IN CLAUSE ( Passing direct values VsINNER Query )